Service Function Chaining (SFC) Operations, Administration, and Maintenance (OAM) Framework
Note: This ballot was opened for revision 13 and is now closed.
Martin Vigoureux Yes
Alvaro Retana No Objection
Comment (2020-05-07 for -13)
I support Murray's DISCUSS. I appreciate the fact that "More specifics on the mechanism to characterize SF-specific OAM to validate the service offering are outside the scope of this document." (§3.1.1) The issue I want to point out, which I believe is a significant omission in this document, is the lack of mention in §4 and §5 of the validation of the service offering as an SFC OAM function or in the gap analysis. IOW, the availability of the SF from the point of view of its ability to provide the service is pointed out as important in §3.1.1, but there is no further consideration later in the document. [I believe that this issue borders on a DISCUSS -- and while I would really like to see further consideration in the text, I decided to trust that the authors and the responsible AD will take care of it.]
Benjamin Kaduk No Objection
Comment (2020-05-06 for -13)
Section 2 I'm not sure that I understand why a node in the underlay network (the "VM2" one) doesn't line up with a node in the link layer, in Figure 1. Section 3 classifiers, controllers, other service nodes). Testing an SF may not be restricted to connectivity to the SF, but also whether nit(?) perhaps "may be more expansive than just checking connectivity to the SF", to avoid questions about what the "not" binds to. 3. Classifier component: OAM functions applicable at this component includes testing the validity of the classification rules and detecting any incoherence among the rules installed in different classifiers. It seems important to include both positive and negative tests of classification functionality (i.e., that traffic that should not match in fact does not match). Section 3.3 might be a good place to mention this. Section 3.1.1 service function is available?. On one end of the spectrum, one might argue that an SF is sufficiently available if the service node (physical or virtual) hosting the SF is available and is functional. On the other end of the spectrum, one might argue that the SF's availability can only be concluded if the packet, after passing I agree with the other reviewers that the first "end" of the spectrum seems surprising. firewall). The cost of this approach is that the OAM mechanism for some SF will need to be continuously modified in order to "keep up" with new functionality being introduced: lack of extendibility. nit: the grammar doesn't seem right around "lack of extendibility" (also, is "extendibility" preferred or "extensibility"?). The SF availability can be performed using a generalized approach (i.e., an adequate granularity to provide a basic SF service). More nit: I think it's an availability *check* that can be performed. Section 3.2.2 Mandating the ability to measure every arbitrary segment of SFs within an SFC seems like it might be over-constraining. Section 4 to perform OAM functionality at different layers. In order to apply such OAM functions at the service layer, they need to be enhanced to operate a single SF/SFF to multiple SFs/SFFs in an SFC and also in multiple SFCs. I don't understand what "operate a single SFF to multiple SFs/SFFs" means. Section 4.1 o Verify any packet re-ordering and corruption. nit: this wording doesn't really make sense. Do we want to verify the absence of such things, or that it is within configured tolerances, or something else? Just noting any occurrences and verifying that the noted occurrences are as noted doesn't seem useful... o Verify the policy of an SFC or SF. nit: similarly, is this to verify the configuration, or to verify that operation matches the expected configuration? Section 4.2 Continuity is a model where OAM messages are sent periodically to validate or verify the reachability to a given SF within an SFC or nit: while it's "connectivity to", I think it's "reachability of". o Notifying the detected failures to other OAM functions or applications to take appropriate action. nit: the subject of a notification is the entity receiving notification, not the content of the notification. So "notifying other OAM functions or applications of the detected failures so they can take appropriate action", or "Sending notifications of the detected failures to other [...]". Section 4.4 delay [RFC7679] is important. In order to measure one-way delay, time synchronization MUST be supported by means such as NTP, PTP, GPS, etc. I think (informational) references are in order for these. (PTP is not listed as "well-known" at https://www.rfc-editor.org/materials/abbrev.expansion.txt , though the other two are.) One-way delay variation [RFC3393] could also be calculated by sending OAM packets and measuring the jitter between the packets passing through an SFC. Looking at jitter between (measurement) packets to ascertain delay variation seems to require foreknowledge of the (e.g., periodic) pacing of the initial packet transmission. If, on the other hand, the idea is to look at the jitter across the measured delay of each packet, then that works fine (but that's not what the current text says). o Ability to measure the packet loss [RFC7680] within an SF or an SFP bound to a given SFC. nit(?): packet loss "within an SF" (as opposed to between two SFs) is not something I would have expected to need measuring, on first thought. Though on further reflection it is less surprising; still, I wanted to check that this is indeed as intended. Section 5.2 As shown in Table 3, there are no standards-based tools available for the verification of SFs and SFCs. Some note about "at the time of this writing" or similar seems advised; otherwise this statement is unlikely to age well. Section 6.2 An SFF may choose not to forward the OAM packet to an SF if the SF does not support OAM or if the policy does not allow to forward OAM packet to an SF. The SFF may choose to skip the SF, modify the header and forward to next SFC node in the chain. It should be noted that skipping an SF might have implication on some OAM functions (e.g. the delay measurement may not be accurate). The method by This behavior was initially surprising to me, and "might have implication on" feels like weaker text than is merited. While I can perhaps imagine that not forwarding an OAM packet to an SF that will choke on it instead of doing something useful, it seems like it's rather divergent from the OAM expectations to silently bypass a given SF and is quite likely to affect the accuracy of the resulting OAM results. process OAM packets is outside the scope of this document. It could be a configuration parameter instructed by the controller or it can be done by dynamic negotiation between the SF and SFF. (Is there an existing mechanism for dynamic negotation between SF and SFF?) Section 6.3 As described in Section 4, there are different OAM functions that may require different OAM solutions. While the presence of the OAM marker in the overlay header (e.g., O bit in the NSH header) indicates it as OAM packet, it is not sufficient to indicate what OAM function the packet is intended for. The Next Protocol field in NSH header may be used to indicate what OAM function is intended to or what toolset is used. Elsewhere in the document we make reference to what would be required of a non-NSH encapsulation header; is it appropriate to also do so here? Section 6.4.2 BFD or S-BFD could be leveraged to perform continuity function for SF or SFC. An initiator could generate a BFD control packet and set the "Your Discriminator" value as last SFF in the control packet. Upon nit: I think this "your discriminator" would be the address or identifier of the last SFF, not just "the last SFF" itself, right? Or be set "to indicate" the last SFF, or similar. (Also occurs a few sentences later.) with relevant DIAG code. The TTL field in the NSH header could be used to perform partial SFC availability. For example, the initiator nit: availability checks/checking Section 6.4.4 [I-D.penno-sfc-trace] defines a protocol that checks for path liveliness and traces the service hops in any SFP. Section 3 of [I-D.penno-sfc-trace] defines the SFC trace packet format while Sections 4 and 5 of [I-D.penno-sfc-trace] defines the behavior of SF and SFF respectively. While [I-D.penno-sfc-trace] has expired, the proposal is implemented in Open Daylight and available. Why is draft-penno-sfc-trace not progressing towards publication? Section 7 I think this table really needs more lead-in to what it's communicating. As depicted in Table 4, information and data models are needed for configuration, manageability and orchestration for SFC. With I don't see where that is actually indicated by the table. Section 8 OAM information (though most usefully as summary statistics), if leaked, could also be used by an attacker to gauge the efficacy of an ongoing attack. Any security consideration defined in [RFC7665] and [RFC8300] are applicable for this document. The rest of the document implies that NSH is not mandatory, so I'd suggest rewording this reference to clarify what from RFC 8300 is (or is not) applicable to the whole of the document. The mapping and the rules information at the classifier component may reveal the traffic rules and the traffic mapped to the SFC. The SFC information collected at an SFC component may reveal the SF associated within each chain and this information together with nit: s/the SF/the SFs/ To address the above concerns, SFC and SF OAM may provide mechanism for: (1) The crucial missing word that Roman notes is indeed crucial! (2) Can we say something stronger than "may"? The documents proposing the OAM solution for any service layer components should consider some form of message filtering to prevent leaking any internal service layer information outside the administrative domain. "should consider" is fairly weak guidance; would "should provide" be appropriate? Also, is it worth mentioning that this filtering would include dropping OAM-marked messages from outside the domain (at least by default)? Section 12.1 The only citation to RFC 8300 that appears to require a normative reference is the bit in the security considerations that I noted was unclear about its scope of applicability. Section 12.2 RFC 6291 is a BCP, so we should probably cite it as such. Also, given its content, perhaps it ought to be normative?
Erik Kline No Objection
Comment (2020-05-06 for -13)
[[ nits ]] Several minor grammar tweaks, which others have identified, and hopefully can be fixed in these final stages of publication. A few of them I did mention below. [ section 2 ] * The first paragraph describing Figure 1 has a few grammar bugs. May I suggest something to the effect of: In Figure 1, the service layer elements such as classifier and SF are depicted as virtual machines that are interconnected using an overlay network. The underlay network may comprise multiple intermediate nodes not shown in the figure that provide underlay connectivity between the service layer elements. [ section 3.1.1 ] * s/the got/the/ * The 2nd to last paragraph could use a simplifying rewrite. The conversational tone I think impedes the direct, clear transmission of intended meaning. * "The SF availability can be performed": does this refer to the *measurement* of SF availability? [ section 3.2.1 ] * s/comprised of/composed of/ [ section 7 ] * Do 6.4.1 and/or 6.4.4 suffice to perform topology exploration of the SFC?
Martin Duke No Objection
Comment (2020-05-04 for -13)
Please address the issues in the (detailed!) Tsvarea review, in particular the bits from section 4 that incorrectly describe the work coming out of IPPM. Additional nits: Sec 3.1.1: s/extendibility/extensibility Sec 3.1.2: there are also “hybrid” methods like IOAM that do not fit the active and passive definitions neatly. 4.1 s/packet to be of variable length packet size/packet to be of variable length 4.1 you are probably not trying to “verify” packet reordering and corruption! I suggest “detect” instead. 6.3 s/is intended to/is intended 6.3 s/in NSH header/in the NSH header 6.4.1 s/describes/describe 6.4.1 s/incrementing the ttl/incrementing ttl 6.4.3 s/using NSH header/using the NSH header 8. s/Any security consideration/Any security considerations 8. Missing period at end of second paragraph 8. s/mechanism/mechanisms
Murray Kucherawy (was Discuss) No Objection
Comment (2020-05-07 for -13)
Thank you for engaging my DISCUSS question about Section 3.1.1. I don't feel it was fully resolved, but I also don't feel there's anything to be gained by pressing my point further. My original comments: I think this is pretty well done. I had little trouble following it and this is my first foray into the realm of SFC. I find myself almost suggesting that you don't need the BCP 14 boilerplate or its language at all in this document. You barely use it, and it might not even be needed in the places you do have it especially since this is a framework document and not a protocol document. Finally, lots of editorial stuff: Section 2: * “In Figure 1, the service layer element such as …” -- s/element/elements/ * “The underlay network may comprise of …” -- s/comprise/consist/, or alternatively, s/comprise of/comprise/ * “... nodes but not shown …” -- remove “but”, or s/but/but these/ Section 3, all three numbered list entries: * “... applicable at this component includes …” -- s/includes/include/ Section 3.1.1: OLD: "SF availability is an aspect that raises an interesting question -- How to determine that a service function is available?." NEW: "SF availability is an aspect that raises an interesting question: How does one determine that a service function is available?" * “... the packet did indeed get the got expected service.” -- remove “got” * This section uses both “a SF” and “an SF”. I don’t know which one is grammatically correct, but we should find out and use that. Or, if both are, pick one and be consistent. Section 3.2.1: * “An SFC could be comprised of …” -- either remove “of”, or use “composed of” Section 3.3: * Either capitalize “Classifier” generally, or not at all. Section 3.4: * What constitutes an availability check of the underlay network? You discussed this in 3.1.1, but not here. If this is covered in Section 4.1, a forward reference would be helpful here. Section 3.5: * “... and are mostly transparent …” -- s/are/is/; “network” is singular * What constitutes an availability check of the overlay network? Section 4: * “...for more than one SFC components.” -- use “component”, or change “more than one” to “multiple” Section 4.1: * “Verify any packet re-ordering and corruption.” -- perhaps “detect” would be better than “verify” here Section 4.2: * “Ability to provision continuity check …” -- put an “a” before “continuity” * “... supported by continuity function are as follows:” -- s/function/functions/ Section 4.3: * “... from every transit devices …” -- s/devices/device/ * “Ability to trigger action from …” -- s/action/an action/ Section 5.1: * I would quote “ping” and “trace”. * “Table 3 below is not exhaustive” -- needs a period at the end * Table 3 needs a bit of tidying in terms of alignment. Section 6.1: * “... network layer must mark the packet …” -- why isn’t that a MUST? Section 6.2: * “... skipping an SF might have implication …” -- s/implication/implications/ * “Any SFC-aware node that initiates an OAM packet must set …” -- why isn’t that a MUST? Section 6.3: * “... indicates it as OAM packet …” -- s/as/as an/ Section 6.4.1: * “[RFC0792] and [RFC4443] describes …” -- s/describes/describe/ * “... verify the availability of SF or SFC.” -- s/SF/an SF/ * “... can generate ICMP echo …” -- s/ICMP/an ICMP/ * “... from last SFF and thereby …” -- add comma after “SFF” * “Alternately, … Alernatively, … ” -- use “Alternatively” for the first one, and “or” for the second, all in one sentence Section 6.4.2: * “[RFC5880] defines Bidirectional …” -- s/defines/defines the/ * “... to perform continuity function …” -- “functions”, or “the continuity function” * “... value as last SFF …” -- s/last/the last/ * “... with relevant DIAG code.” -- s/with/with the/ (or “a”) Section 6.4.3: * “... transported using NSH header.” -- s/using/using the/ Section 6.4.4: * “... is implemented in Open Daylight and available.” -- s/and/and is/ Section 7: * Table 4 needs some work on spacing consistency. Section 8: * “... are applicable for this document.” -- s/are/is/, as “consideration” is singular * “The OAM information from service layer …” -- s/from/from the/ * “... service function paths etc.” -- s/paths etc./paths, etc./ * “... information from SFC layer raises a need for careful security considerations” -- s/from/from the/, and the sentence is missing a period * “... SFC and SF OAM may provide mechanism …” -- s/mechanism/mechanisms/ * “... the OAM solution for SF component should …” -- s/component/components/
Robert Wilton No Objection
Comment (2020-05-07 for -13)
I found this document to be pretty easy to read and understand, so thank you for your work in this area. I have a few comments, that may have already been raised by other reviewers: 2. SFC Layering Model While Figure 1 depicts an example where SFs are enabled as virtual entities, the SFC architecture does not make any assumptions on how the SFC data plane elements are deployed. The SFC architecture is flexible and accommodates physical or virtual entity deployment. SFC OAM accounts for this flexibility and accordingly it is applicable whether SFC data plane elements are deployed directly on physical hardware, as one or more Virtual Machines, or any combination thereof. Would "SF data plane elements" be more clear than "SFC data plane elements"? 3. SFC OAM Components 3. Classifier component: OAM functions applicable at this component includes testing the validity of the classification rules and detecting any incoherence among the rules installed in different classifiers. It was not entirely clear to me what is meant by different classifiers, so possibly this could be elaborated slightly. 4.3. Trace Functions "TTL" is used in various places. Does that need to be listed in the acronyms? 6.2. OAM Packet Processing and Forwarding Semantic Upon receiving an OAM packet, SFC-aware SFs may choose to discard the packet if it does not support OAM functionality or if the local policy prevents them from processing the OAM packet. When an SF supports OAM functionality, it is desirable to process the packet and provide an appropriate response to allow end-to-end verification. To limit performance impact due to OAM, SFC-aware SFs should rate limit the number of OAM packets processed. Doesn't this mean that SFC is potentially altering the thing being measured? Wouldn't it be better instead to rate limit the number of OAM packets that are being generated in the first place? 6.1. SFC OAM Packet Marker The SFC OAM function described in Section 4 performed at the service layer or overlay network layer must mark the packet as an OAM packet so that relevant nodes can differentiate an OAM packet from data packets. The base header defined in Section 2.2 of [RFC8300] assigns a bit to indicate OAM packets. When NSH encapsulation is used at the service layer, the O bit must be set to differentiate the OAM packet. Any other overlay encapsulations used at the service layer must have a way to mark the packet as OAM packet. "must be set" => "MUST be set" & "must have a way" => "MUST have a way"? But I question whether these should be musts at all. E.g. by setting an OAM bit you allow the intermediate functions to potentially modify their behaviour, making it harder to know that the thing under test isn't changing its behaviour because it is being tested. E.g. could another choice be to use some reserved address space to simulate flows without requiring the packets to be explicitly marked? 7. Manageability Considerations Table 4: OAM Tool GAP Analysis +----------------+--------------+-------------+--------+-------------+ | Layer |Configuration |Orchestration|Topology|Notification | +----------------+--------------+-------------+--------+-------------+ | Underlay N/w |CLI, NETCONF | CLI, NETCONF|SNMP |SNMP, Syslog,| | | | | |NETCONF | +----------------+--------------+-------------+--------+-------------+ | Overlay N/w |CLI, NETCONF | CLI, NETCONF|SNMP |SNMP, Syslog | | | | | |NETCONF | +----------------+--------------+-------------+--------+-------------+ | Classifier | CLI, NETCONF | CLI, NETCONF| None | None | +----------------+--------------+-------------+--------+-------------+ | SF |CLI, NETCONF | CLI, NETCONF| None | None | +----------------+--------------+-------------+--------+-------------+ | SFC |CLI, NETCONF | CLI, NETCONF| None | None | +----------------+--------------+-------------+--------+-------------+ It would probably be useful for YANG to be listed here as well under configuration and Orchestration. RESTCONF or gNMI could potentially also be listed, although I note that this table is not intended to be exhaustive. There is also a base YANG topology model, RFC 8345, and other extensions being defined, at least for Overlay and Underlay networks. Would they be appropriate for the topology column? Regards, Rob
Roman Danyliw No Objection
Comment (2020-05-06 for -13)
** Section 1. Per “OAM controllers are assumed to be within the same administrative domain as the target SFC enabled domain”, is there a circumstance where the OAM controller would be working across administrative domains? Can this statement be made stronger – “OAM controller MUST be within the same administrative …” ** Section 2. Example technologies for the underlay (i.e., IP MPLS) and link (i.e., POS DWDM) layers are provided. It might help readability to provide such examples for the overlay network layer too. ** Section 3.0. “Testing an SF may not be restricted to connectivity to the SF, but also whether the SF is providing its intended service. Refer to Section 3.1.1 for a more detailed discussion.” Section 3.1.1 appears to discuss availability but not validation practices to ensure “the SF is providing its intended service”. Is providing the intended service as simple as the service being up? Or does it include confirmation that functionally the right service is happening? ** Section 3.4. Typo. s/and so/so/ ** Section 3.5. Per “The overlay network establishes the service plane between the SFC components and are mostly transparent to the SFC data plane elements.”, in what way is the overlay network not transparent? ** Section 5.1. Per “Tools like ping and trace …” and “BFD is another tool …”, what specific instances of ping and trace are you referencing? Or is this generically saying ping = Section 6.4.1, trace = Section 6.4.4 and BFD = Section 6.4.2? ** Section 5.1. I initially found the contents of this table confusing. The paragraph introducing this table refers to “ping” and “trace” as tools. However, E-OAM, IPPM, etc. which are also noted in this table don’t seem easily categories by the “tool” designation. ** Section 6.4.3. Is there a reason that In-Situ OAM is mentioned here but not in Section 5.1? ** Section 8. Per “The sensitivity of the information from SFC layer raises a need for careful security considerations”, what are these concerns? It also isn’t clear what information is being mentioned. ** Section 8. (I’m not calling this a DISCUSS since this is a trivial editorial fix but it MUST be done before publication) Per “To address the above concerns, SFC and SF OAM may provide mechanism for: o Misuse of the OAM channel for denial-of-services, …” A crucial missing word here as in – s/provide mechanisms for:/provide mechanisms for mitigating:/ ** Section 8. Per “To address the above concerns, SFC and SF OAM may provide mechanism for: … o Leakage of OAM packets across SFC instances, and o Leakage of SFC information beyond the SFC domain.” -- the text above mentions the risk of risk of SFC evasion but this isn’t mentioned here -- the two bullets here seem appropriate and needed OAM activities but they don’t follow from the text above (as explicitly promised by the text “to address the above …”) ** Section 8. What is the different between the guidance: -- “… SRC and SF OAM may provide mechanism to [mitigate]: … leakage of SFC information beyond the SFC domain” -- “The documents proposing the OAM solution for any service layer component should consider some form of message filtering to prevent leaking any internal service layer information outside the administrative domain” To me they are saying the same thing, except the second item explicitly notes message filtering.
Éric Vyncke No Objection
Comment (2020-05-06 for -13)
Thank you for the work put into this document. The document is easy to read; BTW, I found that its content is more about the justifications / requirements for an OAM system and tools descriptions than about for a framework description. The core of the document appears to be section 6: this should probably be reflected in the abstract and introduction Please address the comments raised in the Internet directorate review by Carlos Bernardos: https://mailarchive.ietf.org/arch/msg/int-dir/TgQulH7hytGPNxdAPWcSgkTx1IM Please find below a couple on non-blocking COMMENTs. I would really appreciate a reply to all these COMMENTs. I hope that this helps to improve the document, Regards, -éric == COMMENTS == I would not refer to BCP 14 (RFC 8174) as this is an architectural/framework document (informational) and not a protocol specification. It seems that most of the described tools are about synthetic traffic. Is there any other means to do OAM in SFC (not that I have any suggestion...)?. -- Section 1 -- About "to be applied to packets and/or frames", for me packets are layer-3 PDUs and frames are layer-2 PDUs. While I am not familiar with SFC, I could envision SFC being applied to transport or application layers PDUs. So, why restricting the use of this document to layers 2 and 3 only ? -- Section 2 -- Is there a reason why all 'virtual links' are not mentioned in this section? I.e., SR-IOv network, tun/tap, ... Similar question about why limiting the example of VM and not including containers ? -- Section 3 -- The word "performance" is often used in the document but it is not described in depth though: is it about the SF CPU/memory or 'client traffic' latency & throughput ? Section 4 partially addresses my question but not completely; also, adding forward pointers to section 4 would be nice. -- Section 4.3 -- Please bear with my ignorance of SFC world... but, if a SF is doing proxying / rewriting the application message, how useful is an end-to-end PMTUd check? As there are two stitched TCP connections ? The overall assumption of this section is that all SF are pure layer-3, leaving the IP header intact so that ECMP & TTL checks can be done. Is it always the case ? Section 5.2 addresses the above points, but, I suggest that section 4.3 to be restricted to ' link-layer OAM' -- Section 6.4.1 -- "TTL field in NSH header to 63", not familiar with NSH, but, if there is a TTL field in NSH, then it could be useful to point to the RFC & section describing it. Esp in a section whose title is "ICMP" (referring obviously to the IP header). -- Section 8 -- In this security section, I wonder whether the trace tool deserves a paragraph or two as if trusted while being forgeable/spoofed, then operators could trust a SFC which is "owned" and not reliable (i.e., with a bypass of some security SF). Trusting the security AD to raise a DISCUSS if they think it is a DISCUSS. == NITS == -- section 6.3 -- Is it really required to re-specify the use of bit O in NSH ? -- Section 6.4.1 -- Sigh... using the IPv4 terminology of TTL...
(Alissa Cooper; former steering group member) (was Discuss) No Objection
No Objection (2020-07-05)
Thanks for addressing my DISCUSS and COMMENT.
(Barry Leiba; former steering group member) No Objection
No Objection (2020-05-05 for -13)
Please use the new BCP 14 boilerplate exactly as in RFC 8174.
(Deborah Brungard; former steering group member) No Objection
( for -13)
(Magnus Westerlund; former steering group member) No Objection
( for -13)