An Overview of Operations, Administration, and Maintenance (OAM) Tools
draft-ietf-opsawg-oam-overview-16
Discuss
Yes
(Benoît Claise)
(Ron Bonica)
No Objection
(Gonzalo Camarillo)
(Robert Sparks)
(Russ Housley)
(Sean Turner)
Note: This ballot was opened for revision 05 and is now closed.
Stewart Bryant Former IESG member
Discuss
Discuss
[Treat as non-blocking comment]
(2011-08-10)
Unknown
The following is an edited version of the comments received from the Routing Directory Review. Summary: There are significant concerns that need to be addressed before publication. The document does no needs to clearly identify the target audience. Since a document written as a tutorial for a beginner has different requirements from that written for a subject matter expert this clarification is important in terms of expectations in terms of depth and precision of the text. A tutorial document for the beginner would be most welcome considering the extent of OAM discussions that have taken place in the IETF and it is assumed by the reviewer that this is the intent of the document. To that end the document needs to - Include a “Historical Background” session that goes beyond the single sentence in Section 1 (“OAM was originally used in the world of telephony, and has been adopted in packet based networks”) Provide a clear view of OAM functionality and its relationship to various “planes” of networking (data plane, control plane, management plane). In particular, the importance of fate-sharing of OAM and user traffic flows in packet networks should be explained. Explicitly map the ideas, terms and methods that have been adopted from technologies owned by ITU-T and/or IEEE to IETF-owned technologies. If such a mapping is not possible, it should be explicitly stated. Explain in a neutral way points of contention regarding various OAM-related issues. The draft as written is is a partial annotated list of references to IETF and non-IETF protocols and mechanisms that deal with certain aspects of OAM in IP, IP/MPLS, MPLS-TP and Ethernet networks. The draft does not describe the underlying reasons for selecting particular protocols for description. It is not clear why the now obselete is ITU-T Y.1711 considered in detail. The reviewer proposed giving consideration to I.610 as a protocol, although I am not sufficinetly familiar with I.610 to determine its relevence. It should however be examined. Similarly it may be useful to introduce the reader to E-LMI (defined by MEF). In terms of MPLS-TP why is there no discussion of MPLS-TP fault management OAM - (draft-ietf-mpls-tp-fault-05) is omitted?? There are a number of readibility issues that arrise from the terms and concepts taken from the referenced documents having different meaning in these documents. E.g.,. in Section 4.1 the draft states that ICMP ping provides “connectivity verification for Internet Protocol”. However, in Section 3.2.4 the draft says that “connectivity verification function allows an MP to check whether it is connected to a peer MP or not”. Since MPs are not mentioned with regard to ICMP, it is not clear whether “connectivity verification” means the same thing in these two cases. In some cases the text is detailed beyond the needs of the beginner, whilst other imporatnt concepts are not detailed sufficiently for example: - The OWAMP TCP port information is not needed, whilst the IPPM - In Section 3.2.3 the draft defines the term “Maintenance Entity” (ME), whilst “Maintenance Entity Group” (MEG), a.k.a. “Maintenance Association (MA), is only defined by reference - In Section 4.5.2 the draft mentions security aspects of IPPM protocols. Howeverwhilst, these aspects are not even mentioned in Section 4.2. discussing BFD. The document therefore needs another pass to ensure consistency of detail. Major Issues: The concepts of data plane, control plane and management plane are not well explored in the draft and need to expained with their OAM context. ======= The relationship between OAM functionality and network management as presented in the draft is unclear. For example a. (Section 1) Other aspects associated with the OAM acronym, such as management, are outside the scope of this document <<Management is out of scope>> b. (Section 4.6.4) The FDI function is used by an LSR to report a defect to affected client layers, allowing them to suppress alarms about this defect << Alarms are arguable part of management >> c. (Section 4.7.2) When the ETH-CC function detects a defect, it reports one of the following defect conditions: i. Loss of continuity (LOC): Occurs when at least when no CCM messages have been received from a peer MEP during a period of 3.5 times the configured transmission period iii. Unexpected period: Occurs when the transmission period field in the CCM does not match the expected transmission period value << Since transmission period field in ETH-CC is defined by management, this defect reports a management issue> d. (Section 4.7.6) The Alarm Indication Signal indicates that a MEG should suppress alarms about a defect condition at a lower MEG level, i.e., since a defect has occurred in a lower hierarchy in the network, it should not be reported by the current node <<Alarms’ suppression again…>> e. (Section 4.7.9) The Y.1731 standard defines the frame format for Automatic Protection Switching frames. The protection switching operations are defined in other ITU-T standards. <<Whether PS is part of OAM seems to depend on which SDO is considering the problem and this needs to be made clear to the reader>> 3. OAM in connectionless vs. connection-oriented networks: a. (2a) above suggests that OAM is applicable only to connection-oriented networks (if you do not have connections, connection problems do not exist by definition) b. At the same time, the draft discusses ICMP Ping (Section 4.1) operating in connectionless IP networks, and Ethernet OAM (Sections 4.7 and 4.8) operating in connectionless Ethernet networks. The authors should define the scope of OAM explicitly and clearly - and then remove the sections dealing with protocols and mechanisms that happen to be out of this scope. In particular, explaining the relationship of each specific defect to a specific networking plane. MEs, MPs, MEPs and MIPs Caveat: It may well be that the problem is not with the draft but with the concept itself (or at least with the attempts to extend it to IP, IP/MPLS and MPLS-TP networks) Consider the following statements: 1. (Section 3.2.2) A Maintenance Entity (ME) is a point-to-point relationship between two Maintenance Points (MP). The connectivity between these Maintenance Points is managed and monitored by the OAM protocol. A pair of MPs engaged in an ME are connected by a Communication Link 2. (Section 3.2.3) A Maintenance Point (MP) is a functional entity that is defined at a node in the network, and either initiates or reacts to OAM messages. A Maintenance End Point (MEP) is one of the end points of an ME, and can initiate OAM messages and respond to them. A Maintenance Intermediate Point (MIP) is an intermediate point between two MEPs, that does not initiate OAM frames, but is able to respond to OAM frames that are destined to it, and to forward others. 3. (Section 3.2.3) The 802.1ag defines a finer distinction between Up MPs and Down MPs. An MP is a bridge interface, that is monitored by an OAM protocol… 4. (Section 4.1) ICMP provides a connectivity verification function for the Internet Protocol… ICMP is also used in Traceroute for path discovery. An OAM beginner would not be able to answer the following questions: 1. Can a communication link exist without any MPs on it? 2. Suppose that I have defined a P2P bidirectional communication link with two MEPs forming an ME. What would happen to this ME if I add a MIP between the two MEPs? 3. What is the relationship (if any) between MEPs and interfaces? Or is it just something specific to Ethernet bridges? 4. Does a MIP really forward OAM frames that are not destined to it? 5. Operation of ICMP Ping does not require creation of MPs. How does it provide a connectivity verification function for IP? The authors need to remove conflicting definitions, to fix typos (e.g., the definition of ME would be less problematic if it referred to a pair of MEPs and not to a pair of MPs) and inaccurate statements (in IP, IP/MPLS and MPLS-TP MIPs (as a component) do NOT forward OAM packets that are not destined to them – but they do that in Ethernet OAM). Minor Issues: Connectivity Check vs. Continuity Check The draft mainly uses the term “Continuity Check”. However, in some places the term “Connectivity Check” is used as well, e.g.: 1. (Section 4.12) A key element in some of the OAM standards that are analyzed in this document is the continuity check. It is thus important to present a more detailed comparison of the connectivity check mechanisms defined in OAM standards. 2. (Section 4.3) LSP Ping extends the basic ICMP Ping operation (of data-plane connectivity and continuity check)… Please look at the use of the terms and ensure they are applied consistently. Caveat: Similar inconsistency in IEEE 802.1ag (but not in ITU-T Y.1731). Continuity Check vs. Connectivity Verification In Section 3.2.4. the draft refers to RFC 5860 as the ultimate source of information about the difference between Continuity Check and Connectivity Verification. Looking up RFC 5860 (Section 2.2.3), I’ve learned that connectivity verification is a function that allows an End Point to find out whether it is connected to a specific End Point(s) by means of an expected PW, LSP or Section. At the same time, the draft says (in the same Section 3.2.4) that “A connectivity verification function allows an MP to check whether it is connected to a peer MP or not”. The omitted words from RFC 5860 “by means of…” make such a definition unclear; also it is unclear whether End Points (of Section, LSP or PW) which, presumably, are MEPs, can be extended to be MEPs or MIPs (the draft uses the term MPs). It is also not clear whether the draft considers LSP Ping (see Section 4.3.) functionality “to verify data-plane vs. control-plane consistency for a Forwarding Equivalence Class (FEC)” as related to Connectivity Verification. This is especially strange since the draft also states (in the same section) that “LSP Ping extends the basic ICMP Ping operation” while Section 4.1 states that “ICMP provides a connectivity verification function for the Internet Protocol”. Another problem is the statement (in Section 4.2.3) that “BFD Echo provides a connectivity verification function”, especially since draft-ietf-mpls-tp-cc-cv-rdi-05 in Section 3.5 expands format of the BFD control packets in order to provide CV function, while BFD Echo is not even mentioned in this document. It might be worth noting that we are not considering BFD Echo mode for MPLS-TP. Finally, the draft does not explain whether there is any correlation between the defects detected by the continuity check and those detected by connectivity verification (Section 4.10.3.1 looks a logical place for this). Inaccurate Representation of IEEE 802.1ag In Section 3.2.3 of the draft theer is the following text: “The 802.1ag defines a finer distinction between Up MPs and Down MPs. An MP is a bridge interface, that is monitored by an OAM protocol either in the direction facing the network, or in the direction facing the bridge. A Down MP is an MP that receives OAM packets from, and transmits them to the direction of the network. An Up MP receives OAM packets from, and transmits them to the direction of the bridging entity”. However IEEE 802.1ag states (see Section 22.1.3 of that document ) that: “All Up MEPs belonging to MAs that are attached to specific VIDs are placed between the Frame filtering entity (8.6.3) and the Port filtering entities (8.6.1, 8.6.2, and 8.6.4). Separately for each VLAN, there can be from zero to eight Up MEPs, ordered by increasing MD Level, from Frame filtering towards Port filtering”. That seems to imply that 802.1ag MEPs are NOT bridge interfaces (since there can be are multiple MEPs per VLAN and multiple VLANs per bridge interface). Defects, Faults and Failures In Section 3.2.5 the draft discusses the terms Defect, Fault and Failure. However, these terms seem to apply to the “communication link” the term needs to be clarified to indicate that this is a data plane entity, or the term data plane used in its place. At the same time, “Unexpected Period” and “Unexpected MEP” are mentioned as defects detected by ETH-CC in Section 4.7.2 even if, to the best of my understanding, these conditions are side effects of mis-configuration i.e., a management plane problem. VCCV: An OAM Mechanism or a Control Channel? In Section 4.4. the draft states that VCCV “provides end-to-end fault detection and diagnostics for PWs”. This seems to point that VCCV is an OAM mechanism/protocol. However, later in the same section is states that “The VCCV switching function provides a control channel associated with each PW… and allows sending OAM packets in-band with PW data”. And on the next line it explains that “VCCV currently supports the following OAM mechanisms: ICMP Ping, LSP Ping, and BFD” (which are all mentioned as OAM mechanisms providing continuity check and/or connectivity verification in the draft). So it remains completely unclear whether VCCV is an OAM mechanism or just a channel for separating user data from OAM flows. The issue here may well be historic because VCCV predates the modern ACH mechanism. This should be clarified in the text. MEs, MEGs and MEG levels The draft explicitly defines a Maintenance Entity (ME) in Section 3.2.2, but defers to MPLS-TP OAM Framework for the definition of the Maintenance Entity Group (MEG). The text defining ME in the draft differs from that in the MPLS-T_ OAM Framework document (see http://datatracker.ietf.org/doc/draft-ietf-mpls-tp-oam-framework/?include_text=1, Section 2.2). At the same time, it resembles the definition of ME in Section 3.1 of this document. MEG level is mentioned a couple of times in the draft, but the only explanation given (in Section 4.7.2) is “The MEG level is a 3-bit number that defines the level of hierarchy of the MEG”; and this seems to be the only text in the draft that deals with MEG hierarchy. A more details description should be provided. Differences between Approaches to Packet/Frame Loss Measurement There is no description the fundamental difference between two approaches to measuring packet loss – that of the IPPM WG (based on counting synthetic packets) and that of Y.1731 (based on counting the user packets), even if both are mentioned in the draft. MPLS-TP BTW provides a tool for doing loss measurement and notes that the instrumentation technique is independent of the method of making the measuremnet.
Benoît Claise Former IESG member
Yes
Yes
(for -14)
Unknown
Ron Bonica Former IESG member
Yes
Yes
()
Unknown
Adrian Farrel Former IESG member
(was Discuss)
No Objection
No Objection
(2011-08-08 for -06)
Unknown
The Ballot Text write-up seems missing the Technical Summary. --- I'm nervous of a document that makes a comparative analysis of OAM mechanisms developed in another SDO without seeking input from that SDO. --- idnits warns about the unnecessary 2119 boilerplate and the unresolved references. There is no reason for an I-D to reach this stage with theese warnings. Please clean up before passing to the RFC Editor. --- You say: o ICMP Echo request, also known as Ping, as defined in [ICMPv4], and [ICMPv6]. ICMP Ping is a very simple and basic mechanism in failure diagnosis, and is not traditionally associated with OAM, "Traditionally" gives me an image of my great grandfather hand-crafting packets from kiln-dried apple wood. You might want to find out which tools are most commonly used by network operators to diagnose their networks. According to that research and your definition of OAM, you will possibly find that ICMP Ping is very much associated with OAM. --- Odd that Section 1 calls out MPLS-TP and RFC 5860, butdoes not call out RFCs 4377 and 4378. --- Table 1 seems confused about whether it needs to make citations (in square brackets). It does not need to state "work in progress" for I-Ds that are referenced and marked as such in the references section. --- Table 1 seems to be missing some of the references used in the text. For example for p2mp LSP ping. Can you do a cross-check with the text? Actually, the table seems a bit mixed. Some protocols are listed, while in other areas you just list the requirements and frameworks. --- Did you consider discussing permformance metrics at other layers as part of the diagnostic toolset? You certainly seem open to OAM at "various layers." Have a look at draft-ietf-pmol-metrics-framework and maybe think about RFC 6076. --- Section 3.1 Add ACH, ETH, FEC, GAL, LDP, LOC, LOCV, MC, MTU, UC LSP is a Label Switched Path I thought the 'M' in ME and MIP stood for MEG --- Section 3.2.6 The table shows "System" for BFD Maintenance Point Terminology. It is not clear to me what that word means. --- Section 4.12 |BFD |BFD |Negotiat|UC |My Discr| Control Detection Time | | |Control|ed durin| |iminator| Expired | "My Discriminator"? Who are you? --- I should have liked Section 5 to have included a discussion of the security considertions of OAM in general, and the security provisions available for the various OAM mechanisms discussed. --- Should you include RFC 4950? --- NEW COMMENT I wonder if you need to also consider draft-ietf-trill-rbridge-channel
Gonzalo Camarillo Former IESG member
No Objection
No Objection
()
Unknown
Pete Resnick Former IESG member
No Objection
No Objection
(2011-08-11)
Unknown
RFC Editor note addresses my comment. And now, some snark and sarcasm for the amusement of my fellow ADs and anyone else who cares: <sarcasm>My ballot notwithstanding, I hereby object to the fact that this document (a) defines OAM and (b) does not normatively reference RFC 6291/BCP 161. (*snort*)</sarcasm>
Peter Saint-Andre Former IESG member
No Objection
No Objection
(2011-08-08)
Unknown
1. The use of the term "localization" in the Abstract is potentially confusing, since localization in application protocols refers to presenting textual strings that are appropriate for a given locale. Perhaps the term "isolation" might be more appropriate? 2. This paragraph is confusing: o IP Performance Metrics (IPPM) is a working group in the IETF that defined common metrics for performance measurement, as well as a protocol for measuring delay and packet loss in IP networks. Alternative protocols for performance measurement are defined, for example, in MPLS-TP OAM [MPLS-TP OAM], and in Ethernet OAM [ITU-T Y.1731]. As far as I can see, MPLS-TP OAM and Ethernet OAM were not developed in the IETF's IPPM WG; I suggest moving the second sentence to a separate paragraph.
Ralph Droms Former IESG member
No Objection
No Objection
(2011-08-10)
Unknown
Minor editorial suggestions... In section 3.2.5, the word "intermittently" doesn't seem right. Perhaps "interchangeably"? I was OK with this sentence in section 3.2.5: The terms Failure, Fault, and Defect are intermittently used in the standards, [...] until I read in the next paragraph that ITU-T differentiates among the three terms. Perhaps the quoted sentence should specify which standards? Also in the title of 3.2.6?
Robert Sparks Former IESG member
No Objection
No Objection
()
Unknown
Russ Housley Former IESG member
(was Discuss)
No Objection
No Objection
(for -06)
Unknown
Sean Turner Former IESG member
(was Discuss)
No Objection
No Objection
(for -06)
Unknown
Wesley Eddy Former IESG member
No Objection
No Objection
(2011-08-06)
Unknown
IPPM has defined other metrics that aren't mentioned here (e.g. duplication and reordering) ... is there a reason why those aren't included? It was also unclear if psamp, netflow, and ipfix were excluded for a reason.