Ballot for draft-ietf-sfc-nsh-28

Comment (2017-09-27 for -24) Unknown

I provided long (and somewhat grumpy!) comments on the previous version of this document -- I'd like to thank the authors, especially Carlos for addressing them.
This version is, IMO, much improved.

Yes (for -23) Unknown

No Objection (2017-09-27 for -24) Unknown

I have the same concern as Kathleen's DISCUSS, and would have blocked the draft on the same grounds if such a position were not already in place. The "crunchy perimeter, soft center" model of security was flawed to start with; and, even in those arenas where it was once fashionable, it's starting to be considered dated (e.g., much of the traffic inside data centers is secured using TLS -- see the recent discussions in the TLS working group for evidence of this situation). More notably, this "unconditionally trusted network zone" approach to security has led to some spectacular exploits recently (cf. https://www.wired.com/2016/04/the-critical-hole-at-the-heart-of-cell-phone-infrastructure/). Rather than explicitly fostering this model, the security section really needs to normatively disallow it.

(n.b., I reviewed version -21 of the document -- but I don't find the changes between that version and -24 to address the issue Kathleen raises)

----

Section 3 says the following about reclassification behavior:

When the logical classifier performs re-
classification that results in a change of service path, it MUST
replace the existing NSH with a new NSH with the Base Header and
Service Path Header reflecting the new service path information
and MUST set the initial SI. The O bit, as well as unassigned
flags, MUST be copied transparently from the old NSH to a new
NSH. Metadata MAY be preserved in the new NSH.

I don't see anything here about copying the TTL. If the TTL isn't copied, you can end up with a stable (and unending) loop involving two classifiers (which seems even more damaging than usual, as the SI value won't generally survive a reclassification, right?). I would suggest adding "TTL" to the list of things that MUST be copied when reclassification occurs.

No Objection (2017-09-27 for -24) Unknown

I am agreeing with Kathleen's DISCUSS.

Also, have you thought about likelyhood of introducing new versions, and if it is likely, what kind of restrictions do you want to impose on future versions (e.g. requirements on backward compatibility) and what are the criteria for bumping the version number? For example, future versions must use the same Base Header and Service Path header, but can add new mandatory fields after that. Etc.

No Objection (2017-09-26 for -24) Unknown

(1) While describing the MD Type field, Section 2.2. (NSH Base Header) talks about the specific scenario in which "a device will support MD Type 0x1 (as per the MUST) metadata, yet be deployed in a network with MD Type 0x2 metadata packets", and it specifies that "the MD Type 0x1 node, MUST utilize the base header length field to determine the original payload offset if it requires access to the original packet/frame." This is the case where the node in question *does not* support MD Type 0x2, right? If so, then the specification above seems to go against (in the last sentence of the same paragraph): "Packets with MD Type values not supported by an implementation MUST be silently dropped." IOW, if the node doesn't support 0x2, why wouldn't it just drop the packet?

(2) Section 2.5.1. (Optional Variable Length Metadata) says that this document "does not make any assumption about Context Headers that are mandatory-to-implement or those that are mandatory-to-process. These considerations are deployment-specific." But the next couple of paragraphs specify explicit actions for them (mandatory-to-process):

Upon receipt of a packet that belongs to a given SFP, if a mandatory-
to-process context header is missing in that packet, the SFC-aware SF
MUST NOT process the packet and MUST log an error at least once per
the SPI for which the mandatory metadata is missing.

If multiple mandatory-to-process context headers are required for a
given SFP, the control plane MAY instruct the SFC-aware SF with the
order to consume these Context Headers. If no instructions are
provided and the SFC-aware SF will make use of or modify the specific
context header, then the SFC-aware SF MUST process these Context
Headers in the order they appear in an NSH packet.

Maybe I'm confused about considerations being deployment specific vs specifying what to do here. Can you please clarify?

(3) "SFFs MUST use the Service Path Header for selecting the next SF or SFF in the service path." Section 6 explains most of what has to be done -- what I think is still not clear in this document is where the information in Tables 1-4 comes from. There may be different ways for an SFF to learn that, and I would imagine that it is out-of-scope of this document. Please say so -- maybe there's a relevant reference to rfc7665 (?).

(4) Section 11.1. (NSH EtherType) seems out of place in this document because (1) the document doesn't discuss the transport itself, and (2) it is in the IANA section...

(5) What is the "IETF Base NSH MD Class" (Section 11.2.4)? Ahh, I see that Section 11.2.6 talks about "the type values owned by the IETF"; it would be good to say that MD Class 0x0000 is being assigned to the IETF (in 11.2.4).

Nits:

In section 2.2. (NSH Base Header), it would be nice to have a forward reference when the Service Index is first mentioned.

It may be nice to explicitly state in the description of the MD Type field (Section 2.2) that for length = 0x2 and MD Type = 0x2, there are in fact no optional context headers. (I know there's some text about this later in section 2.5.)

"...all domain edges MUST filter based on the carried protocol in the VxLAN-gpe". That "MUST" is out of place because the text is an example.

No Objection (2017-09-27 for -24) Unknown

Substantive:

- General: This is a mechanism to add metadata to user flows. There is very little discussion about how that metadata may relate to the application layer payloads. It's likely that some of those payloads will be encrypted by the user in an attempt to control what information is shared with middleboxes. I'd like to see some discussion about how this relates to the guidance in RFC 8165. (Note: I am on the fence about whether this should be a DISCUSS. But since "on the fence" is probably insufficient grounds for a DISCUSS, I'm leaving it as a comment.)

- General: I support Kathleen's DISCUSS points concerning integrity protection. The document leaves that up to the transporting protocol. I think it's reasonable to recommend that that protocol at least default to providing integrity protection unless there's a good reason not to.

-2.2, "version": How is the version field to be used by consumers? That is, what should a recipient do if the field contains a version number it doesn't support/recognize?

-2.2, MD type 0x0: "Implementations SHOULD silently
discard packets with MD Type 0x0."
Why not MUST?

-- MD type 0xF: "Implementations not explicitly configured to be part of
an experiment SHOULD silently discard packets with MD Type 0xF."
Why not MUST?

-2.2, Next Protocol Values:
Why are there 2 experimental values? (as opposed to 1, or, well, 3).

-2.3, last paragraph (and several other places):
This draft seems to take a position that a failed SFP means the service level flow fails. Are there no use cases where delivery of the service flow is critical and should happen even if the chain of middleboxes fails?

-2.4, paragraph starting with "An SFC-aware SF MUST receive the data semantics..."
I'm not sure what the intent of this paragraph is. Is that MUST really a statement of fact? Or is there really and expectation of an out-of-band delivery of some semantic definition?

-3, list item 1: "A service classifier MUST insert an NSH at the start of an SFP."
What if an initial classifier receives a packet that already has an NSH? Can multiple NSHs be stacked?

-7.1, last paragraph: "Depending on the information carried in the metadata, data privacy
considerations may need to be considered. "
"may need to be considered" is weak sauce. Data privacy always needs to be considered, even if the _output_ of that consideration is that there is nothing sensitive being carried. Please consider dropping the "may".

Also, this seems like an odd place to bury a privacy discussion. Please consider moving this to a "Privacy Considerations" section.

-8, first paragraph:
It seems like insider attacks are worth at least a mention when discussing a single operator environment as a mitigator against attacks.

-8.1, 2nd paragraph:
This doesn't seem like a single operator scenario, in the sense that part of the flow crosses a network that is not controlled by that operator.

-8.3, 4th paragraph: Please elaborate on what is meant by "obfuscating" subscriber identifying information (as opposed to "encrypting" or "leaving it out in the first place".)

Editorial:

-2.2, "O bit", last paragraph: "The configurable parameter MUST be
disabled by default."
Does "disabled" mean "unset" (or "set to zero")?

-2.2, "unassigned bits": "At reception, all
elements MUST NOT modify their actions based on these unknown bits."
Isn't that MUST NOT just a restatement of the "MUST ignore" from the previous sentence? There's no problem with reinforcing a point, but there shouldn't be multiple instances of the same 2119 requirement. Also, would logging a warning violate the "MUST NOT modify their actions/MUST ignore" requirement?

-8, first paragraph: "NSH is designed for use within operator environments."
Is there a missing "single" before "operator"?

No Objection (2017-09-28 for -25) Unknown

- Section 2.5.1., you might want to mention that no metadata are specified at this point in time.
Indeed, "New IETF Assigned Optional Variable Length Metadata Type Registry is specified in this doc., but empty

- Section 2.3
OPS question: SPI must be unique per admin domain? Otherwise, you're looking for trouble, right?
This would be typically addressed in an "Operational Considerations" section.
Where is my "Operational Considerations" section...?

- Section 2.4
Fixed length metadata.

This specification does not make any assumptions about the content of
the 16 byte Context Header that must be present when the MD Type
field is set to 1, and does not describe the structure or meaning of
the included metadata.

An SFC-aware SF MUST receive the data semantics first in order to
process the data placed in the mandatory context field. The data
semantics include both the allocation schema and the meaning of the
included data.

I understand that the order of the metadata in the Fixed Length Context Header is important, right? Should it be mentioned?
I understand that the fixed length metadata are specific per service, and that's the reason why there is no IANA for fixed length.
Should this be mentioned?

- if you publish a new version, change the order of these two paragraphs:

Unassigned bits: All other flag fields, marked U, are unassigned and
available for future use, see Section 11.2.1. Unassigned bits MUST
be set to zero upon origination, and MUST be ignored and preserved
unmodified by other NSH supporting elements. At reception, all
elements MUST NOT modify their actions based on these unknown bits.

Length: The total length, in 4-byte words, of the NSH including the
Base Header, the Service Path Header, the Fixed Length Context Header
or Variable Length Context Header(s). The length MUST be 0x6 for MD
Type equal to 0x1, and MUST be 0x2 or greater for MD Type equal to
0x2. The length of the NSH header MUST be an integer multiple of 4
bytes, thus variable length metadata is always padded out to a
multiple of 4 bytes.

Lacking some time before the telechat, but not worth deferring (there are enough DISCUSS'). FYI, I arrived at section 5.

No Objection (for -24) Unknown

No Objection (2017-11-06) Unknown

Thank you for adding the new text in S 8.2.1. I continue to be concerned about the challenge of securing this kind of protocol, but that's a bigger problem we as the IETF need to resolve.

No Objection (2017-10-16 for -26) Unknown

I'm clearing my discuss now, however, I don't think all of my comments have been adequately addressed. However, some points could be clarified such that these open points do not warrant to hold a discuss anymore.

--------------------
Old discuss text:
--------------------
I have a couple of comments on the design. I know, as always in IESG review state, it's probably too late to make any changes to the actual header format, therefore most of my comments are actually in the comment section below. I still decided to note them so at least people can consider these points. However, there are a few things that I need clarification for before publication, which I note in this section:

1) Sec 2.2
"SF/SFF/SFC Proxy/Classifier implementations that do not support SFC
OAM procedures SHOULD discard packets with O bit set, but MAY support
a configurable parameter to enable forwarding received SFC OAM
packets unmodified to the next element in the chain. Forwarding OAM
packets unmodified by SFC elements that do not support SFC OAM
procedures may be acceptable for a subset of OAM functions, but can
result in unexpected outcomes for others; thus, it is recommended to
analyze the impact of forwarding an OAM packet for all OAM functions
prior to enabling this behavior. The configurable parameter MUST be
disabled by default."
This part is really unclear to me and I believe needs to be further specified. Where should this configurable parameter be? In the Context header? Why don't you just use one of the unassigned bit to indicate if an unknown (OAM) packet should be forwarded or not?
Moreover, I also disagree with this text. If there is a bit/a way to indicate if a not supported OAM packet should be forwarded or not, it should just be defined like this, while any considerations if that bit should be set or not depend on the OAM function itself and do not need to be discussed here.
Finally, it is not well explained what an OAM packet is at all. Is that a 'fake' packet that is generated by the operator to actively test the (potentially newly configured) SFP? If so, why does a SF need to know if a packet is an OAM packet or not? Usually it's a bad idea to use different kind of traffic for testing compared to what will be used in operations. Please provide more explanation here!

2) section 2.4
"An SFC-aware SF MUST receive the data semantics first in order to
process the data placed in the mandatory context field. The data
semantics include both the allocation schema and the meaning of the
included data. How an SFC-aware SF gets the data semantics is
outside the scope of this specification."
This is really confusing to me. I think this is what you need an actually data semantics aka type field for in the base header. Or is there an actual reason to not put this information directly in the base header where it is need but instead assuming some magical way this information may take to reach the node? If the assumption is that the SF is configured to know based of the SFI what the content of the context header has to be, you a) need to say that in the draft, and b) that's really error-prone because it's really hard to tell if the conext header actually holds the information that you need or just random crap (of course depending of the expected data type of this information). In short, I think you really need a type field somewhere here. In any case, you really need to explain this more!
Also, the text further says:
"An SF or SFC Proxy that does not know the format or semantics of the
Context Header for an NSH with MD Type 1 MUST discard any packet with
such an NSH..."
How does the SFC proxy know that it knows the format or not if there is no type field or identifier that indicates what the format should be?

Also, a related question from me: why is the context header present in all types of NSH if there is no use for it defined in this document yet? Why is there no fixed length NSH without a context header then?

3) Section 2.5.1:
"If multiple instances of the same metadata are included in an NSH
packet, but the definition of that context header does not allow for
it, the SFC-aware SF MUST process the first instance and ignore
subsequent instances."
This seems error prone to me. If the same metadata appears multiple where it should not, that seems clearly like an error case for me. Just using the first one and proceed normally might not be the right thing to do. In any case I think such an occasion should at least be logged. If the multiple instances are just a copy of each other and carry the same information, it's probably okay to use that information and proceed. If the different instances carry different information, it maybe a bit dangerous to just use the first one and ignore others silently. In this case I would rather recommend to drop the packet...

4) In line with the second comment from the tsv-art review (thanks Wes!), I don't really understand why this documents says (sec 6) that there can be multiple next hops for the same SFP or SFs can be traversed in a different order. May understanding (from a quick look at RFC7665) would be that, if those things are needed e.g. for load balancing, then one should define different SFPs and the Classifier must have the knowledge that two SFP are equivalent and select them respectively. The reason why I'm really concerned about this is that usually a number of packets below to a flow and all packets belonging to the same flow just ideally take the same route. But usually only the Classifier has a notion of what a flow is and respectively will assign the SFI to the packets belonging to the same flow. If now any SF on the path can more or less randomly decided to forward packet belong to the same flow to one or another next nodes, I would assume that this is not only a problem for the flow, e.g. reordering, but also for SF itself in many cases.

------------------------
Old comment text:
------------------------

Further considerations:

1) I don't really see why a TTL in the base header is needed. I mostly understand why there is the Service Index in service header, also I think there should be better mean to validate that you SFP is correct and I ideally you should really not need this. However, loop prevention can be provided by both mechanism and moreover there is probably also often a TTL in the encapsulation protocol and loop prevention should really be a function of that forwarding protocol and not the NSH.

2) I don't see why you need the type field in the base header. This is fully redundant because because all you need is the length field. If the length is 0x2 it is what you have defined as type 1 if the length field is larger it is type 2. I also don't see the need for any other types in future because you also have the version field; if you need anything else you should go or probably have to go for a new version. Note that the general probably with unnecessary redundancy is that is add complexity. If you keep this redundancy you have to separately handle and implement the case(s) where the type is 1 but the length is larger than expected. If at all you could probably just use one bit to indicate that the length field is present and if not the length is 0x2. However, saving bits does not really seem to be a concern for you, so that might not actually be an advantage.

3) sec 2.5.1
"Unassigned bit: One unassigned bit is available for future use. This
bit MUST NOT be set, and MUST be ignored on receipt."
Is there an actual reason to have an unassigned bit here? Because I would assume that the type already provided enough flexibility for way to extend the metadata format in any way needed.

4) Also section 2.5.1:
" Length: Indicates the length of the variable metadata, in bytes. In
case the metadata length is not an integer number of 4-byte words,
the sender MUST add pad bytes immediately following the last metadata
byte to extend the metadata to an integer number of 4-byte words.
The receiver MUST round up the length field to the nearest 4-byte
word boundary, to locate and process the next field in the packet."
Your definition of the length field might be more error-prone than needed. It would probably be easier to simply define the length as 4-byte words, and the type of course defines the content of the metadata field and as such can simply define which part of the total metadata field holds certain data of a certain type and with part is padding.

5) And finally I have to say it is unclear to me why the SFI and SI field are described as a separate header. Given they have to be present in all SFH, I would consider them as two fields of the base header. But it is after all really just an editorial issue. However, all this together with my previous comments makes the protocol spec actually much more complicated than it needs to be...

6) I also have to agree to the last comment of the tsv-art review: I think it would have been nicer to not only described the NSH but also define mappings to a set of possible encapsulations because I would assume that for each encapsulation there are a couple specific considerations that need to be made to make things work successfully. I don't think that all encapsulations can be captured by general consideration and I cannot make up my mind to go through all cases in my head to figure out if there are things that needs to be noted.

No Objection (2017-09-26 for -24) Unknown

Thank you for responding to Wes Eddy's TSV-ART review of -19 (and, of course, for making text changes that seemed appropriate).

It seems to me that you describe expectations about the applicability of NSH in various places in the document, and in various ways. You might consider (for example) pulling the common elements of statements like (from Section 5)

Within a managed administrative domain, an operator can ensure that
the underlay MTU is sufficient to carry SFC traffic without requiring
fragmentation. Given that the intended scope of the NSH is within a
single provider's operational domain, that approach is sufficient.

and (from Section 8)

NSH is designed for use within operator environments. As such, it
does not include any mandatory security mechanisms. As with many
other protocols, without enhancements, the NSH encapsulation can be
spoofed and is subject to snooping and modification in transit.

However, the deployment scope (as defined in [RFC7665]) of the NSH
encapsulation is limited to a single network administrative domain as
a controlled environment, with trusted devices (e.g., a data center)
hence mitigating the risk of unauthorized manipulation of the
encapsulation headers or metadata. This controlled environment is an
important assumption for NSH. There is one additional important
assumption: All of the service functions used by an operator in
service chains are assumed to be selected and vetted by the operator.

into one section describing the applicability of NSH, appearing MUCH earlier in the document (the most detailed description of your expectations looks like it appears in the Security Considerations section, but parts of that description are applicable to the Fragmentation Considerations section, which appears three sections earlier in the document). The reader would have your intended applicability in mind much earlier and more clearly, and you could just invoke your expectations by reference when you need to explain how they apply elsewhere in the document, so the expectations in play would be consistent across mentions throughout the document.

I'm still bothered that this document doesn't explicitly mention ICMP blocking as a problem for PMTUD with IP encapsulations. We're just not good at path MTU discovery, so it seems useful to call this out explicitly when a document expects to use PMTUD. That way, people who use NSH will know to check for ICMP blocking on their networks before they receive their first trouble reports. This almost reached my threshold for balloting Discuss, so I'd hope you folks would consider that.

I see that the applicability of NSH includes encapsulations that don't provide a path MTU discovery mechanism, and that your resolution for those encapsulations is to log events when a "too big" packet is dropped. Could you educate me, as to whether all encapsulations detect that this is happening? It might be that encapsulations are using a fixed maximum MTU by definition, so that what you're logging is an attempt to send a payload that violates the protocol definition of the encapsulation, but I don't know that that's true in all cases, so thought I should ask.

I saw a suggestion from Joe Touch (in a response to the TSV-ART review) to consider looking at the terminology developed for draft-ietf-intarea-tunnels. I didn't see a reply to that suggestion, and I didn't see a reference to draft-ietf-intarea-tunnels in -24 - was this considered?

(I'm also asking because I want to keep track of whether people applying encapsulations find that document useful, of course)

(Joe's follow-up is at https://mailarchive.ietf.org/arch/msg/tsv-art/CsdWwR9B5_AB64D0eFl-KIE7_NA)

No Objection (2017-10-03 for -25) Unknown

Thanks for quickly addressing my DISCUSS and COMMENT points.

No Objection (for -24) Unknown

Abstain (2017-10-20 for -27) Unknown

Thank you very much for the new security considerations section.  The added text to fully explain the current state as well as the pitfalls of the solution is quite helpful.  The work put into understanding the issues (both from the security side as well as from the architecture/solution limitations) and documenting them was substantial and I hope it helps improve future work.  I do understand that the WG has hurdles to provide integrity protection, but also don't see immediate solutions.  I do appreciate the text that acknowledges this problem and that offers some mitigating options.  While I appreciate the changes, I'd like to see better options going forward in routing area protocols and as such, I will abstain since the security property of integrity is not addressed in this solution.

Network Service Header (NSH) draft-ietf-sfc-nsh-28

Network Service Header (NSH)
draft-ietf-sfc-nsh-28