Last Call Review of draft-ietf-bfd-seamless-use-case-04
review-ietf-bfd-seamless-use-case-04-genart-lc-worley-2016-04-05-00

Request Review of draft-ietf-bfd-seamless-use-case
Requested rev. no specific revision (document currently at 08)
Type Last Call Review
Team General Area Review Team (Gen-ART) (genart)
Deadline 2016-04-12
Requested 2016-03-24
Draft last updated 2016-04-05
Completed reviews Genart Last Call review of -04 by Dale Worley (diff)
Genart Telechat review of -06 by Dale Worley (diff)
Secdir Last Call review of -04 by Tobias Gondrom (diff)
Opsdir Telechat review of -05 by BenoƮt Claise (diff)
Assignment Reviewer Dale Worley
State Completed
Review review-ietf-bfd-seamless-use-case-04-genart-lc-worley-2016-04-05
Reviewed rev. 04 (document currently at 08)
Review result On the Right Track
Review completed: 2016-04-05

Review
review-ietf-bfd-seamless-use-case-04-genart-lc-worley-2016-04-05

I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other last call comments.

For more information, please see the FAQ at

<

http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Document: draft-ietf-bfd-seamless-use-case-04
Reviewer: Dale R. Worley
Review Date: 2016-04-01
IETF LC End Date: 2016-04-12
IESG Telechat date: 2016-05-05

Summary:

This draft is on the right track but has open issues, described in the
review.

Major issues:

In various places the description needs to be made clearer.  I believe
that the authors have a good idea of what is intended, but in some
places the descriptions are not clear to the general reader.

Nits/editorial comments:

There are various problems with English usage (e.g., missing articles)
and punctuation (e.g., excessive commas), which can be taken care of
by the Editor.  But the overall structure and clarity of several
paragraphs needs improvement.

----------------------------------------------------------------------

General

What is the meaning of "seamless"?  The term "seamless BFD" is used in
the title and in the title of section 2, "Introduction to Seamless
BFD", and in exactly one other place in the document:

   If this information is already known to the end-points of a potential
   BFD session, the initial handshake including an exchange of this
   node-specific information is unnecessary and it is possible for the
   end points to begin BFD messaging seamlessly.

At no point is "seamless BFD" or the specific meaning of "seamless"
defined.

I suspect that the authors have a strong intuitive sense of the
behaviors they identify as "seamless", and it would be helpful if that
could be stated in the Introduction.

Abstract

The Abstract reads:

   This document provides various use cases for Bidirectional Forwarding
   Detection (BFD) and various requirements such that extensions could
   be developed to allow for simplified detection of forwarding
   failures.

It seems unlikely that adding extensions to a protocol will "simplify"
it (other than in the case of "MPLS BFD Session Per ECMP Path"), so it
seems that the Abstract could be phrased better.

It seems like a major goal of the draft is making it possible to
accelerate the establishment of a BFD session.  But that is not
mentioned in the Abstract.

Section 1

   Bidirectional Forwarding Detection (BFD) is a lightweight protocol,
   as defined in [RFC5880], used to detect forwarding failures.  Various
   protocols and applications rely on BFD for failure detection.  Even
   though the protocol is simple, there are certain use cases, where
   faster setting up of sessions and continuity check of the data
   forwarding paths is necessary.  This document identifies various use
   cases and requirements related to those, such that necessary
   enhancements could be made to BFD protocol.

The phrase "Even though the protocol is simple" is not relevant to the
remainder of the sentence it appears in and probably can be deleted.

"This document..." would better be "This document identifies these use
cases and the consequent requirements for extensions to the BFD
protocol."

The phrase "continuity check of the data forwarding paths" seems to be
disconnected.  I suspect the problem is a lack of parallelism, due to
"setting up" and "check".  You probably want to say "faster setting up
of sessions and faster continuity checking of the data forwarding
paths".

The phrase "complexity, not only from an operations point of view, but
also in terms of the speed at which these sessions could be
established or deleted" attaches "speed" to "complexity", which isn't
quite correct.  Better would be "creates operational complexity, but
also causes undesirable delay in establishing or deleting sessions"

Section 2

The second paragraph says:

   In order for BFD to be able to initially verify that a
   connection is valid and that it connects the expected set of end
   points, it is necessary to provide the node information associated
   with the connection at each end point prior to initiating BFD
   sessions, such that this information can be used to verify that the
   connection is up and verifiable.

I think it would help if the nature of the "node information" was made
explicit.  It seems like this paragraph strongly related to the
aspect of BFD that is *not* defined in RFC 5880:

   The method of demultiplexing the initial packets (in which Your
   Discriminator is zero) is application dependent, and is thus outside
   the scope of this specification.

Presumably the "node information" is what is used to perform the
demultiplexing of the initial packets.  Explaining this in more detail
might make the design problem(s) clearer to the inexperienced reader.

The third paragraph seems to be about accelerating the establishment
of a BFD session between two nodes.  With baseline BFD, establishing a
session requires the two nodes to exchange BFD packets, which include
the discriminators assigned by each node to the session.  It seems
that a goal of this draft is to avoid needing to exchange the initial
packets before the BFD session is established, with the goal of
getting to the established state more quickly.  But this is not
explicitly stated, nor is the manner in which "seamless BFD" would
avoid it.  As far as I can tell, the problem is that before a session
is established, BFD is limited to sending one packet per second, and
so the establishment of a session requires one or two seconds,
regardless of the speed of the link.

If the time to establish a BFD session is of central concern, it would
be helpful to present an analysis of how long it takes baseline BFD to
establish a session, and how long it might take an alternative BFD
startup method to establish a session.

In addition to the discriminators, the initial BFD packets also
include the BFD packet interval parameters, "Detect Mult", "Desired
Min TX Interval", "Required Min RX Interval", and "Required Min Echo
RX Interval".  What allows BFD to have a very short Detection Time
in favorable situations is that the interval parameters can be much
shorter than one second.  But that implies that any system for
quick-starting a BFD session has to transmit the interval parameters
as well as the discriminators, or the BFD startup process still has to
exchange packets before the full sending rate has been established.

Then again, perhaps the phrase "node information" in this paragraph
includes the interval parameters, instead of just the discriminators
mentioned in the previous paragraph, in which case that should be made
clearer.

Is the fourth paragraph a description of the proposed "seamless BFD"
and how it differs from baseline BFD?

The fourth paragraph contains "Each of those network entities is
assigned a BFD discriminator, to establish a BFD session."  But this
seems to be incorrect -- each network entity is assigned a BFD
discriminator for each BFD session that the entity will participate in
(RFC 5880 section 6.3).  I can't tell whether this is a fundamental
misunderstanding on the part of the authors, merely incorrect wording,
or if S-BFD includes a technique by which a node can use the same
discriminator for all of its BFD sessions -- that should be clarified.

Section 3.1

This section isn't clear about the distinction between "verifying
forwarding in one direction only" and "not needing to provision the
target node, only the source node" -- the first is a relaxation on the
requirements on what BFD detects, the second is a strengthening on the
requirements on how BFD can be configured.

Despite saying that the target would not need to be configured, as
discussed in this section, BFD would still need to be configured at
the target node to know the discriminator of the source node:  "When
the targeted network entity receives the packet, it knows that BFD
packet, based on the discriminator and processes it."

I am not understanding the sense in which "unidirectional" is being
used.  It seems that the only need is to verify transmission in one
direction between the two nodes.  The target node can verify
successful transmission if it receives the control packets from the
source node.  But the source node can only know that transmission is
working if it receives reply nodes from the target node.  So despite
that only needing to test transmission in one direction, transmission
must be done in both directions.  Or the purpose to send the live/dead
determination to the "centralized controller", and it is not required
that the source know the state of the path?

Section 3.2

The first paragraph is

   BFD provides data delivery confidence when reachability validation is
   performed prior to traffic utilizing specific paths/LSPs.  However
   this comes with a cost, where, traffic is prevented to use such
   paths/LSPs until BFD is able to validate the reachability, which
   could take seconds due to BFD session bring-up sequences [RFC5880],
   LSP ping bootstrapping [RFC5884], etc.  This use case could be well
   supported by eliminating the need for session negotiation and
   discriminator exchanges in order to establish the BFD session.

As far as I can tell, the use case is "when reachability validation is
performed prior to traffic utilizing specific paths/LSPs".  But the
first sentence isn't structured to emphasize that, so it's difficult
to tell what "This use case" means.  Better would be some thing like

   This use case is when BFD is used to verify reachability before
   sending traffic via a path/LSP.  This comes with a cost, which is
   that traffic is prevented to use the path/LSP until BFD is able to
   validate the reachability, which could take seconds ... .  This use
   case would be better supported by eliminating the need for the
   initial BFD session negotiation.

The second paragraph says "All it takes is for the network entities to
know what the discriminator values to be used for the session."  But
as in section 2, the interval parameters must be configured as well
before a BFD session is functioning.

Section 3.3

The last two paragraphs are

   Traditional BFD session establishment and validation of the
   forwarding path must not become a bottleneck in the case of
   centralized traffic engineering.  If the controller or other
   centralized entity is able to instantly verify a forwarding path of
   the TE tunnel , it could steer the traffic onto the traffic
   engineered tunnel very quickly thus minimizing adverse effect on a
   service.  This is especially useful and needed when the scale of the
   network and number of TE tunnels is very high.

Don't use the word "instantly":  Nothing happens "instantly" if it
involves events at two or more physically distinct locations.
(299,792,458 metres per second -- It's not just a good idea, it's the
law!)

   The cost associated with BFD session negotiation and establishment of
   BFD sessions to identify valid paths is very high and providing
   network redundancy becomes a critical issue.

It would help to specify that the "cost" is primarily due to the time
delay:  "The cost associated with the time required for BFD session
negotiation and ... is very high when providing network redundancy is
a critical issue."

Section 3.4

The final paragraph is:

   To support this use case, BFD MUST be able to perform liveness
   detection initated from centralized controller for any given segment
   under its domain.

This isn't a requirement on BFD per se, it's a requirement on the
agents that implement BFD in nodes.  But that is not a protocol
requirement either, since this document isn't specifying a protocol
between a centralized controller and a BFD agent.  I think what is
intended is that there should be a standard way by which a centralized
controller can instruct the two BFD agents in two nodes to initiate a
BFD session along a path, and then can then monitor whether the BFD
session determines that the path between the nodes is working.  But if
so, that should be stated clearly.

Section 3.5

The final paragraph is:

   The established BFD session parameters and attributes like
   transmission interval, receiver interval, etc., MUST be modifiable
   without changing the state of the session.

Unfortunately, the term "state" has this definition (RFC 5880 section
4.1):

   State (Sta)

      The current BFD session state as seen by the transmitting system.
      Values are:

         0 -- AdminDown
         1 -- Down
         2 -- Init
         3 -- Up

It seems to me that the requirement is better captured by the last
sentence of the preceding paragraph:  "In these scenarios, it is
desirable for BFD to slow down, speed up, stop or resume at will witho
minimal [sic] additional BFD packets exchanged to establish a new or
modified session."  But that sentence is not quite good enough, since
what the preceding part of the paragraph asked for was "... with no
additional BFD packets exchanged", whereas the final sentence says
"minimal".

What is the requirement?  If it is "no additional packets", that's
clear.  If what is needed is a reduction in the additional packets, it
would help if there was an analysis of how many additional packets are
now needed and what potential reduction might be obtained, so that the
reader has some idea what "minimal" means.

Section 3.6

First, this use case needs to make it clear what it is testing:  That
a source node can send a packet to an anycast address, and that the
target node to which the packet is delivered can send a response
packet to the source node.  Of course, baseline BFD doesn't verify
that, because it does not provide for a set of BFD agents to
collectively form one endpoint of a BFD session.

Within that goal, there is an additional requirement that there is no
need to establish separate BFD sessions between the source node and
every node that receives for the anycast address.  But there is an
ambiguity -- is it required that target nodes that do not happen to
receive any of the BFD packets do not need to maintain any state, or
is it that the source node does not need to maintain separate state
for each target node?

Section 3.7

This section talks about fault isolation very abstractly.  Is there a
definition as to what constitutes fault isolation?  (Or is this
definition well-known in the routing world?)

Section 3.8

   With distributed architectures of BFD implementations, this can be
   protected, if a node was to run multiple BFD sessions to targets,
   hosted on different parts of the system (ex: different CPU
   instances).  This can reduce BFD false failures, resulting in more
   stable network.

This is true, but it is not clear what the new requirements are.  I
see in RFC 5880 section 6.3

   Since multiple BFD sessions may be running between two systems, there
   needs to be a mechanism for demultiplexing received BFD packets to
   the proper session.
   ...
   The method of demultiplexing the initial packets (in which Your
   Discriminator is zero) is application dependent, and is thus outside
   the scope of this specification.

Is the question one of how to demultiplex the initial packets from
multiple BFD sessions in the same source device?

Section 3.9

[no complaints]

Section 4

It would help if there were cross-references between the scenarios/use
cases and the requirements.

REQ#1

"MUST start processing for the discriminator" is unclear.  Does this
mean "MUST establish a session", "MUST be able to send a response", or
what?

REQ#2

See comments on section 3.1.

REQ#3

Does this include not needing to exchange interval parameters as well?

REQ#4

I suspect this requirement is only operational in the scenario of
section 3.4, a Segment Routed network.  It might be useful to qualify
the requirement this way, since otherwise "centralized controller" and
"segment" don't have a context.  Or is S-BFD only intended for
situations with a centralized controller?

REQ#5

See comments for section 3.5.

REQ#6

"This requirement does not require BFD session establishment with
every node hosting the anycast address." is not what is intended.
Rather, it should be something like appending "... without
establishing a separate BFD session with every node hosing the anycast
address" to the first session.  As written, the requirement "does not
require session establishment with every node" whereas the intention
is to "require that there not be session establishment with every
node".

REQ#7

See comments for section 3.7.