6TiSCH Minimal Scheduling Function (MSF)
draft-ietf-6tisch-msf-18

Note: This ballot was opened for revision 11 and is now closed.

Erik Kline Yes

Comment (2020-09-11 for -17)
[[ nits ]]

[ section 2 ]

* "to do at each time slots" -> "to do at each time slot"

* "there is a frame to sent" -> "there is a frame to be sent"

[ section 4.3 ]

* "the pledge continue listening" -> "the pledge continues listening"

[ section 4.4 ]

* "After selected a JP" ->
  "After selecting a JP" or "After having selected a JP"

[ section 4.8 ]

* "AutRxCell" -> "AutoRxCell"?

* "for new pledge" -> "for new pledges"?

[ section 5.1 ]

* "used to both type of" -> "used for both type of"?

[ section 8 ]

* "consequence of randomly cell selection" ->
  "consequence of random cell selection"?

[ section 9 ]

* "define din" -> "defined in"

[ section 16 ]

* "considrations" -> "considerations"

* "to hand that packet" -> "to handle that packet", maybe

Alvaro Retana No Objection

Comment (2020-03-12 for -12)
(1) I support Roman's DISCUSS.


(2) The datatracker should point at draft-chang-6tisch-msf being replaced by this document.


(3) §2: "MSF RECOMMENDS the use of 3 slotframes."  Why isn't this REQUIRED?  How does an implementation signal to a neighboring node that a different number of slotframes are being used (or a different length, which is also RECOMMENDED later)?  It seems to me that RECOMMENDING may not be enough for an interoperable implementation...but I may also be missing something in 802.15.4 or rfc8180 (or somewhere else).  [BTW, the rfc2119 keyword is RECOMMENDED (not RECOMMENDS).]

I think the use of RECOMMENDED (vs REQUIRED) may be related to this text a couple of paragraphs before:

   A node implementing MSF SHOULD implement the Minimal 6TiSCH
   Configuration [RFC8180], which defines the "minimal cell", a single
   shared cell providing minimal connectivity between the nodes in the
   network.  The MSF implementation provided in this specification is
   based on the implementation of the Minimal 6TiSCH Configuration.
   However, an implementor MAY implement MSF based on other
   specifications as long as the specification defines a way to
   advertise the EB/DIO among the network.

I understand that a configuration other than rfc8180 is possible, but if this document is based on rfc8180, then it would be clearer if the language was stronger (s/SHOULD/MUST) with the understanding that the specification refers to that case.


(4) §4.3:

   While the exact behavior is implementation-specific, it is
   RECOMMENDED that after having received the first EB, a node keeps
   listen for at most MAX_EB_DELAY seconds until it has received EBs
   from NUM_NEIGHBOURS_TO_WAIT distinct neighbors, which is defined in
   [RFC8180].

rfc8180/§6.2 says that "after having received the first EB, a node MAY listen for at most MAX_EB_DELAY seconds until it has received EBs from NUM_NEIGHBOURS_TO_WAIT distinct neighbors."  The use of RECOMMENDED here is not consistent with the optional nature of the MAY.  


(5) Nits...

s/represents node's preferred parent/represents the node's preferred parent

s/no restrictions to go multiple MSF sessions/no restrictions to use (?) multiple MSF sessions

s/One of the algorithm met the rule/One of the algorithms that meet the rule

s/Alternative behaviors may involved/Alternative behaviors may be involved

s/when alternative security/when an alternative security

s/node keeps listen/node keeps listening

s/pairs of following counters/pairs of the following counters

Benjamin Kaduk (was Discuss) No Objection

Comment (2020-04-02 for -16)
Thanks for clarifying the non-issue nature of my original Discuss points!

Original COMMENT section preserved below (possibly stale).

I support Roman's Discuss -- we need more information for this to be a
useful reference; even what seem to be the official DASFAA 1997
proceedings (https://dblp.org/db/conf/dasfaa/dasfaa97) do not have an
associated document).

Basing various scheduling aspects on (a hash of) the EUI64 ties
functionality to a persistent identifier for a device.  How significant
a disruption would be incurred if a device periodically changes its
presented EUI64 for anonymization purposes?

There seems to be a general pattern of "if you don't have a
6P-negotiated Tx cell, install and AutoTxCell to send your one message
and then remove it after sending"; I wonder if it would be easier on the
reader to consolidate this as a general principle and not repeat the
details every time it occurs.

Requirements Language

"NOT RECOMMENDED" is not in the RFC2119 boilerplate (but is a BCP 14 keyword).

Section 1

   the 6 steps described in Section 4.  The end state of the join
   process is that the node is synchronized to the network, has mutually
   authenticated to the network, has identified a routing parent, and

nit(?): I guess maybe "mutually authenticated with" is more correct for
the bidirectional operation.

   It does so for 3 reasons: to match the link-layer resources to the
   traffic, to handle changing parent, to handle a schedule collision.

nit: end the list with "or" (or "and"?).

   MSF works closely with RPL, specifically the routing parent defined
   in [RFC6550].  This specification only describes how MSF works with
   one routing parent, which is phrased as "selected parent".  The

nit: I suggest '''one routing parent; this parent is referred to as the
"selected parent"'''.

   activity of MSF towards to single routing parent is called as a "MSF

nit: "towards the"

   *  We added sections on the interface to the minimal 6TiSCH
      configuration (Section 2), the use of the SIGNAL command
      (Section 6), the MSF constants (Section 14), the MSF statistics
      (Section 15).

nit: end the list with "and".

Section 2

   In a TSCH network, time is sliced up into time slots.  The time slots
   are grouped as one of more slotframes which repeat over time.  The

nit(?): should this be "one or more"?

   channel) is indicated as a cell of TSCH schedule.  MSF is one of the
   policies defining how to manage the TSCH schedule.

nit: if there is only one such policy active at a given time for a given
network, I suggest "MSF is a policy for managing the TCSH schedule".
(If multiple policies are active simultaneously, no change is needed.)

   MSF uses the minimal cell for broadcast frames such as Enhanced
   Beacons (EBs) [IEEE802154] and broadcast DODAG Information Objects
   (DIOs) [RFC6550].  Cells scheduled by MSF are meant to be used only
   for unicast frames.

If this paragraph was moved before the previous paragraph, then EB and
DIO would be defined before their first usage.

   bandwidth of minimal cell.  One of the algorithm met the rule is the
   Trickle timer defined in [RFC6206] which is applied on DIO messages
   [RFC6550].  However, any such algorithm of limiting the broadcast

nit(?): "One of the algorithms that fulfills this requirement"?

   MSF RECOMMENDS the use of 3 slotframes.  MSF schedules autonomous
   cells at Slotframe 1 (Section 3) and 6P negotiated cells at Slotframe
   2 (Section 5) , while Slotframe 0 is used for the bootstrap traffic
   as defined in the Minimal 6TiSCH Configuration.  It is RECOMMENDED to
   use the same slotframe length for Slotframe 0, 1 and 2.  Thus it is

Perhaps this is just a question of writing style, but if an
implementation is free to use an alternative SF or a variant of MSF,
could we not say that "MSF uses 3 slotframts", "MSF uses the same
slotframe length for", etc.?

Section 3

Is there any risk of unwanted correlation between slot and channel
offsets when using the same hash function and input for both
calculations?

   hash function.  Other optional parameters defined in SAX determine
   the performance of SAX hash function.  Those parameters could be
   broadcasted in EB frame or pre-configured.  For interoperability
   purposes, an example how the hash function is implemented is detailed
   in Appendix B.

Given the lack of usable reference for [SAX-DASFAA], I assume that the
content in Appendix B is going to be used as a specification, not just
an example.

   *  The AutoRxCell MUST always remain scheduled after synchronized.

nit: s/synchronized/synchronization/

   AutoRxCell.  In case of conflicting with a negotiated cell,
   autonomous cells take precedence over negotiated cell, which is
   stated in [IEEE802154].  However, when the Slotframe 0, 1 and 2 use
   the same length value, it is possible for negotiated cell to avoid
   the collision with AutoRxCell.

Presumably this factors in to the recommendation to have the three
listed slotframes use the same length, but mentioning it explicitly
(whether here or where the recommendation is made) might be nice.

Section 4

   network.  Alternative behaviors may involved, for example, when
   alternative security solution is used for the network.  Section 4.1

nit: singular/plural mismatch "behaviors"/"solution is used"

Section 4.1

   A node implementing MSF SHOULD implement the Minimal Security
   Framework for 6TiSCH [I-D.ietf-6tisch-minimal-security].  As a

Didn't this get renamed to CoJP?

Section 4.2

I a little bit wonder if there is a better description than "available
frequencies" but don't have one to offer.

Section 4.3

   While the exact behavior is implementation-specific, it is
   RECOMMENDED that after having received the first EB, a node keeps
   listen for at most MAX_EB_DELAY seconds until it has received EBs
   from NUM_NEIGHBOURS_TO_WAIT distinct neighbors, which is defined in
   [RFC8180].

nit(?): this phrasing implies that only NUM_NEIGHBOURS_TO_WAIT is
defined in RFC 8180, but MAX_EB_DELAY is also defined there.

not-nit: this phrasing is ambiguous as to whether one of MAX_EB_DELAY
and NUM_NEIGHBOURS_TO_WAIT is sufficient to move to the next step or
whether both are required.

Section 4.4

   After selected a JP, a node generates a Join Request and installs an
   AutoTxCell to the JP.  The Join Request is then sent by the pledge to
   its JP over the AutoTxCell.  The AutoTxCell is removed by the pledge

editorial: I'd suggest s/its JP/its selected JP/

   Response is sent out.  The pledge receives the Join Response from its
   AutoRxCell, thereby learns the keying material used in the network,
   as well as other configurations, and becomes a "joined node".

nit: maybe "other configuration values" or "other configuration
settings"?

Section 4.6

   Once it has selected a routing parent, the joined node MUST generate
   a 6P ADD Request and install an AutoTxCell to that parent.  The 6P
   ADD Request is sent out through the AutoTxCell with the following
   fields:

   *  CellOptions: set to TX=1,RX=0,SHARED=0
   *  NumCells: set to 1
   *  CellList: at least 5 cells, chosen according to Section 8

Is this listing describing the contents of the ADD request or the
AuthTxCell used to send it?  (I presume the former, in which case I
suggest to use "containing" or similar in preference to "with".)

Section 5.1

   The goal of MSF is to manage the communication schedule in the 6TiSCH
   schedule in a distributed manner.  For a node, this translates into
   monitoring the current usage of the cells it has to the selected
   parent:

Is this goal strictly limited to traffic "to the selected parent" vs.
all traffic?

   *  If the node determines that the number of link-layer frames it is
      attempting to exchange with the selected parent per unit of time
      is larger than the capacity offered by the TSCH negotiated cells
      it has scheduled with it, the node issues a 6P ADD command to that
      parent to add cells to the TSCH schedule.
   *  If the traffic is lower than the capacity, the node issues a 6P
      DELETE command to that parent to delete cells from the TSCH
      schedule.

As written, this would potentially lead to oscillation when demand is
basically at capacity, due to the quantization of capacity.  Perhaps
some provisioning for hysteresis is appropriate?

   The cell option of cells listed in CellList in 6P Request frame
   SHOULD be either Tx=1 only or Rx=1 only.  Both NumCellsElapsed and
   NumCellsUsed counters can be used to both type of negotiated cells.

Would this be more clear as "(Tx=1,Rx=0) or (Tx=0,Rx=1)"?

   *  NumCellsElapsed is incremented by exactly 1 when the current cell
      is AutoRxCell.

This holds for all peers/parents we're keeping counters for, so the
AutoRxCell can get "double counted"?

   In case that a node booted or disappeared from the network, the cell
   reserved at the selected parent may be kept in the schedule forever.
   A clean-up mechanism MUST be provided to resolve this issue.  The
   clean-up mechanism is implementation-specific.  It could either be a
   periodic polling to the neighbors the nodes have negotiated cells
   with, or monitoring the activities on those cells.  The goal is to
   confirm those negotiated cells are not used anymore by the associated
   neighbors and remove them from the schedule.

I'm not sure that "monitoring the activities on those cells" is safe
with the current level of specification; if a node negotiates a 6P
transmit cell to a parent and uses it only sparingly, with the parent
eventually reclaiming it due to inactivity, I don't see a mechanism by
which the node will reliably discover the negotiated cell to be
nonfunctional and fall back to (e.g.) the corresponding AutoTxCell.  It
may be most prudent to just not mention that as an example (a "periodic
polling" procedure does not seem to have the same potential for
information skew)

Section 5.3

   schedule is executed and the node sends frames to that parent.  When
   NumTx reaches MAX_NUMTX, both NumTx and NumTxAck MUST be divided by
   2.  For example, when MAX_NUMTX is set to 256, from NumTx=255 and
   NumTxAck=127, the counters become NumTx=128 and NumTxAck=64 if one
   frame is sent to the parent with an Acknowledgment received.  This
   operation does not change the value of the PDR, but allows the
   counters to keep incrementing.  The value of MAX_NUMTX is
   implementation-specific.

Does MAX_NUMTX need to be a power of two (to avoid errors when the
division occurs)?

   4.  For any other cell, it compares its PDR against that of the cell
       with the highest PDR.  If the difference is larger than
       RELOCATE_PDRTHRES, it triggers the relocation of that cell using
       a 6P RELOCATE command.

The recommended RELOCATE_PDRTHRES is given as "50 %".  Is this
"difference" performed as a subtraction (so that if the highest PDR is
less than 50%, no cells can ever be relocated) or a ratio (a PDR that's
half than the maximum PDR or smaller will trigger relocation)?

Section 7

Maybe reference Section 17.1 where the allocation will occur?

Section 8

   *  The slotOffset of a cell in the CellList SHOULD be randomly and
      uniformly chosen among all the slotOffset values that satisfy the
      restrictions above.
   *  The channelOffset of a cell in the CellList SHOULD be randomly and
      uniformly chosen in [0..numFrequencies], where numFrequencies
      represents the number of frequencies a node can communicate on.

Do these random selections need to be independent from each other?  (I
note that the selection for the autonomous cells are not.)

Section 9

Is there a reference for these three parameters (MAXBE, MAXRETRIES,
SLOTFRAME_LENGTH)?  SLOTFRAME_LENGTH seems new in this document and is
listed in the table in Section 14, but the other two are not listed
there.

Section 14

Why is MAX_NUMTX not listed in the table?

Can we really give a recommended NUM_CH_OFFSET value, since this is in
effect dependent on the number of channels available?

KA_PERIOD is defined but not used elsewhere in the document.

What are the considerations in using a power of 10 vs. a power of 2 as
MAX_NUM_CELLS?

Section 16

   MSF defines a series of "rules" for the node to follow.  It triggers
   several actions, that are carried out by the protocols defined in the
   following specifications: the Minimal IPv6 over the TSCH Mode of IEEE
   802.15.4e (6TiSCH) Configuration [RFC8180], the 6TiSCH Operation

I'd suggest a brief note that the security considerations of those
protocols continue to apply (even though it ought to be obvious);
reading them could help a reader understand the behavior of this
document as well.

   Sublayer Protocol (6P) [RFC8480], and the Minimal Security Framework
   for 6TiSCH [I-D.ietf-6tisch-minimal-security].  In particular, MSF

[CoJP again]

   prevent it from receiving the join response.  This situation should
   be detected through the absence of a particular node from the network
   and handled by the network administrator through out-of-band means,
   e.g. by moving the node outside the radio range of the attacker.

"the radio range of the attacker" is not exactly a fixed constant ...
attackers are not in general bound by legal limits and can increase Tx
power subject only to their equipment and budget.

   MSF adapts to traffics containing packets from IP layer.  It is
   possible that the IP packet has a non-zero DSCP (Diffserv Code Point
   [RFC2597]) value in its IPv6 header.  The decision whether to hand

RFC 2597 is talking more about specifically assured forwarding PHB groups
than "DSCP codepoint"s per se.

Section 18.1

RFC 6206 seems to only be used as an example (Trickle), and could
probably be informative.

RFC 8505 might also not need to be normative.

Appendix B

   In MSF, the T is replaced by the length slotframe 1.  String s is

nit: "length of"

   2.  sum the value of L_shift(h,l_bit), R_shift(h,r_bit) and ci

Is this addition performed in "infinite precision" integer arithmetic or
limited to the output width of h, e.g., by modular division?  (It's not
clear to me whether this is the role T plays or not.)

   8.  assign the result of Step 5 to h

The value from step 5 *is* h, so taken literally this says "assign h to
h" and is not needed.

Martin Vigoureux No Objection

Roman Danyliw (was Discuss) No Objection

Comment (2020-05-08 for -16)
No email
send info
Thanks for addressing my DISCUSS and COMMENTs.

Warren Kumari No Objection

Éric Vyncke No Objection

Comment (2020-03-17 for -12)
Thank you for the work put into this document. 

Please find below some non-blocking COMMENTs and NITs. An answer will be appreciated.

I hope that this helps to improve the document,

Regards,

-éric

== COMMENTS ==

As Alissa's comment, please use RFC 8174 boiler plate.

-- Section 3 --
Suggest to remove "The AutoRxCell MUST always remain scheduled after synchronized. *  6P CLEAR MUST NOT erase any autonomous cells." from the bulleted list and create a new paragraph for those 2 lines.
   
-- Section 4 --
The whole section seems to assume that the events will work as expected. But, what if this is not the case? E.g., the JP does not send any reply ?

-- Section 5.3 --
"we necessarily have NumTxAck <= NumTx" is only true is all nodes behave...

"MUST be divided by 2", the example is about 127 divided by 2 giving the unexpected value (to me at least) of 64... The text should clarify how rounding is handled as it is not a plain right shift by 1.

Step 2, is it also applicable to any value of MAX_NUMTX ? Including very small or very large ones ?

== NITS ==

-- section 5.2 --
To be checked by a native speaker but s/can have a node switch parent/can have a node switching parent/ would make the text easier to parse.

-- Section 14 --
Please order the rows of Figure 2.

(Suresh Krishnan; former steering group member) Yes

Yes ( for -11)
No email
send info

(Adam Roach; former steering group member) No Objection

No Objection ( for -12)
No email
send info

(Alissa Cooper; former steering group member) No Objection

No Objection (2020-03-11 for -12)
Please use the RFC 8174 boilerplate rather than the RFC 2119 boilerplate.

(Barry Leiba; former steering group member) No Objection

No Objection (2020-03-11 for -12)
I was going to ask that you expand “DODAG” in first use, because it’s not marked as sufficiently common in the RFC Editor’s abbreviation list.  But, really, I think the better answer is to ask the responsible AD to ask the RFC Editor to put that asterisk on both DAG and DODAG at this point.

(Deborah Brungard; former steering group member) No Objection

No Objection ( for -12)
No email
send info

(Mirja Kühlewind; former steering group member) No Objection

No Objection (2020-03-11 for -12)
I agree with Roman's discuss that the relation to SAX-DASFAA should be clarified and if this is actually needed for interoperability (as stated at some point in the text) it seems this should be part of the body of the document. Or what are the requirements for interoperability? What can be changed in the "example" algorithm and what not?

Two small technical points:
2) Sec 9; mostly double-checking as you probably know better than me:
"6P timeout value is calculated as ((2^MAXBE)-1)*MAXRETRIES*SLOTFRAME_LENGTH"
Often you calculate such a value and then multiply by 2 (or something) to be on the safe side, as there could be e.g. processing delays in the receiving node. I assume the assumption here is that you always need to get the response in the same/after one slot (?). If that is true, I guess the calculation is fine. But wanted to check that there cannot be any additional unknown delays here.

Further, these values come a bit out of nothing. Where are  MAXBE and MAXRETRIES defined? And if you have an exponential backoff that will stop retrying after MAXRETRIES why do you need also a timeout in addition to that?

2) Sec 16:
"   MSF adapts to traffics containing packets from IP layer.  It is
   possible that the IP packet has a non-zero DSCP (Diffserv Code Point
   [RFC2597]) value in its IPv6 header.  The decision whether to hand
   over that packet to MAC layer to transmit or to drop that packet
   belongs to the upper layer and is out of scope of MSF.  As long as
   the decision is made to hand over to MAC layer to transmit, MSF will
   take that packet into account when adapting to traffic."
Why should a packet be dropped based on it DSCP...? Maybe be a bit more neutral here like:
"   MSF adapts to traffics containing packets from IP layer.  It is
   possible that the IP packet has a non-zero DSCP (Diffserv Code Point
   [RFC2597]) value in its IPv6 header.  The decision how to handle
   belongs to the upper layer and is out of scope of MSF. As long as
   a decision is made to hand over to MAC layer to transmit, MSF will
   take that packet into account when adapting to traffic."

Some small editorial nits/comments:
1) Sec 1: 
- Maybe expand RPL on first occurrence.
- s/is called as a "MSF session"/is called a "MSF session"/

2) Sec 2
- s/one of more slotframes/one or more slotframes/

3) Sec 4.4
- Please expand JRC on first occurrence. Maybe add a glossary at the beginning?

4) Sec 5.1.
"   A node implementing MSF MUST implement the behavior described in this
   section."
Not sure if that sentence brings any additional value because that's what specs are for. But I guess it also doesn't hurt.
And respectively I find the statement in 5.3 rather confusing
"   A node implementing MSF SHOULD implement the behavior described in
   this section.  The "MUST" statements in this section hence only apply
   if the node implements schedule collision handling."
I'm not fully sure what this even means now. Can you explain? Can you maybe rather provide some text to explain when it could/MAY be appropriate to not implement it?

5) Sec 16:
"The implementation at IPv6 layer
   SHOULD ensure that this join traffic is rate-limited before it is
   passed to 6top sublayer where MSF can observe it. "
Maybe be less indirect here:
"The implementation at IPv6 layer
   SHOULD rate-limited join traffic before it is
   passed to 6top sublayer where MSF can observe it."

Also this wording is a bit unclear:
" How this rate limit is set is out of scope of MSF."
Maybe
" How this rate limit is implemented is out of scope of MSF.

6) "Appendix A.  Contributors" -> Usually Contributors is an own section in the body of the document and not part of the appendix but I'm sure the RFC editor will advise you correctly.