Skip to main content

Benchmarking Methodology for EVPN and PBB-EVPN
draft-ietf-bmwg-evpntest-11

Discuss


Yes

Warren Kumari

No Objection


No Record

Deb Cooley
Francesca Palombini
Gunter Van de Velde
Jim Guichard
John Scudder
Mahesh Jethanandani
Orie Steele
Paul Wouters

Summary: Has 4 DISCUSSes. Has enough positions to pass once DISCUSS positions are resolved.

Erik Kline
Discuss
Discuss (2021-06-29 for -09) Sent
[S3.9]

* I think this should probably be easy to clear up, likely by clarifying
  the scope of the test.  The general point I'd like to discuss is that ND
  machinery has many more test cases than ARP REPLYs do, and it would be
  good to clarify what is to be tested.  For that matter, ARP has fun cases
  too.

  The test in S3.9 seems to be for L2 unicast traffic ("[s]end ... to the
  target IRB address"), but should there be tests for learning in other
  scenarios?

    (1) L2 broadcast Gratuitous ARP REQUESTs (GARP)?

    (2) L3 Multicast NAs to all-nodes (Solicited flag = 0)

    (3) L3 Multicast NAs to all-routers (Solicited flag = 0)

  My guess is it's simplest to craft text to just say the test covers
  unicast ARP REPLYs and unicast ND NAs, but the option exists to expand
  the test matrix as well, I suppose.
Comment (2021-06-29 for -09) Sent
[S3.3] [question]

* "Fail the DUT-CE link" means to cause a link failure that can be detected
  by the DUT (e.g., a lost of PHY signal or some means to detect a
  directly-connected cable was removed)?

[S3.9] [question]

* What does "Send X arp/neighbour discovery(ND)" mean, exactly?

  If these are meant to be ARP REPLY/Neighbor Advertisements (NAs) then
  being explicit about that seems helpful.

  Or are these ARP REQUESTs and NS Neighbor Solicitations for the IRB address?

* "Send ... to the target IRB address configured in EVPN instance" seems
  like the ARP REPLY/ND NAs are to be sent unicast.  If that's true, I
  think it would improve clarity to say they're unicast.
Murray Kucherawy
Discuss
Discuss (2021-06-30 for -09) Sent for earlier
[This is largely a repeat of Eric's DISCUSS position, which I noticed only after I wrote mine up.  I'll leave it as-is for the moment.]

The tests in Sections 3.3, 3.4, 4.3, and 4.4 each define two distinct things to be measured, but they appear to refer to themselves using a common name.  Feels like some text copy-paste happened here from previous sections, but wasn't fully developed.  For instance, Section 3.3 says:

   Measure the time taken for flushing these X MAC addresses.  Measure
   the time taken to relearn these X MAC in DUT.  The test is repeated
   for N times and the values are collected.  The flush and the
   relearning time is calculated by averaging the values obtained by N
   samples.  N is an arbitrary number to get a sufficient sample.  The
   time measured for each sample is denoted by T1,T2...Tn.  The
   measurement is carried out using external server which polls the DUT
   using automated scripts which measures the MAC counter value with
   timestamp.

   Flush rate = (T1+T2+..Tn)/N

   Relearning rate = (T1+T2+..Tn)/N

So you finish the definition appearing to define two distinct things using the same arithmetic over a single time series.  In other words, you call two different time series by the same name, and then give an identical formal definition for what are really two different measurements.

It seems to me what you actually meant was for these to be, maybe, "F1+F2+..+Fn" for the flush rate and "R1+R2+..+Rn" for the relearning rate, which makes it clear they are distinct quantities.
Comment (2021-06-30 for -09) Sent for earlier
I support Martin and Eric's DISCUSS positions about how some of the tests are characterized.

Like Eric, I found the number of typos, capitalization mistakes, and sentence fragments to be distracting.

Though its coverage of the WG process was good, the shepherd writeup declares the document status but avoided answering the other questions about its status (e.g., "Why is this the proper type of RFC?").  This seems to be a common omission, and I'm starting to wonder if we should just drop it from the template.

I think RFC 7432 and RFC 7623 should be normative, insofar as this document doesn't really make sense unless one understands those first.

Please use the correct boilerplate for BCP 14.  (See RFC 8174, Section 2.)  However, I also note that the only place the key words are used is Section 7, so it occurs to me you could avoid using BCP 14 altogether and basically say the same thing with the same force.

Section 1.2 defines "All-Active Redundancy Mode", but it's not used anywhere in this document other than its own definition.  Similarly, "AA", "ES", "Ethernet Tag", "RT", and "Sub interface" don't appear anywhere else.  "Single-Active Redundancy Mode" could be folded into the definition of "SA", since the latter is actually used elsewhere.
Zaheduzzaman Sarker
Discuss
Discuss (2021-07-01 for -09) Sent
Thanks for the attempt here, defining test cases are tricky. 

I think to use the test cases in this document effectively, followings need to correctly specified/addressed -

* All the tests are said to be run for N times and collect N samples. Are those N same? What should be the default value for the N samples? How do we decide that there are sufficient samples? I find it a bit odd that there is no recommendation provided here. There should be a min and max range of values the N can take so that we know we have done enough tests to conclude on the results. Is there are specific waiting time required between running those tests N times? This feel like very under specified to me.

* Section 3.10: what is the exit criteria of this test? how do we know the test was successful? 

* Section 3.12: I am missing a recommended duration of running the test here. I am supposed to take hourly reading but for how long time I should do that to get to an conclusion?

* May be I missed it but I haven't found any mention of non-test traffic if they are allowed during the test, if yes what kind of considerations should be  taken when interpreting the results. If they are not allowed during the test it should be written as well. 

I also support Martin and Eric's discuss.
Comment (2021-07-01 for -09) Sent
Some non-blocking comments -

* Section 2: 
   ** please expand abbreviated words in their first occurrence if not listed in well known terminologies. See the list : https://www.rfc-editor.org/materials/abbrev.expansion.txt 
   ** why is this called topology 1? are expected to have more typologies? Where are they? and it is actually not clear to me what is topology 1.

* Section 3.8: what is PPS?

* Section 3.9: how do you measure packet loss? is this loss measured for the whole run or with with a time window?

* Section 4.9 : "Then increment the scale of N by 5% of N till the limit is reached." Please clarify if the increment steps always 5% of initial value or N or N = N+5% of N.

Nits :

* Section 2 : Figure 2 - s/SHPFE3/SHPE3 or define SHPFE3.
Éric Vyncke
Discuss
Discuss (2021-06-30 for -09) Sent
Thank you for the work put into this document. I have some regrets about the amount of typos, bad capitalizations, ...

Special thanks for Sarah Banks' shepherd write-up about the WG process / consensus. 

Please find below 3 blocking DISCUSS points (but should be easy to address), some non-blocking COMMENT points (but replies would be appreciated), and some nits.

I hope that this helps to improve the document,

Regards,

-éric


== DISCUSS ==

Most of the tests are labelled 'rate' but what they measure is not 'rate' but 'time'. What did I fail to understand ?

-- Sections 3.3, 3.4, and 4.3 --
Perhaps did I fail to understand the purpose of this test but how can it be that "Flush Rate" is equal to "Relearning Rate" ? Should there be different T1 for flush / relearn ?

-- Section 3.9 --
Please s/ip/IPv4/
Comment (2021-06-30 for -09) Sent
== COMMENTS ==

As most of the test in section 4 are identical to those in section 3, is there a reason for having such repetition ? I.e., rather than having a single set of tests and mentioning the applicability of those tests to the PBB-EVPN scenarios.

-- Abstract --
Suggest to expand PBB as https://www.rfc-editor.org/materials/abbrev.expansion.txt does not have the compound PBB-EVPN. Same applies in section 1 where the expansion is used but not defined.


-- Section 2 --
Suggest to mention early that SH = single home and MH = multi-home

Strange use of 'traffic generator' for a box that also receives traffic. Is there a better word in BMWG ?

Unclear from figure 1 whether the link between SHPE3 and traffic generator acts as a "PE-CE link" as this link is the only unqualified one.

-- Section 3.1 --
In "X different source and destination MAC address for one vlan" (beside the singular form for "address") aren't X different source addresses enough to trigger the MAC learning ? I.e., no need to vary the destination MAC addresses.

Using external scripts to count learned MAC addresses appears very rudimentary and not very accurate...

-- Section 3.2 --
Suggest to use the same text or similar presentation for the measurement part as in section 3.1

-- Section 3.9 --
As it can be expected that most of the EVPN are dual-stack, I wonder whether the 2 single-stack measurements are useful as there could be an interaction between IPv4 and IPv6 learning. Unsure how to test it though (perhaps 50% IPv4 and 50 IPv6 ?).

-- Section 3.12 --
If "SOAK" is an acronym (based on the all uppercase), then please expand it, else use "Soak" 

== NITS ==

Please always use the all uppercase character for VLAN same for 'ipv6', 'arp', 'ip', ...

Also check punctuations as they should be followed by a space character.

-- Section 5 --
You may want to add a last name to Al in "to thank Al"
Warren Kumari
Yes
Roman Danyliw
No Objection
Comment (2021-06-30 for -09) Sent
Thank you to Robert Sparks for the SECDIR review.

** Section 3.* and Section 4.*  Are there recommended values for X, N, or F?  If this document is intended to help others benchmark EVPNs, it would useful to describe how to calculate the number of times to run the test or how many MACs to generate to get useful results.

** Section 3 and 4.  If all the metrics use Topology 1, and this is the only topology provided in the document, why repeat it each time in the description of each metric?

** Section 7.  Thanks for adding the text “Security features mentioned in the RFC 7432 will affect the test results” in response to the SECDIR review.  Is it possible to clarify this text?  Which RFC7432 security mechanisms are assumed to be in Topology 1?  Could a summary cross-walk of the RFC7432 reference security mechanisms against their impact on the metrics be easily documented?

** idnits returns:

== The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords -- however, there's a paragraph with
     a matching beginning. Boilerplate error?

  == Missing Reference: 'RFC7632' is mentioned on line 111, but not defined

  == Unused Reference: 'RFC2544' is defined on line 1248, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2899' is defined on line 1253, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC7623' is defined on line 1268, but no explicit
     reference was found in the text

** Editorial nits:
-- Whole document.  Editorial.  Choose either “VLAN” or “vlan” and use it consistently in the document.

-- Section 2.  Typo. s/support,Interior/support, Interior/

-- Section 2.  Typo. s/parameters.It/parameters. It/

-- Section 3.*. Typo in a few places. s/standrard/standard/g

-- Section 4.11. Typo s/process,CPU/process, CPU/
Deb Cooley
No Record
Francesca Palombini
No Record
Gunter Van de Velde
No Record
Jim Guichard
No Record
John Scudder
No Record
Mahesh Jethanandani
No Record
Orie Steele
No Record
Paul Wouters
No Record
Martin Duke Former IESG member
Discuss
Discuss [Treat as non-blocking comment] (2021-06-28 for -09) Sent
(3.8) (4.8) Why is packet loss measured in time? How is learning 2X MAC addresses relevant to the packet loss measurement at the traffic generator?
How long does the traffic generator have to wait to conclude that the packet is lost?

(3.9) Is a single failure to learn an address sufficient to determine that the device has reached capacity? Or could packet loss or some other phenomenon lose some addresses? In other words, be more precise on how polling reveals the capacity.

Is there some lower bound on the time between sending ARP/ND packets and querying the DUT?

(3.11, 3.12, 4.10, 4.11) Does the traffic generator send F frames in total or F ffs? The spec says both. Are there any constraints on F, perhaps an integer multiple of X?
Alvaro Retana Former IESG member
No Objection
No Objection (2021-06-28 for -09) Sent
The datatracker should be updated to indicate that this document replaces draft-kishjac-bmwg-evpntest.

Please reply to the rtg-dir review: https://mailarchive.ietf.org/arch/msg/rtg-dir/6cK9FKqvIIKEd_f6Ov1bKGfw_3w/#
Martin Vigoureux Former IESG member
No Objection
No Objection (2021-07-01 for -09) Not sent
It's not clear to me what this document is trying to achieve.
It is clear that it defines things to measure, but
- why these and not something else?
- what do the values reflect?
- what can be concluded using these measurements?
- what might affect the results obtained?
Also, all these measurements seems to be done in an idealistic situation where the systems are not under any other stress than the one needed to do the measure.



NITS
   All-Active Redundancy Mode: When all PEs attached to an Ethernet
   segment are allowed to forward known unicast traffic to/from that
   Ethernet segment for a given VLAN, then the Ethernet segment is
   defined to be operating in All-Active redundancy mode.

   AA: All Active mode
This doesn't seem to be used

   RT: TrafficGenerator.
This is never used and should be removed


VLAN should be capitalized


s/measured.The/measured. The/
s/support,Interior/support, Interior/
s/parameters.It/parameters. It/
s/standrard/standard/
s/sample.The/sample. The/
s/this documents/this document/
s/routes.But/routes.But/
s/process,CPU/process,CPU/
Robert Wilton Former IESG member
No Objection
No Objection (2021-07-01 for -09) Sent
Hi,

I would like to thank the authors for the work that they have put into this document, and also Sarah for the doc shepherd work that she did on improving the text.

A few nits that I spotted:

Sometimes this doc refers to the just "MAC" and sometimes "MAC address".  The prose would probably flow better if these were always "MAC address".

The requirements boilerplate text is section 1.1, doesn't look quite right, and should probably be:

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

In section 1.2:
  IRB: Integrated routing and bridging interface
Normally, I would expect IRB to just be "Integrated Routing and Bridging".  Given this term is only referenced once I would suggest changing the term and moving the "interface" to where the term is used (if needed).

In Section 3.2, it refers to the "data plane", but that should be "control plane".

Regards,
Rob