Benchmarking Methodology for Software-Defined Networking (SDN) Controller Performance
draft-ietf-bmwg-sdn-controller-benchmark-meth-09
Yes
Warren Kumari
No Objection
(Alvaro Retana)
(Deborah Brungard)
(Terry Manderson)
Note: This ballot was opened for revision 08 and is now closed.
Warren Kumari
Yes
Adam Roach Former IESG member
No Objection
No Objection
(2018-04-19 for -08)
Unknown
I again share Martin's concerns about the use of the word "standard" in this document's abstract and introduction.
Alissa Cooper Former IESG member
No Objection
No Objection
(2018-04-18 for -08)
Unknown
Regarding this text: "The test SHOULD use one of the test setups described in section 3.1 or section 3.2 of this document in combination with Appendix A." Appendix A is titled "Example Test Topology." If it's really an example, then it seems like it should not be normatively required. So either the appendix needs to be re-named, or the normative language needs to be removed. And if it is normatively required, why is it in an appendix? The document would also benefit from describing what the exception cases to the SHOULD are (I guess if the tester doesn't care about having comparable results with other tests?).
Alvaro Retana Former IESG member
No Objection
No Objection
(for -08)
Unknown
Benjamin Kaduk Former IESG member
No Objection
No Objection
(2018-04-19 for -08)
Unknown
In the Abstract: This document defines the methodologies for benchmarking control plane performance of SDN controllers. Why "the" methodologies? That seems more authoritative than is appropriate in an Informational document. Why do we need the test setup diagrams in both the terminology draft and this one? It seems like there is some excess redundancy, here. In Section 4.1, how can we even have a topology with just one network device? This "at least 1" seems too low. Similarly, how would TP1 and TP2 *not* be connected to the same node if there is only one device? Thank you for adding consideration to key distribution in Section 4.4, as noted by the secdir review. But insisting on having key distribution done prior to testing gives the impression that keys are distributed once and updated never, which has questionable security properties. Perhaps there is value in doing some testing while rekeyeing is in progress? I agree with others that the statistical methodology is not clearly justified, such as the sample size of 10 in Section 4.7 (with no consideration for sample relative variance), use of sample vs. population veriance, etc. It seems like the measurements being described sometimes start the timer at an event at a network element and other times start the timer when a message enters the SDN controller itself (similarly for outgoing messages), which seems to include a different treatment of propagation delays in the network, for different tests. Assuming these differences were made by conscious choice, it might be nice to describe why the network propagation is/is not included for any given measurement. It looks like the term "Nxrn" is introduced implicitly and the reader is supposed to infer that the 'n' represents a counter, with Nrx1 corresponding to the first measurement, Nrx2 the second, etc. It's probably worth mentioning this explicitly, for all fields that are measured on a per-trial/counter basis. I'm not sure that the end condition for the test in Section 5.2.2 makes sense. It seems like the test in Section 5.2.3 should not allow flexibility in "unique source and/or destination address" and rather should specify exactly what happens. In Section 5.3.1, only considering 2% of asynchronous messages as invalid implies a preconception about what might be the reason for such invalid messages, but that assumption might not hold in the case of an active attack, which may be somewhat different from the pure DoS scenario considered in the following section. Section 5.4.1 says "with incremental sequence number and source address" -- are both the sequence number and source address incrementing for each packet sent? This could be more clear. It also is a little jarring to refer to "test traffic generator TP2" when TP2 is just receiving traffic and not generating it. Appendix B.3 indicates that plain TCP or TLS can be used for communications between switch and controller. It seems like this would be a highly relevant test parameter to report with the results for the tests described in this document, since TLS would introduce additional overhead to be quantified! The figure in Section B.4.5 leaves me a little confused as to what is being measured, if the SDN Application is depicted as just spontaneously installing a flow at some time vaguely related to traffic generation but not dependent on or triggered by the traffic generation.
Deborah Brungard Former IESG member
No Objection
No Objection
(for -08)
Unknown
Eric Rescorla Former IESG member
No Objection
No Objection
(2018-04-17 for -08)
Unknown
Rich version of this review at: https://mozphab-ietf.devsvcdev.mozaws.net/D3948 COMMENTS > reported. > > 4.7. Test Repeatability > > To increase the confidence in measured result, it is recommended > that each test SHOULD be repeated a minimum of 10 times. Nit: you might be happier with "RECOMMENDED that each test be repeated ..." Also, where does 10 come from? Generally, the number of trials you need depends on the variance of each trial. > Test Reporting > > Each test has a reporting format that contains some global and > identical reporting components, and some individual components that > are specific to individual tests. The following test configuration > parameters and controller settings parameters MUST be reflected in This is an odd MUST, as it's not required for interop. > 5. Stop the trial when the discovered topology information matches > the deployed network topology, or when the discovered topology > information return the same details for 3 consecutive queries. > 6. Record the time last discovery message (Tmn) sent to controller > from the forwarding plane test emulator interface (I1) when the > trial completed successfully. (e.g., the topology matches). How large is the TD usually? How much does 3 seconds compare to that? > Total Trials > > SUM[SQUAREOF(Tri-TDm)] > Topology Discovery Time Variance (TDv) ---------------------- > Total Trials -1 > You probably don't need to specify individual formulas for mean and variance. However, you probably do want to explain why you are using the n-1 sample variance formula. > > Measurement: > > (R1-T1) + (R2-T2)..(Rn-Tn) > Asynchronous Message Processing Time Tr1 = ----------------------- > Nrx Incidentally, this formula is the same as \sum_i{R_i} - \sum_i{T_i} > messages transmitted to the controller. > > If this test is repeated with varying number of nodes with same > topology, the results SHOULD be reported in the form of a graph. The > X coordinate SHOULD be the Number of nodes (N), the Y coordinate > SHOULD be the average Asynchronous Message Processing Time. This is an odd metric because an implementation which handled overload by dropping every other message would look better than one which handled overload by queuing.
Ignas Bagdonas Former IESG member
No Objection
No Objection
(2018-04-19 for -08)
Unknown
The document seems to assume the OpenFlow dataplane abstraction model – which is one of the possible models; the practical applicability of such model to anything beyond experimental deployments is a completely separate question outside of the scope of this document. The methodology tends to apply to a broader set of central control based systems, and not only to the data plane operations – therefore the document seems to be setting at least something practically usable for benchmarking of such central control systems. Possibly the document could mention such assumptions made about the overall model where the methodology defined applies to. A nit: s/Khasanov Boris/Boris Khasanov, unless Boris himself would insist otherwise.
Martin Vigoureux Former IESG member
No Objection
No Objection
(2018-04-18 for -08)
Unknown
Hello, I have the same question/comment than on the companion document: I wonder about the use of the term "standard" in the abstract in view of the intended status of the document (Informational). Could the use of this word confuse the reader?
Mirja Kühlewind Former IESG member
No Objection
No Objection
(2018-04-18 for -08)
Unknown
Editorial comments: 1) sdn-controller-benchmark-term should probably rather be referred in the intro (instead of the abstract). 2) Is the test setup needed in both docs (this and sdn-controller-benchmark-term) or would a reference to sdn-controller-benchmark-term maybe be sufficient? 3) Appendix A.1 should probably also be moved to sdn-controller-benchmark-term
Spencer Dawkins Former IESG member
No Objection
No Objection
(2018-04-16 for -08)
Unknown
I have a few questions, at the No Objection level ... do the right thing, of course. I apologize for attempting to play amateur statistician, but it seems to me that this text 4.7. Test Repeatability To increase the confidence in measured result, it is recommended that each test SHOULD be repeated a minimum of 10 times. is recommending a heuristic, when I'd think that you'd want to repeat a test until the results seem to be converging on some measure of central tendency, given some acceptable margin of error, and this text Procedure: 1. Establish the network connections between controller and network nodes. 2. Query the controller for the discovered network topology information and compare it with the deployed network topology information. 3. If the comparison is successful, increase the number of nodes by 1 and repeat the trial. If the comparison is unsuccessful, decrease the number of nodes by 1 and repeat the trial. 4. Continue the trial until the comparison of step 3 is successful. 5. Record the number of nodes for the last trial (Ns) where the topology comparison was successful. seems to beg for a binary search, especially if you're testing whether a controller can support a large number of controllers ... This text Reference Test Setup: The test SHOULD use one of the test setups described in section 3.1 or section 3.2 of this document in combination with Appendix A. or some variation is repeated about 16 times, and I'm not understanding why this is using BCP 14 language, and if BCP 14 language is the right thing to do, I'm not understanding why it's always SHOULD. I get the part that this will help compare results, if two researchers are running the same tests. Is there more to the requirement than that? In this text, Procedure: 1. Perform the listed tests and launch a DoS attack towards controller while the trial is running. Note: DoS attacks can be launched on one of the following interfaces. a. Northbound (e.g., Query for flow entries continuously on northbound interface) b. Management (e.g., Ping requests to controller's management interface) c. Southbound (e.g., TCP SYN messages on southbound interface) is there a canonical description of "DoS attack" that researchers should be using, in order to compare results? These are just examples, right? Is the choice of [OpenFlow Switch Specification] ONF,"OpenFlow Switch Specification" Version 1.4.0 (Wire Protocol 0x05), October 14, 2013. intentional? I'm googling that the current version of OpenFlow is 1.5.1, from 2015.
Suresh Krishnan Former IESG member
No Objection
No Objection
(2018-04-19 for -08)
Unknown
I share Ignas's concern about this being too tightly associated with the OpenFlow model. * Section 4.1 The test cases SHOULD use Leaf-Spine topology with at least 1 Network Device in the topology for benchmarking. How is it even possible to have a leaf-spine topology with one Network Device?
Terry Manderson Former IESG member
No Objection
No Objection
(for -08)
Unknown