Diameter Overload Control Requirements
RFC 7068

Note: This ballot was opened for revision 11 and is now closed.

(Benoît Claise) Yes

(Spencer Dawkins) Yes

Comment (2013-09-25 for -12)
No email
send info
I found this draft to be clear and well-written. 

I have some comments that you might wish to consider, along with any other comments you receive during the approval process.

This draft is distinguishing between overload and network congestion, but I'm seeing the word "congestion" in places that seem to be talking about overload, beginning in the Abstract (where the following text appears), and in the Introduction:

   When a Diameter server or agent becomes overloaded, it needs to be
   able to gracefully reduce its load, typically by informing clients to
   reduce sending traffic for some period of time.  Otherwise, it must
   continue to expend resources parsing and responding to Diameter
   messages, possibly resulting in congestion collapse.  The existing
                                   ^^^^^^^^^^
   Diameter mechanisms are not sufficient for this purpose.  This
   document describes the limitations of the existing mechanisms.
   Requirements for new overload management mechanisms are also
   provided.

Given the distinction this draft is making between overload and network congestion, is using "congestion" in this way helpful?

I'm not smart enough to suggest which term is appropriate in each case, but it might be helpful to do a quick scan and make sure each usage of "congest" has the intended meaning ...

In 1.2.  Causes of Overload

   External resources can include upstream Diameter nodes; for example,
   a Diameter agent can become effectively overloaded if one or more
   upstream nodes are overloaded.  While overload is not the same thing
   as network congestion, network congestion can reduce a Diameter nodes
   ability to process and respond to requests, thus contributing to
   overload.

Section 1.4.  Overload vs. Network Congestion explains the difference, but that's later in the document. Perhaps a forward reference here would be helpful?

In 7.2.  Performance

   REQ 11:  The solution MUST be able to operate in networks of
            different sizes.

it might be helpful to give some orders-of-magnitude clue on what this requirement means at the high end (recognizing that all these networks are growing).

In 7.3.  Heterogeneous Support for Solution

   REQ 17:  In a mixed environment with nodes that support the solution
            and that do not, the solution MUST NOT result in materially
            less useful throughput as would have resulted if the
            solution were not present.  It SHOULD result in less severe
            congestion in this environment.

it wasn't clear to me whether this requirement was talking about materially less throughput during periods of overload, or at any time. REQ 21 is clearly talking about periods of overload, but on REQ 17, I'm guessing.

Should "congestion" be "overload" in the last sentence?

Ignoring the part where we're talking about RFC 2119 language in a requirements document, in 7.4.  Granular Control

   REQ 22:  The solution MUST provide a way for a node to throttle the
            amount of traffic it receives from a peer node.  This
            throttling SHOULD be graded so that it can be applied
            gradually as offered load increases.  Overload is not a
            binary state; there may be degrees of overload.

I'm reading this as saying that a solution that cannot be applied gradually is still acceptable (SHOULDs aren't MUSTs). Is that right?

In 7.6.  Security

   REQ 27:  The solution MUST NOT provide new vulnerabilities to
            malicious attack, or increase the severity of any existing
            vulnerabilities.  This includes vulnerabilities to DoS and
            DDoS attacks as well as replay and man-in-the middle
            attacks.  Note that the Diameter base specification
                      ^^^^
            [RFC6733] lacks end to end security and this must be
            considered.  Note that this requirement was expressed at a
            ^^^^^^^^^^
            high level so as to not preclude any particular solution.
            Is is expected that the solution will address this in more
            detail.

The point between "^"s is explained more clearly in the Security Considerations section. I'd suggest either replacing this sentence with a pointer to Section 9 or (perhaps better) pulling the third paragraph from the Security Considerations section into this requirement.

In 9.5.  Compromised Hosts

   A compromised host that supports the Diameter overload control
   mechanism could be used for information gathering as well as for
   sending malicious information to any Diameter node that would
   normally accept information from it.  While is is beyond the scope of
                                               ^^^^^ perhaps "it is"?

(Jari Arkko) No Objection

(Stewart Bryant) No Objection

(Gonzalo Camarillo) No Objection

(Adrian Farrel) No Objection

Comment (2013-09-26 for -12)
No email
send info
I have no objection to the publication of this document.

If I was to be picky I would say that the use of "route" (as in "route a
request to a server") has the potential to cause some confusion. As far
as I can see there isn't any network-wide routing going on here and what
happens is that an Agent selects a Server to consume a request and
dispatches the request to that Server. I suppose that is a form of
routing, but unless the Agents can be arranged hierarchically towards
the Server, and unless there is some sort of mesh connectivity between
Agents, this is not really worthy of the term "routing". "Switch" or
"Dispatch" or even "Forward" may be better terms.

But this is a trivial point and you can completely ignore it if you
like.

(Stephen Farrell) No Objection

Comment (2013-09-26 for -12)
No email
send info
Good document, thanks.

My only question was what the wg are going to do if they
need to choose between two solutions, neither of which
meets all the MUST conditions. You don't say here, and
maybe that's for the best, but that is a long list of
requirements.

(Brian Haberman) No Objection

Barry Leiba No Objection

Comment (2013-09-25 for -12)
No email
send info
I found this to be a good read, and I learned some things from it.

On seeing Spencer's comments about "overload" vs "congestion", I looked for that, specifically.  To me, it seems that you are using the two as you mean to, and you are making a clean distinction, while recognizing that they relate to each other.  In particular, when you say that server overload can lead to congestion collapse, I think you mean exactly that, and are showing how the layers can interact.  Of course, if I'm reading that wrong, then take this as support for Spencer's comments.  :-)

In Section 1.3, a tiny micronit, but this nit is a pet peeve of mine, so:

   Modern Diameter networks, comprised of application layer multi-node
   deployments of Diameter elements

Correctly used, the whole comprises the parts (it's not "comprised of" them).  Please change "comprised of" to "comprising".

Hey, it shows that I really *read* it, yeh?

In the first sentence of Section 7, I would say "with the goals of addressing the issues described in Section 5"; "improving the issues" sounds odd to this native-English ear.

The requirements appear to be well thought out, and they fit together well.  There's often a gap in that regard in requirements documents, so good work there.

(Ted Lemon) No Objection

(Martin Stiemerling) No Objection

(Sean Turner) No Objection

Comment (2013-09-23 for -12)
No email
send info
Just nits so take 'em or leave 'em:

s1.2: r/it is dependent/it depends
s1.5: r/other protocols/protocol other

(Pete Resnick) No Record

Comment (2013-09-25 for -12)
No email
send info
I will attempt to finish my review before the telechat, but I wanted to send these out just in case I don't. Editorial silliness follows:

Section 1.1: I quixotically suggest the following as a replacement for the first paragraph:

   The uppercased keywords "MUST", "MUST NOT", and "SHOULD" in this
   document are NOT used as defined in [RFC2119]. Instead, they are to
   be interpreted as follows:

   - "MUST" means that the item is an absolute requirement for a
   solution proposed for addressing the problem described in this
   document.
   - "MUST NOT" means that the item is an absolute prohibition for a
   solution proposed for addressing the problem described in this
   document.
   - "SHOULD" means that there may exist valid reasons in particular
   circumstances for the solution not to address the particular item,
   but the full implications must be understood and carefully weighed
   before choosing a different course.
   
Section 1.4 and Section 2: Replace "the authors" with "this document", 3 occurrences total. This is a WG document; referring to "the authors" sounds weird.