Skip to main content

Using the IPv6 Flow Label for Load Balancing in Server Farms
draft-ietf-intarea-flow-label-balancing-03

Revision differences

Document history

Date Rev. By Action
2014-01-14
03 (System) RFC Editor state changed to AUTH48-DONE from AUTH48
2014-01-03
03 (System) RFC Editor state changed to AUTH48 from RFC-EDITOR
2014-01-03
03 (System) RFC Editor state changed to RFC-EDITOR from EDIT
2013-11-18
03 Amy Vezza State changed to RFC Ed Queue from Approved-announcement sent
2013-11-15
03 (System) RFC Editor state changed to EDIT
2013-11-15
03 (System) Announcement was received by RFC Editor
2013-11-15
03 (System) IANA Action state changed to No IC from In Progress
2013-11-15
03 (System) IANA Action state changed to In Progress
2013-11-15
03 Cindy Morgan State changed to Approved-announcement sent from IESG Evaluation::AD Followup
2013-11-15
03 Cindy Morgan IESG has approved the document
2013-11-15
03 Cindy Morgan Closed "Approve" ballot
2013-11-15
03 Cindy Morgan Ballot approval text was generated
2013-11-15
03 Adrian Farrel [Ballot comment]
Thanks for addressing my Discuss
2013-11-15
03 Adrian Farrel [Ballot Position Update] Position for Adrian Farrel has been changed to No Objection from Discuss
2013-11-14
03 Joel Jaeggli [Ballot Position Update] Position for Joel Jaeggli has been changed to No Objection from Discuss
2013-11-04
03 (System) Sub state has been changed to AD Followup from Revised ID Needed
2013-11-04
03 Brian Carpenter IANA Review state changed to Version Changed - Review Needed from IANA OK - No Actions Needed
2013-11-04
03 Brian Carpenter New version available: draft-ietf-intarea-flow-label-balancing-03.txt
2013-11-03
02 Joel Jaeggli
[Ballot discuss]
I'm satisfied with the proposed text changes and will clear when the document is revved.

thanks
joel

this  started as a comment. it's …
[Ballot discuss]
I'm satisfied with the proposed text changes and will clear when the document is revved.

thanks
joel

this  started as a comment. it's bloomed into a discuss because I think there's a lot of issues to be dicussed.

(sorry, one minor edit, read the second discuss email rather than the first)

  If the flow label is in fact set to zero, it will not
  affect the information entropy of the IPv6 header.

certainly it will, since your assumption vis-a-vis label balancing is that it does augment (an l3 only hash)  or  substitute for entropy that might otherwise be found elsewhere, e.g. in the transport header.

Assume the case of load balancing a small number of sources to a single or small number of destination IPs (this is common for API services for example or lots of front-end --> back-backend application communication).

host 1 first tcp connection to dest 1 with a zero flow label using a source, dest, flow label, xor hashes to the same host behind dest 1 as host 1 second connection to dest 2 with a zero flow label. That's peachy if that's what you want, but not if you don't.

so as you note if you're using zero you probably want to do something else.
...

these two:

  o  Another method, for HTTP servers, is to operate a layer 7 reverse
      proxy in front of the server farm.  The reverse proxy will present
      a single IP address to the world, communicated to clients by a
      single AAAA record.  For each new client session (an incoming TCP
      connection and HTTP request), it will pick a particular server and
      proxy the session to it.  The act of proxying should be more
      efficient and less resource-intensive than the act of serving the
      required content.  The proxy must retain TCP state and proxy state
      for the duration of the session.  This TCP state could,
      potentially, include the incoming flow label value.

  o  A component of some load balancing systems is an SSL reverse proxy
      farm.  The individual SSL proxies handle all cryptographic aspects
      and exchange unencrypted HTTP with the actual servers.  Thus, from
      the load balancing point of view, this really looks just like a
      server farm, except that it's specialised for HTTPS.  Each proxy
      will retain SSL and TCP and maybe HTTP state for the duration of
      the session, and the TCP state could potentially include the flow
      label.

are simply variations of application aware proxy they could just as easy be imap(s) or smtp(s) or sip. The operative issue being that for the purposes of connection termination they are hosts not routers and they have to find the upper layer header (and potentially go as far as application protocol implementation).

...

      In all cases, the layer 3/4 load balancer has to
      recognize incoming packets as belonging to new or existing client
      sessions, and choose the target server or proxy so as to ensure
      persistence.

I get what you're trying to say but I think this is a a mis-statement. As you go on to deal with stateless load balancing correctly... A stateless L3+L4 load balancer doesn't care whether a connection is new or preexisting, persistence is product of the values associated with the hash being immutable over the duration for which they are required and no other condition.

...

1.  Balancers use various techniques to redirect traffic to a
          specific target server.

          - All servers are configured with the same IP address, they
          are all on the same LAN, and the load balancer sends directly
          to their individual MAC addresses.  In this case, return
          packets from the server to the client are sent back without
          passing through the balancer, a technique known as direct
          server return, but we are not concerned here with the return
          packets.

          - Each server has its own IP address, and the balancer uses an
          IP-in-IP tunnel to reach it.

          - Each server has its own IP address, and the balancer
          performs NAPT (network address and port translation) to
          deliver the client's packets to that address.

          The choice between these methods is not affected by use of the
          flow label.

You missed one that is rather common which is that there are multiple L3 next-hops  for the same destination IP (Layer-3 ECMP the same as is used to hash across router links). that's anycast in it's simplest form, and it's a pretty common technique using either bgp or an IGP of your choice.

...

jjaeggli@ca2-b2-re1# run show route 2620:102:8003:211::1/128

inet6.0: 14896 destinations, 59622 routes (14896 active, 0 holddown, 29172 hidden)
+ = Active Route, - = Last Active, * = Both

2620:102:8003:211::1/128
                  *[BGP/170] 17w6d 23:34:12, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                      to 2620:102:8003:200::13 via ae0.1721
                      to 2620:102:8003:200::14 via ae0.1721
                      to 2620:102:8003:200::15 via ae0.1721
                      to 2620:102:8003:200::16 via ae0.1721
                      to 2620:102:8003:200::18 via ae0.1721
                      to 2620:102:8003:200::19 via ae0.1721
                    > to 2620:102:8003:201::8 via ae1.2721
                      to 2620:102:8003:201::9 via ae1.2721
                      to 2620:102:8003:201::a via ae1.2721
                    [BGP/170] 7w4d 01:55:21, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::13 via ae0.1721
                    [BGP/170] 7w4d 01:55:28, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::14 via ae0.1721
                    [BGP/170] 7w4d 01:55:06, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::15 via ae0.1721
                    [BGP/170] 7w4d 01:55:08, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::16 via ae0.1721
                    [BGP/170] 7w4d 01:55:27, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::18 via ae0.1721
                    [BGP/170] 1d 05:58:20, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::19 via ae0.1721
                    [BGP/170] 17w6d 23:34:38, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:201::9 via ae1.2721
                    [BGP/170] 17w6d 23:33:43, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:201::a via ae1.2721
                    [BGP/170] 17w6d 20:20:44, MED 50, localpref 200, from 2620:102:8003::1
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::16 via ae0.1721

...

      2.  A layer 3/4 balancer must correctly handle Path MTU Discovery
          by forwarding relevant ICMPv6 packets in both directions.
          This too is not affected by use of the flow label.

icmp messages (this applies to v4 and v6) emitted by devices on the path from the server --> client that are sent back to the server IP are really problematic because the probability of them hashing to a different server is rather high, icmp messages emitted towards the client by path elements inclusive of the server ip are generally no problem. There are techniques to address that e.g. multicasting them towards the servers but that's rather non-scalable.

One could snarf the flow label and the destination off the offending packet (the one that's going back in the icmp6 type 2 payload to the sender) and use those as the source  and flow label for your icmpv6 ptb message since label-balance would insure delivery but that makes diagnostics kind of sketchy.

...

diagram in section 3

        ___|_______DNS-based____________|___
              |    load splitting    |
              |    (if used) occurs  |
              |    here   

dns based sever selection is actually something that occurs on a nameserver (the GTM) the client (e.g. when more than one record is served) or both e.g. by a GTM serving more that one RR.

mb-aye:~ jjaeggli$ host www.google.com
www.google.com has address 173.194.33.18
www.google.com has address 173.194.33.16
www.google.com has address 173.194.33.20
www.google.com has address 173.194.33.19
www.google.com has address 173.194.33.17
www.google.com has IPv6 address 2607:f8b0:400a:800::1010

....

  However, usage by the proxies seems unlikely to be cost-effective,
  because they must in any case process the application layer header,
  so in this document we focus only on layer 3/4 balancers.

As you note previously the flow label is in a fixed location in the  ip header, so cost isn't really that germain. They do have to look all the way in-to the application so it may not be that relevant.

...

  o  We are only concerned with IPv6 traffic in which the flow label
      value has been set at or near the source according to [RFC6437].

I can't see that it matters so long as it doesn't change midflow,  it's hard to know this with certainty since it's not immutable.

...

section 4

  2-tuple {source address, flow label}

What would be the gain in not using the source/dest/flow label on a stateless LB? you may be asserting that the dest has low entropy and maybe it does if you only have one, but if you have hundreds or thousands it may well.

The destination is required for the forwarding decision your might as well xor-across it too (and traditional layer-3 only load balancing in v4 typically used at least source/dest and generally protocol number). doing so has no state.

...

      A stateful layer 3/4 load balancer would apply its usual load
      distribution algorithm to the first packet of a session, and store
      the {2-tuple, server} association in a table so that subsequent
      packets belonging to the same session are forwarded to the same
      server.  Thus, for all subsequent packets of the session, it can
      ignore all IPv6 extension headers, which should lead to a
      performance benefit.  Whether this benefit is valuable will depend
      on engineering details of the specific load balancer.

This strikes me as a bit odd, as described it would be trivial to multiplex another connection  over this state entry (which might be desirable for some applications) simply by using the same source/dest/flow label which sounds like a huge DOS risk e.g. because you now have a fixed state entry pinned to a server so long as you keep sending packets.  It might also allow you bypass controls that you could apply to the first packet of each session, like for example initiating a new connection to a different destination port number. The other question is how do you know when the session ends. since you're not looking for fin/ack/fin/ack or RST your basically stuck keeping state with a timer or this state table will always be full.

This is not as the security considerations section states:

  The flow label does not significantly alter this situation.

with regard to dos/bypass risk. Rather techniques used today by stateful l3/l4 load balancers to mitigate dos risk statelessly to protect servers from large packet flows e.g. TCP syn cookies or apply gross security controls e.g. only port 80 connections for example) become ineffective because the attacker generating valid connections can know in advance what source/dest/flow label(s) will be used by to create connections through which packets can be delivered to servers (which is a rather different problem then it being hard to guess what someone else is using).
...

  Since the
  only state to be stored is the 2-tuple and the server identifier,
  storage requirements will be reduced.

and a timer apparently, either the 2 minute one from 3697 if we consider that still in force since 6437 doesn't mention it, or some other one.
...

  The association between the flow label value and
  the server is stored in a table (often called stick table) so that
  future connections using the same flow label can be sent to the same
  server.

This presumes the reuse of flow labels by a client. while 6437 presumes the existence of stateful flow label usage models, state entries that  last a long time relative to the tcp connections that use them are an expensive proposition for a network device.
2013-11-03
02 Joel Jaeggli Ballot discuss text updated for Joel Jaeggli
2013-10-10
02 Cindy Morgan State changed to IESG Evaluation::Revised I-D Needed from IESG Evaluation
2013-10-10
02 Amanda Baber IANA Review state changed to IANA OK - No Actions Needed from Version Changed - Review Needed
2013-10-10
02 Sean Turner [Ballot Position Update] New position, No Objection, has been recorded for Sean Turner
2013-10-10
02 Gonzalo Camarillo [Ballot Position Update] New position, No Objection, has been recorded for Gonzalo Camarillo
2013-10-10
02 Stephen Farrell [Ballot Position Update] New position, No Objection, has been recorded for Stephen Farrell
2013-10-10
02 Joel Jaeggli
[Ballot discuss]
this  started as a comment. it's bloomed into a discuss because I think there's a lot of issues to be dicussed.

(sorry, one …
[Ballot discuss]
this  started as a comment. it's bloomed into a discuss because I think there's a lot of issues to be dicussed.

(sorry, one minor edit, read the second discuss email rather than the first)

  If the flow label is in fact set to zero, it will not
  affect the information entropy of the IPv6 header.

certainly it will, since your assumption vis-a-vis label balancing is that it does augment (an l3 only hash)  or  substitute for entropy that might otherwise be found elsewhere, e.g. in the transport header.

Assume the case of load balancing a small number of sources to a single or small number of destination IPs (this is common for API services for example or lots of front-end --> back-backend application communication).

host 1 first tcp connection to dest 1 with a zero flow label using a source, dest, flow label, xor hashes to the same host behind dest 1 as host 1 second connection to dest 2 with a zero flow label. That's peachy if that's what you want, but not if you don't.

so as you note if you're using zero you probably want to do something else.
...

these two:

  o  Another method, for HTTP servers, is to operate a layer 7 reverse
      proxy in front of the server farm.  The reverse proxy will present
      a single IP address to the world, communicated to clients by a
      single AAAA record.  For each new client session (an incoming TCP
      connection and HTTP request), it will pick a particular server and
      proxy the session to it.  The act of proxying should be more
      efficient and less resource-intensive than the act of serving the
      required content.  The proxy must retain TCP state and proxy state
      for the duration of the session.  This TCP state could,
      potentially, include the incoming flow label value.

  o  A component of some load balancing systems is an SSL reverse proxy
      farm.  The individual SSL proxies handle all cryptographic aspects
      and exchange unencrypted HTTP with the actual servers.  Thus, from
      the load balancing point of view, this really looks just like a
      server farm, except that it's specialised for HTTPS.  Each proxy
      will retain SSL and TCP and maybe HTTP state for the duration of
      the session, and the TCP state could potentially include the flow
      label.

are simply variations of application aware proxy they could just as easy be imap(s) or smtp(s) or sip. The operative issue being that for the purposes of connection termination they are hosts not routers and they have to find the upper layer header (and potentially go as far as application protocol implementation).

...

      In all cases, the layer 3/4 load balancer has to
      recognize incoming packets as belonging to new or existing client
      sessions, and choose the target server or proxy so as to ensure
      persistence.

I get what you're trying to say but I think this is a a mis-statement. As you go on to deal with stateless load balancing correctly... A stateless L3+L4 load balancer doesn't care whether a connection is new or preexisting, persistence is product of the values associated with the hash being immutable over the duration for which they are required and no other condition.

...

1.  Balancers use various techniques to redirect traffic to a
          specific target server.

          - All servers are configured with the same IP address, they
          are all on the same LAN, and the load balancer sends directly
          to their individual MAC addresses.  In this case, return
          packets from the server to the client are sent back without
          passing through the balancer, a technique known as direct
          server return, but we are not concerned here with the return
          packets.

          - Each server has its own IP address, and the balancer uses an
          IP-in-IP tunnel to reach it.

          - Each server has its own IP address, and the balancer
          performs NAPT (network address and port translation) to
          deliver the client's packets to that address.

          The choice between these methods is not affected by use of the
          flow label.

You missed one that is rather common which is that there are multiple L3 next-hops  for the same destination IP (Layer-3 ECMP the same as is used to hash across router links). that's anycast in it's simplest form, and it's a pretty common technique using either bgp or an IGP of your choice.

...

jjaeggli@ca2-b2-re1# run show route 2620:102:8003:211::1/128

inet6.0: 14896 destinations, 59622 routes (14896 active, 0 holddown, 29172 hidden)
+ = Active Route, - = Last Active, * = Both

2620:102:8003:211::1/128
                  *[BGP/170] 17w6d 23:34:12, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                      to 2620:102:8003:200::13 via ae0.1721
                      to 2620:102:8003:200::14 via ae0.1721
                      to 2620:102:8003:200::15 via ae0.1721
                      to 2620:102:8003:200::16 via ae0.1721
                      to 2620:102:8003:200::18 via ae0.1721
                      to 2620:102:8003:200::19 via ae0.1721
                    > to 2620:102:8003:201::8 via ae1.2721
                      to 2620:102:8003:201::9 via ae1.2721
                      to 2620:102:8003:201::a via ae1.2721
                    [BGP/170] 7w4d 01:55:21, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::13 via ae0.1721
                    [BGP/170] 7w4d 01:55:28, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::14 via ae0.1721
                    [BGP/170] 7w4d 01:55:06, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::15 via ae0.1721
                    [BGP/170] 7w4d 01:55:08, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::16 via ae0.1721
                    [BGP/170] 7w4d 01:55:27, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::18 via ae0.1721
                    [BGP/170] 1d 05:58:20, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::19 via ae0.1721
                    [BGP/170] 17w6d 23:34:38, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:201::9 via ae1.2721
                    [BGP/170] 17w6d 23:33:43, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:201::a via ae1.2721
                    [BGP/170] 17w6d 20:20:44, MED 50, localpref 200, from 2620:102:8003::1
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::16 via ae0.1721

...

      2.  A layer 3/4 balancer must correctly handle Path MTU Discovery
          by forwarding relevant ICMPv6 packets in both directions.
          This too is not affected by use of the flow label.

icmp messages (this applies to v4 and v6) emitted by devices on the path from the server --> client that are sent back to the server IP are really problematic because the probability of them hashing to a different server is rather high, icmp messages emitted towards the client by path elements inclusive of the server ip are generally no problem. There are techniques to address that e.g. multicasting them towards the servers but that's rather non-scalable.

One could snarf the flow label and the destination off the offending packet (the one that's going back in the icmp6 type 2 payload to the sender) and use those as the source  and flow label for your icmpv6 ptb message since label-balance would insure delivery but that makes diagnostics kind of sketchy.

...

diagram in section 3

        ___|_______DNS-based____________|___
              |    load splitting    |
              |    (if used) occurs  |
              |    here   

dns based sever selection is actually something that occurs on a nameserver (the GTM) the client (e.g. when more than one record is served) or both e.g. by a GTM serving more that one RR.

mb-aye:~ jjaeggli$ host www.google.com
www.google.com has address 173.194.33.18
www.google.com has address 173.194.33.16
www.google.com has address 173.194.33.20
www.google.com has address 173.194.33.19
www.google.com has address 173.194.33.17
www.google.com has IPv6 address 2607:f8b0:400a:800::1010

....

  However, usage by the proxies seems unlikely to be cost-effective,
  because they must in any case process the application layer header,
  so in this document we focus only on layer 3/4 balancers.

As you note previously the flow label is in a fixed location in the  ip header, so cost isn't really that germain. They do have to look all the way in-to the application so it may not be that relevant.

...

  o  We are only concerned with IPv6 traffic in which the flow label
      value has been set at or near the source according to [RFC6437].

I can't see that it matters so long as it doesn't change midflow,  it's hard to know this with certainty since it's not immutable.

...

section 4

  2-tuple {source address, flow label}

What would be the gain in not using the source/dest/flow label on a stateless LB? you may be asserting that the dest has low entropy and maybe it does if you only have one, but if you have hundreds or thousands it may well.

The destination is required for the forwarding decision your might as well xor-across it too (and traditional layer-3 only load balancing in v4 typically used at least source/dest and generally protocol number). doing so has no state.

...

      A stateful layer 3/4 load balancer would apply its usual load
      distribution algorithm to the first packet of a session, and store
      the {2-tuple, server} association in a table so that subsequent
      packets belonging to the same session are forwarded to the same
      server.  Thus, for all subsequent packets of the session, it can
      ignore all IPv6 extension headers, which should lead to a
      performance benefit.  Whether this benefit is valuable will depend
      on engineering details of the specific load balancer.

This strikes me as a bit odd, as described it would be trivial to multiplex another connection  over this state entry (which might be desirable for some applications) simply by using the same source/dest/flow label which sounds like a huge DOS risk e.g. because you now have a fixed state entry pinned to a server so long as you keep sending packets.  It might also allow you bypass controls that you could apply to the first packet of each session, like for example initiating a new connection to a different destination port number. The other question is how do you know when the session ends. since you're not looking for fin/ack/fin/ack or RST your basically stuck keeping state with a timer or this state table will always be full.

This is not as the security considerations section states:

  The flow label does not significantly alter this situation.

with regard to dos/bypass risk. Rather techniques used today by stateful l3/l4 load balancers to mitigate dos risk statelessly to protect servers from large packet flows e.g. TCP syn cookies or apply gross security controls e.g. only port 80 connections for example) become ineffective because the attacker generating valid connections can know in advance what source/dest/flow label(s) will be used by to create connections through which packets can be delivered to servers (which is a rather different problem then it being hard to guess what someone else is using).
...

  Since the
  only state to be stored is the 2-tuple and the server identifier,
  storage requirements will be reduced.

and a timer apparently, either the 2 minute one from 3697 if we consider that still in force since 6437 doesn't mention it, or some other one.
...

  The association between the flow label value and
  the server is stored in a table (often called stick table) so that
  future connections using the same flow label can be sent to the same
  server.

This presumes the reuse of flow labels by a client. while 6437 presumes the existence of stateful flow label usage models, state entries that  last a long time relative to the tcp connections that use them are an expensive proposition for a network device.
2013-10-10
02 Joel Jaeggli Ballot discuss text updated for Joel Jaeggli
2013-10-10
02 Joel Jaeggli
[Ballot discuss]
this  started as a comment. it's bloomed into a discuss because I think there's a lot of issues to be dicussed.

  If …
[Ballot discuss]
this  started as a comment. it's bloomed into a discuss because I think there's a lot of issues to be dicussed.

  If the flow label is in fact set to zero, it will not
  affect the information entropy of the IPv6 header.

certainly it will, since your assumption vis-a-vis label balancing is that it does augment (an l3 only hash)  or  substitute for entropy that might otherwise be found elsewhere, e.g. in the transport header.

Assume the case of load balancing a small number of sources to a single or small number of destination IPs (this is common for API services for example or lots of front-end --> back-backend application communication).

host 1 first tcp connection to dest 1 with a zero flow label using a source, dest, flow label, xor hashes to the same host behind dest 1 as host 1 second connection to dest 2 with a zero flow label. That's peachy if that's what you want, but not if you don't.

so as you note if you're using zero you probably want to do something else.
...

these two:

  o  Another method, for HTTP servers, is to operate a layer 7 reverse
      proxy in front of the server farm.  The reverse proxy will present
      a single IP address to the world, communicated to clients by a
      single AAAA record.  For each new client session (an incoming TCP
      connection and HTTP request), it will pick a particular server and
      proxy the session to it.  The act of proxying should be more
      efficient and less resource-intensive than the act of serving the
      required content.  The proxy must retain TCP state and proxy state
      for the duration of the session.  This TCP state could,
      potentially, include the incoming flow label value.

  o  A component of some load balancing systems is an SSL reverse proxy
      farm.  The individual SSL proxies handle all cryptographic aspects
      and exchange unencrypted HTTP with the actual servers.  Thus, from
      the load balancing point of view, this really looks just like a
      server farm, except that it's specialised for HTTPS.  Each proxy
      will retain SSL and TCP and maybe HTTP state for the duration of
      the session, and the TCP state could potentially include the flow
      label.

are simply variations of application aware proxy they could just as easy be imap(s) or smtp(s) or sip. The operative issue being that for the purposes of connection termination they are hosts not routers and they have to find the upper layer header (and potentially go as far as application protocol implementation).

...

      In all cases, the layer 3/4 load balancer has to
      recognize incoming packets as belonging to new or existing client
      sessions, and choose the target server or proxy so as to ensure
      persistence.

I get what you're trying to say but I think this is a a mis-statement. As you go on to deal with stateless load balancing correctly... A stateless L3+L4 load balancer doesn't care whether a connection is new or preexisting, persistence is product of the values associated with the hash being immutable over the duration for which they are required and no other condition.

...

1.  Balancers use various techniques to redirect traffic to a
          specific target server.

          - All servers are configured with the same IP address, they
          are all on the same LAN, and the load balancer sends directly
          to their individual MAC addresses.  In this case, return
          packets from the server to the client are sent back without
          passing through the balancer, a technique known as direct
          server return, but we are not concerned here with the return
          packets.

          - Each server has its own IP address, and the balancer uses an
          IP-in-IP tunnel to reach it.

          - Each server has its own IP address, and the balancer
          performs NAPT (network address and port translation) to
          deliver the client's packets to that address.

          The choice between these methods is not affected by use of the
          flow label.

You missed one that is rather common which is that there are multiple L3 next-hops  for the same destination IP (Layer-3 ECMP the same as is used to hash across router links). that's anycast in it's simplest form, and it's a pretty common technique using either bgp or an IGP of your choice.

...

jjaeggli@ca2-b2-re1# run show route 2620:102:8003:211::1/128

inet6.0: 14896 destinations, 59622 routes (14896 active, 0 holddown, 29172 hidden)
+ = Active Route, - = Last Active, * = Both

2620:102:8003:211::1/128
                  *[BGP/170] 17w6d 23:34:12, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                      to 2620:102:8003:200::13 via ae0.1721
                      to 2620:102:8003:200::14 via ae0.1721
                      to 2620:102:8003:200::15 via ae0.1721
                      to 2620:102:8003:200::16 via ae0.1721
                      to 2620:102:8003:200::18 via ae0.1721
                      to 2620:102:8003:200::19 via ae0.1721
                    > to 2620:102:8003:201::8 via ae1.2721
                      to 2620:102:8003:201::9 via ae1.2721
                      to 2620:102:8003:201::a via ae1.2721
                    [BGP/170] 7w4d 01:55:21, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::13 via ae0.1721
                    [BGP/170] 7w4d 01:55:28, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::14 via ae0.1721
                    [BGP/170] 7w4d 01:55:06, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::15 via ae0.1721
                    [BGP/170] 7w4d 01:55:08, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::16 via ae0.1721
                    [BGP/170] 7w4d 01:55:27, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::18 via ae0.1721
                    [BGP/170] 1d 05:58:20, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::19 via ae0.1721
                    [BGP/170] 17w6d 23:34:38, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:201::9 via ae1.2721
                    [BGP/170] 17w6d 23:33:43, MED 50, localpref 200
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:201::a via ae1.2721
                    [BGP/170] 17w6d 20:20:44, MED 50, localpref 200, from 2620:102:8003::1
                      AS path: 64999 I, validation-state: unverified
                    > to 2620:102:8003:200::16 via ae0.1721

...

      2.  A layer 3/4 balancer must correctly handle Path MTU Discovery
          by forwarding relevant ICMPv6 packets in both directions.
          This too is not affected by use of the flow label.

icmp messages (this applies to v4 and v6) emitted by devices on the path from the server --> client that are sent back to the server IP are really problematic because the probability of them hashing to a different server is rather high, icmp messages emitted towards the client by path elements inclusive of the server ip are generally no problem. There are techniques to address that e.g. multicasting them towards the servers but that's rather non-scalable.

One could snarf the flow label and the source address off the offending packet (the one that's going back in the icmp6 type 2 payload) and use those as the source  and flow label for your icmpv6 ptb message since label-balance would insure delivery but that makes diagnostics kind of sketchy.

...

diagram in section 3

        ___|_______DNS-based____________|___
              |    load splitting    |
              |    (if used) occurs  |
              |    here   

dns based sever selection is actually something that occurs on a nameserver (the GTM) the client (e.g. when more than one record is served) or both e.g. by a GTM serving more that one RR.

mb-aye:~ jjaeggli$ host www.google.com
www.google.com has address 173.194.33.18
www.google.com has address 173.194.33.16
www.google.com has address 173.194.33.20
www.google.com has address 173.194.33.19
www.google.com has address 173.194.33.17
www.google.com has IPv6 address 2607:f8b0:400a:800::1010

....

  However, usage by the proxies seems unlikely to be cost-effective,
  because they must in any case process the application layer header,
  so in this document we focus only on layer 3/4 balancers.

As you note previously the flow label is in a fixed location in the  ip header, so cost isn't really that germain. They do have to look all the way in-to the application so it may not be that relevant.

...

  o  We are only concerned with IPv6 traffic in which the flow label
      value has been set at or near the source according to [RFC6437].

I can't see that it matters so long as it doesn't change midflow,  it's hard to know this with certainty since it's not immutable.

...

section 4

  2-tuple {source address, flow label}

What would be the gain in not using the source/dest/flow label on a stateless LB? you may be asserting that the dest has low entropy and maybe it does if you only have one, but if you have hundreds or thousands it may well.

The destination is required for the forwarding decision your might as well xor-across it too (and traditional layer-3 only load balancing in v4 typically used at least source/dest and generally protocol number). doing so has no state.

...

      A stateful layer 3/4 load balancer would apply its usual load
      distribution algorithm to the first packet of a session, and store
      the {2-tuple, server} association in a table so that subsequent
      packets belonging to the same session are forwarded to the same
      server.  Thus, for all subsequent packets of the session, it can
      ignore all IPv6 extension headers, which should lead to a
      performance benefit.  Whether this benefit is valuable will depend
      on engineering details of the specific load balancer.

This strikes me as a bit odd, as described it would be trivial to multiplex another connection  over this state entry (which might be desirable for some applications) simply by using the same source/dest/flow label which sounds like a huge DOS risk e.g. because you now have a fixed state entry pinned to a server so long as you keep sending packets.  It might also allow you bypass controls that you could apply to the first packet of each session, like for example initiating a new connection to a different destination port number. The other question is how do you know when the session ends. since you're not looking for fin/ack/fin/ack or RST your basically stuck keeping state with a timer or this state table will always be full.

This is not as the security considerations section states:

  The flow label does not significantly alter this situation.

with regard to dos/bypass risk. Rather techniques used today by stateful l3/l4 load balancers to mitigate dos risk statelessly to protect servers from large packet flows e.g. TCP syn cookies or apply gross security controls e.g. only port 80 connections for example) become ineffective because the attacker generating valid connections can know in advance what source/dest/flow label(s) will be used by to create connections through which packets can be delivered to servers (which is a rather different problem then it being hard to guess what someone else is using).
...

  Since the
  only state to be stored is the 2-tuple and the server identifier,
  storage requirements will be reduced.

and a timer apparently, either the 2 minute one from 3697 if we consider that still in force since 6437 doesn't mention it, or some other one.
...

  The association between the flow label value and
  the server is stored in a table (often called stick table) so that
  future connections using the same flow label can be sent to the same
  server.

This presumes the reuse of flow labels by a client. while 6437 presumes the existence of stateful flow label usage models, state entries that  last a long time relative to the tcp connections that use them are an expensive proposition for a network device.
2013-10-10
02 Joel Jaeggli [Ballot Position Update] New position, Discuss, has been recorded for Joel Jaeggli
2013-10-09
02 Stewart Bryant [Ballot Position Update] New position, No Objection, has been recorded for Stewart Bryant
2013-10-09
02 Pete Resnick [Ballot Position Update] New position, No Objection, has been recorded for Pete Resnick
2013-10-09
02 Jari Arkko [Ballot Position Update] New position, No Objection, has been recorded for Jari Arkko
2013-10-09
02 Adrian Farrel
[Ballot discuss]
This is a fine document, but I have concluded (see below) that it
is not the document it says it is. That causes …
[Ballot discuss]
This is a fine document, but I have concluded (see below) that it
is not the document it says it is. That causes me to place a Discuss
that I think can be very easily fixed by some minor changes to the
text...

In Section 1

  Load distribution is a slightly
  more general term than load balancing, but the latter is more
  commonly used.  Both terms refer to mechanisms that distribute the
  workload of a server farm among different servers in order to
  optimize performance.

In the context of server farms, the terms definitely apply as you
describe, but it is not right to say that load balancing means a
mechanism used to the distribute workload of a server farm. Please
reword to not curtail other people's use of the term. It may be enough
to say "Both terms can be used to refer to..." or "In this document, both
terms are used to refer to..." or "In the context of a server farm, both
terms refer to..."

Similarly, Section 3 is headed "Summary of Load Balancing Techniques"
but appears to be about load balancing techniques for server farms.
So maybe that should be rebranded.

Which leads me to ask the main meat of the Discussable point: is the
document title correct? Shouldn't it be "Using the IPv6 Flow Label for
Server Load Balancing in Server Farms"?
2013-10-09
02 Adrian Farrel [Ballot Position Update] New position, Discuss, has been recorded for Adrian Farrel
2013-10-08
02 Benoît Claise
[Ballot comment]


I see
      If the flow label of an incoming packet is non-zero, layer 3/4
      load balancers can …
[Ballot comment]


I see
      If the flow label of an incoming packet is non-zero, layer 3/4
      load balancers can use the 2-tuple {source address, flow label} as
      the session key for whatever load distribution algorithm they
      support.

And later on

  The association between the flow label value and
  the server is stored in a table (often called stick table) so that
  future connections using the same flow label can be sent to the same
  server.

Isn't it?
  The association between the source address/flow label value and
  the server is stored in a table (often called stick table) so that
  future connections using the same flow label can be sent to the same
  server.
2013-10-08
02 Benoît Claise [Ballot Position Update] New position, No Objection, has been recorded for Benoit Claise
2013-10-08
02 Barry Leiba
[Ballot comment]
One thought I had on reading this document is that it would seem to make sense as an Applicability Statement, rather than as …
[Ballot comment]
One thought I had on reading this document is that it would seem to make sense as an Applicability Statement, rather than as Informational.  However, the answers to questions 5 and 6 in the shepherd writeup convinced me that Informational is all that's appropriate at this time.
2013-10-08
02 Barry Leiba [Ballot Position Update] New position, No Objection, has been recorded for Barry Leiba
2013-10-08
02 Richard Barnes [Ballot Position Update] New position, No Objection, has been recorded for Richard Barnes
2013-10-08
02 Brian Haberman [Ballot Position Update] New position, Yes, has been recorded for Brian Haberman
2013-10-07
02 Brian Carpenter IANA Review state changed to Version Changed - Review Needed from IANA OK - No Actions Needed
2013-10-07
02 Brian Carpenter New version available: draft-ietf-intarea-flow-label-balancing-02.txt
2013-10-07
01 Spencer Dawkins [Ballot Position Update] New position, No Objection, has been recorded for Spencer Dawkins
2013-10-04
01 Martin Stiemerling
[Ballot comment]
Only one piece of text to comment on:
Section 2., paragraph 4:

>    A careful reading of RFC 6437 shows that for …
[Ballot comment]
Only one piece of text to comment on:
Section 2., paragraph 4:

>    A careful reading of RFC 6437 shows that for a given source accessing
>    a well-known TCP port at a given destination, the flow label is, in
>    effect, a substitute for the source port number, found at a fixed
>    position in the layer 3 header.

Where do you read this in RFC 6437? The text above sounds a bit mysterious in that respect.
Anyhow, even if RFC 6437 can be read in this way, your text is not correct as it stands. The flow label is in general not a substitute for TCP port number, as the port numbers are used at the end hosts to demultiplex the incoming traffic.

Here is a text proposal from my side to make your point much clearer:

A careful reading of RFC 6437 (according to Section X) shows that for load balancers relying on the flow label, the flow label is a substitute for the source port number, found at a fixed position in the layer 3 header,  for a given source accessing a well-known TCP port at a given destination.
2013-10-04
01 Martin Stiemerling [Ballot Position Update] New position, No Objection, has been recorded for Martin Stiemerling
2013-10-03
01 Ted Lemon State changed to IESG Evaluation from Waiting for Writeup
2013-10-03
01 Ted Lemon Placed on agenda for telechat - 2013-10-10
2013-10-03
01 Ted Lemon Ballot has been issued
2013-10-03
01 Ted Lemon [Ballot Position Update] New position, Yes, has been recorded for Ted Lemon
2013-10-03
01 Ted Lemon Created "Approve" ballot
2013-10-03
01 Ted Lemon Ballot writeup was changed
2013-09-30
01 (System) State changed to Waiting for Writeup from In Last Call (ends 2013-09-30)
2013-09-26
01 Tero Kivinen Request for Last Call review by SECDIR Completed: Ready. Reviewer: David Waltermire.
2013-09-24
01 (System) IANA Review state changed to IANA OK - No Actions Needed from IANA - Review Needed
2013-09-24
01 Amanda Baber
IESG/Authors/WG Chairs:

IANA has reviewed draft-ietf-intarea-flow-label-balancing-01, which is currently in Last Call, and has the following comments:

We understand that this document doesn't require …
IESG/Authors/WG Chairs:

IANA has reviewed draft-ietf-intarea-flow-label-balancing-01, which is currently in Last Call, and has the following comments:

We understand that this document doesn't require any IANA actions. IANA requests that the IANA Considerations section of the document remain in place upon publication.

If this assessment is not accurate, please respond as soon as possible.
2013-09-19
01 Jean Mahoney Request for Last Call review by GENART is assigned to Ben Campbell
2013-09-19
01 Jean Mahoney Request for Last Call review by GENART is assigned to Ben Campbell
2013-09-19
01 Tero Kivinen Request for Last Call review by SECDIR is assigned to David Waltermire
2013-09-19
01 Tero Kivinen Request for Last Call review by SECDIR is assigned to David Waltermire
2013-09-16
01 Amy Vezza IANA Review state changed to IANA - Review Needed
2013-09-16
01 Amy Vezza
The following Last Call announcement was sent out:

From: The IESG
To: IETF-Announce
CC:
Reply-To: ietf@ietf.org
Sender:
Subject: Last Call:  (Using the IPv6 Flow Label …
The following Last Call announcement was sent out:

From: The IESG
To: IETF-Announce
CC:
Reply-To: ietf@ietf.org
Sender:
Subject: Last Call:  (Using the IPv6 Flow Label for Server Load Balancing) to Informational RFC


The IESG has received a request from the Internet Area Working Group WG
(intarea) to consider the following document:
- 'Using the IPv6 Flow Label for Server Load Balancing'
  as Informational RFC

The IESG plans to make a decision in the next few weeks, and solicits
final comments on this action. Please send substantive comments to the
ietf@ietf.org mailing lists by 2013-09-30. Exceptionally, comments may be
sent to iesg@ietf.org instead. In either case, please retain the
beginning of the Subject line to allow automated sorting.

Abstract


  This document describes how the IPv6 flow label as currently
  specified can be used to enhance layer 3/4 load distribution and
  balancing for large server farms.




The file can be obtained via
http://datatracker.ietf.org/doc/draft-ietf-intarea-flow-label-balancing/

IESG discussion can be tracked via
http://datatracker.ietf.org/doc/draft-ietf-intarea-flow-label-balancing/ballot/


No IPR declarations have been submitted directly on this I-D.


2013-09-16
01 Amy Vezza State changed to In Last Call from Last Call Requested
2013-09-16
01 Amy Vezza Last call announcement was generated
2013-09-14
01 Ted Lemon Last call was requested
2013-09-14
01 Ted Lemon Ballot approval text was generated
2013-09-14
01 Ted Lemon Ballot writeup was generated
2013-09-14
01 Ted Lemon State changed to Last Call Requested from Publication Requested
2013-09-14
01 Ted Lemon Last call announcement was generated
2013-08-26
01 Cindy Morgan
(1) What type of RFC is being requested (BCP, Proposed Standard,
Internet Standard, Informational, Experimental, or Historic)? Why is
this the proper type of RFC? …
(1) What type of RFC is being requested (BCP, Proposed Standard,
Internet Standard, Informational, Experimental, or Historic)? Why is
this the proper type of RFC? Is this type of RFC indicated in the title
page header?

Informational. This document does not define any protocols. It
describes the use of the IPv6 flow label field to enhance layer 3/4
load distribution and balancing for large server farms. Hence we
believe that an Informational document is appropriate.

(2) The IESG approval announcement includes a Document Announcement
Write-Up. Please provide such a Document Announcement Write-Up. Recent
examples can be found in the "Action" announcements for approved
documents. The approval announcement contains the following sections:

Technical Summary:

This document describes the use of the IPv6 flow label field to
enhance layer 3/4 load distribution and balancing for large server
farms. The main goal of this proposed approach is to improve the
performance of most types of L3/L4 load balancers, especially for
traffic that includes multiple IPv6 extension headers and for
fragmented packets. The document also includes a brief summary of
commonly used load balancing techniques to put the proposed mechanism
in context.

Working Group Summary:

The working group had active discussion on the draft and the current
text of the draft is representative of the consensus of the working
group.

Document Quality:

The document has received adequate review. The Document Shepherd has
no concerns about the depth or breadth of these reviews. There are no
known implementations of this document.

Personnel:

Who is the Document Shepherd? Who is the Responsible Area Director?

Suresh Krishnan is the document shepherd. Ted Lemon is the
responsible AD.

(3) Briefly describe the review of this document that was performed by
the Document Shepherd. If this version of the document is not ready for
publication, please explain why the document is being forwarded to the IESG.

The document shepherd has reviewed the draft and finds that it is
ready to advance to the IESG. All issues that were raised in the
working group last calls have been addressed.

(4) Does the document Shepherd have any concerns about the depth or
breadth of the reviews that have been performed?

No. The document shepherd has no such concerns.

(5) Do portions of the document need review from a particular or from
broader perspective, e.g., security, operational complexity, AAA, DNS,
DHCP, XML, or internationalization? If so, describe the review that took
place.

Yes. I think the document could benefit from further review from
people with operational expertise (especially people who run huge
server farms with load balancers). I did manage to get a solicited
review from a load balancer vendor.

(6) Describe any specific concerns or issues that the Document Shepherd
has with this document that the Responsible Area Director and/or the
IESG should be aware of? For example, perhaps he or she is uncomfortable
with certain parts of the document, or has concerns whether there really
is a need for it. In any event, if the WG has discussed those issues and
has indicated that it still wishes to advance the document, detail those
concerns here.

We have not had much content provider participation in this work. It
is unclear to the shepherd how acceptable this work is for that
community.

(7) Has each author confirmed that any and all appropriate IPR
disclosures required for full conformance with the provisions of BCP 78
and BCP 79 have already been filed. If not, explain why?

Yes.

(8) Has an IPR disclosure been filed that references this document? If
so, summarize any WG discussion and conclusion regarding the IPR
disclosures.

No.

(9) How solid is the WG consensus behind this document? Does it
represent the strong concurrence of a few individuals, with others being
silent, or does the WG as a whole understand and agree with it?

The WG consensus behind this document has been pretty stable but not
very strong.

(10) Has anyone threatened an appeal or otherwise indicated extreme
discontent? If so, please summarise the areas of conflict in separate
email messages to the Responsible Area Director. (It should be in a
separate email because this questionnaire is publicly available.)

No.

(11) Identify any ID nits the Document Shepherd has found in this
document. (See http://www.ietf.org/tools/idnits/ and the Internet-Drafts
Checklist). Boilerplate checks are not enough; this check needs to be
thorough.

No errors were found on the ID nits check.

(12) Describe how the document meets any required formal review
criteria, such as the MIB Doctor, media type, and URI type reviews.

N/A

(13) Have all references within this document been identified as either
normative or informative?

Yes.

(14) Are there normative references to documents that are not ready for
advancement or are otherwise in an unclear state? If such normative
references exist, what is the plan for their completion?

No.

(15) Are there downward normative references references (see RFC 3967)?
If so, list these downward references to support the Area Director in
the Last Call procedure.

No.

(16) Will publication of this document change the status of any existing
RFCs? Are those RFCs listed on the title page header, listed in the
abstract, and discussed in the introduction? If the RFCs are not listed
in the Abstract and Introduction, explain why, and point to the part of
the document where the relationship of this document to the other RFCs
is discussed. If this information is not in the document, explain why
the WG considers it unnecessary.

No.

(17) Describe the Document Shepherd's review of the IANA considerations
section, especially with regard to its consistency with the body of the
document. Confirm that all protocol extensions that the document makes
are associated with the appropriate reservations in IANA registries.
Confirm that any referenced IANA registries have been clearly
identified. Confirm that newly created IANA registries include a
detailed specification of the initial contents for the registry, that
allocations procedures for future registrations are defined, and a
reasonable name for the new registry has been suggested (see RFC 5226).

The document requests no IANA actions.

(18) List any new IANA registries that require Expert Review for future
allocations. Provide any public guidance that the IESG would find useful
in selecting the IANA Experts for these new registries.

N/A

(19) Describe reviews and automated checks performed by the Document
Shepherd to validate sections of the document written in a formal
language, such as XML code, BNF rules, MIB definitions, etc.

N/A
2013-08-26
01 Cindy Morgan Intended Status changed to Informational
2013-08-26
01 Cindy Morgan IESG process started in state Publication Requested
2013-08-26
01 (System) Earlier history may be found in the Comment Log for /doc/draft-carpenter-flow-label-balancing/
2013-08-26
01 Suresh Krishnan Changed consensus to Yes from Unknown
2013-08-26
01 Suresh Krishnan Changed document writeup
2013-08-26
01 Suresh Krishnan Annotation tags Doc Shepherd Follow-up Underway, Other - see Comment Log cleared.
2013-08-26
01 Suresh Krishnan IETF WG state changed to Submitted to IESG for Publication from Waiting for WG Chair Go-Ahead
2013-08-26
01 Suresh Krishnan I have requested the authors to provide information about any existing implementations.
2013-08-26
01 Suresh Krishnan IETF WG state changed to Waiting for WG Chair Go-Ahead from WG Consensus: Waiting for Write-Up
2013-08-26
01 Suresh Krishnan Annotation tag Other - see Comment Log set.
2013-08-26
01 Suresh Krishnan Changed document writeup
2013-08-20
01 Suresh Krishnan IETF WG state changed to WG Consensus: Waiting for Write-Up from WG Document
2013-08-20
01 Suresh Krishnan Annotation tag Doc Shepherd Follow-up Underway set.
2013-05-25
01 Brian Carpenter New version available: draft-ietf-intarea-flow-label-balancing-01.txt
2013-01-22
00 Suresh Krishnan Changed shepherd to Suresh Krishnan
2013-01-15
00 Brian Carpenter New version available: draft-ietf-intarea-flow-label-balancing-00.txt