Javascript disabled? Like other modern websites, the IETF Datatracker relies on Javascript. Please enable Javascript for full functionality.

Using the IPv6 Flow Label for Load Balancing in Server Farms
draft-ietf-intarea-flow-label-balancing-03

Revision differences

From revision

To revision

Diff format

Document history

RFC 7098
draft-ietf-intarea-flow-label-balancing

Date	Rev.	By	Action
2014-01-14	03	(System)	RFC Editor state changed to AUTH48-DONE from AUTH48
2014-01-03	03	(System)	RFC Editor state changed to AUTH48 from RFC-EDITOR
2014-01-03	03	(System)	RFC Editor state changed to RFC-EDITOR from EDIT
2013-11-18	03	Amy Vezza	State changed to RFC Ed Queue from Approved-announcement sent
2013-11-15	03	(System)	RFC Editor state changed to EDIT
2013-11-15	03	(System)	Announcement was received by RFC Editor
2013-11-15	03	(System)	IANA Action state changed to No IC from In Progress
2013-11-15	03	(System)	IANA Action state changed to In Progress
2013-11-15	03	Cindy Morgan	State changed to Approved-announcement sent from IESG Evaluation::AD Followup
2013-11-15	03	Cindy Morgan	IESG has approved the document
2013-11-15	03	Cindy Morgan	Closed "Approve" ballot
2013-11-15	03	Cindy Morgan	Ballot approval text was generated
2013-11-15	03	Adrian Farrel	[Ballot comment] Thanks for addressing my Discuss
2013-11-15	03	Adrian Farrel	[Ballot Position Update] Position for Adrian Farrel has been changed to No Objection from Discuss
2013-11-14	03	Joel Jaeggli	[Ballot Position Update] Position for Joel Jaeggli has been changed to No Objection from Discuss
2013-11-04	03	(System)	Sub state has been changed to AD Followup from Revised ID Needed
2013-11-04	03	Brian Carpenter	IANA Review state changed to Version Changed - Review Needed from IANA OK - No Actions Needed
2013-11-04	03	Brian Carpenter	New version available: draft-ietf-intarea-flow-label-balancing-03.txt
2013-11-03	02	Joel Jaeggli	[Ballot discuss] I'm satisfied with the proposed text changes and will clear when the document is revved. thanks joel this started as a comment. it's … [Ballot discuss] I'm satisfied with the proposed text changes and will clear when the document is revved. thanks joel this started as a comment. it's bloomed into a discuss because I think there's a lot of issues to be dicussed. (sorry, one minor edit, read the second discuss email rather than the first) If the flow label is in fact set to zero, it will not affect the information entropy of the IPv6 header. certainly it will, since your assumption vis-a-vis label balancing is that it does augment (an l3 only hash) or substitute for entropy that might otherwise be found elsewhere, e.g. in the transport header. Assume the case of load balancing a small number of sources to a single or small number of destination IPs (this is common for API services for example or lots of front-end --> back-backend application communication). host 1 first tcp connection to dest 1 with a zero flow label using a source, dest, flow label, xor hashes to the same host behind dest 1 as host 1 second connection to dest 2 with a zero flow label. That's peachy if that's what you want, but not if you don't. so as you note if you're using zero you probably want to do something else. ... these two: o Another method, for HTTP servers, is to operate a layer 7 reverse proxy in front of the server farm. The reverse proxy will present a single IP address to the world, communicated to clients by a single AAAA record. For each new client session (an incoming TCP connection and HTTP request), it will pick a particular server and proxy the session to it. The act of proxying should be more efficient and less resource-intensive than the act of serving the required content. The proxy must retain TCP state and proxy state for the duration of the session. This TCP state could, potentially, include the incoming flow label value. o A component of some load balancing systems is an SSL reverse proxy farm. The individual SSL proxies handle all cryptographic aspects and exchange unencrypted HTTP with the actual servers. Thus, from the load balancing point of view, this really looks just like a server farm, except that it's specialised for HTTPS. Each proxy will retain SSL and TCP and maybe HTTP state for the duration of the session, and the TCP state could potentially include the flow label. are simply variations of application aware proxy they could just as easy be imap(s) or smtp(s) or sip. The operative issue being that for the purposes of connection termination they are hosts not routers and they have to find the upper layer header (and potentially go as far as application protocol implementation). ... In all cases, the layer 3/4 load balancer has to recognize incoming packets as belonging to new or existing client sessions, and choose the target server or proxy so as to ensure persistence. I get what you're trying to say but I think this is a a mis-statement. As you go on to deal with stateless load balancing correctly... A stateless L3+L4 load balancer doesn't care whether a connection is new or preexisting, persistence is product of the values associated with the hash being immutable over the duration for which they are required and no other condition. ... 1. Balancers use various techniques to redirect traffic to a specific target server. - All servers are configured with the same IP address, they are all on the same LAN, and the load balancer sends directly to their individual MAC addresses. In this case, return packets from the server to the client are sent back without passing through the balancer, a technique known as direct server return, but we are not concerned here with the return packets. - Each server has its own IP address, and the balancer uses an IP-in-IP tunnel to reach it. - Each server has its own IP address, and the balancer performs NAPT (network address and port translation) to deliver the client's packets to that address. The choice between these methods is not affected by use of the flow label. You missed one that is rather common which is that there are multiple L3 next-hops for the same destination IP (Layer-3 ECMP the same as is used to hash across router links). that's anycast in it's simplest form, and it's a pretty common technique using either bgp or an IGP of your choice. ... jjaeggli@ca2-b2-re1# run show route 2620:102:8003:211::1/128 inet6.0: 14896 destinations, 59622 routes (14896 active, 0 holddown, 29172 hidden) + = Active Route, - = Last Active, * = Both 2620:102:8003:211::1/128 *[BGP/170] 17w6d 23:34:12, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified to 2620:102:8003:200::13 via ae0.1721 to 2620:102:8003:200::14 via ae0.1721 to 2620:102:8003:200::15 via ae0.1721 to 2620:102:8003:200::16 via ae0.1721 to 2620:102:8003:200::18 via ae0.1721 to 2620:102:8003:200::19 via ae0.1721 > to 2620:102:8003:201::8 via ae1.2721 to 2620:102:8003:201::9 via ae1.2721 to 2620:102:8003:201::a via ae1.2721 [BGP/170] 7w4d 01:55:21, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::13 via ae0.1721 [BGP/170] 7w4d 01:55:28, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::14 via ae0.1721 [BGP/170] 7w4d 01:55:06, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::15 via ae0.1721 [BGP/170] 7w4d 01:55:08, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::16 via ae0.1721 [BGP/170] 7w4d 01:55:27, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::18 via ae0.1721 [BGP/170] 1d 05:58:20, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::19 via ae0.1721 [BGP/170] 17w6d 23:34:38, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:201::9 via ae1.2721 [BGP/170] 17w6d 23:33:43, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:201::a via ae1.2721 [BGP/170] 17w6d 20:20:44, MED 50, localpref 200, from 2620:102:8003::1 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::16 via ae0.1721 ... 2. A layer 3/4 balancer must correctly handle Path MTU Discovery by forwarding relevant ICMPv6 packets in both directions. This too is not affected by use of the flow label. icmp messages (this applies to v4 and v6) emitted by devices on the path from the server --> client that are sent back to the server IP are really problematic because the probability of them hashing to a different server is rather high, icmp messages emitted towards the client by path elements inclusive of the server ip are generally no problem. There are techniques to address that e.g. multicasting them towards the servers but that's rather non-scalable. One could snarf the flow label and the destination off the offending packet (the one that's going back in the icmp6 type 2 payload to the sender) and use those as the source and flow label for your icmpv6 ptb message since label-balance would insure delivery but that makes diagnostics kind of sketchy. ... diagram in section 3 ___\|_______DNS-based____________\|___ \| load splitting \| \| (if used) occurs \| \| here dns based sever selection is actually something that occurs on a nameserver (the GTM) the client (e.g. when more than one record is served) or both e.g. by a GTM serving more that one RR. mb-aye:~ jjaeggli$ host www.google.com www.google.com has address 173.194.33.18 www.google.com has address 173.194.33.16 www.google.com has address 173.194.33.20 www.google.com has address 173.194.33.19 www.google.com has address 173.194.33.17 www.google.com has IPv6 address 2607:f8b0:400a:800::1010 .... However, usage by the proxies seems unlikely to be cost-effective, because they must in any case process the application layer header, so in this document we focus only on layer 3/4 balancers. As you note previously the flow label is in a fixed location in the ip header, so cost isn't really that germain. They do have to look all the way in-to the application so it may not be that relevant. ... o We are only concerned with IPv6 traffic in which the flow label value has been set at or near the source according to [RFC6437]. I can't see that it matters so long as it doesn't change midflow, it's hard to know this with certainty since it's not immutable. ... section 4 2-tuple {source address, flow label} What would be the gain in not using the source/dest/flow label on a stateless LB? you may be asserting that the dest has low entropy and maybe it does if you only have one, but if you have hundreds or thousands it may well. The destination is required for the forwarding decision your might as well xor-across it too (and traditional layer-3 only load balancing in v4 typically used at least source/dest and generally protocol number). doing so has no state. ... A stateful layer 3/4 load balancer would apply its usual load distribution algorithm to the first packet of a session, and store the {2-tuple, server} association in a table so that subsequent packets belonging to the same session are forwarded to the same server. Thus, for all subsequent packets of the session, it can ignore all IPv6 extension headers, which should lead to a performance benefit. Whether this benefit is valuable will depend on engineering details of the specific load balancer. This strikes me as a bit odd, as described it would be trivial to multiplex another connection over this state entry (which might be desirable for some applications) simply by using the same source/dest/flow label which sounds like a huge DOS risk e.g. because you now have a fixed state entry pinned to a server so long as you keep sending packets. It might also allow you bypass controls that you could apply to the first packet of each session, like for example initiating a new connection to a different destination port number. The other question is how do you know when the session ends. since you're not looking for fin/ack/fin/ack or RST your basically stuck keeping state with a timer or this state table will always be full. This is not as the security considerations section states: The flow label does not significantly alter this situation. with regard to dos/bypass risk. Rather techniques used today by stateful l3/l4 load balancers to mitigate dos risk statelessly to protect servers from large packet flows e.g. TCP syn cookies or apply gross security controls e.g. only port 80 connections for example) become ineffective because the attacker generating valid connections can know in advance what source/dest/flow label(s) will be used by to create connections through which packets can be delivered to servers (which is a rather different problem then it being hard to guess what someone else is using). ... Since the only state to be stored is the 2-tuple and the server identifier, storage requirements will be reduced. and a timer apparently, either the 2 minute one from 3697 if we consider that still in force since 6437 doesn't mention it, or some other one. ... The association between the flow label value and the server is stored in a table (often called stick table) so that future connections using the same flow label can be sent to the same server. This presumes the reuse of flow labels by a client. while 6437 presumes the existence of stateful flow label usage models, state entries that last a long time relative to the tcp connections that use them are an expensive proposition for a network device.
2013-11-03	02	Joel Jaeggli	Ballot discuss text updated for Joel Jaeggli
2013-10-10	02	Cindy Morgan	State changed to IESG Evaluation::Revised I-D Needed from IESG Evaluation
2013-10-10	02	Amanda Baber	IANA Review state changed to IANA OK - No Actions Needed from Version Changed - Review Needed
2013-10-10	02	Sean Turner	[Ballot Position Update] New position, No Objection, has been recorded for Sean Turner
2013-10-10	02	Gonzalo Camarillo	[Ballot Position Update] New position, No Objection, has been recorded for Gonzalo Camarillo
2013-10-10	02	Stephen Farrell	[Ballot Position Update] New position, No Objection, has been recorded for Stephen Farrell
2013-10-10	02	Joel Jaeggli	[Ballot discuss] this started as a comment. it's bloomed into a discuss because I think there's a lot of issues to be dicussed. (sorry, one … [Ballot discuss] this started as a comment. it's bloomed into a discuss because I think there's a lot of issues to be dicussed. (sorry, one minor edit, read the second discuss email rather than the first) If the flow label is in fact set to zero, it will not affect the information entropy of the IPv6 header. certainly it will, since your assumption vis-a-vis label balancing is that it does augment (an l3 only hash) or substitute for entropy that might otherwise be found elsewhere, e.g. in the transport header. Assume the case of load balancing a small number of sources to a single or small number of destination IPs (this is common for API services for example or lots of front-end --> back-backend application communication). host 1 first tcp connection to dest 1 with a zero flow label using a source, dest, flow label, xor hashes to the same host behind dest 1 as host 1 second connection to dest 2 with a zero flow label. That's peachy if that's what you want, but not if you don't. so as you note if you're using zero you probably want to do something else. ... these two: o Another method, for HTTP servers, is to operate a layer 7 reverse proxy in front of the server farm. The reverse proxy will present a single IP address to the world, communicated to clients by a single AAAA record. For each new client session (an incoming TCP connection and HTTP request), it will pick a particular server and proxy the session to it. The act of proxying should be more efficient and less resource-intensive than the act of serving the required content. The proxy must retain TCP state and proxy state for the duration of the session. This TCP state could, potentially, include the incoming flow label value. o A component of some load balancing systems is an SSL reverse proxy farm. The individual SSL proxies handle all cryptographic aspects and exchange unencrypted HTTP with the actual servers. Thus, from the load balancing point of view, this really looks just like a server farm, except that it's specialised for HTTPS. Each proxy will retain SSL and TCP and maybe HTTP state for the duration of the session, and the TCP state could potentially include the flow label. are simply variations of application aware proxy they could just as easy be imap(s) or smtp(s) or sip. The operative issue being that for the purposes of connection termination they are hosts not routers and they have to find the upper layer header (and potentially go as far as application protocol implementation). ... In all cases, the layer 3/4 load balancer has to recognize incoming packets as belonging to new or existing client sessions, and choose the target server or proxy so as to ensure persistence. I get what you're trying to say but I think this is a a mis-statement. As you go on to deal with stateless load balancing correctly... A stateless L3+L4 load balancer doesn't care whether a connection is new or preexisting, persistence is product of the values associated with the hash being immutable over the duration for which they are required and no other condition. ... 1. Balancers use various techniques to redirect traffic to a specific target server. - All servers are configured with the same IP address, they are all on the same LAN, and the load balancer sends directly to their individual MAC addresses. In this case, return packets from the server to the client are sent back without passing through the balancer, a technique known as direct server return, but we are not concerned here with the return packets. - Each server has its own IP address, and the balancer uses an IP-in-IP tunnel to reach it. - Each server has its own IP address, and the balancer performs NAPT (network address and port translation) to deliver the client's packets to that address. The choice between these methods is not affected by use of the flow label. You missed one that is rather common which is that there are multiple L3 next-hops for the same destination IP (Layer-3 ECMP the same as is used to hash across router links). that's anycast in it's simplest form, and it's a pretty common technique using either bgp or an IGP of your choice. ... jjaeggli@ca2-b2-re1# run show route 2620:102:8003:211::1/128 inet6.0: 14896 destinations, 59622 routes (14896 active, 0 holddown, 29172 hidden) + = Active Route, - = Last Active, * = Both 2620:102:8003:211::1/128 *[BGP/170] 17w6d 23:34:12, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified to 2620:102:8003:200::13 via ae0.1721 to 2620:102:8003:200::14 via ae0.1721 to 2620:102:8003:200::15 via ae0.1721 to 2620:102:8003:200::16 via ae0.1721 to 2620:102:8003:200::18 via ae0.1721 to 2620:102:8003:200::19 via ae0.1721 > to 2620:102:8003:201::8 via ae1.2721 to 2620:102:8003:201::9 via ae1.2721 to 2620:102:8003:201::a via ae1.2721 [BGP/170] 7w4d 01:55:21, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::13 via ae0.1721 [BGP/170] 7w4d 01:55:28, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::14 via ae0.1721 [BGP/170] 7w4d 01:55:06, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::15 via ae0.1721 [BGP/170] 7w4d 01:55:08, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::16 via ae0.1721 [BGP/170] 7w4d 01:55:27, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::18 via ae0.1721 [BGP/170] 1d 05:58:20, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::19 via ae0.1721 [BGP/170] 17w6d 23:34:38, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:201::9 via ae1.2721 [BGP/170] 17w6d 23:33:43, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:201::a via ae1.2721 [BGP/170] 17w6d 20:20:44, MED 50, localpref 200, from 2620:102:8003::1 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::16 via ae0.1721 ... 2. A layer 3/4 balancer must correctly handle Path MTU Discovery by forwarding relevant ICMPv6 packets in both directions. This too is not affected by use of the flow label. icmp messages (this applies to v4 and v6) emitted by devices on the path from the server --> client that are sent back to the server IP are really problematic because the probability of them hashing to a different server is rather high, icmp messages emitted towards the client by path elements inclusive of the server ip are generally no problem. There are techniques to address that e.g. multicasting them towards the servers but that's rather non-scalable. One could snarf the flow label and the destination off the offending packet (the one that's going back in the icmp6 type 2 payload to the sender) and use those as the source and flow label for your icmpv6 ptb message since label-balance would insure delivery but that makes diagnostics kind of sketchy. ... diagram in section 3 ___\|_______DNS-based____________\|___ \| load splitting \| \| (if used) occurs \| \| here dns based sever selection is actually something that occurs on a nameserver (the GTM) the client (e.g. when more than one record is served) or both e.g. by a GTM serving more that one RR. mb-aye:~ jjaeggli$ host www.google.com www.google.com has address 173.194.33.18 www.google.com has address 173.194.33.16 www.google.com has address 173.194.33.20 www.google.com has address 173.194.33.19 www.google.com has address 173.194.33.17 www.google.com has IPv6 address 2607:f8b0:400a:800::1010 .... However, usage by the proxies seems unlikely to be cost-effective, because they must in any case process the application layer header, so in this document we focus only on layer 3/4 balancers. As you note previously the flow label is in a fixed location in the ip header, so cost isn't really that germain. They do have to look all the way in-to the application so it may not be that relevant. ... o We are only concerned with IPv6 traffic in which the flow label value has been set at or near the source according to [RFC6437]. I can't see that it matters so long as it doesn't change midflow, it's hard to know this with certainty since it's not immutable. ... section 4 2-tuple {source address, flow label} What would be the gain in not using the source/dest/flow label on a stateless LB? you may be asserting that the dest has low entropy and maybe it does if you only have one, but if you have hundreds or thousands it may well. The destination is required for the forwarding decision your might as well xor-across it too (and traditional layer-3 only load balancing in v4 typically used at least source/dest and generally protocol number). doing so has no state. ... A stateful layer 3/4 load balancer would apply its usual load distribution algorithm to the first packet of a session, and store the {2-tuple, server} association in a table so that subsequent packets belonging to the same session are forwarded to the same server. Thus, for all subsequent packets of the session, it can ignore all IPv6 extension headers, which should lead to a performance benefit. Whether this benefit is valuable will depend on engineering details of the specific load balancer. This strikes me as a bit odd, as described it would be trivial to multiplex another connection over this state entry (which might be desirable for some applications) simply by using the same source/dest/flow label which sounds like a huge DOS risk e.g. because you now have a fixed state entry pinned to a server so long as you keep sending packets. It might also allow you bypass controls that you could apply to the first packet of each session, like for example initiating a new connection to a different destination port number. The other question is how do you know when the session ends. since you're not looking for fin/ack/fin/ack or RST your basically stuck keeping state with a timer or this state table will always be full. This is not as the security considerations section states: The flow label does not significantly alter this situation. with regard to dos/bypass risk. Rather techniques used today by stateful l3/l4 load balancers to mitigate dos risk statelessly to protect servers from large packet flows e.g. TCP syn cookies or apply gross security controls e.g. only port 80 connections for example) become ineffective because the attacker generating valid connections can know in advance what source/dest/flow label(s) will be used by to create connections through which packets can be delivered to servers (which is a rather different problem then it being hard to guess what someone else is using). ... Since the only state to be stored is the 2-tuple and the server identifier, storage requirements will be reduced. and a timer apparently, either the 2 minute one from 3697 if we consider that still in force since 6437 doesn't mention it, or some other one. ... The association between the flow label value and the server is stored in a table (often called stick table) so that future connections using the same flow label can be sent to the same server. This presumes the reuse of flow labels by a client. while 6437 presumes the existence of stateful flow label usage models, state entries that last a long time relative to the tcp connections that use them are an expensive proposition for a network device.
2013-10-10	02	Joel Jaeggli	Ballot discuss text updated for Joel Jaeggli
2013-10-10	02	Joel Jaeggli	[Ballot discuss] this started as a comment. it's bloomed into a discuss because I think there's a lot of issues to be dicussed. If … [Ballot discuss] this started as a comment. it's bloomed into a discuss because I think there's a lot of issues to be dicussed. If the flow label is in fact set to zero, it will not affect the information entropy of the IPv6 header. certainly it will, since your assumption vis-a-vis label balancing is that it does augment (an l3 only hash) or substitute for entropy that might otherwise be found elsewhere, e.g. in the transport header. Assume the case of load balancing a small number of sources to a single or small number of destination IPs (this is common for API services for example or lots of front-end --> back-backend application communication). host 1 first tcp connection to dest 1 with a zero flow label using a source, dest, flow label, xor hashes to the same host behind dest 1 as host 1 second connection to dest 2 with a zero flow label. That's peachy if that's what you want, but not if you don't. so as you note if you're using zero you probably want to do something else. ... these two: o Another method, for HTTP servers, is to operate a layer 7 reverse proxy in front of the server farm. The reverse proxy will present a single IP address to the world, communicated to clients by a single AAAA record. For each new client session (an incoming TCP connection and HTTP request), it will pick a particular server and proxy the session to it. The act of proxying should be more efficient and less resource-intensive than the act of serving the required content. The proxy must retain TCP state and proxy state for the duration of the session. This TCP state could, potentially, include the incoming flow label value. o A component of some load balancing systems is an SSL reverse proxy farm. The individual SSL proxies handle all cryptographic aspects and exchange unencrypted HTTP with the actual servers. Thus, from the load balancing point of view, this really looks just like a server farm, except that it's specialised for HTTPS. Each proxy will retain SSL and TCP and maybe HTTP state for the duration of the session, and the TCP state could potentially include the flow label. are simply variations of application aware proxy they could just as easy be imap(s) or smtp(s) or sip. The operative issue being that for the purposes of connection termination they are hosts not routers and they have to find the upper layer header (and potentially go as far as application protocol implementation). ... In all cases, the layer 3/4 load balancer has to recognize incoming packets as belonging to new or existing client sessions, and choose the target server or proxy so as to ensure persistence. I get what you're trying to say but I think this is a a mis-statement. As you go on to deal with stateless load balancing correctly... A stateless L3+L4 load balancer doesn't care whether a connection is new or preexisting, persistence is product of the values associated with the hash being immutable over the duration for which they are required and no other condition. ... 1. Balancers use various techniques to redirect traffic to a specific target server. - All servers are configured with the same IP address, they are all on the same LAN, and the load balancer sends directly to their individual MAC addresses. In this case, return packets from the server to the client are sent back without passing through the balancer, a technique known as direct server return, but we are not concerned here with the return packets. - Each server has its own IP address, and the balancer uses an IP-in-IP tunnel to reach it. - Each server has its own IP address, and the balancer performs NAPT (network address and port translation) to deliver the client's packets to that address. The choice between these methods is not affected by use of the flow label. You missed one that is rather common which is that there are multiple L3 next-hops for the same destination IP (Layer-3 ECMP the same as is used to hash across router links). that's anycast in it's simplest form, and it's a pretty common technique using either bgp or an IGP of your choice. ... jjaeggli@ca2-b2-re1# run show route 2620:102:8003:211::1/128 inet6.0: 14896 destinations, 59622 routes (14896 active, 0 holddown, 29172 hidden) + = Active Route, - = Last Active, * = Both 2620:102:8003:211::1/128 *[BGP/170] 17w6d 23:34:12, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified to 2620:102:8003:200::13 via ae0.1721 to 2620:102:8003:200::14 via ae0.1721 to 2620:102:8003:200::15 via ae0.1721 to 2620:102:8003:200::16 via ae0.1721 to 2620:102:8003:200::18 via ae0.1721 to 2620:102:8003:200::19 via ae0.1721 > to 2620:102:8003:201::8 via ae1.2721 to 2620:102:8003:201::9 via ae1.2721 to 2620:102:8003:201::a via ae1.2721 [BGP/170] 7w4d 01:55:21, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::13 via ae0.1721 [BGP/170] 7w4d 01:55:28, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::14 via ae0.1721 [BGP/170] 7w4d 01:55:06, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::15 via ae0.1721 [BGP/170] 7w4d 01:55:08, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::16 via ae0.1721 [BGP/170] 7w4d 01:55:27, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::18 via ae0.1721 [BGP/170] 1d 05:58:20, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::19 via ae0.1721 [BGP/170] 17w6d 23:34:38, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:201::9 via ae1.2721 [BGP/170] 17w6d 23:33:43, MED 50, localpref 200 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:201::a via ae1.2721 [BGP/170] 17w6d 20:20:44, MED 50, localpref 200, from 2620:102:8003::1 AS path: 64999 I, validation-state: unverified > to 2620:102:8003:200::16 via ae0.1721 ... 2. A layer 3/4 balancer must correctly handle Path MTU Discovery by forwarding relevant ICMPv6 packets in both directions. This too is not affected by use of the flow label. icmp messages (this applies to v4 and v6) emitted by devices on the path from the server --> client that are sent back to the server IP are really problematic because the probability of them hashing to a different server is rather high, icmp messages emitted towards the client by path elements inclusive of the server ip are generally no problem. There are techniques to address that e.g. multicasting them towards the servers but that's rather non-scalable. One could snarf the flow label and the source address off the offending packet (the one that's going back in the icmp6 type 2 payload) and use those as the source and flow label for your icmpv6 ptb message since label-balance would insure delivery but that makes diagnostics kind of sketchy. ... diagram in section 3 ___\|_______DNS-based____________\|___ \| load splitting \| \| (if used) occurs \| \| here dns based sever selection is actually something that occurs on a nameserver (the GTM) the client (e.g. when more than one record is served) or both e.g. by a GTM serving more that one RR. mb-aye:~ jjaeggli$ host www.google.com www.google.com has address 173.194.33.18 www.google.com has address 173.194.33.16 www.google.com has address 173.194.33.20 www.google.com has address 173.194.33.19 www.google.com has address 173.194.33.17 www.google.com has IPv6 address 2607:f8b0:400a:800::1010 .... However, usage by the proxies seems unlikely to be cost-effective, because they must in any case process the application layer header, so in this document we focus only on layer 3/4 balancers. As you note previously the flow label is in a fixed location in the ip header, so cost isn't really that germain. They do have to look all the way in-to the application so it may not be that relevant. ... o We are only concerned with IPv6 traffic in which the flow label value has been set at or near the source according to [RFC6437]. I can't see that it matters so long as it doesn't change midflow, it's hard to know this with certainty since it's not immutable. ... section 4 2-tuple {source address, flow label} What would be the gain in not using the source/dest/flow label on a stateless LB? you may be asserting that the dest has low entropy and maybe it does if you only have one, but if you have hundreds or thousands it may well. The destination is required for the forwarding decision your might as well xor-across it too (and traditional layer-3 only load balancing in v4 typically used at least source/dest and generally protocol number). doing so has no state. ... A stateful layer 3/4 load balancer would apply its usual load distribution algorithm to the first packet of a session, and store the {2-tuple, server} association in a table so that subsequent packets belonging to the same session are forwarded to the same server. Thus, for all subsequent packets of the session, it can ignore all IPv6 extension headers, which should lead to a performance benefit. Whether this benefit is valuable will depend on engineering details of the specific load balancer. This strikes me as a bit odd, as described it would be trivial to multiplex another connection over this state entry (which might be desirable for some applications) simply by using the same source/dest/flow label which sounds like a huge DOS risk e.g. because you now have a fixed state entry pinned to a server so long as you keep sending packets. It might also allow you bypass controls that you could apply to the first packet of each session, like for example initiating a new connection to a different destination port number. The other question is how do you know when the session ends. since you're not looking for fin/ack/fin/ack or RST your basically stuck keeping state with a timer or this state table will always be full. This is not as the security considerations section states: The flow label does not significantly alter this situation. with regard to dos/bypass risk. Rather techniques used today by stateful l3/l4 load balancers to mitigate dos risk statelessly to protect servers from large packet flows e.g. TCP syn cookies or apply gross security controls e.g. only port 80 connections for example) become ineffective because the attacker generating valid connections can know in advance what source/dest/flow label(s) will be used by to create connections through which packets can be delivered to servers (which is a rather different problem then it being hard to guess what someone else is using). ... Since the only state to be stored is the 2-tuple and the server identifier, storage requirements will be reduced. and a timer apparently, either the 2 minute one from 3697 if we consider that still in force since 6437 doesn't mention it, or some other one. ... The association between the flow label value and the server is stored in a table (often called stick table) so that future connections using the same flow label can be sent to the same server. This presumes the reuse of flow labels by a client. while 6437 presumes the existence of stateful flow label usage models, state entries that last a long time relative to the tcp connections that use them are an expensive proposition for a network device.
2013-10-10	02	Joel Jaeggli	[Ballot Position Update] New position, Discuss, has been recorded for Joel Jaeggli
2013-10-09	02	Stewart Bryant	[Ballot Position Update] New position, No Objection, has been recorded for Stewart Bryant
2013-10-09	02	Pete Resnick	[Ballot Position Update] New position, No Objection, has been recorded for Pete Resnick
2013-10-09	02	Jari Arkko	[Ballot Position Update] New position, No Objection, has been recorded for Jari Arkko
2013-10-09	02	Adrian Farrel	[Ballot discuss] This is a fine document, but I have concluded (see below) that it is not the document it says it is. That causes … [Ballot discuss] This is a fine document, but I have concluded (see below) that it is not the document it says it is. That causes me to place a Discuss that I think can be very easily fixed by some minor changes to the text... In Section 1 Load distribution is a slightly more general term than load balancing, but the latter is more commonly used. Both terms refer to mechanisms that distribute the workload of a server farm among different servers in order to optimize performance. In the context of server farms, the terms definitely apply as you describe, but it is not right to say that load balancing means a mechanism used to the distribute workload of a server farm. Please reword to not curtail other people's use of the term. It may be enough to say "Both terms can be used to refer to..." or "In this document, both terms are used to refer to..." or "In the context of a server farm, both terms refer to..." Similarly, Section 3 is headed "Summary of Load Balancing Techniques" but appears to be about load balancing techniques for server farms. So maybe that should be rebranded. Which leads me to ask the main meat of the Discussable point: is the document title correct? Shouldn't it be "Using the IPv6 Flow Label for Server Load Balancing in Server Farms"?
2013-10-09	02	Adrian Farrel	[Ballot Position Update] New position, Discuss, has been recorded for Adrian Farrel
2013-10-08	02	Benoît Claise	[Ballot comment] I see If the flow label of an incoming packet is non-zero, layer 3/4 load balancers can … [Ballot comment] I see If the flow label of an incoming packet is non-zero, layer 3/4 load balancers can use the 2-tuple {source address, flow label} as the session key for whatever load distribution algorithm they support. And later on The association between the flow label value and the server is stored in a table (often called stick table) so that future connections using the same flow label can be sent to the same server. Isn't it? The association between the source address/flow label value and the server is stored in a table (often called stick table) so that future connections using the same flow label can be sent to the same server.
2013-10-08	02	Benoît Claise	[Ballot Position Update] New position, No Objection, has been recorded for Benoit Claise
2013-10-08	02	Barry Leiba	[Ballot comment] One thought I had on reading this document is that it would seem to make sense as an Applicability Statement, rather than as … [Ballot comment] One thought I had on reading this document is that it would seem to make sense as an Applicability Statement, rather than as Informational. However, the answers to questions 5 and 6 in the shepherd writeup convinced me that Informational is all that's appropriate at this time.
2013-10-08	02	Barry Leiba	[Ballot Position Update] New position, No Objection, has been recorded for Barry Leiba
2013-10-08	02	Richard Barnes	[Ballot Position Update] New position, No Objection, has been recorded for Richard Barnes
2013-10-08	02	Brian Haberman	[Ballot Position Update] New position, Yes, has been recorded for Brian Haberman
2013-10-07	02	Brian Carpenter	IANA Review state changed to Version Changed - Review Needed from IANA OK - No Actions Needed
2013-10-07	02	Brian Carpenter	New version available: draft-ietf-intarea-flow-label-balancing-02.txt
2013-10-07	01	Spencer Dawkins	[Ballot Position Update] New position, No Objection, has been recorded for Spencer Dawkins
2013-10-04	01	Martin Stiemerling	[Ballot comment] Only one piece of text to comment on: Section 2., paragraph 4: > A careful reading of RFC 6437 shows that for … [Ballot comment] Only one piece of text to comment on: Section 2., paragraph 4: > A careful reading of RFC 6437 shows that for a given source accessing > a well-known TCP port at a given destination, the flow label is, in > effect, a substitute for the source port number, found at a fixed > position in the layer 3 header. Where do you read this in RFC 6437? The text above sounds a bit mysterious in that respect. Anyhow, even if RFC 6437 can be read in this way, your text is not correct as it stands. The flow label is in general not a substitute for TCP port number, as the port numbers are used at the end hosts to demultiplex the incoming traffic. Here is a text proposal from my side to make your point much clearer: A careful reading of RFC 6437 (according to Section X) shows that for load balancers relying on the flow label, the flow label is a substitute for the source port number, found at a fixed position in the layer 3 header, for a given source accessing a well-known TCP port at a given destination.
2013-10-04	01	Martin Stiemerling	[Ballot Position Update] New position, No Objection, has been recorded for Martin Stiemerling
2013-10-03	01	Ted Lemon	State changed to IESG Evaluation from Waiting for Writeup
2013-10-03	01	Ted Lemon	Placed on agenda for telechat - 2013-10-10
2013-10-03	01	Ted Lemon	Ballot has been issued
2013-10-03	01	Ted Lemon	[Ballot Position Update] New position, Yes, has been recorded for Ted Lemon
2013-10-03	01	Ted Lemon	Created "Approve" ballot
2013-10-03	01	Ted Lemon	Ballot writeup was changed
2013-09-30	01	(System)	State changed to Waiting for Writeup from In Last Call (ends 2013-09-30)
2013-09-26	01	Tero Kivinen	Request for Last Call review by SECDIR Completed: Ready. Reviewer: David Waltermire.
2013-09-24	01	(System)	IANA Review state changed to IANA OK - No Actions Needed from IANA - Review Needed
2013-09-24	01	Amanda Baber	IESG/Authors/WG Chairs: IANA has reviewed draft-ietf-intarea-flow-label-balancing-01, which is currently in Last Call, and has the following comments: We understand that this document doesn't require … IESG/Authors/WG Chairs: IANA has reviewed draft-ietf-intarea-flow-label-balancing-01, which is currently in Last Call, and has the following comments: We understand that this document doesn't require any IANA actions. IANA requests that the IANA Considerations section of the document remain in place upon publication. If this assessment is not accurate, please respond as soon as possible.
2013-09-19	01	Jean Mahoney	Request for Last Call review by GENART is assigned to Ben Campbell
2013-09-19	01	Jean Mahoney	Request for Last Call review by GENART is assigned to Ben Campbell
2013-09-19	01	Tero Kivinen	Request for Last Call review by SECDIR is assigned to David Waltermire
2013-09-19	01	Tero Kivinen	Request for Last Call review by SECDIR is assigned to David Waltermire
2013-09-16	01	Amy Vezza	IANA Review state changed to IANA - Review Needed
2013-09-16	01	Amy Vezza	The following Last Call announcement was sent out: From: The IESG To: IETF-Announce CC: Reply-To: ietf@ietf.org Sender: Subject: Last Call: (Using the IPv6 Flow Label … The following Last Call announcement was sent out: From: The IESG To: IETF-Announce CC: Reply-To: ietf@ietf.org Sender: Subject: Last Call: (Using the IPv6 Flow Label for Server Load Balancing) to Informational RFC The IESG has received a request from the Internet Area Working Group WG (intarea) to consider the following document: - 'Using the IPv6 Flow Label for Server Load Balancing' as Informational RFC The IESG plans to make a decision in the next few weeks, and solicits final comments on this action. Please send substantive comments to the ietf@ietf.org mailing lists by 2013-09-30. Exceptionally, comments may be sent to iesg@ietf.org instead. In either case, please retain the beginning of the Subject line to allow automated sorting. Abstract This document describes how the IPv6 flow label as currently specified can be used to enhance layer 3/4 load distribution and balancing for large server farms. The file can be obtained via http://datatracker.ietf.org/doc/draft-ietf-intarea-flow-label-balancing/ IESG discussion can be tracked via http://datatracker.ietf.org/doc/draft-ietf-intarea-flow-label-balancing/ballot/ No IPR declarations have been submitted directly on this I-D.
2013-09-16	01	Amy Vezza	State changed to In Last Call from Last Call Requested
2013-09-16	01	Amy Vezza	Last call announcement was generated
2013-09-14	01	Ted Lemon	Last call was requested
2013-09-14	01	Ted Lemon	Ballot approval text was generated
2013-09-14	01	Ted Lemon	Ballot writeup was generated
2013-09-14	01	Ted Lemon	State changed to Last Call Requested from Publication Requested
2013-09-14	01	Ted Lemon	Last call announcement was generated
2013-08-26	01	Cindy Morgan	(1) What type of RFC is being requested (BCP, Proposed Standard, Internet Standard, Informational, Experimental, or Historic)? Why is this the proper type of RFC? … (1) What type of RFC is being requested (BCP, Proposed Standard, Internet Standard, Informational, Experimental, or Historic)? Why is this the proper type of RFC? Is this type of RFC indicated in the title page header? Informational. This document does not define any protocols. It describes the use of the IPv6 flow label field to enhance layer 3/4 load distribution and balancing for large server farms. Hence we believe that an Informational document is appropriate. (2) The IESG approval announcement includes a Document Announcement Write-Up. Please provide such a Document Announcement Write-Up. Recent examples can be found in the "Action" announcements for approved documents. The approval announcement contains the following sections: Technical Summary: This document describes the use of the IPv6 flow label field to enhance layer 3/4 load distribution and balancing for large server farms. The main goal of this proposed approach is to improve the performance of most types of L3/L4 load balancers, especially for traffic that includes multiple IPv6 extension headers and for fragmented packets. The document also includes a brief summary of commonly used load balancing techniques to put the proposed mechanism in context. Working Group Summary: The working group had active discussion on the draft and the current text of the draft is representative of the consensus of the working group. Document Quality: The document has received adequate review. The Document Shepherd has no concerns about the depth or breadth of these reviews. There are no known implementations of this document. Personnel: Who is the Document Shepherd? Who is the Responsible Area Director? Suresh Krishnan is the document shepherd. Ted Lemon is the responsible AD. (3) Briefly describe the review of this document that was performed by the Document Shepherd. If this version of the document is not ready for publication, please explain why the document is being forwarded to the IESG. The document shepherd has reviewed the draft and finds that it is ready to advance to the IESG. All issues that were raised in the working group last calls have been addressed. (4) Does the document Shepherd have any concerns about the depth or breadth of the reviews that have been performed? No. The document shepherd has no such concerns. (5) Do portions of the document need review from a particular or from broader perspective, e.g., security, operational complexity, AAA, DNS, DHCP, XML, or internationalization? If so, describe the review that took place. Yes. I think the document could benefit from further review from people with operational expertise (especially people who run huge server farms with load balancers). I did manage to get a solicited review from a load balancer vendor. (6) Describe any specific concerns or issues that the Document Shepherd has with this document that the Responsible Area Director and/or the IESG should be aware of? For example, perhaps he or she is uncomfortable with certain parts of the document, or has concerns whether there really is a need for it. In any event, if the WG has discussed those issues and has indicated that it still wishes to advance the document, detail those concerns here. We have not had much content provider participation in this work. It is unclear to the shepherd how acceptable this work is for that community. (7) Has each author confirmed that any and all appropriate IPR disclosures required for full conformance with the provisions of BCP 78 and BCP 79 have already been filed. If not, explain why? Yes. (8) Has an IPR disclosure been filed that references this document? If so, summarize any WG discussion and conclusion regarding the IPR disclosures. No. (9) How solid is the WG consensus behind this document? Does it represent the strong concurrence of a few individuals, with others being silent, or does the WG as a whole understand and agree with it? The WG consensus behind this document has been pretty stable but not very strong. (10) Has anyone threatened an appeal or otherwise indicated extreme discontent? If so, please summarise the areas of conflict in separate email messages to the Responsible Area Director. (It should be in a separate email because this questionnaire is publicly available.) No. (11) Identify any ID nits the Document Shepherd has found in this document. (See http://www.ietf.org/tools/idnits/ and the Internet-Drafts Checklist). Boilerplate checks are not enough; this check needs to be thorough. No errors were found on the ID nits check. (12) Describe how the document meets any required formal review criteria, such as the MIB Doctor, media type, and URI type reviews. N/A (13) Have all references within this document been identified as either normative or informative? Yes. (14) Are there normative references to documents that are not ready for advancement or are otherwise in an unclear state? If such normative references exist, what is the plan for their completion? No. (15) Are there downward normative references references (see RFC 3967)? If so, list these downward references to support the Area Director in the Last Call procedure. No. (16) Will publication of this document change the status of any existing RFCs? Are those RFCs listed on the title page header, listed in the abstract, and discussed in the introduction? If the RFCs are not listed in the Abstract and Introduction, explain why, and point to the part of the document where the relationship of this document to the other RFCs is discussed. If this information is not in the document, explain why the WG considers it unnecessary. No. (17) Describe the Document Shepherd's review of the IANA considerations section, especially with regard to its consistency with the body of the document. Confirm that all protocol extensions that the document makes are associated with the appropriate reservations in IANA registries. Confirm that any referenced IANA registries have been clearly identified. Confirm that newly created IANA registries include a detailed specification of the initial contents for the registry, that allocations procedures for future registrations are defined, and a reasonable name for the new registry has been suggested (see RFC 5226). The document requests no IANA actions. (18) List any new IANA registries that require Expert Review for future allocations. Provide any public guidance that the IESG would find useful in selecting the IANA Experts for these new registries. N/A (19) Describe reviews and automated checks performed by the Document Shepherd to validate sections of the document written in a formal language, such as XML code, BNF rules, MIB definitions, etc. N/A
2013-08-26	01	Cindy Morgan	Intended Status changed to Informational
2013-08-26	01	Cindy Morgan	IESG process started in state Publication Requested
2013-08-26	01	(System)	Earlier history may be found in the Comment Log for /doc/draft-carpenter-flow-label-balancing/
2013-08-26	01	Suresh Krishnan	Changed consensus to Yes from Unknown
2013-08-26	01	Suresh Krishnan	Changed document writeup
2013-08-26	01	Suresh Krishnan	Annotation tags Doc Shepherd Follow-up Underway, Other - see Comment Log cleared.
2013-08-26	01	Suresh Krishnan	IETF WG state changed to Submitted to IESG for Publication from Waiting for WG Chair Go-Ahead
2013-08-26	01	Suresh Krishnan	I have requested the authors to provide information about any existing implementations.
2013-08-26	01	Suresh Krishnan	IETF WG state changed to Waiting for WG Chair Go-Ahead from WG Consensus: Waiting for Write-Up
2013-08-26	01	Suresh Krishnan	Annotation tag Other - see Comment Log set.
2013-08-26	01	Suresh Krishnan	Changed document writeup
2013-08-20	01	Suresh Krishnan	IETF WG state changed to WG Consensus: Waiting for Write-Up from WG Document
2013-08-20	01	Suresh Krishnan	Annotation tag Doc Shepherd Follow-up Underway set.
2013-05-25	01	Brian Carpenter	New version available: draft-ietf-intarea-flow-label-balancing-01.txt
2013-01-22	00	Suresh Krishnan	Changed shepherd to Suresh Krishnan
2013-01-15	00	Brian Carpenter	New version available: draft-ietf-intarea-flow-label-balancing-00.txt

Using the IPv6 Flow Label for Load Balancing in Server Farms draft-ietf-intarea-flow-label-balancing-03

Revision differences

Document history

Using the IPv6 Flow Label for Load Balancing in Server Farms
draft-ietf-intarea-flow-label-balancing-03