TCP Maintenance and Minor                                M. Jethanandani
Extensions                                                 Cisco Systems
Internet-Draft                                                M. Bashyam
Intended status: Informational                      Ocarina Systems, Inc
Expires: April 19, 2008                                 October 17, 2007


                    TCP Robustness in Persist Condition
                    draft-mahesh-persist-timeout-02

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 19, 2008.

Copyright Notice

   Copyright (C) The IETF Trust (2007).

Abstract

   This document describes how a connection can remain infinitely in
   persist condition, and its Denial of Service (DoS) implication on the
   system, if there is no mechanism to recover from this anomaly.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",



Jethanandani & Bashyam   Expires April 19, 2008                 [Page 1]


Internet-Draft     TCP Robustness in Persist Condition      October 2007


   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Denial of Service Experimentation  . . . . . . . . . . . . . .  4
   3.  Solution . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
   4.  Role of Application  . . . . . . . . . . . . . . . . . . . . .  8
   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  8
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
   7.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .  9
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  9
     8.1.  Normative References . . . . . . . . . . . . . . . . . . .  9
     8.2.  Informative References . . . . . . . . . . . . . . . . . .  9
   Appendix A.  An Appendix . . . . . . . . . . . . . . . . . . . . .  9
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . .  9
   Intellectual Property and Copyright Statements . . . . . . . . . . 11
































Jethanandani & Bashyam   Expires April 19, 2008                 [Page 2]


Internet-Draft     TCP Robustness in Persist Condition      October 2007


1.  Introduction

   RFC 1122 [RFC1122] Section 4.2.2.17, page 92 says that: A TCP MAY
   keep its offered receive window closed indefinitely.  As long as the
   receiving TCP continues to send acknowledgments in response to the
   probe segments, the sending TCP MUST allow the connection to stay
   open.

   The RFC goes on to say that it is important to remember that ACK
   (acknowledgement) segments that contain no data are not reliably
   transmitted by TCP.  Therefore zero window probing SHOULD be
   supported to prevent a connection from hanging forever if ACK
   segments that re-opens the window is lost.

   While the RFC is clear why the sender needs to continue to probe the
   receiver, it is not clear why this process needs to be indefinite,
   particularly if the receiver continually responds with a ACK and a
   window of zero.  This draft documents a negative consequence of this
   indefinite attempt by the sender to probe for the receiver's offered
   window.

   One negative consequence of this indefinite attempt is that it makes
   the sender vulnerable to a connection and send buffer exhaustion
   attack by one or more malicious receivers.  This leads to a Denial of
   Service (DoS) where legitimate connections stop getting established
   and well behaved already established connections stop making progress
   in terms of data transmission.

   Having the sender accumulate buffers and connection table entries
   when the receiver has deliberately and maliciously closed the window
   can ultimately lead to resource exhaustion on the sender.  This
   particular dependence on the receiver to open its zero window can be
   easily exploited by a malicious receiver to launch a DoS attack
   against the sender.

   The condition where the sender has at least one buffer in the send
   queue is referred to as persist condition.  In this condition the
   sender is waiting indefinitely for the receiver to open up its
   window.

   Resources that are compromised due to this sender behavior include
   connections and send buffers, since both of these are finite pools in
   any server.

   The problem is applicable to TCP and TCP derived transport protocol
   like SCTP.

   We have done some experimention to demonstrate this problem and



Jethanandani & Bashyam   Expires April 19, 2008                 [Page 3]


Internet-Draft     TCP Robustness in Persist Condition      October 2007


   looked at how many servers on the Internet are susceptible to it.
   The rest of the draft will detail the experiment, suggest how the
   problem needs to be addressed, why we believe it is the right
   solution and what role application can play in solving this problem.

   For TCP to persist indefinitely makes the end point vulnerable to a
   DoS attack.  We therefore clarify the purpose of zero window as
   described in RFC 1122 and suggest that TCP end point SHOULD NOT keep
   a connection in persist condition for an indefinite amount of time.

   In most implementations, TCP runs in kernel mode as part of the
   operating system.  In this mode the operating system may share the
   same address space as TCP.  For the purposes of discussion, this
   draft considers TCP protocol implementation to be a separate module
   responsible for all resources such as buffers and connection control
   blocks that it borrows from the operating system.  The operating
   system can enforce the maximum number of buffers it is willing to
   give to TCP but beyond that it lets TCP decide how to manage them.


2.  Denial of Service Experimentation

   The effect of the receiver that stops reading data is that the sender
   continues to send data till the receiver advertised window goes to
   zero at which time the connection enters persist condition.  Since
   the sender has more buffers with data for the client, it will
   continue to probe the receiver.  If the sender is servicing several
   such clients the effect compounds itself to the extent that the
   sender runs out of buffers and/or connection resources.  The sender
   at this point cannot service new legitimate connections and even the
   existing connections start seeing degraded service.  Further, each
   connection reserves a connection control block, which are of a finite
   amount.  Several connections in persist condition can exhaust the
   connection control block pool.

   To demonstrate the problem we wrote a user level program that puts
   TCP connections on the HTTP server in persist condition.  The client
   can run on any machine and does not require a change in the kernel or
   the operating system.

   The client opens a TCP connection to the HTTP server with a
   advertised MSS of 1460.  It then sends a GET request for a large
   page.  The page size is large enough to ensure that the connections
   send buffer always has more data than receivers maximum advertised
   window.  Once the window has been opened, the client application
   stops reading data resulting in TCP closing the window and
   advertising zero window towards the sender.  For each request of a
   multi-megabyte response, the connection can result in the sender



Jethanandani & Bashyam   Expires April 19, 2008                 [Page 4]


Internet-Draft     TCP Robustness in Persist Condition      October 2007


   holding on to all the requested data minus the receivers advertised
   window, in its send queue.  If the receiver never closes the
   connection, the server will continue to hold that data indefinitely
   in its send queue.

   The same program was then run from each client with it opening one
   thousand connections towards the HTTP server.  This was run from
   several different machines with the result that now the server was
   holding onto several thousand connections, each with more than one
   megabyte worth of data on the send queue.

   After verifying this behavior in the laboratory against both a Apache
   and a IIS server, we then proceeded to test HTTP servers on the
   Internet.  To verify this behavior we needed to open only few
   connections towards the servers.  We chose three well known sites,
   identified here as Site A, Site B and Site C for our test.  We then
   ran a network analyzer on the client machine to monitor the behavior
   of the connection.  These were our observations.

   Connections to Site A went into ESTABLISHED state and after receiving
   receivers advertised window worth of data went into persist
   condition.  The connection persisted in this mode for approximately
   11 minutes and was then RST by the server.

   Connections to Site B went and stayed in ESTABLISHED state.  They
   stayed in that state as long as the client kept the connection open.
   The server in this case was Apache version 2.0.  The size of the file
   requested was 12.12M. The client received 200K worth of data and the
   rest of the data was either queued on the send queue or in
   application.

   Connection to Site C went into and stayed in ESTABLISHED state.  They
   too stayed in that state as long as the client kept the connection
   open, which was as long as five days.  The server in this case was a
   IIS server version 6.0.  The size of the requested page was 1.09M (a
   pdf file).  The client had received 200K worth of data and the rest
   of the data was either queued on the send queue or in application.

   As can be seen from the experimentation the behavior of TCP varied
   greatly between different sites.  Site A appears to implement a User
   Time Out (UTO) or application timeout on their connections.  That
   allowed it to clear the connections.  However, once it was known what
   the fixed timeout was, it was easy to modify the client program to
   open another set of connections after the timeout.  We discuss the
   role of application and the use of UTO in a later section.  It was
   difficult to establish how much data was sitting on the send queue of
   each one of these public servers as that depends on send socket
   buffer size and how much data was written by the application.



Jethanandani & Bashyam   Expires April 19, 2008                 [Page 5]


Internet-Draft     TCP Robustness in Persist Condition      October 2007


   Please note that it is not required for the client to issue a request
   for a large page or for the server to open its window completely to
   reproduce the DoS scenario.  A page size larger than the advertised
   window size is enough.  We decided to do it with a larger response
   because it enabled us to reproduce the problem with fewer number of
   connections and client machines.

   Persist condition clearly has a more significant impact on servers
   that deal with a large number of connections (e.g. 200-300K
   connections), than on end workstations that might deal with a few
   connections at a time.  This is because the server has a finite
   number of buffers for a larger pool of connections.  With dynamic
   allocation of buffers, each connection is given resources as it needs
   them.  A high water mark set on each connection prevents the number
   of enqueued buffers exceeding that mark till such time that the
   number of buffers fall below a low water mark.  However, that in
   itself does not solve the problem as the high water mark is more than
   the advertised window size.


3.  Solution

   The current behavior of the connection in persist condition SHALL
   continue to exist as the default behavior.  The solution proposed
   will control the amount of time a TCP sender will spend in persist
   condition waiting for receiver to open its window.  Outlined are some
   of the ways that this can be achieved.  Default values are suggested
   values and the implementor is free to choose their own value.

   If the administrator of the system decides to use the proposed
   solution, they will need to enable it explicitly.  Optionally, the
   administrator can configure a minimum and maximum threshold values
   for connections and buffer resources for the total pool.  Default
   values of 60 and 80% of the total pool for minimum and maximum
   respectively are assumed.

   While implementing the solution it is important to make sure that
   legitimate and well behaved receivers are not penalized for offering
   zero or reduced window.  Hence the solution needs to be robust.  It
   is also important that the solution be adaptive.  While resources are
   plenty, connections are allowed to spend more time in persist
   condition.  However, as resources become scarce the connections are
   aborted sooner.

   A fixed timeout value is not a effective solution.  Malicious clients
   can discover the timeout value and can (re)launch an attack after the
   fixed timeout period.




Jethanandani & Bashyam   Expires April 19, 2008                 [Page 6]


Internet-Draft     TCP Robustness in Persist Condition      October 2007


   If the solution is enabled, the global persist-condition-expiry -time
   value will be set to infinity (or a very large value).  Thereafter it
   will adapted based on system resources availability.  The persist-
   condition-expiry-time is bounded above by the default value of 60
   seconds and a minimum value of five seconds (or minimum persist
   timeout).  The administrator has the option to change the default
   value.  To prevent wild fluctuations in this timeout value, the time
   will be recomputed only when resources change by at least 1%.  If the
   total pool of resources is less than minimum threshold, the persist-
   condition-expiry-time value is set to infinity (a very large value).
   If the resource utilization increases to being between minimum and
   maximum, then persist-condition-expirty-time is first set to the
   default value and thereafter decreased additively by two seconds.  If
   resources exceed the maximum, the persist-condition-expiry-time is
   decreased multiplicatively by a factor or two.  If the resource
   utilization starts to decrease then persist-condition-expirty-time is
   increased additively by four seconds.  If the utilization falls below
   minimum, the time is set to infinity.

   The solution focuses on figuring on how to keep track of connections
   in persist condition.  The configured option of persist-condition-
   expiry-time implies how long the connection will be allowed to stay
   in persist condition.  When the connection enters persist condition,
   i.e. the receiver advertises a window of zero, the value of current
   time - now, is saved in the connection entry.  This entry is called
   persist-condition-entry-time.  In addition, the sequence number on
   the connection is stored as persist-condition-sequence-number.
   Thereafter every time the persist timer expires or when an ACK is
   received that continues to advertise zero window, a check is done to
   make sure that the difference between current time and persist-
   condition-entry-time is not more than persist-condition-expiry-time.
   If it is then the connection is aborted and the connection resources
   are reclaimed.

   The receiver's silly window avoidance mechanism will make sure that
   the receiver cannot read a small amount of data and fool the sender
   into taking it out of persist condition.

   For the solution to be robust, it is also important to determine
   which connection among the set of connections in persist condition is
   selected to be terminated.  To implement this effectively, we
   maintain two priority queues of connections in persist condition, one
   based on the amount of data in the send queue and another based on
   the persist-condition-entry-time, i.e. when the connection entered
   persist condition.

   Whenever a buffer resource is required and the resource utilization
   is more than the maximum, the connection with the highest amount of



Jethanandani & Bashyam   Expires April 19, 2008                 [Page 7]


Internet-Draft     TCP Robustness in Persist Condition      October 2007


   data in the send queue is dropped, and its buffers recycled.
   Whenever a connection resource is required and the connection
   utilization is higher than the maximum, the connection with the
   oldest persist-condition-entry-time is selected and dropped.  This
   achieves fairness by penalizing the connection which are consuming
   the most resources.


4.  Role of Application

   Applications are agnostic to why TCP connections are not making
   progress in terms of data transmission.  TCP connections may not be
   able to transmit data for a variety of reasons.  Today TCP does not
   provide an indication of the progress of the connection explicitly.
   It is up to the application to conclude based on an examination of
   the send queue backlog or implement a UTO as defined in RFC 793
   [RFC0793].  A lot of commonly used applications do not implement the
   UTO scheme, e.g.  World Wide Web (WWW).  Even if the application did
   implement a UTO scheme, all applications running the system need to
   have implemented the UTO for the solution to be effective.  A single
   application that has not implemented the UTO can cause the entire
   system to be impacted negatively.

   There are cases where the system is application agnostic.  A classic
   case of this is a TCP proxy.  In that particular case, there is no
   end application that can be informed of the state of the connection
   for the application to take action.

   Resources like TCP buffers are system wide resources and are not tied
   to any particular application.  TCP needs to be able to monitor
   resource usage system wide when connections are in persist condition.
   The application does not have the connection's sender state knowledge
   to implement a robust and adaptive solution such as the one outlined
   here.

   Applications can assist TCP's role in solving this problem.  They can
   register for an event notification when the TCP connection enters or
   exits persist condition.  They can use the notification mechanism to
   implement their own scheme of deciding which persist connections to
   clear.  They can also suggest timeout or retry values to TCP.


5.  IANA Considerations

   This document makes no request of IANA.






Jethanandani & Bashyam   Expires April 19, 2008                 [Page 8]


Internet-Draft     TCP Robustness in Persist Condition      October 2007


6.  Security Considerations

   This document discusses one security consideration.  That is the
   possible DoS attacks discussed in Section 2.


7.  Acknowledgements

   Thanks to Anantha Ramaiah who spent countless hours reviewing,
   commenting and proposing changes to the draft.  Ted Faber helped us
   in clarifying the objective of this RFC.  Thanks also to Fred Baker
   and Elliot Lear for providing their feedback on the draft.

   Our thanks to Nanda Bhajana who helped arrange the test setup to be
   able to reproduce the DoS scenario.


8.  References

8.1.  Normative References

   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
              RFC 793, September 1981.

   [RFC1122]  Braden, R., "Requirements for Internet Hosts -
              Communication Layers", STD 3, RFC 1122, October 1989.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

8.2.  Informative References


Appendix A.  An Appendix

















Jethanandani & Bashyam   Expires April 19, 2008                 [Page 9]


Internet-Draft     TCP Robustness in Persist Condition      October 2007


Authors' Addresses

   Mahesh Jethanandani
   Cisco Systems
   170 West Tasman Drive
   San Jose, California  95134
   USA

   Phone: +1-408-527-8230
   Fax:   +1-408-527-0147
   Email: mahesh@cisco.com
   URI:   www.cisco.com


   Murali Bashyam
   Ocarina Systems, Inc
   Fremont, CA
   USA

   Phone:
   Fax:
   Email: mbashyam@ocarinatech.com
   URI:




























Jethanandani & Bashyam   Expires April 19, 2008                [Page 10]


Internet-Draft     TCP Robustness in Persist Condition      October 2007


Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Acknowledgment

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).





Jethanandani & Bashyam   Expires April 19, 2008                [Page 11]