Artificial Intelligence (AI) based ECN adaptive reconfiguration for datacenter networks
draft-zhuang-tsvwg-ai-ecn-for-dcn-00

Document Type Active Internet-Draft (individual)
Last updated 2019-10-18
Stream (None)
Intended RFC status (None)
Formats plain text xml pdf htmlized bibtex
Stream Stream state (No stream defined)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date
Responsible AD (None)
Send notices to (None)
TSVWG                                                          Y. Zhuang
Internet-Draft                                                  B. Zhang
Intended status: Informational                                    H. Pan
Expires: April 20, 2020                    Huawei Technologies Co., Ltd.
                                                        October 18, 2019

  Artificial Intelligence (AI) based ECN adaptive reconfiguration for
                          datacenter networks
                  draft-zhuang-tsvwg-ai-ecn-for-dcn-00

Abstract

   This document is to provide an artificial intelligence (AI) based ECN
   adaptive reconfiguration for datacenter networks.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 20, 2020.

Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Zhuang, et al.           Expires April 20, 2020                 [Page 1]
Internet-Draft       AI ECN adptive reconfiguration         October 2019

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Background  . . . . . . . . . . . . . . . . . . . . . . .   2
     1.2.  Intent  . . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.3.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Architecture of the AI ECN datacenter networks  . . . . . . .   3
   3.  Scene-based ECN adaptive reconfiguration with AI  . . . . . .   4
     3.1.  Scene Training  . . . . . . . . . . . . . . . . . . . . .   5
     3.2.  Scene Identification and ECN Adaptive Reconfiguration . .   5
   4.  Data collection and AI ECN adaptive reconfiguration . . . . .   5
     4.1.  Data collection . . . . . . . . . . . . . . . . . . . . .   5
     4.2.  ECN adaptive Reconfiguration  . . . . . . . . . . . . . .   6
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   6
   6.  Manageability Consideration . . . . . . . . . . . . . . . . .   6
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   6
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   6
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .   6
     8.2.  Informative References  . . . . . . . . . . . . . . . . .   6
   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .   7
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   7

1.  Introduction

1.1.  Background

   As defined in [RFC3168], Explicit Congestion Notification is
   introduced for IP to allow congestion to be signaled before dropping
   packets.  As such, the latency of applications is reduced due to less
   retransmission of the dropped packets.  Besides, MPLS also supports
   ECN defined in [RFC6679].  For tunneling, [RFC6040] defines how ECN
   should be constructed in the case of IP-in-IP tunnels.

   Meanwhile, the upper layer transports protocols, like TCP in
   [RFC3168] and UDP based protocols DCCP in [RFC4341][RFC4342][RFC5632]
   and RTP in [RFC6679] are defined to support ECN-capable functions.

   With ECN marking, active queue management (AQM) can choose a non-
   packet loss way to indicate congestion on the device, rather than
   dropping packets which might ask for packet retransmission and
   increase the latency.  By using AQM in network devices, it can signal
   to common congestion-controlled transports to manage the queue length
   in the buffer and reduce the latency of traffics.  Random Early
   Detection (RED) specified in [RFC2309]is one of the AQM algorithms
   that recommended to be implemented in routers.

   As stated in [RFC7567], with proper parameters, RED can be an
   effective algorithm.  However, dynamically predicting the set of

Zhuang, et al.           Expires April 20, 2020                 [Page 2]
Internet-Draft       AI ECN adptive reconfiguration         October 2019

   parameters (minimum threshold and maximum threshold) is difficult.
   As a result, its present use in the Internet is limited.  Other AQM
   algorithms have also been developed, while how to find proper
   parameters of algorithms for application traffics is still difficult
   and affect the network performance.

   For data center networks, traffic patterns change with the deployment
   of applications like storage and high performance computing and
   changes of corresponding traffics which make the network more
   dynamic, while such applications have more restrict requirements on
   high throughput and ultra-low latency.  In this area, a set of static
   ECN configurations suitable for all traffics at all time challenges.

   With this, this document is to provide a way to seek ECN adaptive
   reconfiguration by using AI technologies in running data center
   network environment.

1.2.  Intent

   Our intent is to seek proper parameters of ECN adaptive
   reconfiguration by using artificial intelligence technologies to
   achieve self-tuning in a running data center network, so as to
   accommodate the changes of network resources to improve the network
   performance.

   We also offer this as a starting point for seeking adaptive
   parameters for algorithms and network reconfigurations by using
   advanced technologies of AI.  We do not change the way ECN works
   defined in [RFC3168].  With this, this document is to provide a way
   to achieve ECN adaptive reconfiguration by using AI technologies in
   dyanmic data center network environment.

1.3.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

2.  Architecture of the AI ECN datacenter networks

   The following is a simple 2 layer data center network architecture
   with an analyzer to process the AI ECN adaptive reconfiguration with
   the changes of network traffics.

Zhuang, et al.           Expires April 20, 2020                 [Page 3]
Internet-Draft       AI ECN adptive reconfiguration         October 2019

     +------------------------------------------------------+
     |                     Analyzer                         |
     +-.-----.-------------.-------.--------------.-----.---+
       .     .             .       .              .     .
       .     .             .       .              .     .
       . +---.-----------+ .       .  +-----------.---+ .
       . |     Spine     | .       .  |     Spine     | .
       . ++--+--+----+---+ .       .  +-+-+-+----+----+ .
       .  |  |  +----------.-------.---------------+    .
       .  |  +-------------.-------.-+  | | |    | |    .
       .  |          |  +--.-------.--------+    | |    .
       .  |  +-------------.-------.------+      | |    .
      +---+--+-+    ++--+--.-+    +.-+--+--+    ++-+----.+
      |        |    |        |    |        |    |        |
      |  Leaf  |    |  Leaf  |    |  Leaf  |    |  Leaf  |
      ++------++    ++------++    ++------++    ++------++
       |      |      |      |      |      |      |      |
       |      |      |      |      |      |      |      |
      +++    +++    +++    +++    +++    +++    +++    +++
      |S| ...|S|    |S| ...|S|    |S| ...|S|    |S| ...|S|
      +-+    +-+    +-+    +-+    +-+    +-+    +-+    +-+

      ........  information collecting path

      --------  data path

      Figure 1. The architecture of a 2-layer data center network

   The analyzer can be integrated with spine or can be an independent
   device which is left for implementation.  In this design, it is
   responsible for collecting device information and conducting the
   induction for proper parameters for ECN adaptive reconfiguration
   periodically.

3.  Scene-based ECN adaptive reconfiguration with AI

   The idea of AI ECN in this document is to identify the "scene" of the
   current network at some time based on the collected information over
   a period.  The identified scene (which can also considered as a
   network traffic pattern)is one of the scenes that are collected and
   learned from datacenter networks running different traffics of
   various applications in training process.  The ECN settings of these
   scenes are decided based on human experience.  As such, the ECN
   parameters of current network can be tuned to the settings of the
   identified scene.  This adaptive reconfiguration process is running
   periodically to accommodate changes of the running network
   environment due to traffic changes.

Zhuang, et al.           Expires April 20, 2020                 [Page 4]
Internet-Draft       AI ECN adptive reconfiguration         October 2019

3.1.  Scene Training

   Scene training is the first process in the procedure.  It composes of
   two steps.  Firstly, construct typical scenes and generate a learning
   model to identify these scenes based on a set of network performance
   indicators.  Secondly, provide proper ECN settings for these typical
   scenes based on human experience.

   In the first step, it might need the network operator to select some
   typical applications and the combinations of traffics based on
   experience to be used as the typical training scenes.  For these
   typical scenes, we run a learning algorithm (for example, neutral
   network) to learn the characteristics of these scenes from
   periodically collected network performance indicators.

   The selected network performance indicators can be device's port
   bandwidth, queue size, etc al. which might be related to the
   applications and traffics in the networks.

   While in the second step, human experience from network
   administrators can be used to provide proper ECN configurations for
   these typical scenes.  AI technologies can also be used to enrich the
   scene sets based on these human experience, which is left for
   implementation.

3.2.  Scene Identification and ECN Adaptive Reconfiguration

   In the practical network, the analyzer periodically collects
   information of selected network performance indicators from network
   nodes.  The information is then used as input to the pre-learnt model
   and get the identified scene.  The ECN settings of network devices
   will then be adaptively reconfigured to the parameters of the
   identified scene periodically.

   The adaptive cycle of the period can be decided according to
   experience or it can be a training result in previous process defined
   in section 3.1.

4.  Data collection and AI ECN adaptive reconfiguration

4.1.  Data collection

   In both training and adaptive reconfiguration process, the analyzer
   needs to collect information of the network i.e.  a set of network
   performance indicators.

   The data collection can be achieved by grpc or yang-push or other
   protocols.

Zhuang, et al.           Expires April 20, 2020                 [Page 5]
Internet-Draft       AI ECN adptive reconfiguration         October 2019

4.2.  ECN adaptive Reconfiguration

   The adaptive reconfiguration of ECN in a running network environment
   can be achieved by control-plane protocols such as netconf.

5.  Security Considerations

   TBD

6.  Manageability Consideration

   TBD

7.  IANA Considerations

   No IANA action

8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

8.2.  Informative References

   [RFC2309]  Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
              S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
              Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
              S., Wroclawski, J., and L. Zhang, "Recommendations on
              Queue Management and Congestion Avoidance in the
              Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998,
              <https://www.rfc-editor.org/info/rfc2309>.

   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
              of Explicit Congestion Notification (ECN) to IP",
              RFC 3168, DOI 10.17487/RFC3168, September 2001,
              <https://www.rfc-editor.org/info/rfc3168>.

Zhuang, et al.           Expires April 20, 2020                 [Page 6]
Internet-Draft       AI ECN adptive reconfiguration         October 2019

   [RFC4341]  Floyd, S. and E. Kohler, "Profile for Datagram Congestion
              Control Protocol (DCCP) Congestion Control ID 2: TCP-like
              Congestion Control", RFC 4341, DOI 10.17487/RFC4341, March
              2006, <https://www.rfc-editor.org/info/rfc4341>.

   [RFC4342]  Floyd, S., Kohler, E., and J. Padhye, "Profile for
              Datagram Congestion Control Protocol (DCCP) Congestion
              Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342,
              DOI 10.17487/RFC4342, March 2006,
              <https://www.rfc-editor.org/info/rfc4342>.

   [RFC5632]  Griffiths, C., Livingood, J., Popkin, L., Woundy, R., and
              Y. Yang, "Comcast's ISP Experiences in a Proactive Network
              Provider Participation for P2P (P4P) Technical Trial",
              RFC 5632, DOI 10.17487/RFC5632, September 2009,
              <https://www.rfc-editor.org/info/rfc5632>.

   [RFC6040]  Briscoe, B., "Tunnelling of Explicit Congestion
              Notification", RFC 6040, DOI 10.17487/RFC6040, November
              2010, <https://www.rfc-editor.org/info/rfc6040>.

   [RFC6679]  Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P.,
              and K. Carlberg, "Explicit Congestion Notification (ECN)
              for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August
              2012, <https://www.rfc-editor.org/info/rfc6679>.

   [RFC7567]  Baker, F., Ed. and G. Fairhurst, Ed., "IETF
              Recommendations Regarding Active Queue Management",
              BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015,
              <https://www.rfc-editor.org/info/rfc7567>.

Acknowledgements

   We would like to thank the following persons for their great efforts
   and contributions to the work: Huafeng Wen, Binghui Wu, Weiqin Kong,
   Ke Meng, Xitong Jia, Liang Shan, Siyu Yan, Weishan Deng, Boding Wang,
   Jungan Yan, Haonan Ye and Liang Zhang.

Authors' Addresses

   Yan Zhuang
   Huawei Technologies Co., Ltd.

   Email: zhuangyan.zhuang@huawei.com

Zhuang, et al.           Expires April 20, 2020                 [Page 7]
Internet-Draft       AI ECN adptive reconfiguration         October 2019

   Bai Zhang
   Huawei Technologies Co., Ltd.

   Email: white.zhangbai@huawei.com

   Haotao Pan
   Huawei Technologies Co., Ltd.

   Email: panhaotao@huawei.com

Zhuang, et al.           Expires April 20, 2020                 [Page 8]