Reinforcement Learning-Based Virtual Network Embedding: Problem Statement
draft-ihsan-nmrg-rl-vne-ps-00

Document Type Active Internet-Draft (individual)
Authors Ihsan Ullah  , Youn-Hee Han  , TaeYeon Kim 
Last updated 2021-06-14
Stream (None)
Intended RFC status (None)
Formats pdf htmlized bibtex
Stream Stream state (No stream defined)
Consensus Boilerplate Unknown
On Agenda nmrg at IETF-111
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date
Responsible AD (None)
Send notices to (None)
Internet Engineering Task Force                                 I. Ullah
Internet-Draft                                                  Y-H. Han
Intended status: Informational                                 KOREATECH
Expires: 16 December 2021                                        TY. Kim
                                                                    ETRI
                                                            14 June 2021

    Reinforcement Learning-Based Virtual Network Embedding: Problem
                               Statement
                     draft-ihsan-nmrg-rl-vne-ps-00

Abstract

   In Network virtualization (NV) technology, Virtual Network Embedding
   (VNE) is a problem to map a virtual network to the substrate network.
   It has a great impact on the performance of virtual network and
   resource utilization of the substrate network.  An efficient
   embedding strategy can maximize the acceptance ratio of virtual
   networks to increase the revenue for Internet service provider.
   Several works have been appeared on the design of VNE solutions,
   however, it has becomes a challenging issues for researchers.  To
   solved the VNE problem, reinforcement learning (RL) can play a vital
   role to make the VNE problem more intelligent and efficient.
   Moreover, RL has been merged with deep learning techniques to develop
   adaptive models with effective strategies for various complex
   problems.  In RL, agents can learn desired behaviors (e.g, optimal
   VNE strategies), and after learning and completing training, it can
   embed the virtual network to the subtract network very quickly and
   efficiently.  RL can reduce the complexity of the VNE method,
   however, it is too difficult to apply RL techniques directly to VNE
   problems and need more research study.  In this document, we are
   presenting a problem statement to motivate the research community to
   solve the VNE problem using reinforcement learning.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

Ullah, et al.           Expires 16 December 2021                [Page 1]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 16 December 2021.

Copyright Notice

   Copyright (c) 2021 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Simplified BSD License text
   as described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Simplified BSD License.

Table of Contents

   1.  Introduction and Scope  . . . . . . . . . . . . . . . . . . .   2
   2.  Reinforcement Learning-based VNE Solutions  . . . . . . . . .   5
   3.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   7
   4.  Problem Space . . . . . . . . . . . . . . . . . . . . . . . .   8
     4.1.  State Representation  . . . . . . . . . . . . . . . . . .   8
     4.2.  Action Space  . . . . . . . . . . . . . . . . . . . . . .   9
     4.3.  Reward Description  . . . . . . . . . . . . . . . . . . .   9
     4.4.  Policy and RL methods . . . . . . . . . . . . . . . . . .  10
     4.5.  Training Environment  . . . . . . . . . . . . . . . . . .  11
     4.6.  Sim2Real Gap  . . . . . . . . . . . . . . . . . . . . . .  12
     4.7.  Generalization  . . . . . . . . . . . . . . . . . . . . .  12
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  12
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  13
   7.  Informative References  . . . . . . . . . . . . . . . . . . .  13
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  16

1.  Introduction and Scope

   Recently, Network virtualization (NV) technology has received a lot
   of attention from academics and industry.  It allows multiple
   heterogeneous virtual networks to share resources on the same
   substrate network (SN) [RFC7364], [ASNVT2020].  The current large-
   size fixed substrate network architecture is no longer efficient and
   not extendable due to network ossification.  To overcome this
   limitations, traditional Internet Service Providers (ISPs) are

Ullah, et al.           Expires 16 December 2021                [Page 2]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

   divided into two independent parts which work together.  One is the
   Service Providers (SPs) who create and own the different number of
   the VNs, and the other one is the Infrastructure Providers (InPs) who
   own the SN devices and links as underlying resources.  SPs generate
   and construct the customized Virtual Network Requests (VNRs), and
   lease the resources from InPs based on that requests.  In addition,
   two types of mediators can enter into the industry domain for better
   coordination of SPs and InPs.  One is the Virtual Network Providers
   (VNPs) who assemble and coordinate diverse virtual resources from one
   or more InPs, the other one is the Virtual Network Operators (VNOs)
   who create, manage, and operate the VN according to the demand of the
   SPs.  VNPs and VNOs could enable efficient use of the physical
   network and increase the commercial revenue of both SPs and InPs.  NV
   can increase network agility, flexibility and scalability while
   creating significant cost savings.  Greater network workload
   mobility, increased availability of network resources with good
   performance, and automated operations, are all the benefits of NV.

   Virtual Network Embedding (VNE) [VNESURV2013] is one of the main
   problem to map a virtual network to the substrate network.  The VNE
   method has two main parts, Node embedding: where virtual nodes of VN
   have to be mapped to the SN nodes, and Link embedding: where virtual
   links between the VNs have to be mapped to the physical paths in the
   substrate network.  It has been proven to be NP-Hard, and both node
   and link embeddings have become challenging for the researchers.  A
   virtual node and link should be efficiently embedded into a given SN,
   so that more VNR can be accepted with minimum cost.  The distance of
   the virtual nodes from each other in a given SN is a big contribution
   to the link failures and causes the rejection of VNRs.  Hence, an
   efficient and intelligent technique is required for VNE problem to
   reduce VNRs rejection [ENViNE2021].  In the perspective of the InPs,
   the efficient VNE performs better mostly in terms of revenue,
   acceptance ratio, and revenue-to-cost ratio.

   Figure 1 shows the the example of two virtual network request VNR1
   and VNR2 to embed them in the given substrate network.  VNR1 contain
   three virtual nodes (a, b, and c) with cpu demands (15, 30, and 10)
   respectively, and the link between virtual the nodes a-b,b-c, and c-a
   with bandwidth demands 15,20, and 35 respectively.  Similarly, VNR2
   contains virtual nodes and links with cpu and bandwidth demand
   respectively.  The purpose of the VNE method to map the virtual nodes
   and links of the VNRs to the physical nodes and links of the given
   substrate as shown in Figure 1.  [ENViNE2021].

Ullah, et al.           Expires 16 December 2021                [Page 3]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

           +----+                +----+         +----+          +----+
           | a  |                | d  |         | e  |          | f  |
           | 15 |                | 25 |__ _25___| 30 |__ _35_ __| 45 |
           +----+                +----+         +----+          +----+
          /      \                \                                 /
        15        35               30                              20
        /          \                \                             /
  +----+            +----+           +----+                 +----+
  | b  |            | c  |           | g  |                 | h  |
  | 30 |__ _20_ __ _| 10 |           | 15 |__ _ __10__ __ __| 35 |
  +----+            +----+           +----+                 +----+

           (VNR1)                                 (VNR2)
             ||   Embedding                         ||    Embedding
             VV                                     VV

        +----+              +----+       +----+                  +----+
 .......| a  |......35......| c  |       | d  |........25........| e  |
:  _____| 15 |              | 10 |_______| 25 |          ________| 30 |
: |     +----+              +----+       +----+         |        +----+
: |   A      |                | :   B      | :          |   C      |  :
: |   50     |__ ___50__ __ __| :   60     |_:_ __30 _ _|   40     |  :
: +__________+                +_:_________+  :          +__________+  :
:      |                        :     |      :                |       :
15     |                        :     |      :                |      35
:     40                       20     60     :               50       :
:      |                        :     |     30                |       :
:      |                       _:_____|_     :                |       :
+----:..............20........|.:       |    :                |   +----+
| b  | |   +----+.....30......|.........|....:                |   | f  |
| 30 |_|___| g  |             |       +----+                __|___| 45 |
+----+     | 15 |.....10......|.......| h  |........20.....|......+----+
 |   D     +____+             |    E  | 35 |               |     F    |
 |   50     |__ __ __ 70 _____|    40 +____+ ___ __ 50_ ___|     60   |
 +__________+                 +_________+                  +__________+

   Figure 1: Substrate network with embedded virtual network, VNR1
                               and VNR2

   Recently, artificial intelligence and machine learning technologies
   have been widely used to solve networking problems [SUR2018],
   [MLCNM2018], [MVNNML2021].  There has been a surge in research
   efforts,specially,reinforcement learning (RL) which has been
   contributed much more in the many complex tasks, e.g. video games and
   auto-driving etc.  The main goal of an RL to learn better policies
   for sequential decision making problems (e.g., VNE) and solve them
   very efficiently.  Several works have appeared on the design of VNE

Ullah, et al.           Expires 16 December 2021                [Page 4]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

   solutions using RL, which focuses on how to interact with the
   environment to achieve maximum cumulative return [VNEQS2021],
   [NRRL2020], [MVNE2020], [CDVNE2020], [PPRL2020], [RLVNEWSN2020],
   [QLDC2019], [VNFFG2020], [VNEGCN2020], [NFVDeep2019], [DeepViNE2019],
   [VNETD2019], [RDAM2018], [MOQL2018], [ZTORCH2018], [NeuroViNE2018],
   [QVNE2020].  This document outlines the problems encountered when
   designing and applying RL-based VNE solutions.  Section 2 describes
   how to design RL-based VNE solutions.  Section 3 gives terminology,
   and Section 4 describes the problem space details.

2.  Reinforcement Learning-based VNE Solutions

   As we discussed that RL has been studied in various fields (such as
   game, control system, operation research, information theory, multi-
   agent system, network system, etc.) and shows better performance than
   humans.  Unlike deep learning, RL trains a policy model by receiving
   rewards through interaction with the environment without training
   label data.

   Recently, there have been several attempts to solve VNE problems
   using RL.  When applying RL-based methods to solve VNE problems, the
   RL agent automatically learns without human intervention through
   interaction with the environment.  Once the agent completed the
   learning process, it can generate the most appropriate embeddings
   decision (action) based on the state of the network.  Based on the
   embedding or action the agent get reward from the environments to
   adaptively train its policy for future action.  The RL agent gets the
   most optimized model based on the reward function defined according
   to each objective (revenue, cost, revenue to cost ratio and
   acceptance ratio).  The optimal RL policy model provides the VNE
   strategy appropriately according to the objective of the network
   operator.  Designing and applying RL techniques directly into VNE
   problems is not yet trivial, but may face several challenges.  This
   document describes the problems.

   Figure 2 shows the virtual network embedding solution based on RL
   method.  The RL is divided into a training process and an inference
   process.  In the training process, state information is composed of
   various substrate networks and VNRs (Environment), which are used as
   suitable inputs for RL models through feature extraction.  After
   that, the RL model is updated by model updater using a feature
   extracted state and reward.  In the inference process, using the
   trained RL model, the embedding result is provided to the operating
   network in real time.

   The following figure shows the detail about RL-based VNE solutions.

Ullah, et al.           Expires 16 December 2021                [Page 5]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

  RL Model Training Process
  +--------------------------------------------------------------------+
  | Training Environment                                               |
  | +-------------------+         RL-based VNE Agent                   |
  | | +---------+       |         +----------------------------------+ |
  | | | +---------+     |         |                   Action         | |
  | | | | +----------+  |<----------------------------------+        | |
  | | + | | Substrate|  |         |                         |        | |
  | |   | | Networks |  |         |  +----------+      +----------+  | |
  | |   + +----------+  |  State  |  | Feature  |      |    RL    |  | |
  | |                   |----------->|Extraction|----->|   Model  |  | |
  | | +--------+        |         |  +----------+      | (Policy) |  | |
  | | | +---------+     |         |       |            +----------+  | |
  | | + | +---------+   |         |       |   +---------+     A      | |
  | |   + |  VNRs   |   | Reward  |       +-->|  Model  |     |      | |
  | |     +---------+   |-------------------->| Updater |-----+      | |
  | +-------------------+         |           +---------+            | |
  |                               +----------------------------------+ |
  +--------------------------------------------------------------------+
                                    |
  Inference Process                 |
  +---------------------------------V----------------------------------+
  |                         + - - - - - - - +                          |
  | Operating Network       |   RL Model    |    Trained RL Model      |
  | (Inference Environment) |   Training    |------------------+       |
  | +-------------------+   |   Process     |                  |       |
  | |   +-----------+   |   + - - - - - - - +                  |       |
  | |   |           |   |         RL-based VNE Agent           |       |
  | |   | Substrate |   |         +----------------------------|-----+ |
  | |   |  Network  |   |         |                   Action   |     | |
  | |   |           |   |<---------------------------------+   |     | |
  | |   +-----------+   |         |                        |   V     | |
  | | +---------+       |         |  +------------+     +---------+  | |
  | | | +---------+     | State   |  |  Feature   |     | Trained |  | |
  | | + | +----------+  |----------->| Extraction |---->|   RL    |  | |
  | |   + |   VNRs   |  |         |  +------------+     |  Model  |  | |
  | |     +----------+  |         |                     +---------+  | |
  | +-------------------+         +----------------------------------+ |
  +--------------------------------------------------------------------+

            Figure 2: Two processes for RL-based VNE solutions

Ullah, et al.           Expires 16 December 2021                [Page 6]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

3.  Terminology

   Network Virtualization
      Network virtualization is the process of combining hardware and
      software network resources and network functionality into a
      single, software-based administrative entity, a virtual network
      [RFC7364].

   Virtual Network Embedding (VNE)
      Virtual Network Embedding (VNE) [VNESURV2013] is one of the main
      techniques used to map a virtual network to the substrate network.

   Substrate Network (SN)
      The underlying physical network which contains the resources such
      as CPU and bandwidth for virtual networks is called substrate
      network.

   Virtual Network Request (VNR)
      Virtual Network Request is a complete single Virtual network
      request containing virtual nodes and virtual links.

   Agent
      In RL, an agent is the component that makes the decision abd take
      action (i.e., embedding decision).

   State
      State is a representation (e.g., remaining SN capacity and
      requested VN resource) of the current environment, and it tells
      the agent what situation it is in currently.

   Action
      Actions (i.e., node and link embedding) are behavior an RL agent
      can do to change the states of the environment.

   Policy
      A policy defines an agent's way of behaving at a given time.  It
      is a mapping from perceived states of environment to actions to be
      taken when in those states.  It is usually implemented as a deep
      learning model because the state and action spaces are too large
      to be completely known.

   Reward
      A reward is the feedback which provides an agent to the agent for
      taking actions that lead to good outcomes (i.g., achieve the
      objective of the network operator).

Ullah, et al.           Expires 16 December 2021                [Page 7]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

   Environment
      An environment is the agent's world in which it lives and
      interacts.  The agent can interact with the environment by
      performing some action but cannot influence the rules of the
      environment by those actions.

4.  Problem Space

   RL contains three main components: state representation, action
   space, and reward description.  For solving a VNE problem, we need to
   consider how to design the three main RL components.  In addition, a
   specific RL method, training environment, sim2real gap, and
   generalization are also important issues that should be considered
   and addressed.  We will describe each one in detail as follows.

4.1.  State Representation

   The way to understand and observe the VNE problem is crucial for an
   RL agent to establish a thorough knowledge of the network status and
   generate efficient embedding decisions.  Therefore, it is essential
   to firstly design the state representation that serves as the input
   to the agent.  The state representation is the information which an
   agent can receive from the environment, and consists of a set of
   values representing the current situation in the environment.  Based
   on the state representation, the RL agent selects the most
   appropriate action through its policy model.  In the VNE problem, an
   RL agent needs to know the information of the overall SN entities and
   their current status in order to use the resources of the nodes and
   edges of the substrate network.  Also it must know the requirements
   of the VNR.

   Therefore, in the VNE problem, the state usually should represent the
   current resource state of the nodes and edges of the substrate
   network (ie, CPU, memory, storage, bandwidth, delay, loss rate, etc.)
   and the requirements of the virtual node and link of the VNR.  The
   collected status information is used as raw input, or refined status
   information through the feature extraction process is used as input
   for the RL agent.  The state representation may vary depending on the
   operator's objective and VNE strategy.  The method of determining
   such feature extraction and representation greatly affects the
   performance of the agent.

Ullah, et al.           Expires 16 December 2021                [Page 8]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

4.2.  Action Space

   In RL, an action represents a decision that an RL agent can take
   based on current state representation.  The set of all possible
   actions is called an action space.  In the VNE problems, actions are
   generally divided into node embedding and link embedding.  The action
   for node embedding means the VNR's nodes are assigned to which nodes
   in the SN.

   Also, for link embedding, the action represents the selected paths
   between the selected substrate network nodes from the node embedding
   result.  If the policy model of the RL agent is well trained, it will
   select the embedding result to maximize the reward appropriate for
   the operator's objectives.  The output actions generated from the
   agent will indicate the adjustment of allocated resources.

   It is noted that, at each point of time step, an RL method may decide
   to 1) embed each virtual node onto substrate nodes and then embed
   each virtual link onto substrate paths separately, or 2) embed the
   given whole VNR onto substrate nodes and links in the SN at once.  In
   the former case, at every single step, a learning agent focuses on
   exactly one virtual node from the current VNR, and it generates a
   certain substrate node to host the virtual node.  Link embedding is
   then performed separately in the same time step.  To solve the VNE
   problem efficiently, mapping of virtual nodes and links are
   considered together, although they are performed separately.  Link
   mapping is considering more complex than node mapping, because a
   virtual link can be mapped onto a physical path with different hops.
   On the other hand, at every single step, a learning agent can try to
   embed the given whole VNR, i.e., all virtual nodes and links in the
   given VNR, onto a subset of SN components.  The whole VNR embedding
   should be handled as a graph embedding, so that the action space is
   huge and the design of the RL method is usually more difficult than
   the one with each node and link embedding.

4.3.  Reward Description

   Designing rewards is an important issue for an RL method.  In
   general, the reward is the benefit that an RL agent follows when
   performing its determined action.  Reward is an immediate value that
   evaluates only the current state and action.  The value of reward
   depends on success or failure of each step.  In order to select the
   action that gives the best results in the long run, an RL agent needs
   to select the action with the highest cumulative reward.

   The reward is calculated through the reward function according to the
   objective of the environment, and even in the same environment, it
   may be different depending on the operator's objective.  Based on the

Ullah, et al.           Expires 16 December 2021                [Page 9]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

   given reward the agent can evaluate the effectiveness to improve the
   policy.  Hence, the reward function play a important rules in the
   training process of RL.  In the VNE problem, the overall objectives
   are to reduce the VNE rejection, embed them with minimum cost,
   maximize the revenue, and increase the resource utilization of
   physical resources.  Reward function should be designed to achieve
   one or multiple ones of these objectives.  Each objective and its
   correspondent reward design are outlined as follows:

   Revenue
      Revenue is the sum of the virtual resources requested by the VN,
      and calculated to determine the total cost of the resources.
      Typically, a successful action (e.g., VNR is embedded without
      violation) is treated to be a good reward which also increases the
      revenue.  Otherwise, a failed action (e.g., VNR is rejected) leads
      that the agent will receive a negative reward as well as
      decreasing the revenue.

   Acceptance Ratio
      Acceptance ratio is the ratio measured by the number of
      successfully embedded virtual network requests divided by total
      number of virtual network requests.  To achieve a high acceptance
      ratio, the agent is trying to embed maximum VNR and get a good
      reward.  Getting a good reward is usually proportional to the
      acceptance ratio.

   Revenue-to-cost ratio
      To balance and compare the cost of resources for embedding VNR,
      the revenue is divided by cost.  Revenue-to-cost ratio compares
      the embedding methods with respect to their embedding results in
      terms of the cost and revenue.  Since most VNOs are most
      interested in this objective, a reward function should be made to
      relate to this performance metric.

4.4.  Policy and RL methods

   The policy is the strategy that the agent employs to determine the
   next action based on the current state.  It maps states to actions
   that promise the highest reward.  Therefore, an RL agent updates its
   policy repeatedly in the learning phase to maximize the expected
   cumulative reward.  Unlike supervised learning, in which each sample
   has a corresponding label indicating the preferred output of the
   learning model, an RL agent relies on reward signals to evaluate the
   effectiveness of actions and further improve the policy.  From the
   perspective of RL, the goal of VNE is to find an optimal policy to
   embed an VNR onto the given SN in any state at any time.  There are
   two types of RL methods: on-policy and off-policy.  In on-policy RL
   methods, the (behaviour) policy of the exploration step to select an

Ullah, et al.           Expires 16 December 2021               [Page 10]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

   action and the policy to learn are the same.  On-policy methods work
   with a single policy, and require any observations (state, action,
   reward, next state) to have been generated using that policy.
   Representative on-policy methods include A2C, A3C, TRPO, and PPO.  On
   the other hand, off-policy RL methods work with two policies.

   These are a policy being learned, called the target policy, and the
   policy being followed that generates the observations, called the
   behaviour policy.  In off-policy RL methods, the learning policy and
   the behaviour policy are not necessarily the same.  It allows the use
   of exploratory policies for collecting the experience, since learning
   and behavior policies are separated.  In the VNE problem, various
   experiences can be accumulated by extracting embedding results using
   various behavior policies.  Representative off-policy methods include
   Q-learning, DQN, DDPG, and SAC.  There are different classifications
   for RL methods: model-based and model-free.  In model-based RL
   methods, an RL agent learns its optimal behavior indirectly by
   learning a model of the environment by taking actions and observing
   the outcomes that include the next state and the immediate reward.

   The models predict the outcomes of actions.  The model is used
   instead of the environment or in addition to interaction with it to
   learn optimal policies.  This becomes, however, impractical when the
   state and action space is large.  Unlike model-based methods, model-
   free RL methods learn directly by trial and error with the
   environment and do not require the relatively large memory.  Since
   data efficiency or safety is very important even in VNE problems, the
   use of model-based methods can be actively considered.  However,
   since it is not easy to build a good model that mimics a real network
   environment, a model-free RL method may be more suitable for VNE
   problems.  In conclusion, a good RL method selection plays an
   important role in solving the VNE problem, and VNE performance
   metrics vary depending on the selected RL method.

4.5.  Training Environment

   Simulation is the use of software to simulate an interacting
   environment that is difficult to actually execute and test.  An RL
   method learns by iteratively interacting with the environment.
   However, in the real environment, various variables such as failure
   and component consumption exist.  Therefore, it is necessary to learn
   through a simulation that simulates the real environment.  In order
   to solve the VNE problem, we need to use a network simulator similar
   to the real environment because it is difficult to repeatedly
   experiment with real network environments using an RL method, and it
   is very challenging and overwhelming to directly apply an RL method
   to real-world environments.  When solving VNE problems, a network
   simulation environment similar to a real network is required.  The

Ullah, et al.           Expires 16 December 2021               [Page 11]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

   network simulation environment should have a general SN environment
   and VNR required by the operator.  The SN has nodes and links between
   nodes, and each has capacity such as CPU and Bandwidth.  In the case
   of VNR, there are virtual nodes and links required by the operator,
   and each must have its own requirements.

4.6.  Sim2Real Gap

   An RL method iteratively learns through a simulation environment to
   train a model of the desired policy.  The trained model is then
   applied to the real environment and/or tuned more for adapting to the
   real one.  However, when the trained model is applied in the
   simulation to the real environment, sim2real gap problem arises.
   Obviously, the simulation environment does not match perfectly to the
   real environment which mostly fails in the tuning process and gives
   poor performance in the model because of the Sim2Real gap.  The
   sim2real gap is caused by the difference between the simulation and
   the real environment.

   It is because the simulation environment cannot perfectly simulate
   the real environment, and there are many variables in the real
   environment.  In a real network environment for VNE, the SN's nodes
   and links may fail due to external factors, or capacity such as CPU
   may change suddenly.  In order to solve this problem, the simulation
   environment should be more robust or the trained RL model should be
   generalized.  To reduce the gap between sim and real network
   environments we need to train our model with an efficient and large
   number of VNR and keep learning the agent not only depend on previous
   memorization.

4.7.  Generalization

   Generalization refers to the trained model's ability to adapt
   properly to previously unseen new observations.  An RL method tries
   to learn a model that optimizes some objective with the purpose of
   performing well on data that has never been seen by the model during
   training.  In terms of VNE problems, the generalization is a measure
   of how the agent's policy model performs on predicting unseen VNR.
   The RL agent not only has to memorize all the previous variance of
   the VNR but also to learn and explore more possible variance.  It is
   important to have good and efficient training data for VNR with good
   variance and train the model with all possible VNRs.

5.  IANA Considerations

   This memo includes no request to IANA.

Ullah, et al.           Expires 16 December 2021               [Page 12]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

6.  Security Considerations

   This is an Informational draft that details research challenges.  It
   does not introduce any security threat.

7.  Informative References

   [ASNVT2020]
              Sharif, Kashif., Li, Fan., Latif, Zohaib., Karim, MM., and
              Sujit. Biswas, "A Survey of Network Virtualization
              Techniques for Internet of Things using SND and NFV",
              DOI 10.1145/3379444, April 2020,
              <https://doi.org/10.1145/3379444>.

   [CDVNE2020]
              "A Continuous-Decision Virtual Network Embedding Scheme
              Relying on Reinforcement Learning",
              DOI 10.1109/TNSM.2020.2971543, February 2020,
              <https://ieeexplore.ieee.org/document/8982091>.

   [DeepViNE2019]
              Dolati, M., Hassanpour, S. B., Ghaderi, M., and A.
              Khonsari, "DeepViNE: Virtual Network Embedding with Deep
              Reinforcement Learning", DOI 10.1109/INFCOMW.2019.8845171,
              September 2019,
              <https://ieeexplore.ieee.org/document/8845171>.

   [ENViNE2021]
              ULLAH, IHSAN., Lim, Hyun-Kyo., and Youn-Hee. Han, "Ego
              Network-Based Virtual Network Embedding Scheme for Revenue
              Maximization", DOI 10.1109/ICAIIC51459.2021.9415185, April
              2021, <https://ieeexplore.ieee.org/document/9415185>.

   [MLCNM2018]
              Ayoubi, Sara., Noura, Limam., Salahuddin, Mohammad.,
              Shahriar, Nashid., Boutaba, NRaouf., Estrada-Solano,
              Felipe., and Oscar. M. Caicedo, "Machine Learning for
              Cognitive Network Management",
              DOI 10.1109/MCOM.2018.1700560, January 2018,
              <https://ieeexplore.ieee.org/document/8255757>.

   [MOQL2018] "Multi-Objective Virtual Network Embedding Algorithm Based
              on Q-learning and Curiosity-Driven",
              DOI 10.1109/TETC.2018.2871549, June 2018, <https://jwcn-
              eurasipjournals.springeropen.com/articles/10.1186/
              s13638-018-1170-x>.

Ullah, et al.           Expires 16 December 2021               [Page 13]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

   [MVNE2020] "Modeling on Virtual Network Embedding using Reinforcement
              Learning", DOI 10.1002/cpe.6020, September 2020,
              <https://doi.org/10.1002/cpe.6020>.

   [MVNNML2021]
              Boutaba, Raouf., Shahriar, Nashid., A, Mohammad., and
              Noura. Limam, "Managing Virtualized Networks and Services
              with Machine Learning",
              DOI 48b8fc73c1609d4632d7db5e67e373a62a3cc1f6, January
              2021, <https://www.semanticscholar.org/paper/Managing-
              Virtualized-Networks-and-Services-with-Boutaba-
              Shahriar/48b8fc73c1609d4632d7db5e67e373a62a3cc1f6>.

   [NeuroViNE2018]
              "NeuroViNE: A Neural Preprocessor for Your Virtual Network
              Embedding Algorithm", DOI 10.1109/INFOCOM.2018.8486263,
              June 2018, <https://ieeexplore.ieee.org/document/8486263>.

   [NFVDeep2019]
              Xiao, Y., Zhang, Q., Liu, F., Wang, J., Zhao, M., Zhang,
              Z., and J. Zhang, "NFVdeep: Adaptive Online Service
              Function Chain Deployment with Deep Reinforcement
              Learning", RFC 1129, DOI 10.1145/3326285.3329056, June
              2019, <https://doi.org/10.1145/3326285.3329056>.

   [NRRL2020] "Network Resource Allocation Strategy Based on Deep
              Reinforcement Learning", DOI 10.1109/OJCS.2020.3000330,
              June 2020, <https://ieeexplore.ieee.org/document/9109671>.

   [PPRL2020] "A Privacy-Preserving Reinforcement Learning Algorithm for
              Multi-Domain Virtual Network Embedding",
              DOI 10.1109/TNSM.2020.2971543, September 2020,
              <https://ieeexplore.ieee.org/document/8982091>.

   [QLDC2019] "A Q-Learning-Based Approach for Virtual Network Embedding
              in Data Center", DOI 10.1007/s00521-019-04376, July 2019,
              <https://link.springer.com/article/10.1007/
              s00521-019-04376-6>.

   [QVNE2020] Yuan, Y., Tian, Z., Wang, C., Zheng, F., and Y. Lv, "A Q-
              learning-Based Approach for Virtual Network Embedding in
              Data Center", DOI 10.1007/s00521-019-04376-6, July 2020,
              <https://link.springer.com/article/10.1007/
              s00521-019-04376-6>.

Ullah, et al.           Expires 16 December 2021               [Page 14]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

   [RDAM2018] "RDAM: A Reinforcement Learning Based Dynamic Attribute
              Matrix Representation for Virtual Network Embedding",
              DOI 10.1109/TETC.2018.2871549, September 2018,
              <https://ieeexplore.ieee.org/document/8469054>.

   [RFC7364]  Thomas, P.T., Eric, Y., David, A., Luyuan, A., Larry, A.,
              and A. Maria Napierala, "Problem Statement: Overlays for
              Network Virtualization", October 2015,
              <https://https://datatracker.ietf.org/doc/rfc7364/>.

   [RLVNEWSN2020]
              "Reinforcement Learning for Virtual Network Embedding in
              Wireless Sensor Networks",
              DOI 10.1109/WiMob50308.2020.9253442, October 2020,
              <https://ieeexplore.ieee.org/document/9253442>.

   [SUR2018]  Boutaba, Raouf., Salahuddin, Mohammad., Limam, Noura.,
              Ayoubi, Sara., Shahriar, Nashid., Estrada-Solano, Felipe.,
              and Oscar. M. Caicedo, "A Comprehensive survey on Machine
              Learning for Networking: Evolution, Applications and
              Research Opportunities", DOI 10.1186/s13174-018-0087-2,
              June 2018, <https://link.springer.com/article/10.1186/
              s13174-018-0087-2>.

   [VNEGCN2020]
              Yan, Z., Ge, J., Wu, Y., Li, L., and T. Li, "Automatic
              Virtual Network Embedding: A Deep Reinforcement Learning
              Approach With Graph Convolutional Networks",
              DOI 10.1109/JSAC.2020.2986662, April 2020,
              <https://ieeexplore.ieee.org/document/9060910>.

   [VNEQS2021]
              Wang, Chao., Batth, Ranbir Singh., Zhang, Peiying., Aujla,
              Gagangeet., Duan, Youxiang., and Lihua. Ren, "VNE Solution
              for Network Differentiated QoS and Security Requirements:
              From the Perspective of Deep Reinforcement Learning",
              DOI 10.1007/s00607-020-00883-w, January 2021,
              <https://link.springer.com/article/10.1007/
              s00607-020-00883-w>.

   [VNESURV2013]
              Fischer, Fischer., Botero, Juan Felipe., Till Beck,
              Michael;., Karim, MM., De Meer, Hermann., and Xavier.
              Hesselbach, "Virtual Network Embedding: A Survey",
              DOI 10.1109/SURV.2013.013013.00155, April 2020,
              <https://doi.org/10.1109/SURV.2013.013013.00155>.

Ullah, et al.           Expires 16 December 2021               [Page 15]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

   [VNETD2019]
              Wang, S., Bi, J., V.Vasilakos, A., and Q. Fan, "VNE-TD: A
              Virtual Network Embedding Algorithm Based on Temporal-
              Difference Learning", DOI 10.1016/j.comnet.2019.05.004,
              October 2019,
              <https://doi.org/10.1016/j.comnet.2019.05.004>.

   [VNFFG2020]
              Anh Quang, P.T., Hadjadj-Aoul, Y., and A. Outtagarts,
              "Evolutionary Actor-Multi-Critic Model for VNF-FG
              Embedding", DOI 10.1109/CCNC46108.2020.9045434, January
              2020, <https://ieeexplore.ieee.org/document/9045434>.

   [ZTORCH2018]
              Sciancalepore, V., Chen, X., Yousaf, F. Z., and X. Costa-
              Perez, "Z-TORCH: An Automated NFV Orchestration and
              Monitoring Solution", BCP 72, RFC 3552,
              DOI 10.1109/TNSM.2018.2867827, August 2018,
              <https://ieeexplore.ieee.org/document/8450000>.

Authors' Addresses

   Ihsan Ullah
   KOREATECH
   1600, Chungjeol-ro, Byeongcheon-myeon, Dongnam-gu
   Cheonan
   Chungcheongnam-do
   31253
   Republic of Korea

   Email: ihsan@koreatech.ac.kr

   Youn-Hee Han
   KOREATECH
   1600, Chungjeol-ro, Byeongcheon-myeon, Dongnam-gu
   Cheonan
   Chungcheongnam-do
   31253
   Republic of Korea

   Email: yhhan@koreatech.ac.kr

   TaeYeon Kim
   ETRI
   218 Gajeong-ro, Yuseong-gu
   Daejeon

Ullah, et al.           Expires 16 December 2021               [Page 16]
Internet-Draft     ML-based Virtual Network Embedding          June 2021

   34129
   Republic of Korea

   Email: tykim@etri.re.kr

Ullah, et al.           Expires 16 December 2021               [Page 17]