Internet-Draft	ML-based Virtual Network Embedding	June 2021
Ullah, et al.	Expires 16 December 2021	[Page]

Workgroup:: Internet Engineering Task Force
Internet-Draft:: draft-ihsan-nmrg-ml-vne-00
Published:: 14 June 2021
Intended Status:: Informational
Expires:: 16 December 2021
Authors:: I. Ullah

KOREATECH

Y-H. Han

KOREATECH

TY. Kim

ETRI

Reinforcement Learning-Based Virtual Network Embedding: Problem Statement

Abstract

In Network virtualization (NV) technology, Virtual Network Embedding (VNE) is a problem to map a virtual network to the substrate network. It has a great impact on the performance of virtual network and resource utilization of the substrate network. An efficient embedding strategy can maximize the acceptance ratio of virtual networks to increase the revenue for Internet service provider. Several works have been appeared on the design of VNE solutions, however, it has becomes a challenging issues for researchers. To solved the VNE problem, reinforcement learning (RL) can play a vital role to make the VNE problem more intelligent and efficient. Moreover, RL has been merged with deep learning techniques to develop adaptive models with effective strategies for various complex problems. In RL, agents can learn desired behaviors (e.g, optimal VNE strategies), and after learning and completing training, it can embed the virtual network to the subtract network very quickly and efficiently. RL can reduce the complexity of the VNE method, however, it is too difficult to apply RL techniques directly to VNE problems and need more research study. In this document, we are presenting a problem statement to motivate the research community to solve the VNE problem using reinforcement learning.¶

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶

This Internet-Draft will expire on 16 December 2021.¶

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶

▲

1. Introduction and Scope

Recently, Network virtualization (NV) technology has received a lot of attention from academics and industry. It allows multiple heterogeneous virtual networks to share resources on the same substrate network (SN) [RFC7364], [ASNVT2020]. The current large-size fixed substrate network architecture is no longer efficient and not extendable due to network ossification. To overcome this limitations, traditional Internet Service Providers (ISPs) are divided into two independent parts which work together. One is the Service Providers (SPs) who create and own the different number of the VNs, and the other one is the Infrastructure Providers (InPs) who own the SN devices and links as underlying resources. SPs generate and construct the customized Virtual Network Requests (VNRs), and lease the resources from InPs based on that requests. In addition, two types of mediators can enter into the industry domain for better coordination of SPs and InPs. One is the Virtual Network Providers (VNPs) who assemble and coordinate diverse virtual resources from one or more InPs, the other one is the Virtual Network Operators (VNOs) who create, manage, and operate the VN according to the demand of the SPs. VNPs and VNOs could enable efficient use of the physical network and increase the commercial revenue of both SPs and InPs. NV can increase network agility, flexibility and scalability while creating significant cost savings. Greater network workload mobility, increased availability of network resources with good performance, and automated operations, are all the benefits of NV.¶

Virtual Network Embedding (VNE) [VNESURV2013] is one of the main problem to map a virtual network to the substrate network. The VNE method has two main parts, Node embedding: where virtual nodes of VN have to be mapped to the SN nodes, and Link embedding: where virtual links between the VNs have to be mapped to the physical paths in the substrate network. It has been proven to be NP-Hard, and both node and link embeddings have become challenging for the researchers. A virtual node and link should be efficiently embedded into a given SN, so that more VNR can be accepted with minimum cost. The distance of the virtual nodes from each other in a given SN is a big contribution to the link failures and causes the rejection of VNRs. Hence, an efficient and intelligent technique is required for VNE problem to reduce VNRs rejection [ENViNE2021]. In the perspective of the InPs, the efficient VNE performs better mostly in terms of revenue, acceptance ratio, and revenue-to-cost ratio.¶

Figure 1 shows the the example of two virtual network request VNR1 and VNR2 to embed them in the given substrate network. VNR1 contain three virtual nodes (a, b, and c) with cpu demands (15, 30, and 10) respectively, and the link between virtual the nodes a-b,b-c, and c-a with bandwidth demands 15,20, and 35 respectively. Similarly, VNR2 contains virtual nodes and links with cpu and bandwidth demand respectively. The purpose of the VNE method to map the virtual nodes and links of the VNRs to the physical nodes and links of the given substrate as shown in Figure 1. [ENViNE2021].¶

           +----+                +----+         +----+          +----+
           | a  |                | d  |         | e  |          | f  |
           | 15 |                | 25 |__ _25___| 30 |__ _35_ __| 45 |
           +----+                +----+         +----+          +----+
          /      \                \                                 /
        15        35               30                              20
        /          \                \                             /
  +----+            +----+           +----+                 +----+
  | b  |            | c  |           | g  |                 | h  |
  | 30 |__ _20_ __ _| 10 |           | 15 |__ _ __10__ __ __| 35 |
  +----+            +----+           +----+                 +----+

           (VNR1)                                 (VNR2)
             ||   Embedding                         ||    Embedding
             VV                                     VV

        +----+              +----+       +----+                  +----+
 .......| a  |......35......| c  |       | d  |........25........| e  |
:  _____| 15 |              | 10 |_______| 25 |          ________| 30 |
: |     +----+              +----+       +----+         |        +----+
: |   A      |                | :   B      | :          |   C      |  :
: |   50     |__ ___50__ __ __| :   60     |_:_ __30 _ _|   40     |  :
: +__________+                +_:_________+  :          +__________+  :
:      |                        :     |      :                |       :
15     |                        :     |      :                |      35
:     40                       20     60     :               50       :
:      |                        :     |     30                |       :
:      |                       _:_____|_     :                |       :
+----:..............20........|.:       |    :                |   +----+
| b  | |   +----+.....30......|.........|....:                |   | f  |
| 30 |_|___| g  |             |       +----+                __|___| 45 |
+----+     | 15 |.....10......|.......| h  |........20.....|......+----+
 |   D     +____+             |    E  | 35 |               |     F    |
 |   50     |__ __ __ 70 _____|    40 +____+ ___ __ 50_ ___|     60   |
 +__________+                 +_________+                  +__________+

Figure 1: Substrate network with embedded virtual network, VNR1 and VNR2

Recently, artificial intelligence and machine learning technologies have been widely used to solve networking problems [SUR2018], [MLCNM2018], [MVNNML2021]. There has been a surge in research efforts,specially,reinforcement learning (RL) which has been contributed much more in the many complex tasks, e.g. video games and auto-driving etc. The main goal of an RL to learn better policies for sequential decision making problems (e.g., VNE) and solve them very efficiently. Several works have appeared on the design of VNE solutions using RL, which focuses on how to interact with the environment to achieve maximum cumulative return [VNEQS2021], [NRRL2020], [MVNE2020], [CDVNE2020], [PPRL2020], [RLVNEWSN2020], [QLDC2019], [VNFFG2020], [VNEGCN2020], [NFVDeep2019], [DeepViNE2019], [VNETD2019], [RDAM2018], [MOQL2018], [ZTORCH2018], [NeuroViNE2018], [QVNE2020]. This document outlines the problems encountered when designing and applying RL-based VNE solutions. Section 2 describes how to design RL-based VNE solutions. Section 3 gives terminology, and Section 4 describes the problem space details.¶

2. Reinforcement Learning-based VNE Solutions

As we discussed that RL has been studied in various fields (such as game, control system, operation research, information theory, multi-agent system, network system, etc.) and shows better performance than humans. Unlike deep learning, RL trains a policy model by receiving rewards through interaction with the environment without training label data.¶

Recently, there have been several attempts to solve VNE problems using RL. When applying RL-based methods to solve VNE problems, the RL agent automatically learns without human intervention through interaction with the environment. Once the agent completed the learning process, it can generate the most appropriate embeddings decision (action) based on the state of the network. Based on the embedding or action the agent get reward from the environments to adaptively train its policy for future action. The RL agent gets the most optimized model based on the reward function defined according to each objective (revenue, cost, revenue to cost ratio and acceptance ratio). The optimal RL policy model provides the VNE strategy appropriately according to the objective of the network operator. Designing and applying RL techniques directly into VNE problems is not yet trivial, but may face several challenges. This document describes the problems.¶

Figure 2 shows the virtual network embedding solution based on RL method. The RL is divided into a training process and an inference process. In the training process, state information is composed of various substrate networks and VNRs (Environment), which are used as suitable inputs for RL models through feature extraction. After that, the RL model is updated by model updater using a feature extracted state and reward. In the inference process, using the trained RL model, the embedding result is provided to the operating network in real time.¶

The following figure shows the detail about RL-based VNE solutions.¶

RL Model Training Process
+--------------------------------------------------------------------+
| Training Environment                                               |
| +-------------------+         RL-based VNE Agent                   |
| | +---------+       |         +----------------------------------+ |
| | | +---------+     |         |                   Action         | |
| | | | +----------+  |<----------------------------------+        | |
| | + | | Substrate|  |         |                         |        | |
| |   | | Networks |  |         |  +----------+      +----------+  | |
| |   + +----------+  |  State  |  | Feature  |      |    RL    |  | |
| |                   |----------->|Extraction|----->|   Model  |  | |
| | +--------+        |         |  +----------+      | (Policy) |  | |
| | | +---------+     |         |       |            +----------+  | |
| | + | +---------+   |         |       |   +---------+     A      | |
| |   + |  VNRs   |   | Reward  |       +-->|  Model  |     |      | |
| |     +---------+   |-------------------->| Updater |-----+      | |
| +-------------------+         |           +---------+            | |
|                               +----------------------------------+ |
+--------------------------------------------------------------------+
                                  |
Inference Process                 |
+---------------------------------V----------------------------------+
|                         + - - - - - - - +                          |
| Operating Network       |   RL Model    |    Trained RL Model      |
| (Inference Environment) |   Training    |------------------+       |
| +-------------------+   |   Process     |                  |       |
| |   +-----------+   |   + - - - - - - - +                  |       |
| |   |           |   |         RL-based VNE Agent           |       |
| |   | Substrate |   |         +----------------------------|-----+ |
| |   |  Network  |   |         |                   Action   |     | |
| |   |           |   |<---------------------------------+   |     | |
| |   +-----------+   |         |                        |   V     | |
| | +---------+       |         |  +------------+     +---------+  | |
| | | +---------+     | State   |  |  Feature   |     | Trained |  | |
| | + | +----------+  |----------->| Extraction |---->|   RL    |  | |
| |   + |   VNRs   |  |         |  +------------+     |  Model  |  | |
| |     +----------+  |         |                     +---------+  | |
| +-------------------+         +----------------------------------+ |
+--------------------------------------------------------------------+

Figure 2: Two processes for RL-based VNE solutions

3. Terminology

Network Virtualization: Network virtualization is the process of combining hardware and software network resources and network functionality into a single, software-based administrative entity, a virtual network [RFC7364].¶

Virtual Network Embedding (VNE): Virtual Network Embedding (VNE) [VNESURV2013] is one of the main techniques used to map a virtual network to the substrate network.¶

Substrate Network (SN): The underlying physical network which contains the resources such as CPU and bandwidth for virtual networks is called substrate network.¶

Virtual Network Request (VNR): Virtual Network Request is a complete single Virtual network request containing virtual nodes and virtual links.¶

Agent: In RL, an agent is the component that makes the decision abd take action (i.e., embedding decision).¶

State: State is a representation (e.g., remaining SN capacity and requested VN resource) of the current environment, and it tells the agent what situation it is in currently.¶

Action: Actions (i.e., node and link embedding) are behavior an RL agent can do to change the states of the environment.¶

Policy: A policy defines an agent's way of behaving at a given time. It is a mapping from perceived states of environment to actions to be taken when in those states. It is usually implemented as a deep learning model because the state and action spaces are too large to be completely known.¶

Reward: A reward is the feedback which provides an agent to the agent for taking actions that lead to good outcomes (i.g., achieve the objective of the network operator).¶

Environment: An environment is the agent's world in which it lives and interacts. The agent can interact with the environment by performing some action but cannot influence the rules of the environment by those actions.¶

4. Problem Space

RL contains three main components: state representation, action space, and reward description. For solving a VNE problem, we need to consider how to design the three main RL components. In addition, a specific RL method, training environment, sim2real gap, and generalization are also important issues that should be considered and addressed. We will describe each one in detail as follows.¶

4.1. State Representation

The way to understand and observe the VNE problem is crucial for an RL agent to establish a thorough knowledge of the network status and generate efficient embedding decisions. Therefore, it is essential to firstly design the state representation that serves as the input to the agent. The state representation is the information which an agent can receive from the environment, and consists of a set of values representing the current situation in the environment. Based on the state representation, the RL agent selects the most appropriate action through its policy model. In the VNE problem, an RL agent needs to know the information of the overall SN entities and their current status in order to use the resources of the nodes and edges of the substrate network. Also it must know the requirements of the VNR.¶

Therefore, in the VNE problem, the state usually should represent the current resource state of the nodes and edges of the substrate network (ie, CPU, memory, storage, bandwidth, delay, loss rate, etc.) and the requirements of the virtual node and link of the VNR. The collected status information is used as raw input, or refined status information through the feature extraction process is used as input for the RL agent. The state representation may vary depending on the operator's objective and VNE strategy. The method of determining such feature extraction and representation greatly affects the performance of the agent.¶

4.2. Action Space

In RL, an action represents a decision that an RL agent can take based on current state representation. The set of all possible actions is called an action space. In the VNE problems, actions are generally divided into node embedding and link embedding. The action for node embedding means the VNR's nodes are assigned to which nodes in the SN.¶

Also, for link embedding, the action represents the selected paths between the selected substrate network nodes from the node embedding result. If the policy model of the RL agent is well trained, it will select the embedding result to maximize the reward appropriate for the operator's objectives. The output actions generated from the agent will indicate the adjustment of allocated resources.¶

It is noted that, at each point of time step, an RL method may decide to 1) embed each virtual node onto substrate nodes and then embed each virtual link onto substrate paths separately, or 2) embed the given whole VNR onto substrate nodes and links in the SN at once. In the former case, at every single step, a learning agent focuses on exactly one virtual node from the current VNR, and it generates a certain substrate node to host the virtual node. Link embedding is then performed separately in the same time step. To solve the VNE problem efficiently, mapping of virtual nodes and links are considered together, although they are performed separately. Link mapping is considering more complex than node mapping, because a virtual link can be mapped onto a physical path with different hops. On the other hand, at every single step, a learning agent can try to embed the given whole VNR, i.e., all virtual nodes and links in the given VNR, onto a subset of SN components. The whole VNR embedding should be handled as a graph embedding, so that the action space is huge and the design of the RL method is usually more difficult than the one with each node and link embedding.¶

4.3. Reward Description

Designing rewards is an important issue for an RL method. In general, the reward is the benefit that an RL agent follows when performing its determined action. Reward is an immediate value that evaluates only the current state and action. The value of reward depends on success or failure of each step. In order to select the action that gives the best results in the long run, an RL agent needs to select the action with the highest cumulative reward.¶

The reward is calculated through the reward function according to the objective of the environment, and even in the same environment, it may be different depending on the operator's objective. Based on the given reward the agent can evaluate the effectiveness to improve the policy. Hence, the reward function play a important rules in the training process of RL. In the VNE problem, the overall objectives are to reduce the VNE rejection, embed them with minimum cost, maximize the revenue, and increase the resource utilization of physical resources. Reward function should be designed to achieve one or multiple ones of these objectives. Each objective and its correspondent reward design are outlined as follows:¶

Revenue: Revenue is the sum of the virtual resources requested by the VN, and calculated to determine the total cost of the resources. Typically, a successful action (e.g., VNR is embedded without violation) is treated to be a good reward which also increases the revenue. Otherwise, a failed action (e.g., VNR is rejected) leads that the agent will receive a negative reward as well as decreasing the revenue.¶

Acceptance Ratio: Acceptance ratio is the ratio measured by the number of successfully embedded virtual network requests divided by total number of virtual network requests. To achieve a high acceptance ratio, the agent is trying to embed maximum VNR and get a good reward. Getting a good reward is usually proportional to the acceptance ratio.¶

Revenue-to-cost ratio: To balance and compare the cost of resources for embedding VNR, the revenue is divided by cost. Revenue-to-cost ratio compares the embedding methods with respect to their embedding results in terms of the cost and revenue. Since most VNOs are most interested in this objective, a reward function should be made to relate to this performance metric.¶

4.4. Policy and RL methods

The policy is the strategy that the agent employs to determine the next action based on the current state. It maps states to actions that promise the highest reward. Therefore, an RL agent updates its policy repeatedly in the learning phase to maximize the expected cumulative reward. Unlike supervised learning, in which each sample has a corresponding label indicating the preferred output of the learning model, an RL agent relies on reward signals to evaluate the effectiveness of actions and further improve the policy. From the perspective of RL, the goal of VNE is to find an optimal policy to embed an VNR onto the given SN in any state at any time. There are two types of RL methods: on-policy and off-policy. In on-policy RL methods, the (behaviour) policy of the exploration step to select an action and the policy to learn are the same. On-policy methods work with a single policy, and require any observations (state, action, reward, next state) to have been generated using that policy. Representative on-policy methods include A2C, A3C, TRPO, and PPO. On the other hand, off-policy RL methods work with two policies.¶

These are a policy being learned, called the target policy, and the policy being followed that generates the observations, called the behaviour policy. In off-policy RL methods, the learning policy and the behaviour policy are not necessarily the same. It allows the use of exploratory policies for collecting the experience, since learning and behavior policies are separated. In the VNE problem, various experiences can be accumulated by extracting embedding results using various behavior policies. Representative off-policy methods include Q-learning, DQN, DDPG, and SAC. There are different classifications for RL methods: model-based and model-free. In model-based RL methods, an RL agent learns its optimal behavior indirectly by learning a model of the environment by taking actions and observing the outcomes that include the next state and the immediate reward.¶

The models predict the outcomes of actions. The model is used instead of the environment or in addition to interaction with it to learn optimal policies. This becomes, however, impractical when the state and action space is large. Unlike model-based methods, model-free RL methods learn directly by trial and error with the environment and do not require the relatively large memory. Since data efficiency or safety is very important even in VNE problems, the use of model-based methods can be actively considered. However, since it is not easy to build a good model that mimics a real network environment, a model-free RL method may be more suitable for VNE problems. In conclusion, a good RL method selection plays an important role in solving the VNE problem, and VNE performance metrics vary depending on the selected RL method.¶

4.5. Training Environment

Simulation is the use of software to simulate an interacting environment that is difficult to actually execute and test. An RL method learns by iteratively interacting with the environment. However, in the real environment, various variables such as failure and component consumption exist. Therefore, it is necessary to learn through a simulation that simulates the real environment. In order to solve the VNE problem, we need to use a network simulator similar to the real environment because it is difficult to repeatedly experiment with real network environments using an RL method, and it is very challenging and overwhelming to directly apply an RL method to real-world environments. When solving VNE problems, a network simulation environment similar to a real network is required. The network simulation environment should have a general SN environment and VNR required by the operator. The SN has nodes and links between nodes, and each has capacity such as CPU and Bandwidth. In the case of VNR, there are virtual nodes and links required by the operator, and each must have its own requirements.¶

4.6. Sim2Real Gap

An RL method iteratively learns through a simulation environment to train a model of the desired policy. The trained model is then applied to the real environment and/or tuned more for adapting to the real one. However, when the trained model is applied in the simulation to the real environment, sim2real gap problem arises. Obviously, the simulation environment does not match perfectly to the real environment which mostly fails in the tuning process and gives poor performance in the model because of the Sim2Real gap. The sim2real gap is caused by the difference between the simulation and the real environment.¶

It is because the simulation environment cannot perfectly simulate the real environment, and there are many variables in the real environment. In a real network environment for VNE, the SN's nodes and links may fail due to external factors, or capacity such as CPU may change suddenly. In order to solve this problem, the simulation environment should be more robust or the trained RL model should be generalized. To reduce the gap between sim and real network environments we need to train our model with an efficient and large number of VNR and keep learning the agent not only depend on previous memorization.¶

4.7. Generalization

Generalization refers to the trained model's ability to adapt properly to previously unseen new observations. An RL method tries to learn a model that optimizes some objective with the purpose of performing well on data that has never been seen by the model during training. In terms of VNE problems, the generalization is a measure of how the agent's policy model performs on predicting unseen VNR. The RL agent not only has to memorize all the previous variance of the VNR but also to learn and explore more possible variance. It is important to have good and efficient training data for VNR with good variance and train the model with all possible VNRs.¶

[ASNVT2020]: Sharif, Kashif., Li, Fan., Latif, Zohaib., Karim, MM., and Sujit. Biswas, "A Survey of Network Virtualization Techniques for Internet of Things using SND and NFV", DOI 10.1145/3379444, April 2020, <https://doi.org/10.1145/3379444>.
[CDVNE2020]: "A Continuous-Decision Virtual Network Embedding Scheme Relying on Reinforcement Learning", DOI 10.1109/TNSM.2020.2971543, February 2020, <https://ieeexplore.ieee.org/document/8982091>.
[DeepViNE2019]: Dolati, M., Hassanpour, S. B., Ghaderi, M., and A. Khonsari, "DeepViNE: Virtual Network Embedding with Deep Reinforcement Learning", DOI 10.1109/INFCOMW.2019.8845171, September 2019, <https://ieeexplore.ieee.org/document/8845171>.
[ENViNE2021]: ULLAH, IHSAN., Lim, Hyun-Kyo., and Youn-Hee. Han, "Ego Network-Based Virtual Network Embedding Scheme for Revenue Maximization", DOI 10.1109/ICAIIC51459.2021.9415185, April 2021, <https://ieeexplore.ieee.org/document/9415185>.
[MLCNM2018]: Ayoubi, Sara., Noura, Limam., Salahuddin, Mohammad., Shahriar, Nashid., Boutaba, NRaouf., Estrada-Solano, Felipe., and Oscar. M. Caicedo, "Machine Learning for Cognitive Network Management", DOI 10.1109/MCOM.2018.1700560, January 2018, <https://ieeexplore.ieee.org/document/8255757>.
[MOQL2018]: "Multi-Objective Virtual Network Embedding Algorithm Based on Q-learning and Curiosity-Driven", DOI 10.1109/TETC.2018.2871549, June 2018, <https://jwcn-eurasipjournals.springeropen.com/articles/10.1186/s13638-018-1170-x>.
[MVNE2020]: "Modeling on Virtual Network Embedding using Reinforcement Learning", DOI 10.1002/cpe.6020, September 2020, <https://doi.org/10.1002/cpe.6020>.
[MVNNML2021]: Boutaba, Raouf., Shahriar, Nashid., A, Mohammad., and Noura. Limam, "Managing Virtualized Networks and Services with Machine Learning", DOI 48b8fc73c1609d4632d7db5e67e373a62a3cc1f6, January 2021, <https://www.semanticscholar.org/paper/Managing-Virtualized-Networks-and-Services-with-Boutaba-Shahriar/48b8fc73c1609d4632d7db5e67e373a62a3cc1f6>.
[NeuroViNE2018]: "NeuroViNE: A Neural Preprocessor for Your Virtual Network Embedding Algorithm", DOI 10.1109/INFOCOM.2018.8486263, June 2018, <https://ieeexplore.ieee.org/document/8486263>.
[NFVDeep2019]: Xiao, Y., Zhang, Q., Liu, F., Wang, J., Zhao, M., Zhang, Z., and J. Zhang, "NFVdeep: Adaptive Online Service Function Chain Deployment with Deep Reinforcement Learning", RFC 1129, DOI 10.1145/3326285.3329056, June 2019, <https://doi.org/10.1145/3326285.3329056>.
[NRRL2020]: "Network Resource Allocation Strategy Based on Deep Reinforcement Learning", DOI 10.1109/OJCS.2020.3000330, June 2020, <https://ieeexplore.ieee.org/document/9109671>.
[PPRL2020]: "A Privacy-Preserving Reinforcement Learning Algorithm for Multi-Domain Virtual Network Embedding", DOI 10.1109/TNSM.2020.2971543, September 2020, <https://ieeexplore.ieee.org/document/8982091>.
[QLDC2019]: "A Q-Learning-Based Approach for Virtual Network Embedding in Data Center", DOI 10.1007/s00521-019-04376, July 2019, <https://link.springer.com/article/10.1007/s00521-019-04376-6>.
[QVNE2020]: Yuan, Y., Tian, Z., Wang, C., Zheng, F., and Y. Lv, "A Q-learning-Based Approach for Virtual Network Embedding in Data Center", DOI 10.1007/s00521-019-04376-6, July 2020, <https://link.springer.com/article/10.1007/s00521-019-04376-6>.
[RDAM2018]: "RDAM: A Reinforcement Learning Based Dynamic Attribute Matrix Representation for Virtual Network Embedding", DOI 10.1109/TETC.2018.2871549, September 2018, <https://ieeexplore.ieee.org/document/8469054>.
[RFC7364]: Thomas, P.T., Eric, Y., David, A., Luyuan, A., Larry, A., and A. Maria Napierala, "Problem Statement: Overlays for Network Virtualization", October 2015, <https://https://datatracker.ietf.org/doc/rfc7364/>.
[RLVNEWSN2020]: "Reinforcement Learning for Virtual Network Embedding in Wireless Sensor Networks", DOI 10.1109/WiMob50308.2020.9253442, October 2020, <https://ieeexplore.ieee.org/document/9253442>.
[SUR2018]: Boutaba, Raouf., Salahuddin, Mohammad., Limam, Noura., Ayoubi, Sara., Shahriar, Nashid., Estrada-Solano, Felipe., and Oscar. M. Caicedo, "A Comprehensive survey on Machine Learning for Networking: Evolution, Applications and Research Opportunities", DOI 10.1186/s13174-018-0087-2, June 2018, <https://link.springer.com/article/10.1186/s13174-018-0087-2>.
[VNEGCN2020]: Yan, Z., Ge, J., Wu, Y., Li, L., and T. Li, "Automatic Virtual Network Embedding: A Deep Reinforcement Learning Approach With Graph Convolutional Networks", DOI 10.1109/JSAC.2020.2986662, April 2020, <https://ieeexplore.ieee.org/document/9060910>.
[VNEQS2021]: Wang, Chao., Batth, Ranbir Singh., Zhang, Peiying., Aujla, Gagangeet., Duan, Youxiang., and Lihua. Ren, "VNE Solution for Network Differentiated QoS and Security Requirements: From the Perspective of Deep Reinforcement Learning", DOI 10.1007/s00607-020-00883-w, January 2021, <https://link.springer.com/article/10.1007/s00607-020-00883-w>.
[VNESURV2013]: Fischer, Fischer., Botero, Juan Felipe., Till Beck, Michael;., Karim, MM., De Meer, Hermann., and Xavier. Hesselbach, "Virtual Network Embedding: A Survey", DOI 10.1109/SURV.2013.013013.00155, April 2020, <https://doi.org/10.1109/SURV.2013.013013.00155>.
[VNETD2019]: Wang, S., Bi, J., V.Vasilakos, A., and Q. Fan, "VNE-TD: A Virtual Network Embedding Algorithm Based on Temporal-Difference Learning", DOI 10.1016/j.comnet.2019.05.004, October 2019, <https://doi.org/10.1016/j.comnet.2019.05.004>.
[VNFFG2020]: Anh Quang, P.T., Hadjadj-Aoul, Y., and A. Outtagarts, "Evolutionary Actor-Multi-Critic Model for VNF-FG Embedding", DOI 10.1109/CCNC46108.2020.9045434, January 2020, <https://ieeexplore.ieee.org/document/9045434>.
[ZTORCH2018]: Sciancalepore, V., Chen, X., Yousaf, F. Z., and X. Costa-Perez, "Z-TORCH: An Automated NFV Orchestration and Monitoring Solution", BCP 72, RFC 3552, DOI 10.1109/TNSM.2018.2867827, August 2018, <https://ieeexplore.ieee.org/document/8450000>.

Authors' Addresses

Ihsan Ullah

KOREATECH

1600, Chungjeol-ro, Byeongcheon-myeon, Dongnam-gu

Cheonan

Chungcheongnam-do

31253

Republic of Korea

Email: ihsan@koreatech.ac.kr

Youn-Hee Han

KOREATECH

1600, Chungjeol-ro, Byeongcheon-myeon, Dongnam-gu

Cheonan

Chungcheongnam-do

31253

Republic of Korea

Email: yhhan@koreatech.ac.kr

TaeYeon Kim

ETRI

218 Gajeong-ro, Yuseong-gu

Daejeon

34129

Republic of Korea

Email: tykim@etri.re.kr

Document	Document type	This is an older version of an Internet-Draft whose latest revision state is "Expired".
	Select version	00 01 02
	Compare versions
	Authors	Ihsan Ullah , Youn-Hee Han , TaeYeon Kim
	RFC stream	(None)
	Other formats	txt html pdf bibtex bibxml
	Additional resources

Reinforcement Learning-Based Virtual Network Embedding: Problem Statement

Abstract

Status of This Memo

Copyright Notice

Table of Contents

1. Introduction and Scope

2. Reinforcement Learning-based VNE Solutions

3. Terminology

4. Problem Space

4.1. State Representation

4.2. Action Space

4.3. Reward Description

4.4. Policy and RL methods

4.5. Training Environment

4.6. Sim2Real Gap

4.7. Generalization

5. IANA Considerations

6. Security Considerations

7. Informative References

Authors' Addresses