Status of this Memo

Document Type Active Internet-Draft (individual)
Authors Fotis Foukalas  , Athanasios Tziouvaras 
Last updated 2021-09-27
Stream (None)
Intended RFC status (None)
Formats pdf htmlized bibtex
Stream Stream state (No stream defined)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date
Responsible AD (None)
Send notices to (None)
IoT Operations Working Group                             F. Foukalas
Internet-Draft                                           A. Tziouvaras
Intended status: Draft Standard                          September 27, 2021
Expires: September, 2022                                  


Status of this Memo

This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF).  Note that other groups may also distribute
working documents as Internet-Drafts.  The list of current Internet-
Drafts is at

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time.  It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."

This Internet-Draft will expire on September 22, 2022.

Copyright Notice

Copyright (c) 2021 IETF Trust and the persons identified as the 
document authors. All rights reserved.

        This document is subject to BCP 78 and the IETF Trust's Legal 
        Provisions Relating to IETF Documents
        ( in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.


Next generation Internet requires decentralized and distributed 
intelligence in order to make available a new type of 
experience to serve the user's interests. Such new services
will be enabled by deploying the intelligence 
over a high volume of IoT devices in a form of distributed 
protocol. Such a  protocol will orchestrate the machine learning 
(ML) application in order to train the aggregated data available 
from the IoT devices. The training is not an easy task in such 
a distributed environment, where the amount of connected IoT 
devices will scale up and the needs for both interoperability 
and computing are high. This draft, addresses both issues 
by combining two emerging technologies known as edge AI 
and fog computing. The protocol procedures aggregate the data 
collected by the IoT devices into a fog node and apply edge AI 
for data analysis at the edge of the infrastructure. The 
analysis of the IoT requirements resulted in an end-to-end ML 
protocol specification which is presented throughout this draft.

Table of Contents

1. Introduction 2
2. Background and terminology 3
3. Edge computing architecture 4
4. Protocol stages 8
4.1. Initial configuration 8
4.2. FL training 11
4.3. Cloud update 12
5. Security Considerations 14
6. IANA Considerations 15
7. Conclusions 15
8. References 15
8.1. Normative References 15
9. Acknowledgments 16

1. Introduction

There is an evident requirement to address several challenges 
to offer robust IoT services by leveraging the integration of 
Edge computing with IoT known as IoT edge computing. The concept 
of IoT edge computing has not been specified in detail yet 
although two recent drafts described already some aspects of such 
Internet architecture. Such architecture is way more useful in case 
of distributed machine learning deployment to future Internet, 
where the edge artificial intelligence will play an important role. 
Towards this end, the proposed draft provides first the IoT edge 
computing architecture, which includes the necessary elements 
to deploy distributed machine learning. Second, three stages of 
such a distributed intelligence are described in a sort of protocol 
procedures, where the initialization, the learning and cloud updates 
were devised. Details are given for all the protocol procedures 
of the distributed machine learning for IoT edge computing. 

2. Background and terminology 

Below we list a number of terms related with the distributed 
machine learning solution:

End devices: End devices [1] are IoT devices that collect 
data while also having computing and networking capabilities.
 End devices can be any type of device that can connect to the 
Edge gateway and facilitate sensors for data collection.

Edge gateway: The Edge gateway is a server that is located to 
the Edge of the network [1]. It facilitates large computational 
and networking capabilities and coordinates the FL process. 
The Edge gateway is used to relieve the traffic from the network 
backhaul as the end devices connect to the Edge instead of the 

Cloud: Cloud supports very large computational capabilities [1] 
and is geographically located far from the end devices. It provides 
accessibility to the Edge gateway and remains agnostic on the amount 
and type of participating end devices. As a result, the cloud does 
not have an active role in the FL training process.

Federated learning (FL): FL is a distributed ML technique which 
utilizes a large number of End devices that train their ML models 
locally without communicating with each other. The locally trained 
models are dispatched to the Edge gateway which aggregates the 
collected models into one global model. In the sequel the global 
model is broadcasted to the end devices in order for the next 
training round to begin. During the FL process, the end devices 
do not share data or any other information.

Constrained application protocol (CoAP): CoAP is a UDP 
communication protocol which supports lightweight communication between 
two entities [RFC 7252]. CoAP is ideal for devices with limited 
computational capabilities as it does not require full protocol 
stack to operate. CoAP supports the following message formats: 
Confirmable (CON) messages, non-confirmable (NON) messages, 
acknowledgement (ACK) reply messages and reset (RST) reply messages. 
CON messages are reliable message requests and are provided by 
marking a message as confirmable. A confirmable message is 
retransmitted using a default timeout and exponential back off 
between retransmissions, until the recipient sends an Acknowledgement 
message (ACK) with the same Message ID. When a recipient is not able 
to process a Confirmable message, it replies with a Reset message (RST) 
instead of an Acknowledgement. NON messages are message requests 
that do not require reliable transmission. These are not acknowledged, 
but still have a Message ID for duplicate detection.  When a recipient 
is not able to process a Non-confirmable message, it may reply with a 
Reset message (RST).

3. Edge computing architecture

Fig 1 below depicts the IoT architecture we employ, where the three 
main entities are the end devices, the edge gateway and the cloud 
server. Below we describe the functionalities of each module 
and how each module it interacts with the rest of the 

End devices: End devices can be classified into constrained and 
non-constrained according to the processing capabilities they 
employ. Previous work in [2] classifies the end devices into the 
following categories:

Class 0 (C0): This class contains sensor-like devices. Although 
they may answer keep-alive signals and send basic indications, 
they most likely do not have the resources to securely 
communicate with the Internet directly (larger devices act as 
proxies, gateways, or servers) and cannot be secured or managed 
comprehensively in the traditional sense.

Class 1 (C1): Such devices are quite constrained in code space 
and processing capabilities and cannot easily talk to other 
Internet nodes nor employ a full protocol stack. Thus they 
are considered ideal for the Constrained Application Protocol 
(CoAP) over UDP.

Class 2 (C2): C2 devices are less constrained and capable of 
supporting most of the same protocol stacks as servers and 
laptop computers.

Other (C3): Devices with capabilities significantly beyond that 
of Class 2 are left uncategorized (Others). They may still be 
constrained by a limited energy supply, but can largely use 
existing protocols unchanged.

To this end, the IoT architecture provides cameras as C1 devices 
and mobile phones as C2-other devices. Each device stores a 
local dataset independently from the others and does not have any 
access to the data sets of the rest of the devices. Also, end 
devices are responsible for training their local ML model and for 
reporting the trained model to the edge gateway for the 
aggregation process.

Edge gateway: The edge gateway is responsible for collecting the 
locally trained models from the end devices and for aggregating 
such models into a global model. Further, the edge gateway is 
responsible for dispatching the trained model to the cloud in 
order to make it available to the developers. In order to support 
the aforementioned services the edge gateway employs the 
following controller interfaces:

Southbound controller: The southbound interface is responsible 
for handling the communication between the edge gateway and
the end devices [5]. The southbound controller also performs 
the resource discovery, resource authentication, device 
configuration and global model dispatch tasks. The resource 
discovery process manages to detect and identify the devices 
that participate on the FL training and also to establish a 
communication link between the edge and the device. The resource 
authentication process authenticates the end devices by matching 
each device's unique ID with a trusted ID list that is stored 
at the edge. The resource configuration broadcasts the ML model 
hyperparameters to the participating end devices. Finally the 
global model dispatch operation broadcasts the aggregated global
model to the trusted connected devices.

Central controller: The Central controller is the core component 
of Network Artificial Intelligence, which can be called as 
"Network Brain" [4]. It carries on the FL aggregation process and is 
responsible to stop the FL process when the model converges. It also 
performs the data sharing, global model training, global model 
aggregation and device scheduling functionalities.
Northbound interface: The northbound interface is provided by a 
gateway component to a remote network [5], e.g. a cloud, home 
or enterprise network. The northbound interface is a data plane 
interface, which facilitates the communication management of the 
edge gateway with the cloud. Under this premise the northbound 
interface is responsible for the model sharing and the model 
publish functionalities. Model sharing is the function under
which the edge is authenticated by the cloud as a trusted 
party and thus, gains the rights to upload the trained FL 
model to the cloud. Model publish the uploading process of 
the trained model to the cloud so that to make it available 
to the developers.

Cloud server: The Cloud server may provide virtually unlimited 
storage and processing power [3].  The reliance of IoT on 
back-end cloud computing brings additional advantages such 
as flexibility and efficiency.  The cloud will facilitate the 
trained FL model which can be used by developers for AR 

FL model: The FL model should operate separately from the dataset 
used for the training process. In this sense, the ML model 
architecture and the dataset type may change without affecting the 
overall FL training process. This interoperability is ensured as 
we design the FL independently of the web protocol and thus, the 
end device-edge communication is not affected by any changes in 
the IoT architecture. Further, the datasets of each device 
are stored locally and interact only with the local FL model 
while the edge does not have any access to them. As a result 
the functionality of the FL training is not affected by either 
the dataset type or size, or by the FL model architecture.

|                                                                  |
| +------------------------+                                       |
| | End devices            |                                       |
| | * Data collection      |                                       |                
| | * Reporting            |                                       |
| | * Local model training |                                       |                                            
| | +---------------------+|                                       |                          
|       | FL training                                              |
|       |                                                          |
| +---------------------------------------------------------------+|                                               
| | Edge gateway                                                  ||
| |                                                               ||
| | +------------------+  +----------------+  +-----------------+ ||
| | | Southbound       |  | Central        |  | Northbound      | ||
| | | interface        |  | controller     |  | interface       | ||
| | |                  |  |                |  |                 | ||
| | | * Resource       |  | * Device       |  | * Model sharing | ||
| | |   discovery      |  |   scheduling   |  | * Model publish | || 
| | | * Resource       |  | * Global model |  +-----------------+ ||
| | |   authentication |  |   aggregation  |                      ||
| | | * Device         |  +----------------+                      ||
| | |   configuration  |                                          ||
| | | * Global model   |                                          ||
| | |   dispatch       |                                          ||
| | +------------------+                                          ||
| |                                                               ||                                                               
| +---------------------------------------------------------------+|
|      |                                                           |
|      | Model to cloud                                            |                                                                                                                     
| +---------------+                                                |
| | Cloud server  |                                                |
| |               |                                                |
| | * Store model |                                                |
| +---------------+                                                |
|                                                                  |
Figure 1: Protocol architecture

4. Protocol stages

In this section we describe the stages which are used by the Edge 
computing protocol to perform the FL process.

4.1. Initial configuration

Fig. 2 below depicts the initial configuration stage of the Edge 
IoT protocol using the CoAP. The initial configuration stage provides 
the necessary functionalities for establishing the IoT-edge gateway 
communication link and for identifying the end devices that will 
participate in the training process. Such functionalities are 
considered as follows:

1.Resource discovery: The end devices are discovered by the edge 
and employ the CoAP to inform the edge gateway about their 
computational capabilities. More specific, the end devices send an 
NON message to the edge containing the resource type of the 
corresponding device, i.e. C0, C1, C2 or C3. The NON message type 
is not confirmable and thus, the edge informs the devices with an
 RST message only in case of a transmission error. In the sequel 
the edge decides which device types may participate in the training 
process and send back a NON message containing the resource discovery 
decision to the corresponding devices.

2.Resource authentication: The end devices are authenticated by the 
edge as trusted parties and are allowed to participate in the training
process. On the contrary, any unauthenticated devices cannot participate 
in the training. To this end, the previously discovered end devices
send a NON message to edge containing the ID information of the 
transmitted device. The edge then informs each device if it failed to 
receive the corresponding ID by dispatching an RST message. Once the 
edge collects all the IDs of the devices it performs the device 
authentication process which designates which end devices will 
participate on the FL process. Finally each device is informed about the 
edge decision by a NON message that contains the authentication outcome. 
Only authenticated end devices are eligible in participating in 
the FL training. 

3.Device scheduling: The edge gateway selects the amount of the 
authenticated end devices that will participate in the training 
and dispatches the necessary messages to inform them about its decision. 
Under this premise, it dispatches a NON message containing such 
information to each of the authenticated devices. The devices send back 
an RST response in case of transmission failure and thus, making the 
edge to retransmit the message. In case of successful transmission 
of the original NON message the eligible devices proceed to the 
device configuration phase.

4.Device configuration: The edge gateway employs the CoAP to broadcast 
the FL model hyperparameters to the end devices in order to properly 
configure their local models. To this end, the end devices dispatch 
a NON message informing the edge about their computational capabilities. 
The edge sends back an RST response in case of transmission error, 
or no message in case of successfully message delivery. In the sequel, 
the edge processes the obtained information and designates the model 
architecture and ML parameters that will be used for the FL process. 
Then it broadcasts the related decisions back to the end devices through 
a NON message and all the eligible devices enter the training phase.

After the initial configuration process completes, the Edge IoT protocol 
continues to the FL training stage.

|  +-------------+                 +--------------+                |
|  | End devices |                 | Edge gateway |                |     
|  +-------------+                 +--------------+                |   
|         |   Non message {Resource type}  |                       |
|         |------------------------------->|                       |
|         |                                |                       |
|         |                      +------------------+              |
|         |                      |Resource discovery|              |
|         |                      +------------------+              |
|         |                                |                       |
|         |   Non message {discovery}      |                       |
|         |<-------------------------------|                       |      
|         |   Non message {Device ID}      |                       |
|         |------------------------------->|                       |
|         |                                |                       |
|         |                    +-----------------------+           |
|         |                    |Resource Authentication|           |
|         |                    +-----------------------+           |
|         |                                |                       |
|         |   Non message {Authentication} |                       |
|         |<-------------------------------|                       |         
|         |                       +-----------------+              |
|         |                       |Device scheduling|              |
|         |                       +-----------------+              |
|         |                                |                       |
|         |  Non message {Scheduling info.}|                       |
|         |<-------------------------------|                       |
|         |                                |                       |
|         |   Non message {Avl. Resources} |                       |
|         |------------------------------->|                       |
|         |                                |                       |
|         |                        +----------------+              |
|         |                        |FL configuration|              |
|         |                        +----------------+              |
|         |                                |                       |
|         |   Non message {Hyperparameters}|                       |
|         |<-------------------------------|                       |         
|         |                                |                       |
Figure 2: Protocol initial configuration stage.

4.2. FL training

The FL training is stage in which the actual FL takes places. Fig. 3 
depicts the functionalities we employ in order to support the FL 
process. Such functionalities are considered as follows:

1.Local model training: In this scenario, the end devices that are 
eligible to participate in the FL training send a NON message to 
request the ML model from the edge. Then, the edge responds with an 
RST message if necessary, to trigger the original NON message 
retransmission. In the sequel the edge dispatches the global model 
to the end devices using again the NON message format. The devices 
respond with an RST message in case the transmission resulted in 
errors and thus, the edge retransmits the NON message to the 
corresponding device. Afterwards, each device proceeds to locally 
train the model using its local data set.

2.Device reporting: Once a device completes the local model training, 
it dispatches its model to the edge gateway through the device
reporting process. Due to the constrained nature of the participating 
devices, the end device-edge communication is implemented by
using the NON message format. To this end, the devices dispatch their 
ids and the locally trained models to the edge via NON messages
which are not followed by an ACK from the server side. As a result, 
if the Edge fails to obtain the corresponding RST reply will notify
the end devices and will trigger a retransmission procedure of the 
original NON message to the Edge. After the edge obtains every local
model, it conducts the global model aggregation process and produces 
one global model which is broadcasted back to the devices. The FL 
training process is repeated until the predefined amount of FL rounds 
is reached.

After the FL training completes, the edge computing protocol enters 
the cloud update stage.

|  +-------------+                 +--------------+                |
|  | End devices |                 | Edge gateway |                |     
|  +-------------+                 +--------------+                |   
|         |   Non message {Model request}  |                       |
|         |------------------------------->|                       |
|         |                                |                       |
|         |   Non message {Global model}   |                       |
|         |<-------------------------------|                       |      
|   +------------+                         |                       |
|   | Local model|                         |                       |
|   |  training  |                         |                       |
|   +------------+                         |                       |
|         |  Non message {Local model}     |                       |
|         |------------------------------->|                       |
|         |                                |                       |
|         |                        +------------------------+      |
|         |                        |Global model aggregation|      |  
          |                        +------------------------+      |
|         |   Non message {Model request}  |                       |
|         |------------------------------->|                       |
|         |                                |                       |
|         |   Non message {Global model}   |                       |
|         |<-------------------------------|                       |     
|   +------------+                         |                       |
|   | Local model|                         |                       |
|   |  training  |                         |                       |
|   +------------+                         |                       |
|         |                                |                       |
|         |                                |                       |
|                                                                  |
Figure 3: Protocol training stage.

4.3. Cloud update

Fig. 4 below depicts the cloud update stage of the Edge computing 
protocol which is invoked after the FL training completes. 
Cloud update consists of the following functionalities:

1.Model sharing: The edge gateway informs the cloud for its 
intentions to upload the trained FL model. In the sequel the cloud 
authenticates the edge and decides whether it can be considered a 
trusted party. When the model sharing process successfully completes, 
the edge is authenticated and can proceed to the model publish 
functionality. Due to the fact that no IoT devices participate in 
such communication process, we use the more reliable CON message 
format; instead of relying on NON messages. To this end, the edge 
dispatches a CON message to cloud that contains its ID to inform 
it that the FL process has been completed. The cloud in return 
responds by an ACK or RST reply that indicates whether the 
initial request was successfully delivered. In the sequel, the 
cloud performs the edge authorization procedure according to the 
received ID and sends a CON message to the edge that contains 
the authorization result.

2.Model publish: In this scenario, the edge sends the trained 
model and the model version through a CON message to the cloud.
Thus the edge waits for an ACK or RST reply depending on the 
success of the transmission. If the model is transmitted 
without errors the cloud responds with an ACK message. On the 
contrary, transmission errors result in an RST reply from the 
cloud which triggers a retransmission from the edge. When the 
cloud successfully obtains the trained ML model it stores it 
and makes it available to the users.

|  +-------------+                      +-----+                    |
|  |Edge gateway |                      |Cloud|                    |     
|  +-------------+                      +-----+                    |  
|         |     CON message {Edge ID}      |                       |
|         |------------------------------->|                       |
|         |                                |                       |
|         |         ACK/RST reply          |                       |
|         |<-------------------------------|                       |
|         |                         +--------------+               |
|         |                         |Authentication|               | 
|         |                         +--------------+               |
|         |  CON message {authorization}   |                       |
|         |<-------------------------------|                       |
|         |                                |                       |
|         |         ACK/RST reply          |                       |
|         |------------------------------->|                       |
|         |                                |                       |
|         | CON message {Model, version}   |                       |
|         |------------------------------->|                       |
|         |                                |                       |
|         |         ACK/RST reply          |                       |
|         |<-------------------------------|                       |
|         |                           +-----------+                |
|         |                           |Model store|                | 
|         |                           +-----------+                |
|         |                                |                       |
|         |                                |                       |
Figure 4: Protocol cloud update stage.

5. Security Considerations

The FL training process is considered a difficult task as the achievable 
accuracy of the model is affected by the characteristics of the local 
data sets. Local datasets are the data collected by the end devices
 which are stored locally on each device. In order to ensure data 
privacy, we make sure that no data exchange takes place between 
the end devices or between the end devices and the Edge gateway. 
In this sense, the Edge gateway aggregates the local models without 
utilizing any local data set information and the data privacy of 
each end devices is ensured. Regarding data security, the end 
device-Edge gateway communication can be encrypted using any existing 
encryption technique such as AES. Such an encryption mechanism can 
be applied either for data sharing between the end devices and 
the Edge or for encrypting the messages exchanged between those 
entities similarly to [6]. The encryption mechanism can be applied 
directly to the transmitted CoAP messages provided that a decryption 
process is deployed on the receiver side. Nonetheless, the 
implementation and deployment of such a technique is outside the 
scope of this work.

6. IANA Considerations

There are no IANA considerations related to this document.

7. Conclusions

In this draft we present an FL protocol suitable for distributed 
ML in an IoT network. We provide a functional architecture that 
consists of a number of end devices, of an edge gateway and of a 
cloud server. In order to support the FL training process we 
provide three distinct protocol stages that coordinate the 
distributed learning process. To this end we consider the initial 
configuration, the FL training and the cloud update stages each 
of which provides the necessary functionalities to the FL 
process. The FL training process is conducted by leveraging the
 CoAP communication protocol and takes place between the end 
devices and the edge server. After the training finishes, 
the trained FL model is stored to the cloud and is made 
accessible to the users.

8. References

8.1. Normative References

[1] IoT Edge Computing Challenges and Functions, IETF draft., 
Jul. 2020.
[2] F. Pisani, F. M. C. de Oliveira, E. S. Gama, R. Immich, L. F. 
Bittencourt, E. Borin. "Fog Computing on Constrained Devices: 
Paving the Way for the Future IoT", in arXiv:, Mar. 2019.
[3] Distributed fault management for IoT Networks, IETf draft., Dec 2018.
[4] IoT Edge Computing: Initiatives, Projects and Products, 
IETF draft.
-edge-computing-background-00, May 2020.
[5] IETF iot-edge-computing draft, Weblink: https://www.potaroo.
[6] M. A. Rahman, M. S. Hossain, M. S. Islam, N. A. Alrajeh 
and G. Muhammad, "Secure and Provenance Enhanced Internet 
of Health Things Framework: A Blockchain Managed Federated 
Learning Approach," in IEEE Access, vol. 8, pp. 
205071-205087, Nov. 2020.

8.1. Non-normative References
[RFC 7252] The Constrained Application Protocol (CoAP), Weblink: , Jun. 2014

9. Acknowledgments

<This work has been funded by the NGI TRUST 3rd Open 
Call with reference number: 2019003.> 

Copyright (c) 2021 IETF Trust and the persons identified 
as authors of the code. All rights reserved.
    Redistribution and use in source and binary forms, 
    with or without modification,are permitted provided 
    that the following conditions are met: Redistributions 
    of source code must retain the above copyright 
    notice, this list of conditions and the following 

    Redistributions in binary form must reproduce the above 
    copyright notice, this list of conditions and the 
    following disclaimer in the documentation and/or other 
    materials provided with the distribution. 

   Neither the name of Internet Society, 
   IETF or IETF Trust, nor the names of specific contributors, 
   may be used to endorse or promote products derived from this 
  software without specific prior written permission. 

Authors' Addresses

Fotis Foukalas
Cognitive Innovations
Kifisias 125-127, 11524, Athens, Greece

Athanasios Tziouvaras
Cognitive Innovations
Kifisias 125-127, 11524, Athens, Greece