Skip to main content

Protocol for Evaluating Reinforcement Learning Environments in Real Time
draft-perlert-wg-00

Document Type Expired Internet-Draft (individual)
Expired & archived
Author Ruben Montero
Last updated 2021-02-14 (Latest revision 2020-08-13)
RFC stream (None)
Intended RFC status (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state Expired
Telechat date (None)
Responsible AD (None)
Send notices to (None)

This Internet-Draft is no longer active. A copy of the expired Internet-Draft is available in these formats:

Abstract

This document defines a simple UDP protocol for communicating a server simulating a reinforcement learning environment and a client observing it and responding with actions. Reinforcement learning problems are usually defined within the scope of a Markov Decission Process (MDP) where an agent sends an action belonging to an action space to an environment. The environment acts as a black box returning an observation and a reward for the agent, whose goal is to maximize the total obtained rewards. Although the problem statement is easy to understand, there are no conventions on how to communicate a reinforcement learning simulation with a client agent, either in a local network or over the Internet. Additionally, giving an answer to this can be especially useful when it comes to multiagent support and analysis. The protocol PERLERT defined in this document assumes that server and client have shared certain information beforehand via another way of communication like a web page served using HTTP protocol. For example, the client must know a port number and an instance number before proceeding to participate in a simulation run on a server. Also, although it is often desired to know the full feedback from the environment, PERLERT focuses on real-time interaction where human agents can interact with AI agents even if that means that information can be lost due to network packet loss.

Authors

Ruben Montero

(Note: The e-mail addresses provided for the authors of this Internet-Draft may no longer be valid.)