Requirements for Distributed Control of Automatic Speech Recognition (ASR), Speaker Identification/Speaker Verification (SI/SV), and Text-to-Speech (TTS) Resources
RFC 4313
Document | Type | RFC - Informational (December 2005; No errata) | |
---|---|---|---|
Author | David Oran | ||
Last updated | 2015-10-14 | ||
Stream | Internent Engineering Task Force (IETF) | ||
Formats | plain text html pdf htmlized (tools) htmlized bibtex | ||
Stream | WG state | (None) | |
Document shepherd | No shepherd assigned | ||
IESG | IESG state | RFC 4313 (Informational) | |
Action Holders |
(None)
|
||
Consensus Boilerplate | Unknown | ||
Telechat date | |||
Responsible AD | Jon Peterson | ||
Send notices to | (None) |
Network Working Group D. Oran Request for Comments: 4313 Cisco Systems, Inc. Category: Informational December 2005 Requirements for Distributed Control of Automatic Speech Recognition (ASR), Speaker Identification/Speaker Verification (SI/SV), and Text-to-Speech (TTS) Resources Status of this Memo This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2005). Abstract This document outlines the needs and requirements for a protocol to control distributed speech processing of audio streams. By speech processing, this document specifically means automatic speech recognition (ASR), speaker recognition -- which includes both speaker identification (SI) and speaker verification (SV) -- and text-to-speech (TTS). Other IETF protocols, such as SIP and Real Time Streaming Protocol (RTSP), address rendezvous and control for generalized media streams. However, speech processing presents additional requirements that none of the extant IETF protocols address. Table of Contents 1. Introduction ....................................................3 1.1. Document Conventions .......................................3 2. SPEECHSC Framework ..............................................4 2.1. TTS Example ................................................5 2.2. Automatic Speech Recognition Example .......................6 2.3. Speaker Identification example .............................6 3. General Requirements ............................................7 3.1. Reuse Existing Protocols ...................................7 3.2. Maintain Existing Protocol Integrity .......................7 3.3. Avoid Duplicating Existing Protocols .......................7 3.4. Efficiency .................................................8 3.5. Invocation of Services .....................................8 3.6. Location and Load Balancing ................................8 Oran Informational [Page 1] RFC 4313 Speech Services Control Requirements December 2005 3.7. Multiple Services ..........................................8 3.8. Multiple Media Sessions ....................................8 3.9. Users with Disabilities ....................................9 3.10. Identification of Process That Produced Media or Control Output ............................................9 4. TTS Requirements ................................................9 4.1. Requesting Text Playback ...................................9 4.2. Text Formats ...............................................9 4.2.1. Plain Text ..........................................9 4.2.2. SSML ................................................9 4.2.3. Text in Control Channel ............................10 4.2.4. Document Type Indication ...........................10 4.3. Control Channel ...........................................10 4.4. Media Origination/Termination by Control Elements .........10 4.5. Playback Controls .........................................10 4.6. Session Parameters ........................................11 4.7. Speech Markers ............................................11 5. ASR Requirements ...............................................11 5.1. Requesting Automatic Speech Recognition ...................11 5.2. XML .......................................................11 5.3. Grammar Requirements ......................................12 5.3.1. Grammar Specification ..............................12 5.3.2. Explicit Indication of Grammar Format ..............12 5.3.3. Grammar Sharing ....................................12 5.4. Session Parameters ........................................12 5.5. Input Capture .............................................12 6. Speaker Identification and Verification Requirements ...........13 6.1. Requesting SI/SV ..........................................13 6.2. Identifiers for SI/SV .....................................13 6.3. State for Multiple Utterances .............................13 6.4. Input Capture .............................................13 6.5. SI/SV Functional Extensibility ............................13 7. Duplexing and Parallel Operation Requirements ..................13 7.1. Full Duplex Operation .....................................14 7.2. Multiple Services in Parallel .............................14Show full document text