Representation of Non-ASCII Text in Internet Message Headers
RFC 1342

Document Type RFC - Proposed Standard (June 1992; No errata)
Obsoleted by RFC 1522
Last updated 2013-03-02
Stream IETF
Formats plain text pdf htmlized bibtex
Stream WG state (None)
Document shepherd No shepherd assigned
IESG IESG state RFC 1342 (Proposed Standard)
Consensus Boilerplate Unknown
Telechat date
Responsible AD (None)
Send notices to (None)
Network Working Group                                           K. Moore
Request for Comments: 1342                       University of Tennessee
                                                               June 1992

      Representation of Non-ASCII Text in Internet Message Headers

Status of this Memo

   This RFC specifies an IAB standards track protocol for the Internet
   community, and requests discussion and suggestions for improvements.
   Please refer to the current edition of the "IAB Official Protocol
   Standards" for the standardization state and status of this protocol.
   Distribution of this memo is unlimited.

Abstract

   This memo describes an extension to the message format defined in [1]
   (known to the IETF Mail Extensions Working Group as "RFC 1341"), to
   allow the representation of character sets other than ASCII in RFC
   822 message headers.  The extensions described were designed to be
   highly compatible with existing Internet mail handling software, and
   to be easily implemented in mail readers that support RFC 1341.

Introduction

   RFC 1341 describes a mechanism for denoting textual body parts which
   are coded in various character sets, as well as methods for encoding
   such body parts as sequences of printable ASCII characters.  This
   memo describes similar techniques to allow the encoding of non-ASCII
   text in various portions of a RFC 822 [2] message header, in a manner
   which is unlikely to confuse existing message handling software.

   Like the encoding techniques described in RFC 1341, the techniques
   outlined here were designed to allow the use of non-ASCII characters
   in message headers in a way which is unlikely to be disturbed by the
   quirks of existing Internet mail handling programs.  In particular,
   some mail relaying programs are known to (a) delete some message
   header fields while retaining others, (b) rearrange the order of
   addresses in To or Cc fields, (c) rearrange the (vertical) order of
   header fields, and/or (d) "wrap" message headers at different places
   than those in the original message.  In addition, some mail reading
   programs are known to have difficulty correctly parsing message
   headers which, while legal according to RFC 822, make use of
   backslash-quoting to "hide" special characters such as "<", ",", or
   or which exploit other infrequently-used features of that
   specification.

Moore                                                           [Page 1]
RFC 1342                 Non-ASCII Mail Headers                June 1992

   While it is unfortunate that these programs do not correctly
   interpret RFC 822 headers, to "break" these programs would cause
   severe operational problems for the Internet mail system.  The
   extensions described in this memo therefore do not rely on little-
   used features of RFC 822.  Instead, certain sequences of "ordinary"
   printable ASCII characters (which are assumed to be unlikely to
   otherwise appear in message headers) are reserved for use as encoded
   data.  The characters used in these encodings are restricted to those
   which do not have special meanings in the context in which the
   encoded text appears.

Encodings

   An "encoded-word" is a sequence of printable ASCII characters that
   begins with "=?", ends with "?=", and has two "?"s in between.  It
   specifies a character set and an encoding method, and also includes
   the original text encoded as ASCII characters, according to the rules
   for that encoding method.

   A mail composer that implements this specification will provide a
   means of inputing non-ASCII text in header fields, but will translate
   these fields (or appropriate portions of these fields) into encoded-
   words before inserting them into the message header.

   A mail reader that implements this specification will recognize
   encoded-words when they appear in certain portions of the message
   header.  Instead of displaying the encoded-word "as is", it will
   reverse the encoding and display the original text in the designated
   character set.

   An "encoded-word" is more precisely defined by the following EBNF
   grammar, using the notation of RFC 822:

   encoded-word = "=" "?" charset "?" encoding "?" encoded-text "?" "="

   charset = token    ; legal charsets defined by RFC 1341

   encoding = token   ; Either "B" or "Q"

   token = 1*<Any CHAR except SPACE, CTLs, and tspecials>

   tspecials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "\" /
               <"> / "/" / "[" / "]" / "?" / "." / "="

   encoded-text = 1*<Any printable ASCII character other than "?" or
                  ; SPACE> (but see "Use of encoded-words in message
                  ; headers", below)

Moore                                                           [Page 2]
RFC 1342                 Non-ASCII Mail Headers                June 1992

   An encoded-word may not be more than 75 characters long, including
   charset, encoding, encoded-text, and delimiters.  If it is desirable
Show full document text