Character Mnemonics and Character Sets
Network Working Group K. Simonsen
Request for Comments: 1345 Rationel Almen Planlaegning
Character Mnemonics & Character Sets
Status of the Memo
This memo provides information for the Internet community. It does
not specify an Internet standard. Distribution of this memo is
This memo lists a selection of characters and their presence in some
coded character sets. To facilitate the coded character set
tabulations an unambiguous mnemonic for each character is used, and a
format for tabulating the coded character sets is defined. The coded
character sets are given names for easy reference. A family of coded
character sets called the mnemonic character sets and conversion
between these coded character set without information loss is
The character set names are registered with the Internet Assigned
Numbers Authority (IANA). Additional character sets not described in
this memo should be registered with the IANA. This memo may be
updated periodically, or additional specifications may be published,
to reflect other coded character sets.
Please send any comments including comments about the accuracy of the
tables to the author, Keld Simonsen <Keld.Simonsen@dkuug.dk>.
With the growing internationalization of the Internet, support for
many coded character sets is required. It is the intention of this
memo to document precisely the mapping between all characters and
their corresponding coded representations in various coded character
sets, and give names to these coded character sets, so they can be
referenced unambiguously in Internet standards.
This memo does not indicate anything about the validity of using
these specifications in any Internet standard, so you should consult
each individual Internet standard to see which coded character sets
and names are allowed there.
Unambiguous character mnemonics are specified, which provide a
practical way of identifying a character, without reference to a
coded character set and its code in this coded character set. The
mnemonics are written in a minimal set of characters, namely the
invariant 83 graphical characters of ISO 646, which is a kind of
greatest common subset to be found between the majority of coded
Simonsen [Page 1]
RFC 1345 Character Mnemonics & Character Sets June 1992
character sets, including ASCII, national variants of the ISO 646 7-
bit character set and various EBCDICs. In addition, the numeric
value of the coded representations of all these characters are the
same in all coded character sets compatible with ISO standards. All
of them except two, EXCLAMATION MARK and QUOTATION MARK, have the
same coded representation in all variants of EBCDIC. This minimal
set of characters is called the reference character set in this memo.
The mnemonics can be used in Internet standards for easy and
unambiguous reference, and they can also serve as a fallback
representation in various Internet specifications.
The coded character sets covered include all parts of ISO 8859, ISO
6937-2 and all ISO 646 conforming coded character sets in the ISO
character set registry managed by ECMA according to ISO 2375. Almost
all graphic coded character sets in the ECMA registry (1) are
covered. The graphic coded character sets not included are registry
numbers 31, 38, 39, 53, 59, 68, 71, 72, 129 and 137. In addition
many vendor defined character sets are covered, including PC
codepages (4), (7), (8), many EBCDIC character sets (4), (5), (6) and
HP, DEC and Apple character sets (8), (9), (10), (13), (14). The
East-Asian 16-bit character sets from the ECMA registry is also
included in this memo.
2. CHARACTER MNEMONICS
2.1 General Syntax
The character mnemonics are taken from the ISO committee draft (CD)
of the POSIX.2 standard (3). They are classified into two groups:
1. A group with two-character mnemonics
- Primarily intended for alphabetic scripts like Latin, Greek,
Cyrillic, Hebrew and Arabic, and special characters.
2. A group with variable-length mnemonics
- primarily intended for non-alphabetic scripts like Japanese and
Chinese, but also used for some accented letters and special
In the two-character mnemonics, all invariant graphic character in
the ISO 646 character codes except "&" are used, i.e. the following
! " % ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _
a b c d e f g h i j k l m n o p q r s t u v w x y z
The character "_" is not used as the first character.
Show full document text