Japanese Character Encoding for Internet Messages
RFC 1468
Network Working Group J. Murai
Request for Comments: 1468 Keio University
M. Crispin
Panda Programming
E. van der Poel
June 1993
Japanese Character Encoding for Internet Messages
Status of this Memo
This memo provides information for the Internet community. It does
not specify an Internet standard. Distribution of this memo is
unlimited.
Introduction
This document describes the encoding used in electronic mail [RFC822]
and network news [RFC1036] messages in several Japanese networks. It
was first specified by and used in JUNET [JUNET]. The encoding is now
also widely used in Japanese IP communities.
The name given to this encoding is "ISO-2022-JP", which is intended
to be used in the "charset" parameter field of MIME headers (see
[MIME1] and [MIME2]).
Description
The text starts in ASCII [ASCII], and switches to Japanese characters
through an escape sequence. For example, the escape sequence ESC $ B
(three bytes, hexadecimal values: 1B 24 42) indicates that the bytes
following this escape sequence are Japanese characters, which are
encoded in two bytes each. To switch back to ASCII, the escape
sequence ESC ( B is used.
The following table gives the escape sequences and the character sets
used in ISO-2022-JP messages. The ISOREG number is the registration
number in ISO's registry [ISOREG].
Esc Seq Character Set ISOREG
ESC ( B ASCII 6
ESC ( J JIS X 0201-1976 ("Roman" set) 14
ESC $ @ JIS X 0208-1978 42
ESC $ B JIS X 0208-1983 87
Note that JIS X 0208 was called JIS C 6226 until the name was changed
Murai, Crispin & van der Poel [Page 1]
RFC 1468 Japanese Character Encoding for Internet Messages June 1993
on March 1st, 1987. Likewise, JIS C 6220 was renamed JIS X 0201.
The "Roman" character set of JIS X 0201 [JISX0201] is identical to
ASCII except for backslash () and tilde (~). The backslash is
replaced by the Yen sign, and the tilde is replaced by overline. This
set is Japan's national variant of ISO 646 [ISO646].
The JIS X 0208 [JISX0208] character sets consist of Kanji, Hiragana,
Katakana and some other symbols and characters. Each character takes
up two bytes.
For further details about the JIS Japanese national character set
standards, refer to [JISX0201] and [JISX0208]. For further
information about the escape sequences, see [ISO2022] and [ISOREG].
If there are JIS X 0208 characters on a line, there must be a switch
to ASCII or to the "Roman" set of JIS X 0201 before the end of the
line (i.e., before the CRLF). This means that the next line starts in
the character set that was switched to before the end of the previous
line.
Also, the text must end in ASCII.
Other restrictions are given in the Formal Syntax below.
Formal Syntax
The notational conventions used here are identical to those used in
RFC 822 [RFC822].
The * (asterisk) convention is as follows:
l*m something
meaning at least l and at most m somethings, with l and m taking
default values of 0 and infinity, respectively.
message = headers 1*( CRLF *single-byte-char *segment
single-byte-seq *single-byte-char )
; see also [MIME1] "body-part"
; note: must end in ASCII
headers = <see [RFC822] "fields" and [MIME1] "body-part">
segment = single-byte-segment / double-byte-segment
single-byte-segment = single-byte-seq 1*single-byte-char
Murai, Crispin & van der Poel [Page 2]
RFC 1468 Japanese Character Encoding for Internet Messages June 1993
double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 )
single-byte-seq = ESC "(" ( "B" / "J" )
double-byte-seq = ESC "$" ( "@" / "B" )
CRLF = CR LF
; ( Octal, Decimal.)
ESC = <ISO 2022 ESC, escape> ; ( 33, 27.)
SI = <ISO 2022 SI, shift-in> ; ( 17, 15.)
SO = <ISO 2022 SO, shift-out> ; ( 16, 14.)
CR = <ASCII CR, carriage return>; ( 15, 13.)
LF = <ASCII LF, linefeed> ; ( 12, 10.)
one-of-94 = <any one of 94 values> ; (41-176, 33.-126.)
7BIT = <any 7-bit value> ; ( 0-177, 0.-127.)
Show full document text