Internationalization of the Hypertext Markup Language
RFC 2070
Document | Type |
RFC - Historic
(January 1997; No errata)
Obsoleted by RFC 2854
Was draft-ietf-html-i18n (html WG)
|
|
---|---|---|---|
Authors | François Yergeau , Glenn Adams , Martin Dürst , Gavin Nicol | ||
Last updated | 2013-03-02 | ||
Stream | IETF | ||
Formats | plain text html pdf htmlized bibtex | ||
Stream | WG state | WG Document | |
Document shepherd | No shepherd assigned | ||
IESG | IESG state | RFC 2070 (Historic) | |
Consensus Boilerplate | Unknown | ||
Telechat date | |||
Responsible AD | (None) | ||
Send notices to | (None) |
Network Working Group F. Yergeau Request for Comments: 2070 Alis Technologies Category: Standards Track G. Nicol Electronic Book Technologies G. Adams Spyglass M. Duerst University of Zurich January 1997 Internationalization of the Hypertext Markup Language Status of this Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Abstract The Hypertext Markup Language (HTML) is a markup language used to create hypertext documents that are platform independent. Initially, the application of HTML on the World Wide Web was seriously restricted by its reliance on the ISO-8859-1 coded character set, which is appropriate only for Western European languages. Despite this restriction, HTML has been widely used with other languages, using other coded character sets or character encodings, at the expense of interoperability. This document is meant to address the issue of the internationalization (i18n, i followed by 18 letters followed by n) of HTML by extending the specification of HTML and giving additional recommendations for proper internationalization support. A foremost consideration is to make sure that HTML remains a valid application of SGML, while enabling its use with all languages of the world. Table of Contents 1. Introduction .................................................. 2 1.1. Scope ...................................................... 2 1.2. Conformance ................................................ 3 2. The document character set ..................................... 4 2.1. Reference processing model ................................. 4 2.2. The document character set ................................. 6 2.3. Undisplayable characters ................................... 8 Yergeau, et. al. Standards Track [Page 1] RFC 2070 HTML Internationalization January 1997 3. The LANG attribute.............................................. 8 4. Additional entities, attributes and elements ................... 9 4.1. Full Latin-1 entity set .................................... 9 4.2. Markup for language-dependent presentation ................ 10 5. Forms ..........................................................16 5.1. DTD additions ..............................................16 5.2. Form submission ............................................17 6. External character encoding issues .............................18 7. HTML public text ...............................................20 7.1. HTML DTD ...................................................20 7.2. SGML declaration for HTML ..................................35 7.3. ISO Latin 1 character entity set ...........................37 8. Security Considerations.........................................40 Bibliography ......................................................40 Authors' Addresses ................................................43 1. Introduction The Hypertext Markup Language (HTML) is a markup language used to create hypertext documents that are platform independent. Initially, the application of HTML on the World Wide Web was seriously restricted by its reliance on the ISO-8859-1 coded character set, which is appropriate only for Western European languages. Despite this restriction, HTML has been widely used with other languages, using other coded character sets or character encodings, through various ad hoc extensions to the language [TAKADA]. This document is meant to address the issue of the internationalization of HTML by extending the specification of HTML and giving additional recommendations for proper internationalization support. It is in good part based on a paper by one of the authors on multilingualism on the WWW [NICOL]. A foremost consideration is to make sure that HTML remains a valid application of SGML, while enabling its use with all languages of the world. The specific issues addressed are the SGML document character set to be used for HTML, the proper treatment of the charset parameter associated with the "text/html" content type and the specification ofShow full document text