RE: latin1 decoder implementation

From: Whistler, Ken <ken.whistler_at_sap.com>
Date: Fri, 16 Nov 2012 22:42:22 +0000

A IANA-registered character *map* is a very different animal from a character encoding standard per se.

The actual character encoding standard, ISO/IEC 8859-1:1998 does not define the C0 and C1 control codes (and never will). That was what I was quoting from.

A mapping table, on the other hand, needs to map the control codes as well as the graphic characters, and that is what ISO_8859-1:1987 does.

Note the same behavior for other mapping tables, including the ISO 8859 mapping table posted on the Unicode website:

http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT

--Ken

I actually did quote that, to no avail.

This seems to be the missing information though (from the wikipedia iso-8859-1 article http://en.wikipedia.org/wiki/ISO/IEC_8859-1):

> In 1992, the IANA<http://en.wikipedia.org/wiki/Internet_Assigned_Numbers_Authority> registered the character map ISO_8859-1:1987, more commonly known by its preferred MIME<http://en.wikipedia.org/wiki/MIME> name of ISO-8859-1 (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the Internet<http://en.wikipedia.org/wiki/Internet>. This map assigns the C0 and C1 control characters<http://en.wikipedia.org/wiki/C0_and_C1_control_character> to the unassigned code values thus provides for 256 characters via every possible 8-bit value.

To me this means that the blanks in the "codepage layout" diagram are quite misleading and should be filled in.
Received on Fri Nov 16 2012 - 16:43:14 CST

This archive was generated by hypermail 2.2.0 : Fri Nov 16 2012 - 16:43:15 CST