From: Kenneth Whistler (email@example.com)
Date: Mon Oct 07 2002 - 20:15:09 EDT
Elliotte Harold asked:
> The Unicode data files at
> http://www.unicode.org/Public/MAPPINGS/ISO8859/ do not include a mapping
> for ISO-8859-11, Thai. Is there any particular reason for this?
Just that nobody got around to submitting and posting one.
Since there was a lot of discussion about this over the weekend,
I took it upon myself to create and post one in the same format
as the other ISO8859 tables.
Let me know if anybody spots any problems in the table -- but
it really is pretty straightforward, as others noted: TIS 620-2533 (1990)
with one addition: 0xA0 NO-BREAK SPACE.
Doug dug out:
> These 9 code positions (0xA0, 0xDB..0xDE, 0xFC..0xFF) appear to be
> undefined in TIS 620.2533. Reference  below does show a "word
> separator character" at 0xDC, which I interpret as U+200B ZERO WIDTH
> SPACE, but the other positions are still undefined.
Reference  is online Tru64 Unix documentation about its Thai support,
which claims that:
"- No-Break space. The character code is A0.
- Word separator. The word separator defined in TIS 620-2533."
This despite the fact that the table shown has no no-break space
shown at A0 (and TIS 620-2533 (1990) does not have it), and that
0xDC is undefined in TIS 620-2533, despite the fact that the
table in the Tru64 Unix documentation shows "word sep." there.
The table is labelled the "TACTIS Codeset" for "Thai API Consortium/
Thai Industrial Standard." I surmise that this is some vendor
extension to the actual TIS 620-2533 (1990). The actual standard
states clearly (in Thai) that 0x80..0xA0, 0xDB..0xDE, and 0xFC..0xFF
are reserved (unassigned), and the tables in the standard match that.
So there may be some implementation practice that uses 0xDC for
U+200B ZERO WIDTH SPACE in Thai code pages, but that is not
part of either TIS 620-2533 (1990) nor ISO 8859-11:2001.
This archive was generated by hypermail 2.1.5 : Mon Oct 07 2002 - 21:06:52 EDT