From: Chris Pratley (email@example.com)
Date: Mon Apr 14 2003 - 04:10:31 EDT
In case I wasn't clear, the mappings of EUDC to PUA are determined by Windows, not "grabbed" by Word. Also, the requirement to provide a stable mapping of EUDC to PUA is not a whim of Windows, it is a fact of life (OK, maybe just a fact of encoding roundtrip :-)).
EUDC are widely used throughout Asia - the mappings cannot be changed arbitrarily or documents could not be exchanged. We are stuck with them, although whatever the mappings were, we'd be stuck with those instead. EUDC cannot be avoided or abandoned anytime soon. They play a valuable role. As you probably know, EUDC were (are) used to handle characters not encoded by their native code pages, often not even Unicode, although most of the characters they are used for are probably now encoded in plane 2 at least. Essentially they are the PUA of the code pages. So mapping them to PUA is correct.
The mappings of EUDC to PUA are documented on MSDN. There may be a larger range reserved in Word than strictly necessary - I'll check.
I understand that overlapping uses of the PUA are frustrating - that is why characters should be encoded whenever possible - although merely encoding the character does not free up the overload on the PUA since legacy data will be with us for many years hence (forever, if some archivists had their way). Please understand that as an implementer, managing to deliver a useful product to all potential users is quite a challenge given the amount of legacy data out there which is expected to work transparently.
From: firstname.lastname@example.org on behalf of Christopher John Fynn
Sent: Fri 4/11/2003 11:09 PM
Subject: Re: Variant Glyph Display
"Chris Pratley" <email@example.com> wrote:
> Part of the PUA (starting at the bottom) is used by Word to handle
> EUDC characters from standards such as Big5, Shift-JIS etc when
> data from those charsets are mapped to Unicode.
It might be helpful if the ranges of the PUA which have been grabbed like this by MS Word was documented somewhere where the information is easy to find. While I'd hate to see this become some kind of de facto encoding, MS Word is a rather ubiquitous application and surely you don't want others, who may have perfectly valid reasons for using PUA code points, to have problems with MS Word for no apparent reason.
BTW I thought "corporate" use of the PUA was supposed to start at U+F9FF and work downward and "end user" use was supposed to start at the bottom (U+E000) and work up. Is there a reason why this wasn't followed in this case?
This archive was generated by hypermail 2.1.5 : Mon Apr 14 2003 - 05:07:41 EDT