Re: Unused code positions and mapping to Unicode

From: Edward Cherlin (
Date: Sun Aug 08 1999 - 18:18:22 EDT

At 14:28 -0700 8/6/1999, wrote:
>>When the vendor later defines a formerly undefined code position, there
>is no feasible alternative to updating your table(s). Once someone starts
>using the newly defined code point, you must map it correctly.
>We have seen a case of this fairly recently: Until a year or so ago, x80, x8E
>and x9E were undefined in CP1252, but now they are defined to map to U+20AC,
>U+017D and U+017E. I know a *lot* of people for whom this created
>problems, but
>as Ken has suggested, there really is no option but to acknowledge these
>and adapt accordingly.

[sigh] Wouldn't it be nice if

[1] owners always put new version numbers on code sets when changing the

[2] software used any available version data as part of file format


(I know that this opens new cans of worms.)

Does anyone feel the need for a conversion utility that would have all the
known details of such things in a database with both user control and good
default heuristics? Is this an idea worth money to the right people? I know
that there are character set detection functions in some Internet software.

Edward Cherlin
"It isn't what you don't know that hurts you, it's
what you know that ain't so."--Mark Twain, or else
some other prominent 19th century humorist and wit

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT