From: Philippe Verdy (email@example.com)
Date: Fri Aug 20 2004 - 04:10:21 CDT
From: "Michael Everson" <firstname.lastname@example.org>
> At 09:56 +0200 2004-08-20, Philippe Verdy wrote:
> >From: "Michael Everson" <email@example.com>
> > > Our contribution was intended to weigh the impact on existing text.
> >May be a small correction here:
> >... the impact on existing text already coded with Unicode.
> NO, Philippe, we were counting entities, not their encoding.
Entities are coded, no?
So there's an encoding for them allowing their differenciation. If there's
no differenciation as well with the origin charset, then the same texts do
not encode the difference, and there's no reencoding cost for them, as these
text already map correctly to Unicode.
If entities are counted using some disambiguating dictionary or lexical
analysis, then this is that dictionnary or the rules in the lexical analyzer
that fixes the encoding and, when they are used in combination with the
origin text, reveal the true identity of character entities. In that case
there's no reencoding needed as well: the same lexical analyzer rules or
dictionnary will just need to use the new code instead of generating the
same VAV,HOLAM codepoints sequence in the generated Unicode text.
There's no reencoding issue with newly encoded texts, created with a editor
(or keyboard driver or input method or other....) that is used for the
purpose of encoding the difference. The cost is only educational (training)
for users of these updated tools to create their new texts, but this code is
not a reencoding cost.
I don't understand where is the reencoding cost issue for legacy texts
(already encoded or facsimiles). The cost is not in the texts themselves but
in the tools used to convert them to a newer less ambiguous version of
Unicode, and in the software used to edit or render them (new fonts, new
renderer version such as the UniScribe engine on Windows, updated database
character properties): all these are costs that already exist each time new
characters of any script, old or new, are added into Unicode. And each time
the user is give nthe choice to upgrade or not.
This archive was generated by hypermail 2.1.5 : Fri Aug 20 2004 - 04:12:02 CDT