Re: [hebrew] Re: Disunification costs

From: Philippe Verdy (
Date: Fri Aug 20 2004 - 04:10:21 CDT

  • Next message: Philippe Verdy: "Re: [hebrew] Re: A new way to break the Holam deadlock?"

    From: "Michael Everson" <>
    > At 09:56 +0200 2004-08-20, Philippe Verdy wrote:
    > >From: "Michael Everson" <>
    > > > Our contribution was intended to weigh the impact on existing text.
    > >
    > >May be a small correction here:
    > >... the impact on existing text already coded with Unicode.
    > NO, Philippe, we were counting entities, not their encoding.

    Entities are coded, no?
    So there's an encoding for them allowing their differenciation. If there's
    no differenciation as well with the origin charset, then the same texts do
    not encode the difference, and there's no reencoding cost for them, as these
    text already map correctly to Unicode.

    If entities are counted using some disambiguating dictionary or lexical
    analysis, then this is that dictionnary or the rules in the lexical analyzer
    that fixes the encoding and, when they are used in combination with the
    origin text, reveal the true identity of character entities. In that case
    there's no reencoding needed as well: the same lexical analyzer rules or
    dictionnary will just need to use the new code instead of generating the
    same VAV,HOLAM codepoints sequence in the generated Unicode text.

    There's no reencoding issue with newly encoded texts, created with a editor
    (or keyboard driver or input method or other....) that is used for the
    purpose of encoding the difference. The cost is only educational (training)
    for users of these updated tools to create their new texts, but this code is
    not a reencoding cost.

    I don't understand where is the reencoding cost issue for legacy texts
    (already encoded or facsimiles). The cost is not in the texts themselves but
    in the tools used to convert them to a newer less ambiguous version of
    Unicode, and in the software used to edit or render them (new fonts, new
    renderer version such as the UniScribe engine on Windows, updated database
    character properties): all these are costs that already exist each time new
    characters of any script, old or new, are added into Unicode. And each time
    the user is give nthe choice to upgrade or not.

    This archive was generated by hypermail 2.1.5 : Fri Aug 20 2004 - 04:12:02 CDT