Re: collation (was: Phoenician)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat May 08 2004 - 16:27:40 CDT

  • Next message: Peter Constable: "RE: Phoenician"

    From: "E. Keown" <k_isoetc@yahoo.com>

    > Thank you Philippe for taking the time to explain. I
    > originally wanted to be a digital lexicographer, so I
    > am interested in perfect collation.

    Pas de quoi! I hope I have been useful to explain the basic concepts. In fact
    the Unicode algorithm for collation is a bit more more complex, because it takes
    into accounts more subtles features needed to cover various languages. My
    examples were very simplified face to what you can do with Unicode collation.

    > I assume that Philippe's 'DUCET' and Michael Everson's
    > "default template" refer to the same item. And
    > Unicode-compliant software will support DUCET.

    "DUCET" is referenced in the Unicode standard documenting collation. It's a
    prebuilt table of collation "weigths" (the term used to designate the comparable
    numeric values that allows matching and ordering characters and strings)
    computed according to what is really a standardized (but tailorable) default
    collation order, and some arbitrary numeric thresholds and arbitrary "gap"
    values (to simplify some implementations of tailoring, without requiring
    renumbering of weights in case of insertions).

    A fully Unicode-compliant collation algorithm implementing the DUCET is not
    required to use the same weights, but just to keep their relative order and
    composition.

    The introductory message described what could be done, but the UTS document
    describes things with more details.



    This archive was generated by hypermail 2.1.5 : Sat May 08 2004 - 16:28:06 CDT