Re: various ways of making a specific character

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Thu May 24 2007 - 11:12:48 CDT

  • Next message: Philippe Verdy: "RE: various ways of making a specific character"

    Hello Agnieszka Kasprzyk,

    Jukka K. Korpela schrieb:
    > Canonical equivalence is not the same as identity.
    ...
    > For example, [...] a program often uses a particular glyph for a
    > precomposed character but handles a decomposed form by displaying the
    > base character and positioning the diacritic somehow (generally with
    > poorer results than the precomposed glyph).

    For your application, the real problems lurk in searching, comparing,
    and sorting operations. I guess, a less than optimally placed diacritic
    in a library catalogue would mostly go unnoticed.

    Bottom line:
    - Either make sure that your software indeed treats canonical equivalent
       sequences as equivalent, in the operations outlined supra;
    - or standardize your input on one of the equivalent patterns.

    > It seems natural to use the form
    > b) letter t/s with dot below (U+1E6D/U+1E63)+ combining dot above (U+0307)
    > as the canonical format,

    So if you have to prescribe the form of the input,
    and if the input methods used allow for this variant,
    than prescribe it thusly.

    Another bottom line:
    You should probably also get acquainted with
    - <http://www.unicode.org/faq/normalization.html>
    - <http://www.unicode.org/faq/collation.html>.

    Best wishes,
       Otto Stolz



    This archive was generated by hypermail 2.1.5 : Thu May 24 2007 - 11:14:15 CDT