From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Thu May 24 2007 - 11:12:48 CDT
Hello Agnieszka Kasprzyk,
Jukka K. Korpela schrieb:
> Canonical equivalence is not the same as identity.
...
> For example, [...] a program often uses a particular glyph for a
> precomposed character but handles a decomposed form by displaying the
> base character and positioning the diacritic somehow (generally with
> poorer results than the precomposed glyph).
For your application, the real problems lurk in searching, comparing,
and sorting operations. I guess, a less than optimally placed diacritic
in a library catalogue would mostly go unnoticed.
Bottom line:
- Either make sure that your software indeed treats canonical equivalent
sequences as equivalent, in the operations outlined supra;
- or standardize your input on one of the equivalent patterns.
> It seems natural to use the form
> b) letter t/s with dot below (U+1E6D/U+1E63)+ combining dot above (U+0307)
> as the canonical format,
So if you have to prescribe the form of the input,
and if the input methods used allow for this variant,
than prescribe it thusly.
Another bottom line:
You should probably also get acquainted with
- <http://www.unicode.org/faq/normalization.html>
- <http://www.unicode.org/faq/collation.html>.
Best wishes,
Otto Stolz
This archive was generated by hypermail 2.1.5 : Thu May 24 2007 - 11:14:15 CDT