Re: Compatibility decomposition for Hebrew and Greek final letters

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Fri, 20 Feb 2015 04:47:52 +0100

2015-02-19 21:17 GMT+01:00 Eli Zaretskii <eliz_at_gnu.org>:

> > From: Philippe Verdy <verdy_p_at_wanadoo.fr>
> > Date: Thu, 19 Feb 2015 20:31:07 +0100
> > Cc: Julian Bradfield <jcb+unicode_at_inf.ed.ac.uk>,
> > unicode Unicode Discussion <unicode_at_unicode.org>
> >
> > The decompositions are not needed for plain text searches, that can use
> the
> > collation data (with the collation data, you can unify at the primary
> level
> > differences such as capitalisation and ignore diacritics, or transform
> some
> > base groups of letters into a single entry, or make some significant
> primary
> > difference when there are diacritics (for example in German equating
> 'ae' and
> > 'ä' at the primary level).
>
> Sorry, I disagree. First, collation data is overkill for search,
> since the order information is not required, so the weights are simply
> wasting storage. Second, people do want to find, e.g., "²" when they
> search for "2" etc. I'm not saying that they _always_ want that, but
> sometimes they do. There's no reason a sophisticated text editor
> shouldn't support such a feature, under user control.
>

The weights or the collation strings do not need to be stored. Even
database engines or plain-text search engines on the web provide now
collation algorithms for searching or sorting data, so that you don't need
to store it in your tables... It is not overkill, as good implementations
of collation are efefctively used in high-permance database servers (and
many users of these databases do not realize that collation is effectively
used.
There are also good text editors implementing collation searches.

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Thu Feb 19 2015 - 21:49:43 CST

This archive was generated by hypermail 2.2.0 : Thu Feb 19 2015 - 21:49:43 CST