Re: Compatibility decomposition for Hebrew and Greek final letters from Martin J. Dürst on 2015-02-19 (Unicode Mail List Archive)

From: Martin J. Dürst <duerst_at_it.aoyama.ac.jp>
Date: Fri, 20 Feb 2015 11:50:17 +0900

On 2015/02/20 05:17, Eli Zaretskii wrote:
>> From: Philippe Verdy <verdy_p_at_wanadoo.fr>
>> Date: Thu, 19 Feb 2015 20:31:07 +0100
>> Cc: Julian Bradfield <jcb+unicode_at_inf.ed.ac.uk>,
>> unicode Unicode Discussion <unicode_at_unicode.org>
>>
>> The decompositions are not needed for plain text searches, that can use the
>> collation data (with the collation data, you can unify at the primary level
>> differences such as capitalisation and ignore diacritics, or transform some
>> base groups of letters into a single entry, or make some significant primary
>> difference when there are diacritics (for example in German equating 'ae' and
>> 'ä' at the primary level).
>
> Sorry, I disagree. First, collation data is overkill for search,
> since the order information is not required, so the weights are simply
> wasting storage. Second, people do want to find, e.g., "²" when they
> search for "2" etc. I'm not saying that they _always_ want that, but
> sometimes they do. There's no reason a sophisticated text editor
> shouldn't support such a feature, under user control.

Well, for cased scripts, search is usually case-insensitive, but case
conversions aren't given by compatibility decompositions.

If the question isn't "Why are there equivalences useful for search that
are not covered by compatibility decompositions?", but "Why doesn't
Unicode provide some data for final/non-final Hebrew letter
correspondence?", maybe the answer is that it hasn't been seen as a need
up to now because it's so easy to figure out.

Regards, Martin.

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Thu Feb 19 2015 - 20:52:14 CST

This archive was generated by hypermail 2.2.0 : Thu Feb 19 2015 - 20:52:16 CST