Re: Character folding in text editors

From: Mark Davis ☕️ <>
Date: Sun, 21 Feb 2016 11:47:28 +0100

On Sat, Feb 20, 2016 at 11:10 PM, Asmus Freytag (t) <
> wrote:

> Unicode, even CLDR, doesn't nearly have enough data for the purpose.
> (and as a corollary of what Elias points out, it's likely to annoy users
> of every language, in that it would fold essential and non-essential
> distinctions indiscriminately).
> I've been working on this problem in the context of international
> top-level domain names, where the aim of the project is to identify labels
> that are seen as "the same" by users of a given script (but, in cases of
> identical appearance, we also include those seen as identical by users
> across scripts).
> None of the working groups in this project has felt like turning to CLDR
> for this purpose, and so far, each has approached the issue in a way that
> is not linked to sorting.
> Finally, none has seen folding of diacritics as useful; however, in the
> case of Arabic, where optional combining marks simply are not supported (so
> as to avoid having to define a folding).
> (see
> )

​It depends on what the folding is being used for: there are many different
purposes. For some purposes, the goal of "is seen as the same" ​is
appropriate, while for others a broader scope is appropriate—typically
because someone wants a quick filter to get to a relatively small set of
strings which can then be processed in a more CPU-intensive fashion.

In whatever case, one can only get an approximation; the question is
whether that approximation is sufficient for whatever the task is at hand.

Received on Sun Feb 21 2016 - 04:49:36 CST

This archive was generated by hypermail 2.2.0 : Sun Feb 21 2016 - 04:49:36 CST