Re: Character folding in text editors

From: Eli Zaretskii <eliz_at_gnu.org>
Date: Sun, 21 Feb 2016 18:21:24 +0200

> From: "Doug Ewell" <doug_at_ewellic.org>
> Date: Sat, 20 Feb 2016 14:43:15 -0700
>
> > What about language-independent character-folding: where in the
> > Unicode database is the data for that?
>
> The OP kind of alluded to that: there is no such thing really as
> language-independent character folding.

Emacs is currently looking for a useful approximation, given that the
language of the text is in general unknown. The folding can be
toggled off (either as a global default, or for the current search),
for those use cases where it is undesirable or gets in the way.

> About the closest approximation you can get using Unicode data alone
> (not CLDR) is to normalize to NFD, then ignore the combining diacritics.

This is what Emacs currently does, IIUC what you say. The NFD
normalization uses the decomposition data included with
UnicodeData.txt. Is this what you mean?

> But that still doesn't work for a character like ø, which doesn't
> decompose to o + anything

Why doesn't it, btw? Same question about ł.

I've heard an opinion that UnicodeData.txt only included
decompositions when the combining mark's glyphs don't overlap those of
the basic character. Is that correct?

> and more importantly, it still won't meet expectations because of
> the n/ñ and o/ö/ø language-dependency problems.

Given that the feature can be turned off easily, do you think that it
will nonetheless be useful, even though language-dependent parts are
not available?
Received on Sun Feb 21 2016 - 10:22:57 CST

This archive was generated by hypermail 2.2.0 : Sun Feb 21 2016 - 10:22:57 CST