Re: Character folding in text editors

From: Janusz S. Bien <>
Date: Sat, 20 Feb 2016 18:11:03 +0100

Quote/Cytat - Elias Mårtenson <> (Sat 20 Feb 2016
11:23:13 AM CET):

> Hello Unicode,
> I have been involved in a rather long discussion on the Emacs-devel mailing
> list[1] concerning the right way to do character folding and we've reached
> a point where input from Unicode experts would be welcome.
> The problem is the implementation of equivalence when searching for
> characters. For example, if I have a buffer containing the following
> characters (both using the precomposed and canonical forms):
> o ö ø ó n ñ
> The character folding feature in Emacs allows a search for "o" to mach some
> or even all of these characters. The discussion on the mailing list has
> circulated around both the fact that the correct behaviour here is
> locale-dependent, and also on the correct way to implement this matching
> absent any locale-specific exceptions.

What about just using the POSIX equivalent classes in regular expression?


A POSIX locale can define character equivalents that indicate that
certain characters should be considered as identical for sorting. In
French, for example, accents are ignored when ordering words. élève
comes before être which comes before événement. é and ê are all the
same as e, but l comes before t which comes before v. With the locale
set to French, a POSIX-compliant regular expression engine matches e,
é, è and ê when you use the collating sequence [=e=] in the bracket
expression [[=e=]].


(an Emacs user)

Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)
Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department),,
Received on Sat Feb 20 2016 - 11:12:00 CST

This archive was generated by hypermail 2.2.0 : Sat Feb 20 2016 - 11:12:00 CST