Re: precomposed polytonic Greek characters with macrons and other diacritics

From: Markus Scherer <>
Date: Mon, 8 Feb 2016 11:10:20 -0800

On Mon, Feb 8, 2016 at 10:47 AM, James Tauber <> wrote:

> Even with all this, though, my own work includes accentuation and
> syllabification algorithms, all of which are made more cumbersome by the
> lack of precomposed characters indicating vowel length. I'm currently
> leaning towards adding a layer of "character" processing on top of Python
> 3's otherwise decent support that effectively treats the relevant character
> sequences as single characters even if they aren't (and can't be
> precomposed).

I suggest you normalize the text (NFC or NFD), and then look for "grapheme

In C++ and Java, you could use an ICU BreakIterator for the latter.

Received on Mon Feb 08 2016 - 13:11:39 CST

This archive was generated by hypermail 2.2.0 : Mon Feb 08 2016 - 13:11:39 CST