Re: New Canonical Decompositions to Non-Starters

From: Philippe Verdy <>
Date: Sun, 17 Feb 2013 23:06:21 +0100

2013/2/17 Richard Wordingham <>:
> No. I am trying to confirm that there will never be any character but
> U+0344, U+0F73, U+0F75 and U+0F81 that has a non-singleton canonical
> decomposition to non-starters. The only way I see can for that to
> happen is a decomposition via one of U+0F73, U+0F75 and U+0F81 such as
> from U+E4567 to <U+0F73, U+E4568>, and I cannot see where this is
> prohibited.

For practival purposes, I see little interest of adding new canonical
decompositions without a non-starter in first position.

But this does not exclude further encodings of named sequences, which
may be useful notably in Indic scripts (or Semitic scripts) for
specially designated vowel aggregates that actually have a specific
role in some language, or needed for some notations using multiple
combining marks and CGJ (or even some other format controls). In that
case we are not limited to only pairs.

But I won't call these as "decompositions", they are just semantically
significant groupings which would escape the default character
encoding model or when they cannot work with the default grapheme
cluster boundaries. My bet is that those encoded "named sequences"
would be language-specific rather than script-specific (and not
applicable to all other languages using the same script).
