From: Philippe Verdy <>
Date: Wed, 2 Jul 2014 20:44:38 +0200

Aren(t we in such a case where the distinction (supposed to be guessed
contextually) would be needed only to facilitate contextual analisis of
text (such as counting syllables, or transforming the text to count them in
a later process, or searching text phonologically, even if the look of the
rendered glyph does not really need the distinction) ?
Anyway we have two variants of the breve (with "rounded bowls" and
"lunar"). In the example given the word initial was using the lunar form,
and the word-medial and word-final was using the form with rounded bowls
(sort of mix between a breve and a diaeresis).

I still don't understand which one is for the short /i/ and which one is
for /j/, and what is then the representation of the long /ii/ or how you
represent and distinghich a short /ji/, a long /jii/ (like in
English/French language name "Yi"), or /iji/ (like in French city and name
of a cream "Chantilly") and how you would represent the diphtong /aj/ and
distinguish /a·i/ (two sillables like in French verb "haïr") from /aji/
(like in
French noun "taillis") and /aj/ (like in French advective 'thaï" or
verb/noun "taille").

All I know is that Cyrillic as dedicated letters for common syllables /ja/,
/je/, /ju/ (inherited from old ligatures) and languages using Cyrillic vary
in how they use them (add also the diferent Ukrainian letter for /i/ and
its use of diaresis in some cases and ambiguities in writing borrowed
foreign words notably when they are trademarks like "Wikimedia" whose
orthography varies depending on author's interpretation of the phonology).

2014-07-02 20:13 GMT+02:00 Jukka K. Korpela <>:

> 2014-07-02 20:34, Philippe Verdy wrote:
> CGJ would be better used to prevent canonical compositions but it won't
>> normally give a distinctive semantic.
> In the question, visual difference was desired. The Unicode FAQ says:
> “The semantics of CGJ are such that it should impact only searching and
> sorting, for systems which have been tailored to distinguish it, while
> being otherwise ignored in interpretation. The CGJ character was encoded
> with this purpose in mind.”
> So CGJ is to be used when you specifically want the same rendering but
> wish to make a distinction in processing.
> Yucca
