Re: Fallback for Sinhala Consonant Clusters

From: Harshula via Unicode <>
Date: Mon, 15 Oct 2018 01:55:24 +1100

Hi Richard,

1) From a pronunciation perspective, your first and third examples will
be similar. Your second example will be pronounced very differently. I
did some quick testing on Linux and reproduced the behaviour that you

2) Going back more than a decade, the state tables used by some
layout/shaping engines used the same 'virama' rules for North Indian
scripts and Sinhala. This resulted in undesirable *implicit* conjuncts
being created for Sinhala consonant clusters. That then resulted in
undesirable positioning of dependent vowels. e.g.

3) However, what you have observed is an issue with *explicit* conjunct
creation. After the segmentation is completed, the layout/shaping engine
needs to first check if there is a corresponding lookup for the explicit
conjunct, if not, then it needs to remove the ZWJ and redo the
segmentation and lookup(s). Perhaps that is not happening in Harfbuzz.

4) I've been out of the loop for many years, so I have CC'd Ruvan &
Harsha who may already be aware of what you have observed.


On 14/10/18 11:02 am, Richard Wordingham via Unicode wrote:
> Are there fallback rules for Sinhala consonant clusters? There are
> fallback rules for Devanagari, but I'm not sure if they read across.
> The problem I am seeing is that the Pali syllable 'ndhe' න්‍ධෙ <U+0DB1
> KOMBUVA> is being rendered identically to a hypothetical Sinhalese
> 'nēdha' නේධ <U+0DB1, U+0DDA DIGA KOMBUVA, U+0DB0>, which in NFD is
> <U+0DB1, U+0DD9, U+0DCA, U+0DB0>, when I use a font that lacks the
> conjunct. (Most fonts lack the conjunct.) The Devanagari rules and my
> preference would lead to a fallback rendering as න්ධෙ (Sinhalese
> 'ndhe'), which is encoded as <U+0DB1 NAYANNA, U+0DCA AL-LAKUNA, U+0DB0
> MAHAPRAANA DAYANNA, U+0DD9 KOMBUVA>. Is the rendering I am getting
> technically wrong, or is it merely undesirable?
> The ambiguity arises in part because, like the Brahmi script, the
> Sinhala script uses its virama character as a vowel length indicator.
> Missing touching consonants are being rendered almost as though there
> were no ZWJ, but the combination of consonant and al-lakuna is being
> rendered badly.
> Richard.
Received on Sun Oct 14 2018 - 09:55:54 CDT

This archive was generated by hypermail 2.2.0 : Sun Oct 14 2018 - 09:55:54 CDT