Re: Prepending vowel exception in Lontara/Buginese script ? from Philippe Verdy on 2011-07-24 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Sun, 24 Jul 2011 23:48:17 +0200

OK. So Chrome is using Unicode 6.0 character properties to determine
character properties and effectively expects that the prepended vowel
will be encoded after the base letter (or space or dotted circle
symbol).

But as the diacritic vowels -e and -u in Buginese are expected to be
spacing in Buginese fonts, the mark-to-base positioning is not used in
Buginese fonts, and they correctly define a non-zero spacing width.

This means that effectively the renderer needs to take into account
the expected reordering of glyphs to apply the prepending on vowel -e.
As far as I have seen, The Windows 7 builtin text renderer does not do
that for Buginese, and Chrome's builtin renderer does not do that too
(is it Pango ?).

In other words, we need a bug fix in text renderers for the support of
Buginese, despite it has been encoded since long now.

Hmmmm.... This means that there's no software support for the script
for now. And this may explain why Buginese texts have been encoded for
now in such a way that they expect the same exception to the logical
order as in Thai, Lao and Tai Viet, i.e. these texts are encoded using
the visual order...

What would be the behavior of a font that would use GSUB entries (or
ligatures) in a feature to implement the reordering that NO renderer
currently implements for Buginese ? What will happen later if the
renderer does implement it ? Shouldn't we define this feature in a
feature tagged with the "bugi" script ID, that future renderer will
simply ignore if they implement the reordering themselves ?

Does the OpenType specification allow specifying a temporary override
for the missing renderer reordering capabilities ? I.e. Can we tag the
defined feature to be specifically ignored by renderers implementing
the reordering themselve ? Or at least say that the feature will
override the renderer's builtin feature, so that both reorderings
won't be used simultaneously (in the font feature, and in the renderer
itself) ? Shouldn't the OpenType specification define such thing to
allow smooth transition and compatibility of fonts made for compliant
and non-compliant renderers ?

Note: The Microsoft Font Validator (found in Microsoft Typography
website, section for Downloadable Tools) still does not recognize bit
96 of the ulUnicodeRange field, officially defined for the Buginese
block range (U+1A00..U+1A1F), and reports an error if this bit is set.

And the Fonts folder in Windows 7 Explorer does not say that the font
effectively supports Buginese (a Buginese font says that it supports
no script at all, even if all code points assigned in the Buginese
block are mapped, and bit 96 is set in Unicode Ranges of the header).
Apparently, this Microsoft Font Validator, as well as the Windows
Exporer extension for the Fonts folder, do not match the current
OpenType specification published by... Microsoft, but only the much
older specifications currently implemented in Windows (as if it was
still reserved).

This is the case for all ulUnicodeRange bits defined now after bit
number 87, i.e. the Deseret block of the UCS, meaning that the
validator and the Windows 7 text renderer and Fonts Explorer are still
only based on the (now very old) Unicode 4.1 of... 2003 (with the
Deseret additions) or even before in 1996 with Unicode 3.1 only. Who's
late ?

Another bug of the font validator (I don't know where to post it,
because the Microsoft page does not contain any link to post comments)
generates exceptions when parsing floatting-point numbers in strings
found in the gont header, when running on French version of Windows
(apparently, it bugs on the full stop found in a version number, and
expects a comma because it does not properly sets the US locale and
uses the current user locale...). As a workaround, I have to start the
validator from an Explorer only after I have set the user locale to US
English (with the language bar).

-- Philippe.

2011/7/24 Peter Constable <petercon_at_microsoft.com>:
> In the OpenType model, a distinction is made between font-specific behaviours and font-neutral script behaviours. OpenType Layout tables were designed to deal with only font-specific details, leaving it to OTL client software to handle anything that is font-neutral.
>
> Re-ordering of prepended Buginese vowel /e/ is a font-neutral behaviour. More generally, re-ordering in Brahmi-derived scripts is considered a font-neutral behaviour, and OpenType Layout does not include means to describe the re-ordering of characters. (You could fake things out by creating ligature glyphs for entire syllables, but that isn't generally recommended.
>
> So, if you're not seeing Buginese script text rendering as expected specifically wrt the re-ordering issue, that's an issue with the rendering software--a bug if the software claims to support Buginese, a limitation if it doesn't.
>
>
> Peter
>
>
> -----Original Message-----
> From: unicode-bounce_at_unicode.org [mailto:unicode-bounce_at_unicode.org] On Behalf Of verdy_p
> Sent: Saturday, July 23, 2011 9:13 AM
> To: Unicode Mailing List
> Subject: Prepending vowel exception in Lontara/Buginese script ?
>
> If I look in the Unicode 6.0 charts for the Buginese script, I see that vowel /e/ (U+1A19) is prepended visually on the left of the base consonnant to which it applies. This should mean that the vowel has to be encoded ilogically in texts AFTER the base consonnant to which it applies.
>
> However, I have tested all fonts available on the web for this script, and none of them contain the necessary OpenType substitution feature needed to make the logical- to-visual reordering.
>
> Is this a bug of these fonts (most of them are TrueType only, not OpenType with a reordering feature like those used in other Indic scripts, but built like basic TrueType fonts for Thai, Lao and Tai Viet scripts, that are the only scripts for which Unicode has defined the "Prepended Vowel" exception)?
>
> Or is is a bug/limitation of text renderers ?
>
> I note for example that Chrome correctly uses Unicode 6.0 default grapheme cluster boundaries, when editing and selecting in Lontara text (written in Biginese or Makassar languages), so that the vowel will be selected/deleted logically along with the base character encoded before it (for example a space or punctuation, or even a HTML syntax character). But if I use this browser to display Lontara text, the vowel /e/ is still shown with the diacritic on the right of the base consonnant (or dotted circle symbol), meaning that the text is garbled when I use any one of those available fonts.
>
> All texts in Makassar or Buginese I have found, encoded in Unicode, seem to assume the visual order (i.e. the same "prepended vowel" exception as in Thai and Lao).
> Given the geographical area where the Lontara script is mostly used (Indonesia and Thailand), it seems quite logical that text authors assumed this exception to the logical encoding order.
>
> What can be done? Should the fonts be corrected to include the OpenType feature, or should Unicode be modified to inclide the "prepended vowel" exception also for Buginese, and so the default grapheme boundaries modified as well, and the Unicode 6.0 chart modified too for U+1A19 ?
>
> -- Philippe.
>
>
>
Received on Sun Jul 24 2011 - 16:54:15 CDT

This archive was generated by hypermail 2.2.0 : Sun Jul 24 2011 - 16:54:22 CDT