Re: Prepending vowel exception in Lontara/Buginese script ?

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Mon, 25 Jul 2011 21:27:44 +0200

2011/7/25 Peter Constable <petercon_at_microsoft.com>:
> From: verdyp_at_gmail.com [mailto:verdyp_at_gmail.com] On Behalf Of Philippe Verdy
>
>> What would be the behavior of a font that would use GSUB entries (or
>> ligatures) in a feature to implement the reordering that NO renderer
>> currently implements for Buginese ? What will happen later if the
>> renderer does implement it ?
>
> Your question is no coherent: OpenType features cannot be used to trigger re-ordering.

Hmmm... Your reply is also incoherent:

(1) There are lots of OpenType features registered that actually
perform contextual reordering in Indic scripts, including when they
are in fact mandatory for that script (example for repha forms of ra,
or to move ra to a later position after another base consonnant, to
make it shown on the next vowel, or other exceptions needed in khmer,
lao,...).

(2) These features were even registered by Microsoft.

(3) Some of them are for pre-base reordering, other contain exceptions
to the usually "mandatory" pre-base order, to change it in a post-base
form in some other contexts.

>> Does the OpenType specification allow specifying a temporary override
>> for the missing renderer reordering capabilities ?
>
> No, and I don't see how that would make any sense: if a rendering system support Buginese script, then it supports it and does the reordering necessary. It either supports it or it doesn't.

What I asked is if it is possible to have another feature, that would
be triggered and enabled by default (and should occur before the nukta
feature and other similar features like repha forms) and tagged with
the Buginese script, unless the renderer knows that it supports itself
the reordering of prepending vowels for that Buginese scripts (in
which case that feature would be ignored).

This is what I would call a smooth transition : existing renderers
would work with a font presenting that feature, and future renderers
that perform the necessary reordering would ignore it and would not
even require that a Buginese script contains this feature.

>> Note: The Microsoft Font Validator (found in Microsoft Typography
>> website, section for Downloadable Tools) still does not recognize bit
>> 96 of the ulUnicodeRange field, officially defined for the Buginese
>> block range (U+1A00..U+1A1F), and reports an error if this bit is set.
>
> I'll report that to the team that maintains that tool.

Thanks.

It should also correctly parse the "head" table instead of reporting
this (non-documented) internal exception in the validation report:

E0041 : An exception occurred preventing completion of table validation"
System.FormatException: Le format de la chaîne d'entrée est incorrect.
à System.Number.StringToNumber(String str, NumberStyles options,
NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal) à
System.Number.ParseDouble(String value, NumberStyles options,
NumberFormatInfo numfmt) à System.Double.Parse(String s, NumberStyles
style, NumberFormatInfo info) à
OTFontFileVal.val_head.Validate(Validator v, OTFontVal fontOwner) à
OTFontFileVal.OTFontVal.Validate()

>> And the Fonts folder in Windows 7 Explorer does not say that the font
>> effectively supports Buginese (a Buginese font says that it supports no
>> script at all, even if all code points assigned in the Buginese block are
>> mapped, and bit 96 is set in Unicode Ranges of the header).
>
> Two issues:
>
> 1) Windows 7 does not provide text-display support for Buginese script.

OK, so Uniscribe (and IE) does not perform the reordering. It's then
impossible to display correctly encoded Buginese text on Windows with
Uniscribe. Other renderers will be needed (but Pango does not know
that reordering rule too, and none of the tested browsers on Windows
are working).

It seems that the script is supported only on MacOS, where there are
effectively commercial Buginese fonts designed for Mac (example one
font from Xerox : I've not tested it, I would need a Mac before even
buying that font).

> 2) The scripts show in the "Designed for" column in the Fonts control panel in Windows 7 does not make use of the UnicodeRanges fields in the OS/2 table. There are a few reasons for this:
> - that data is not all that reliable since there's no consistent practice in how it is set (there's no metric to decide when a bit should or shouldn't be set);
> - the UnicodeRanges fields are not scalable into the future (they were exhausted with Unicode 5.1); and
> - the UnicodeRanges fields are typically set based on some sense of "can display" whereas what we were thought was much more useful to users was to indicate "was designed for". For example, MS Gothic _can_ display English text, but we think it's not a particularly useful choice for English users since that's not the audience it was designed for. The intent is to give useful recommendations that help users differentiate relevant options from distracting noise.
> Rather than using the OS/2 data, the Fonts cpl uses metadata outside the font. Unfortunately, it has it only for a certain set of fonts that were known when we shipped to be on most systems; so, if you add a Buginese font, the metadata will not include that font.

It's strange : many new international fonts have been added after the
release of Windows 7. And the CPL explorer extension still detects
that the fonts support some scripts. How does it perform the test? By
counting the mapped glyphs? If so it could easily detect Buginese by
counting that there are at least 28 glyphs mapped from code points in
the Buginese block.

>> This is the case for all ulUnicodeRange bits defined now after
>> bit number 87, i.e. the Deseret block of the UCS, meaning that
>> the validator and the Windows 7 text renderer and Fonts
>> Explorer are still only based on the (now very old) Unicode 4.1
>> of... 2003 (with the Deseret additions) or even before in 1996
>> with Unicode 3.1 only. Who's late ?
>
> Font Validator may be out of date; as mentioned, I'll pass that on to the relevant team. As for the Fonts control panel, as mentioned it doesn't use ulUnicodeRange fields at all; but you have spotted a bug in our metadata: Deseret should be listed for the Segoe UI Symbol font.

OK, is it possible to have the Saweri and Code2000 fonts recognized
(these two free fonts are widely advertized as a possible solution for
the Buginese edition of Wikipedia, but for now this edition mostly use
the Latin script for that language).

I was asked on Wikipedia to design a test page for the script, but I
was completely unable to do that.

All I could make was to try adapting the page presenting the [[Lontara
script]] with:

- a few text samples (but not sure that the samples are logically
encoded, it seems that they are visually encoded in some places, and
one word is most probably incoherent with its Latin transcription),

- and in the Unicode block chart where the vowel e is effectively
rendered after the base glyph: the chart on English Wikipedia
currently uses a dotted circle symbol (but there's no warranty that
reordering would occur with that symbol in a compliant renderer),
whereas the French Wikipedia page presents all Buginese diacritics
with the Buginese base letter ka (U+1A00 : it should really work).

This brought me to the question of testing other South-East Asian
Brahmic scripts, like Hanunoo, Buhid, Javanese, or Balinese. It seem
that they have the same rendering problem in a few cases for prepended
vowels (plus other problems remaining in Khmer and Burmese for some
contextual forms).

The rendering problem will be recurring with all other pending Brahmic
scripts (still not encoded) that feature prepended diacritics. Why
can't we have now a registered OpenType feature for handling those
mandatory contextual reorderings (at least for the most frequent
cases), waiting for a full support of the script in text renderers?
Received on Mon Jul 25 2011 - 14:30:19 CDT

This archive was generated by hypermail 2.2.0 : Mon Jul 25 2011 - 14:30:20 CDT