Re: Unicode of Death 2.0

From: Philippe Verdy via Unicode <unicode_at_unicode.org>
Date: Sun, 18 Feb 2018 01:30:09 +0100

My opinion about this bug is that Apple's text renderer dynamically
allocates a glyphs buffer only when needed (lazily), but a test is missing
for the lazy construction of this buffer (which is not needed for most
texts not needing glyph substitutions or reordering when a single accessor
from the code point can find the glyph data directly by lookup in font
tables) and this is causing a null pointer exception at run time.

The bug occurs effectively when processing the vowel that occurs after the
ZWNJ, if the code assumes that there's a glyphs buffer already constructed
for the cluster, in order to place the vowel over the correct glyph (which
may have been reordered in that buffer).

Microsoft's text renderer, or other engines use do not delay the
constructiuon of the glyphs buffer, which can be reused for processing the
rest of the text, provided it is correctly reset after processing a cluster.

2018-02-17 21:54 GMT+01:00 Manish Goregaokar <manish_at_mozilla.com>:

> Heh, I wasn't aware of the word "phala-form", though that seems
> Bengali-specific?
>
> Interesting observation about the vowel glyphs, I'll mention this in the
> post. Initially I missed this because I hadn't realized that the bengali o
> vowel crashed (which made me discount this).
>
>
> Thanks!
>
> -Manish
>
> On Sat, Feb 17, 2018 at 12:22 PM, Philippe Verdy <verdy_p_at_wanadoo.fr>
> wrote:
>
>> I would have liked that your invented term of "left-joining consonants"
>> took the usual name "phala forms" (to represent RA or JA/JO after a virama,
>> generally named "raphala" or "japhala/jophala").
>>
>> And why this bug does not occur with some vowels is because these are
>> vowels in two parts, that are first decomposed into two separate glyphs
>> reordered in the buffer of glyphs, while other vowels do not need this
>> prior mapping and keep their initial direct mapping from their codepoints
>> in fonts, which means that this has to do to the way the ZWNJ looks for the
>> glyphs of the vowels in the glyphs buffer and not in the initial codepoints
>> buffer: there's some desynchronization, and more probably an uninitialized
>> data field (for the lookup made in handling ZWNJ) if no vowel decomposition
>> was done (the same data field is correctly initialized when it is the first
>> consonnant which takes an alternate form before a virama, like in most
>> Indic consonnant clusters, because the a glyph buffer is created.
>>
>> Now we have some hints about why the bug does not occur in Kannada or
>> Khmer: a glyph buffer is always created, but there was some shortcut made
>> in Devanagari, Bengali, and Telugu to allow processing clusters faster
>> without having to create always a gyphs buffer (to allow reordering glyphs
>> before positioning them), and working directly on the codepoints streams.
>>
>> So it seems related to the fact that OpenType fonts do not need to
>> include rules for glyph substitution, but the PHALA forms are represented
>> without any glyph substitution, by mapping directly the phala forms in a
>> separate table for the consonants. Because there's been no code to glyph
>> subtitution, the glyph buffer is not created, but then when processing the
>> ZWNJ, it looks for data in a glyph buffer that has still not be initialized
>> (and this is specific to the renderers implemented by Apple in iOS and
>> MacOS). This bug does not occur if another text rendering engine is used
>> (e.g. in non-Apple web browsers).
>>
>>
>> 2018-02-16 19:44 GMT+01:00 Manish Goregaokar <manish_at_mozilla.com>:
>>
>>> FWIW I dissected the crashing strings, it's basically all <consonant,
>>> virama, consonant, zwnj, vowel> sequences in Telugu, Bengali, Devanagari
>>> where the consonant is suffix-joining (ra in Devanagari, jo and ro in
>>> Bengali, and all Telugu consonants), the vowel is not Bengali au or o /
>>> Telugu ai, and if the second consonant is ra/ro the first one is not also
>>> ra/ro (or ro-with-line-through-it).
>>>
>>> https://manishearth.github.io/blog/2018/02/15/picking-apart-
>>> the-crashing-ios-string/
>>>
>>> -Manish
>>>
>>> On Thu, Feb 15, 2018 at 10:58 AM, Philippe Verdy via Unicode <
>>> unicode_at_unicode.org> wrote:
>>>
>>>> That's probably not a bug of Unicode but of MacOS/iOS text renderers
>>>> with some fonts using advanced composition feature.
>>>>
>>>> Similar bugs could as well the new advanced features added in Windows
>>>> or Android to support multicolored emojis, variable fonts, contextual glyph
>>>> transforms, style variants, or more font formats (not just OpenType); the
>>>> bug may also be in the graphic renderer (incorrect clipping when drawing
>>>> the glyph into the glyph cache, with buffer overflows possibly caused by
>>>> incorrectly computed splines), and it could be in the display driver (or in
>>>> the hardware accelerator having some limitations on the compelxity of
>>>> multipolygons to fill and to antialias), causing some infinite recursion
>>>> loop, or too deep recursion exhausting the stack limit;
>>>>
>>>> Finally the bug could be in the OpenType hinting engine moving some
>>>> points outside the clipping area (the math theory may say that such
>>>> plcement of a point outside the clipping area may be impossible, but
>>>> various mathematical simplifcations and shortcuts are used to simplify or
>>>> accelerate the rendering, at the price of some quirks. Even the SVG
>>>> standard (in constant evolution) could be affected as well in its
>>>> implementation.
>>>>
>>>> There are tons of possible bugs here.
>>>>
>>>> 2018-02-15 18:21 GMT+01:00 James Kass via Unicode <unicode_at_unicode.org>
>>>> :
>>>>
>>>>> This article:
>>>>> https://techcrunch.com/2018/02/15/iphone-text-bomb-ios-mac-c
>>>>> rash-apple/?ncid=mobilenavtrend
>>>>>
>>>>> The single Unicode symbol referred to in the article results from a
>>>>> string of Telugu characters. The article doesn't list or display the
>>>>> characters, so Mac users can visit the above link. A link in one of
>>>>> the comments leads to a page which does display the characters.
>>>>>
>>>>
>>>>
>>>
>>
>
Received on Sat Feb 17 2018 - 18:31:00 CST

This archive was generated by hypermail 2.2.0 : Sat Feb 17 2018 - 18:31:00 CST