RE: Ways to show Unicode contents on Windows?

From: Peter Constable <petercon_at_microsoft.com>
Date: Fri, 19 Jul 2013 21:38:51 +0000

Every Unicode code point will have some default behaviour in any text process on Windows. If those default behaviours happen to fit the character in question, then you should get the behaviour you want. But we don't service Windows for each UCD update. Also, not every text process relies solely on UCD data.

Peter

-----Original Message-----
From: unicode-bounce_at_unicode.org [mailto:unicode-bounce_at_unicode.org] On Behalf Of Richard Wordingham
Sent: Friday, July 19, 2013 1:21 PM
To: unicode_at_unicode.org
Subject: Re: Ways to show Unicode contents on Windows?

Peter Constable <petercon_at_microsoft.com> wrote:
> Behalf Of Ilya Zakharevich wrote:

> > Why would one NEED to upgrade the OS to use Old Italic?

> You can't expect an OS like Windows XP to support Old Italic
> characters that weren't even defined in Unicode at the time it
> shipped.

That actually came as a great surprise to me. I once naively thought that all that had to be done was to update the version of the Unicode Character Database (UCD) that the system was using, and then only new
*properties* should be causing major trouble. Now scripts needing reordering have their own problems, but that sort of problem is what SIL developed Graphite for. (I fear the case for Microsoft Office to support Graphite is steadily reducing.)

The problem with changes to the UCD arises partly because enough developers prefer speed and compactness to flexibility.

> That said, it turns out that a given version of Windows does support
> later-encoded characters such as Old Italic that have no special
> requirements fairly well -- provided you have a font and format your
> content with that font.

Are you sure this tolerance isn't by design?

> It is the case of simple rendering. Given a font, and a keyboard
> layout (both doable in user-land), it should “just work”. Or I am
> missing something?

The biggest thing you're missing is too much cleverness, and the second is centralisation.

Word switches keyboard at the very least as you step through text, which in simple cases is quite helpful. Also, Office has (at least) three current fonts - one for simple scripts, one for complex scripts, and one for CJK scripts. This in itself can cause problems with new scripts - I have a fair bit of Tai Tham text in Open Document format that has the wrong size because LibreOffice hesitantly changed the script's classification from simple to complex.

The centralisation issue is that Indic rearrangement and selection of Arabic and Syria contextual forms seemed obvious things to abstract away from fonts and handle centrally. Consequently, text is split by
script and each script run handled separately.

Combining the two, we can certainly have Word XP asking whether a font supports a script, and refusing to use it for the script if it doesn't declare it does. I had to fiddle the OS/2 table of a Tai Tham hack font (Lannaworld) to be able to use it. The font maps Latin and Thai characters to Tai Tham glyphs, but when I downloaded the font it didn't declare support for the 'Basic Latin' character range or the 'Latin-1' encoding. To get the font to work, I not only had to dodge the constraints on Thai character sequences, I also had to change the
OS/2 table to declare that the font supported the Latin range and encoding.
 
I still don't think we've got to the bottom of Doug's PUA problem. For all I know, he may have been violating the agreement he made with Microsoft for the use of the PUA. I'm not aware of Microsoft publishing a consolidated statement of this agreement, but I've a feeling some characters are reserved for symbol fonts and yet others are reserved for Thai glyphs. Its also conceivable that he trespassed on the PUA assignments decreed by China for Tibetan.

Richard.
Received on Fri Jul 19 2013 - 16:43:44 CDT

This archive was generated by hypermail 2.2.0 : Fri Jul 19 2013 - 16:43:45 CDT