RE: Keyboard Layouts for Office XP in WIndows 98

From: Chris Pratley (chrispr@microsoft.com)
Date: Mon Mar 11 2002 - 01:58:16 EST


Marc Durdin wrote:
        1. As Windows uses UTF-16 in most situations, most applications
treat non-BMP characters as two separate characters for editing
purposes,

While it is true that in terms of absolute numbers most apps do not yet
support UTF-16, it is worth noting that OfficeXP and anything based on
mshtml.dll ver.6 (e.g. IE 6) or Riched20.dll v.4 (e.g. Wordpad in WinXP)
do handle surrogate characters from UTF-16 correctly. So in terms of
usage, surrogate support is covered pretty well as adoption of these
newer versions increases.

Chris

-----Original Message-----
From: Marc Durdin [mailto:mcdlist@tavultesoft.com]
Sent: Sunday, March 10, 2002 2:55 PM
To: Vladimir Ivanov; Peter_Constable@sil.org
Cc: Chris Pratley; Michael Everson; unicode@unicode.org
Subject: Re: Keyboard Layouts for Office XP in WIndows 98

At 08:20 PM 9/03/2002 +0300, Vladimir Ivanov wrote:
>Should we wait for Keyman 6 to type Old Persian in the same
applications
>because this script is in Plane 1?
>

Peter's summary of Keyman 5's plane 1-16 support is not quite correct.
Keyman 5 does support planes 1-16 in most circumstances (it uses UTF-32
in its keyboard description), except with WM_UNICHAR (which is not
relevant under Windows 2000/XP). There are also some caveats to
remember:

1. As Windows uses UTF-16 in most situations, most applications treat
non-BMP characters as two separate characters for editing purposes,
although the renderer in Win2k and XP can display it as the correct
character. This means that two backspaces are needed, for instance, to
delete the whole character. Keyman 5 uses 2 backspaces when it needs to
delete characters (such as when doing character combining), as when it
was released, I was not aware of any major application that handled
editing surrogate pairs correctly. Keyman 6 will handle this better
with Text Services Framework support.

2. The renderer for Win2k does not display surrogate pairs as a single
character unless you change a registry setting. See
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/un
icode_192r.asp for further details.

3. Some of Keyman 5's more advanced functionality does support non-BMP
characters correctly, including index() and any(). Keyman 6 does fix
this. (I expect this is what Peter was thinking of.)

Regards,

Marc Durdin
Tavultesoft



This archive was generated by hypermail 2.1.2 : Mon Mar 11 2002 - 02:07:32 EST