Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)

From: Peter_Constable@sil.org
Date: Mon Apr 28 2003 - 08:59:24 EDT

Next message: Sheni R. Meledath: "Arabic text in Unicode hexadecimal code"

Previous message: Rob Wilder: "Adobe GoLive 6 & Unicode"
Maybe in reply to: Michael \(michka\) Kaplan: "Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
Next in thread: John Hudson: "Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Thomas Milo wrote on 04/27/2003 04:49:26 AM:

> Would it be possible to make the IJ/ij available at last as a single
> character Ĳ/ĳ for Dutch users?

If I understand the facts correctly, is this not just a digraph, comparable
to "ch" in various languages, the only difference being that Unicode
doesn't have a "ch" character but did include "ij" -- for backward
compatibility purposes? In other words, ideally "ij" wouldn't have been
included, but now that we've got it, Dutch "ij" has two alternate
representations, < i, j > or < U+0133 >.

Tom, I think what you should be asking Chris Pratley to do is to make the
spelling checker for Office recognise either spelling; the best way to do
that is probably to apply a compatibility normalisation to Dutch text.#

As for input methods, Michael Kaplan has already pointed out that they
can't really change what has already shipped (and that that is not an
Office issue). There are ways to create your own input method, though: you
can use Tavultesoft Keyman now to create your own input method, or soon (I
presume) Microsoft will be making a tool available.

#This brings up a general issue worth mentioning: we are familiar with the
concept of canonical equivalence for Latin precomposed / decomposed
representations, and the use of Unicode normalisation forms C and D to deal
with these equivalences. In contrasts, characters with compatibility
decompositions are quite a sorted lot, and there's no simple, general rule
to say when compatibility decompositions should or shouldn't be used.

But, there is one class of Latin characters with compatibility
decompositions that probably should generally be handled as though they
were canonically equivent to their decomposed counterparts: digraphs. For
whatever reason, digraphs as a rule were given *compatibility* rather than
*canonical* decomposition mappings. But unless I'm missing something, it
seems to me that for most practical purposes, representations using the
digraph characters ij, lj, dž etc. should be treated by applications as
equivalent with their decomposed counterparts.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Next message: Sheni R. Meledath: "Arabic text in Unicode hexadecimal code"
Previous message: Rob Wilder: "Adobe GoLive 6 & Unicode"
Maybe in reply to: Michael \(michka\) Kaplan: "Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
Next in thread: John Hudson: "Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Apr 28 2003 - 09:46:25 EDT