Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Tue, 5 Feb 2013 09:08:01 +0000

On Mon, 4 Feb 2013 23:54:41 +0100
Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:

> 2013/2/3 Costello, Roger L. <costello_at_mitre.org>:
> > - It is easier to use a few keystrokes for combining accents than
> > to set up compose key sequences for all the possible composed
> > characters.
>
> But MOST texts using combining diacritics are written in languages for
> which there already exists standard keyboards featuring combine keys
> or dead keys.

It's the exceptions that cause the problems. For example, in the basic
keyboard maps supported by Microsoft Windows, a dead key combination
has to deliver a single BMP character. That's a bit of a limitation if
you working in an odd writing system (ISO 11940 Part 1 Thai
transliteration is the worst I've encountered) where acute, grave and
cicumflex accents can occur on many different letters. You'd have to
remember which combinations are precomposed; dead keys normally don't
work if there is no precomposed character.

An input method which normalises input is hostile if one cannot delete
recently entered characters by deleting them in reverse order to
input. (Logic that always deletes an entire grapheme cluster is also
user hostile.) Some input methods appear to work by deleting recently
entered characters and substituting new ones - imagine the chaos that
would wreak if the application were normalising the backing store as
characters were entered!

> So you cannot recommand any form.

Recommendation and mandating are different.

> But if the W3C needs to update
> something, it's to say that ALL forms that are canonically equivalent
> should be treated equally. This means that it is to the recipient of
> encoded documents to perform their own normalization.

The problem comes with applications that ignore canonical
normalisation. The stability of Unicode normalisation is guaranteed so
that application can ignore the normalisation process!

Richard.
Received on Tue Feb 05 2013 - 03:15:19 CST

This archive was generated by hypermail 2.2.0 : Tue Feb 05 2013 - 03:15:26 CST