Re: Does Unicode 4.1 change NFC?

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Apr 04 2005 - 16:05:13 CST

Next message: John Burger: "Re: Does Unicode 4.1 change NFC?"

Previous message: Peter Kirk: "Re: Does Unicode 4.1 change NFC?"
Maybe in reply to: Elliotte Rusty Harold: "Does Unicode 4.1 change NFC?"
Next in thread: John Burger: "Re: Does Unicode 4.1 change NFC?"
Reply: John Burger: "Re: Does Unicode 4.1 change NFC?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> >In any conformant Unicode 4.0.1 (or earlier) version of normalization,
> >U+FACF normalizes to (tada!) U+FACF. If it doesn't, the normalizer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >isn't conformant. If sending U+FACF to such a normalizer crashes
> >an application, then shame on the programmer.
> >
> >
>
> The problem will of course come when new UCD data is fed into an old
> normaliser.

Actually, it will not. If a Unicode normalizer was a Unicode 4.0
normalizer, it will *stay* a Unicode 4.0 normalizer.

By the Unicode 4.0 spec, U+FACF (and unassigned code point)
normalized to U+FACF. (See above, emphasized now, to, well,
*emphasize* the point.)

If such a normalizer is fed Unicode 4.1 data, it will *still*
proceed to normalize conformantly, according to the Unicode 4.0
spec. The fact that there is an unassigned code point in the
data, and that the normalizer is not up to the 4.1 spec is
basically no issue to it. It just doesn't support any version
past Unicode 4.0, just as advertised (presumably) on the box.

> You have made much in the past of the need not to change the
> normalisation algorithm,

The normalization *algorithm* has not changed.

Stability of normalized data has not been disturbed. Any
data in a normalization form by the Unicode 4.0 spec is *still*
in that normalized form by the Unicode 4.1 spec.

That does not and never does mean that an implementation does
not need to change to support extensions to the standard.

> not to add new classes of exceptions etc so
> that programs don't have to be rewritten for each new version, only the
> data needs to be updated.

In principle, one could write a normalization implementation to
be completely data driven, so that once it were written, it
could simply be handed the next version UCD data files, and it
would do the "right thing" with them. In practice, most implementations
predigest all the data and perform various internal optimizations
for either table size or speed or both. Such implementations need
to be updated when the standard is updated, and the implementers
generally understand the maintenance versus performance tradeoffs
they are making here.

> The sort of outcome I might well expect to see
> from this is a normaliser emitting surrogate pairs in UTF-8 or UTF-32.

Well, if so, it is badly written, and probably non-conformant to
begin with.

What you *should* expect is:

A Unicode 4.0 implementation will normalize U+FACF to U+FACF.

A Unicode 4.0 implementation, if tested against a Unicode 4.1
test data suite, will issue an exception (fail a test, whatever) for
U+FACF.

A Unicode 4.1 implementation will normalize U+FACF to U+2284A.

Anything more or less than that is just bad software engineering.

--Ken

Next message: John Burger: "Re: Does Unicode 4.1 change NFC?"
Previous message: Peter Kirk: "Re: Does Unicode 4.1 change NFC?"
Maybe in reply to: Elliotte Rusty Harold: "Does Unicode 4.1 change NFC?"
Next in thread: John Burger: "Re: Does Unicode 4.1 change NFC?"
Reply: John Burger: "Re: Does Unicode 4.1 change NFC?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Apr 04 2005 - 16:06:24 CST