Re: more flexible pipeline for new scripts and characters

From: Ken Whistler <kenw_at_sybase.com>
Date: Fri, 18 Nov 2011 15:06:40 -0800

On 11/18/2011 1:30 PM, Karl Williamson wrote:
> How is this different from Named sequences, which are published
> provisionally?

Named sequences aren't character properties.

When a newly encoded character is published in the standard, its code point,
its name, and dozens of other properties all have to be published at the
same
time. The whole notion of omitting any of them would cause problems for
implementers and would be tantamount to saying that the character isn't
actually "standard" yet, because properties for it are missing.

And for good reasons, *some* (but not all) of those properties are also
immutable upon
publication. The most obvious is the code point, of course. Changing a code
point for an encoded character after it is published in the standard is
tantamount
to admitting it was never standard in the first place.

In the early days of Unicode (and 10646, for that matter), the committees
entertained the notion that character names might be the kind of thing
which could occasionally get corrected later, as needed, after
publication. But after
several notorious examples of the undesirability and costs associated with
changing character names after publication, the committees slammed the
door on that, and character *names* are now as immutable as their code
points.

Named sequences are different. Publishing a newly encoded character has
no implications whatsoever for named sequences. A named sequence stands
on its own, as an independent entity. Furthermore, there basically are no
algorithms (or implementations) that depend on them in any significant way.
Named sequences are primarily epicycles of the character encoding
process -- they give
standard names to "things" that people want to have names for, but which
the committees decline to encode as characters, because they can already
be represented by sequences of existing characters.

Given that status, and given that named sequences are *not* character
properties,
it was possible to create a two-staged, provisional publication
mechanism for
them, publishing them first as a provisional list, and then later, if
nobody has any
objections of corrections, moving them into the (immutable) standard list.

You just can't do that with character *names*.

If you want to make analogies, however, the ISO ballots constitute the
*provisional* publication
for character code points and names. If nobody has any objections or
corrections
expressed during the ballotting process (which can continue for 2 years
or longer),
then eventually those code points and names get "moved" into the (immutable)
list in the standard.

--Ken
Received on Fri Nov 18 2011 - 17:12:02 CST

This archive was generated by hypermail 2.2.0 : Fri Nov 18 2011 - 17:12:04 CST