From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Mar 02 2005 - 18:28:44 CST
Peter Kirk said:
> But I
> find it hard to reconcile with the whole concept of a standard which is
> supposed to specify how text should be represented,
There is a fine line between the standard specifying how
text should (= "can") be represented and specifying how
text should (= "ought to") be spelled.
The former is the appropriate domain of the standard. It consists
in explaining how the characters are to be used to represent
text, and therefore forms the fundamental basis for interoperability
in use of the standard.
The latter is outside the competence of the UTC and consists of
specifications of conventions, spelling rules, orthography, and
such that properly belong to the relevant users (and standardizers)
of orthographies and writing systems.
If I am interested in representing the syllable "sa" in Hangul,
the Unicode Standard tells me that I "should" represent it as:
<U+1109, U+1176>
or as:
U+C0AC
or as:
<U+3145, U+314F>
or as:
<U+FFB5, U+FFC2>
It doesn't tell me which of those alternatives I "should" pick to
be "correct" in a given context, although I might get some clues
from knowing which are "compatibility" or "half-width" versions
of the jamos. Nor does it tell me that using "sa" in a particular
Korean word is even a correct spelling for that word. Maybe I
was mistaken and it should be spelled "ssa" instead, for example.
Fixing things for Biblical Hebrew by adding holam haser for vav,
qamats qatan, clarifying positional rules for meteg, etc., makes
it possible for users to make some distinctions they
need to make in Unicode representation of Biblical texts.
But it doesn't end up with the Unicode Standard dictating how
you should (= "must") spell the Biblical texts.
> as well as with Doug
> Ewell's definition of stability that "it does not change in a way that
> causes existing implementations or data to break".
Properly restated (for which see Asmus' discussion),
this is the primary reason for the continued existence of the
UTC now more than 14 years after its founding. The members
involved have huge implementation commitments to protect and
are involved in stupendously massive data collections whose
stability must be guaranteed.
Every new addition or change to the standard is subjected to
detailed scrutiny, with most of the concern being essentially
summed up as, "Will this change break my existing implementations
or result in corruption or obsolescence or other bad things
happening to my existing data stores?"
Such considerations cannot prevent the standard from changing at
all, since implementation needs also drive the need to add
new characters and occasionally reinterpret some of the rules
of the standard. But considerations for stability *do* constitute
a very, very significant barrier to arbitrary changes in the
standard now.
--Ken
This archive was generated by hypermail 2.1.5 : Wed Mar 02 2005 - 18:29:59 CST