From: Peter Kirk (peterkirk@qaya.org)
Date: Sat Mar 05 2005 - 20:31:11 CST
On 05/03/2005 22:54, UList@dfa-mail.com wrote:
>People think I'm being absolutely horrible. But you should be more sympathetic.
>
>I looked at Unicode and observed a basic difficulty. "Where do you draw the
>line." What fits exactly into what category. And I saw that Unicode was having
>to answer complex and finely gradated questions with the bluntest of answers:
>black or white. Codepoint or not. And it occurred to me that what was called
>for conceptually, was one or more shades of gray.
>
>A way to define something that was not quite a characterhood, and yet
>something still vital, or important, or at least useful, to have in (what I
>will now call) the basic plaintext data.
>
>And a very simple technique for doing this is apparent -- use one or more
>levels of variation selector-like codepoints to define a "sub-characterhood",
>and even a "sub-sub-characterhood". Not a "pseudo-codepont" -- but a real
>piece of data, describing the real identity of something as a sub-category of
>a codepoint. Data in one or more shades of gray.
>
>
There is a mechanism defined for this "sub-characterhood", Variation
Selectors.
>I brought up an example of this with the Serbian 't'. My approach has a sound
>conceptual basis and can be done technically. It has the benefits that the
>definition of the "sub-characterhood" is tightly bound to the characterhood,
>providing data robustness, and codepoint-level data identification.
>
>
For reasons which have been explained before, most convicingly the one
that to introduce this usage now would disturb widespread current usage,
Variation Selectors are not considered suitable for Serbian 't'. But
that does not mean that they are unsuitable for your local Greek
alphabet examples. If you can find some suitable reasonably standardised
variant shapes for individual letters, you might consider proposing them
for standardisation. But if you end up with a complete alphabet of
variant shapes, the issue becomes a rather different one, not just
sub-characterhood but sub-scripthood. And I accept that Unicode has not
found a good way to resolve this one, either generally or in individual
controversial cases.
I must say I am thinking it would be a good idea to define a subset of
the 256 variation selectors (already specified as default ignorable) as
available for private use. (The existing PUA characters are not a good
substitute as they are not default ignorable.) At least this would be a
good way for the Unicode community to deal with recurrent issues like
the ones Doug is repeatedly raising: advice can be given to use the
Private Use Variation Selectors to select variant glyphs in any way you
want, as long as you do it only between consenting adults - and the text
would automatically default to being displayed with the regular glyphs
by anyone outside the private loop. At the cost of a small number of
code points, this could get a lot of people off our back, and stop them
abusing Unicode in more fundamentally damaging ways.
>But I was told, no, there is simply a better way of doing this: "language
>tags".
>
>
I don't think anyone has ever encouraged you to use the Unicode language
tags. This mechanism is available, but is not well defined and not
particularly suitable for your purposes.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.6.2 - Release Date: 04/03/2005
This archive was generated by hypermail 2.1.5 : Sat Mar 05 2005 - 20:37:03 CST