From: Language Analysis Systems, Inc. Unicode list reader (Unicodefirstname.lastname@example.org)
Date: Thu Apr 29 2004 - 17:17:42 EDT
>> 1) Change the default properties for some range of the PUA. This is
>> what people seem to be pushing most hard.
>Actually, no. While it seems like an obvious first solution, the
problems you pointed out are quickly
>pointed out to such people so that they start to push for option 2)
I'm not hearing unanimity on this point, but if it's there, that's
>> 2) Leave the current PUA alone, but set aside a new PUA, say Planes
>> and 13. This solves the existing-use problem, but you still have the
>> question of just how you subdivide the range, and it starts to cut
>> down significantly on the code points available for actual
>It won't require so many code points. I have been working on such a
proposal, and it requires not even a
>fifth of a plane, let alone two full planes. I even hope to make it
>> 3) Define ad-hoc standards that are based on Unicode but make
>> character assignments in the PUA and lobby application vendors to
>> support these encodings in addition to regular Unicode.
>I just can't see application or OS vendors choosing to pick a single
PUA standard that is different from
>the Unicode defaults.
And I'm not saying they should. I'm not envisioning "a single PUA
standard"; I'm envisioning a separate PUA standard for each defined user
community; trying to unify all of the user communities' demands into a
single PUA standard is tantamount to just putting the characters into
Of course, getting app vendors to support a bunch of PUA standards is an
even tougher sell, especially considering it's not a good idea in the
>> 4) Lobby for operating-system vendors to extend their text engines
>> allow properties of PUA code points to be configured.
>While this would be a possible solution, it has some real drawbacks as
well. especially when different
>Private Use assignments overlap in the codepoints they use and have
character properties that differ for
>the same character.
There'd have a be a way to account for this, in the same way you account
for different glyphs for the same character by using different fonts.
There's have to be some way of applying a "property set" to a particular
run of characters. The most logical way would be to have this be
somehow associated with a choice of font.
>> 5) Write specialized applications that are designed to deal with
>> certain scripts and address the needs of user communities whose needs
>> aren't being met right now.
>The problem is, for most potential private uses, if there is sufficient
interest that a specialized
>application gets written just to handle that one private use, it
probably has enough interest to merit
>being encoded in Unicode itself.
Right. And if we're just talking about stopgap measures until something
gets into Unicode, it seems like we can tolerate greater ugliness.
As for user communities that'll never be served by Unicode, yeah, I
think there should be a separate standard of some kind
("SourGrapes-icode") that can lobby for support from OS and app vendors
on its own merits. Maybe I'm nuts, but I like to think the UTC is
generally reasonable and if there's a good reason for something, it
usually gets in eventually.
>> 6) Use markup or other fancy-text mechanisms to override the default
>Markup already uses the selection of a font to establish which set of
Private Use characters is in use.
No, font selection just determines which glyphs to draw.
>Bidi Class and Line Break can be handled, if not elegantly via the
Right. I think there are also markup codes that do these things.
>However, the behaviors that cause the most interest in a better defined
private use area, combing marks and
>cased letters cannot be handled in a generic manner by formatting
characters. It simply is impossible to
>simulate non-zero canonical combining class characters in Unicode with
anything other than a character with >the appropriate canonical
I'm still clueless as to why this is a good idea. Combining class is
there for one reason only-- normalization-- and imposing semantics on
PUA characters can't change normalization. See my other note.
>Indicating the case would involve having a means to indicate which case
the character is and where its
>other case is
Asking OS and app vendors for a way to control casing behavior seems an
easier sell than getting them to make everything pluggable. It also
seems like something that's pretty easy to code on its own, although
then you have to use an external application to do case mapping rather
than getting it for free in MS Word (or whatever).
How big a demand is there for custom case mapping? Seems like most of
the PUA things I've heard about aren't cased in the first place.
>> 7) Design custom fonts that cannibalize existing code points that
>> have the right sets of properties.
>This is a commonly chosen option right now, only it is usually done
with respect to legacy encodings so as
>to make use of the keyboard mapping that is commonly associated with
that legacy encoding. It's not pretty,
>but most private users look for something that will make their private
use easy to accomplish, despite the
>problems it causes.
>Any solution which requires users of the Private Use scheme to do more
than install a few files on their
>system to get it to work will probably not be used, since schemes such
as that embodied by option 7) work
>now, despite the trouble they cause for others not using the scheme.
And I don't think there's anything wrong with that.
Language Analysis Systems, Inc.
This archive was generated by hypermail 2.1.5 : Thu Apr 29 2004 - 18:02:07 EDT