RE: An attempt to focus the PUA discussion [long]

From: Ernest Cline (ernestcline@mindspring.com)
Date: Thu Apr 29 2004 - 16:18:02 EDT

Next message: John Hudson: "Re: New contribution"

Previous message: jcowan@reutershealth.com: "Re: An attempt to focus the PUA discussion [long]"
Maybe in reply to: Language Analysis Systems, Inc. Unicode list reader: "An attempt to focus the PUA discussion [long]"
Next in thread: Peter Kirk: "Re: An attempt to focus the PUA discussion [long]"
Reply: Peter Kirk: "Re: An attempt to focus the PUA discussion [long]"
Reply: Doug Ewell: "Re: An attempt to focus the PUA discussion [long]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> ----- Original Message by Rich Gillam-----
> ...
>
> Seems to me that the choice of defaults was designed to irritate the
> smallest number of people possible and cover the widest range of
> use cases possible, and that we're now hearing from people in that
> "smallest possible" group.
>
> Those people have legitimate needs. How should they be
> accommodated, and how does Unicode participate in that process?
> Seems there are a number of options:
>
> 1) Change the default properties for some range of the PUA. This is
> what people seem to be pushing most hard.

Actually, no. While it seems like an obvious first solution, the problems
you pointed out are quickly pointed out to such people so that they start
to push for option 2)

> 2) Leave the current PUA alone, but set aside a new PUA, say Planes
> 12 and 13. This solves the existing-use problem, but you still have the
> question of just how you subdivide the range, and it starts to cut down
> significantly on the code points available for actual standardization.

It won't require so many code points. I have been working on such
a proposal, and it requires not even a fifth of a plane, let alone two
full planes. I even hope to make it smaller.

> 3) Define ad-hoc standards that are based on Unicode but make
> character assignments in the PUA and lobby application vendors
> to support these encodings in addition to regular Unicode.

I just can't see application or OS vendors choosing to pick a
single PUA standard that is different from the Unicode defaults.

> 4) Lobby for operating-system vendors to extend their text engines
> to allow properties of PUA code points to be configured.

While this would be a possible solution, it has some real drawbacks
as well. especially when different Private Use assignments overlap
in the codepoints they use and have character properties that
differ for the same character.

> 5) Write specialized applications that are designed to deal with
> certain scripts and address the needs of user communities
> whose needs aren't being met right now.

The problem is, for most potential private uses, if there is sufficient
interest that a specialized application gets written just to handle that
one private use, it probably has enough interest to merit being
encoded in Unicode itself.

> 6) Use markup or other fancy-text mechanisms to override the
> default properties. There are plenty of controls for controlling
> directionality, cursive joining, and line breaking. It may be
> inconvenient to use them, but it seems like a viable workaround
> while waiting for something to get into Unicode, and there's no
> implementation lag. What problems do the existing mechanisms
> not solve? Maybe the discussion should focus on this question--
> are there mechanisms that should be added to Unicode or some
> markup language to help enable some of these scripts?

Markup already uses the selection of a font to establish which set
of Private Use characters is in use. Bidi Class and Line Break can
be handled, if not elegantly via the existing codes. However, the
behaviors that cause the most interest in a better defined private
use area, combing marks and cased letters cannot be handled
in a generic manner by formatting characters. It simply is impossible
to simulate non-zero canonical combining class characters in Unicode
with anything other than a character with the appropriate canonical
combining class. Indicating the case would involve having a means
to indicate which case the character is and where its other case is

> 7) Design custom fonts that cannibalize existing code points that
> have the right sets of properties.

This is a commonly chosen option right now, only it is usually done
with respect to legacy encodings so as to make use of the keyboard
mapping that is commonly associated with that legacy encoding.
It's not pretty, but most private users look for something that will make
their private use easy to accomplish, despite the problems it causes.

Any solution which requires users of the Private Use scheme to do
more than install a few files on their system to get it to work will
probably not be used, since schemes such as that embodied by
option 7) work now, despite the trouble they cause for others not
using the scheme.

Next message: John Hudson: "Re: New contribution"
Previous message: jcowan@reutershealth.com: "Re: An attempt to focus the PUA discussion [long]"
Maybe in reply to: Language Analysis Systems, Inc. Unicode list reader: "An attempt to focus the PUA discussion [long]"
Next in thread: Peter Kirk: "Re: An attempt to focus the PUA discussion [long]"
Reply: Peter Kirk: "Re: An attempt to focus the PUA discussion [long]"
Reply: Doug Ewell: "Re: An attempt to focus the PUA discussion [long]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Apr 29 2004 - 17:01:10 EDT