From: Asmus Freytag (
Date: Sun Oct 19 2003 - 21:59:16 CST

Why does this have to be in 'plain text'??

Plain text can be streams or strings. For streams, such a mechanism might
make sense, if you could identify a compelling case that's not better
handled by HTML, XML etc.

For strings, embedding font names in front of characters just violates some
implicit assumptions, e.g. that the average string is 'short', that the
number of bytes are a small and at least probabilistically determinable
multiple of the number of character, etc. etc. Not to forget that strings
are often assumed to be the plainest of plain text.

A lot of architectures will break if you violate these implicit assumptions
by hosting a mini-markup inside a string. And for at least half of them (my
scientific estimate) performance will prevent them from doing anything
about it, so you are stuck.

The language tagging scheme was designed for use with a string based
protocol, but one where the protocol contained the rules of interpreting
any tagging. What you are proposing is something that's supposed to just
infect any run of characters without warning.

Who's going to implement this, why, where and when?


At 04:34 AM 10/20/03 +0200, Chris Jacobs wrote:

>----- Original Message -----
>From: "Doug Ewell" <>
>To: "Unicode Mailing List" <>
>Cc: <>; "Tom Gewecke" <>
>Sent: Sunday, October 19, 2003 8:32 PM
>Subject: Re: Klingons and their allies - Beyond 17 planes
> > <jameskass at att dot net> wrote:
> >
> > > In addition to the problem of the OS substituting improper glyphs
> > > from inappropriate fonts unexpectedly, there's often a problem with
> > > line breaking.
> > >
> > > Since the PUA has no properties, some applications seem to ignore the
> > > space character and break lines arbitrarily, splitting words in the
> > > middle.
> >
> > That's exactly what happens in my sample pages. I didn't think it was
> > because the PUA had "no" properties so much as "default" properties,
> > which (as Thomas Chan indicated) might be Han-based or Han-influenced.
> > You can always switch to a font that will display glyphs for your PUA
> > characters, but it's harder to adapt a rendering engine to observe PUA
> > character properties.
>One problem is that there seems to be no way in plaintext unicode to specify
>who is in charge of a particular interpretation of the PUA.
>As I understand the position of the designers of Unicode they definitely
>don't want to be in charge of this and want to let the users of the PUA
>fight it out among themselves.
>Nevertheless I think if Unicode don't want to decide how the PUA is to be
>interpreted it should be at the very least provide a mechanism by which an
>user of the PUA can specify which specification he prefers.
>I plan to propose such a mechanism:
>I want to propose a char with the following properties:
>Scalar Value: U+E0002
>This starts a PUA interpretation selector tag.
>The content of the tag is a Font family name.
>For all PUA chars between this tag and the corresponding Cancel tag the
>copyright holder of the font is the sole authority about how the PUA should
>be interpreted.
>Any comments?
> > In any case, I am absolutely certain :-) :-) that the arbitrary mid-word
> > line breaking is what has discouraged would-be readers from pointing out
> > the typo (since fixed) in my transcription of a Dorothy Parker poem:
> >
> >
> >
> > -Doug Ewell
> > Fullerton, California
> >
> >
> >
> >

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST