Re: Special Type Sorts Tray 2001

From: DougEwell2@cs.com
Date: Tue Oct 02 2001 - 11:43:54 EDT


In a message dated 2001-10-02 4:50:03 Pacific Daylight Time,
WOverington@ngo.globalnet.co.uk writes:

> Is there an official Unicode Consortium statement that states, for the
> record, that the Unicode Consortium refuses to encode more ligatures and
> precomposed characters please?

I'm pretty sure there is, since it has been brought up so often by UTC
members on this list. If there is no such statement, then one should be
drafted.

> I feel that this is a matter that needs to be formally resolved one way or
> the other, so that, if such a refusal has been declared then people who
wish
> to have these characters encoded may act knowing that the Unicode
Consortium
> will have legally estopped itself from making any future complaint that it
> has some right to set the standards in such a matter and that those people
> who would like to see the problem solved and ligatured characters encoded
as
> single characters so that a font can be produced may proceed accordingly,
> perhaps approaching the international standards body directly if the
Unicode
> Consortium refuses to do so without a process of even considering
individual
> submissions on their individual merits. On the other hand, if no such
> formal statement has been issued, then those people who would like to see
> the problem solved and ligatured characters encoded as single characters so
> that a font can be produced for use with software such as Microsoft Word
may
> proceed to define characters in the private use area in a manner compatible
> with their possible promotion to being regular unicode characters in the
> presentation forms section.

Was that only two sentences? Wow....

Regarding the "refusal" to encode more ligatures and precomposed presentation
forms: It is not arbitrary. There is a reason why Unicode will not encode
these things. They would interfere with the established standard for
decomposition. Now that Unicode has reached its present level of popularity,
some vendors and implementations (and standards) require a stable set of
decomposable code points. That set is Unicode 3.0. If new precomposed
characters were added, engines and standards that were built to the new
standard would decompose them differently from those built to the old
standard, and this is not acceptable to those who need decomposition to work
at all.

Precomposed characters and ligatures won't be considered "on their individual
merits," and they won't be "promoted" from a private standard to true Unicode
character status, because the decomposition problem is bigger than the
individual merits. Note that I personally like the ct ligature and think it
would be a great thing to have in a font. If this were 1993, perhaps it
might have been encoded.

Regarding fonts: Nothing is stopping you or anyone else from making a font
with these precomposed glyphs and associating them with Unicode PUA (Private
Use Area) code points. That is an excellent illustration of a possible use
of the PUA, and many, many font vendors do just that.

> I feel that it would be quite wrong to pull up the ladder on the
possibility
> of adding characters such as the ct ligature as U+FB07 without the
> possibility of consideration of each case on its merits at the time that a
> possibility arises. A situation would then exist that several ligatures
> have been defined as U+FB00 through to U+FB06 including one long s
ligature,
> yet that U+FB07 through to U+FB12 must remain unused even though they could
> be quite reasonably used for ct and various long s ligatures so as to
> produce a set of characters that could be used, if desired, for
transcribing
> the typography of an 18th Century printed book. Yet, if the ladder has
been
> pulled up, perhaps U+FB07 can be defined as the ct ligature directly by the
> international standards organization and the international standards
> organization could decide directly about including the long s ligatures.

The organization you are talking about is ISO/IEC JTC1/SC2/WG2. They are
firmly committed to maintaining compatibility between Unicode and ISO/IEC
10646. Sorry, but this is a good thing.

> If the possibility of fair consideration is, however, still open, then the
> ct ligature could be defined as U+E707 within the private use area and
> published as part of an independent private initiative amongst those
members
> of the unicode user community that would like to be able to use that
> character in a document by the character being encoded as a character in an
> ordinary font file. That would enable font makers to add in the ct
> character if they so choose.

You might start by checking existing fonts, especially those shipped with
major operating systems, to see what PUA code points are commonly used
internally for glyphs not associated with a standard Unicode character. I
know that several Windows fonts have privately assigned glyphs, and I assume
the same is true for Macintosh fonts. Also, maybe the various font makers
who haunt this list could contribute any guidelines they know of for
quasi-standardizing these code points. Obviously, you are hoping that
standardizing the code points could lead to some measure of interoperability;
otherwise there would be no discussion. If all you want is to encode the ct
ligature in a font, you can use any old PUA character you wish, conformantly.

OTOH, private creation of quasi-standards on the part of vendors is not
necessarily a good thing. It is the sort of thing that the public tends to
vilify Microsoft for doing.

If you want to interchange the ct ligature and the long-s ligatures, you can
do that right now. Just encode <c, ZWJ, t> or <long-s, ZWJ, whatever>.
Then, rendering engines that have a glyph for the desired ligature can render
it, and those that don't will fall back to the individual characters
(assuming they are conformant). This approach has at least three major
advantages:

(1) It is already supported by the Unicode Standard.
(2) It provides a standard interchange mechanism without requiring font
vendors to agree on the code point used for the precomposed glyph.
(3) It provides a sensible fallback mechanism for the great majority of
fonts that, let's admit it, will not have these specialized glyphs.

Think about it.

In a message dated 2001-10-02 6:35:16 Pacific Daylight Time,
everson@evertype.com writes:

>> You might want to take a look at the ConScript Unicode Registry, which was
>> originally intended for "constructed" and artificial scripts, but which
>> could also be used for this purpose.
>
> No, it couldn't. It's for constructed and artificial scripts, not for
> precomposed Latin glyphs.

I stand corrected. But there is no reason William couldn't initiate his own
registry, along the lines of CSUR, for the purpose of assigning PUA code
points to precomposed Latin glyphs. Just don't expect the characters thus
added to "graduate" somehow into Unicode.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Oct 02 2001 - 10:19:07 EDT