RE: Proposed Draft UTR #31 - Syntax Characters

From: Jim Allan (jallan@smrtytrek.com)
Date: Wed Aug 27 2003 - 10:09:42 EDT

Next message: Jill.Ramonsky@Aculab.com: "Prophesising (was Proposed Draft...)"

Previous message: Raymond Mercier: "Re: TLG and Beta code"
Maybe in reply to: Rick McGowan: "Proposed Draft UTR #31 - Syntax Characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Jill Ramonsky posted:

> In any case, I _imagine_ that a future compiler, running on a future
> operating system, will contain a system directory which will contain A
> VERSION of Unicode - by which I mean A VERSION of the Unicode data files, as
> supplied by the consortium. The hypothetical OS will then parse said files
> into an internal form that only it needs to know about, and make Unicode
> functionality available to applications (such as future compilers) in the
> form of standard API calls. A future compiler will simply have to call some
> function, which may be called something like is_indentifier_char(), and act
> on the return value (true or false) accordingly. The behaviour of the
> compiler, and indeed the whole OS, can be upgraded to behave in accordance
> with a new version of Unicode, simply by storing the new data files in the
> right place. You will not need to get upgraded applications. You will not
> need to recompile the kernel. Thus, in this future system, one will indeed
> "store a version of Unicode on your machine"

That seems very dangerous.

Such behavior is why people are often very wary of upgrading.

But it is dubious that a Unicode upgrade will modify all fonts on the
system to add new characters in the proper style, modify any and all
translation tables to use them, modify all tailored sorts, change
spell-checkers to recognize new valid spellings, change translation
tables to other character encodings, modify legacy data to fit with the
new version of Unicode.

Unicode is a standard which operating systems and applications and fonts
and sort routines can use and tailor.

A publishing house printing Coptic and using Unicode to do so would
currently employ the mixed Greek/Coptic characters defined in Unicode
but presumably with fonts that handled diacritics in Coptic fashion
rather than Greek fashion and probably using a number of PUA characters.

When the planned disunification of Greek and Coptic is implemented the
addition of new files indicating new properties for some Unicode code
points and new sort weights is not going to be sufficient to switch the
entire printing and publishing operation to use the new characters.

For at least a few years the publishing house will be still using the
old system while it cautiously edges into the new encoding for new work
and gradually updates fonts and acquires new ones and converts legacy
data.

Single character additions here and there to Unicode and clarification
of rules will not have so drastic and effect. But they will have effects
that cannot be implemented immediately through the mere addition of
tables of properties and collating weights.

The effect of the addition of four new characters in Unicode 3.0 for use
in Romanian text is still being felt in the lack of fonts that support
them and uncertainly about what translation tables should do.

It may be specified that a particular control character should not be
treated in the way some fonts have been treating it. That is not going
to change the behavior of legacy fonts. If it did, the user would be
quite perturbed that a document which printed perfectly yesterday prints
in a flawed manner today.

A particular proprietary routine coded by a language may depend a
particular character being in a particular Unicode classification to
filter it out along with certain other characters. It would at least be
annoying and might be disastrous if this behavior changed without
warning because the properties of the character had changed.

A simple change in compatibility decomposition might have great
individual effect on a single routine.

To write routines that depend on properties that Unicode has announced
as changeable may be bad coding. But I don't see that applications in
the future will be any less afflicted with bad coding than current
applications.

Jim Allan

Next message: Jill.Ramonsky@Aculab.com: "Prophesising (was Proposed Draft...)"
Previous message: Raymond Mercier: "Re: TLG and Beta code"
Maybe in reply to: Rick McGowan: "Proposed Draft UTR #31 - Syntax Characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Aug 27 2003 - 11:02:40 EDT