Re: Language Tagging And Unicode

From: Christopher John Fynn (
Date: Wed Jan 19 2000 - 10:04:50 EST

Marco wrote:

+AD4- Janko Stamenovic wrote:

+AD4- +AD4- Can anybody now explain me the exact logic for +ACI-what is character and
+AD4- +AD4- what's not+ACI-? As far as I can see, the only real rule is +ACI-what people
+AD4- +AD4- accept that it should be a character+ACI-.

+AD4- You know that I don't agree with your proposal and why. However these are
+AD4- very good questions, and I am very curious to see the answers from Unicode
+AD4- +ACI-authorities+ACI-.

+AD4- My personal answer is that Unicode's architecture is not a +ACI-clean+ACI- thing,
+AD4- and often one discovers that there is no +ACI-exact logic+ACI- behind many
+AD4- choices.

I'm certainly not an +ACI-authority+ACI- but phrases like 'pre-existing standards',
'round-trip conversion' and 'backwards compatibility' come to mind.

Chances are that if there was an existing Cyrillic standard containing both
Russian and Serbian glyph forms as separate characters - and if
representatives of an official Yugoslav national standards body had then
proposed and championed these +ACI-characters+ACI- through a whole series of
ISO/IEC 10646 committee meetings - it is likely they would now be in the
Unicode standard.

If the Unicode and ISO/IEC 10646 Standards were being written again today
from scratch (and without any political considerations) many things would
probably be done differently. However the fact that the standard today +ACI-is
not a 'clean' thing+ACI- isn't necessarily a good reason to add more pollution.

I think you now pretty well have to provide compelling evidence that two
simple glyph forms convey a significant lexical difference within a single
language using that script for them to be accepted as separate characters.
If you think you have such evidence, make a formal proposal and then try to
see that the proposal is followed through all the stages it has to go
through to be accepted as part of the standard.

This can all take a long time and be a lot of work - and, if the characters
are finally accepted, it's unlikely that many application developers will
begin to support them until another major version of the Unicode Standard
is published.

I personally think you are likely to see reasonable application support for
language system specific glyph variants in fonts long before you could
see these +ACI-characters+ACI- accepted into the standard and supported.
(That is +ACo-IF+ACo- they were accepted - and the chances of that seem very small.)

OTOH I could be wrong - the Euro character was accepted and supported
very quickly, and the level of support for OpenType language features in
API's, shipping applications and fonts is way behind what I expected it
be by now.

 - Chris

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT