Re: Language Tagging And Unicode

From: Christopher John Fynn (cfynn@dircon.co.uk)
Date: Tue Jan 18 2000 - 20:08:37 EST


 ---- Original Message ----

From: Richard Gillam <rgillam@jtcsv.com>

> This whole discussion on Serbian and Russian Cyrillic is getting silly and
seems
> to be disappearing further and further down a rathole.

I agree.

> I have yet to hear a good reason why the Serbian/Russian problem is
anything
> more than a font-selection issue. It's the same problem you have with
> Greek/Coptic, Arabic/Urdu/Persian, and Traditional Chinese/Simplified
> Chinese/Japanese. In *all* of these cases you have characters whose
appearance
> is determined by the language of the text, even though they're
semantically the
> same character. In all of these cases, the correct shape to draw is
controlled
> by some type of out-of-band information and not by Unicode plain text
itself.
> This is because the only difference is visual presentation, not the
semantics of
> the character itself. You don't have different character codes for italic
or
> bold versions of the letter a; so too you don't have different character
codes
> for the Russian and Serbian italic versions of the letter ghe.

Yes, it's not a big deal to change a couple of characters in italic fonts to
make
Serbian specific versions - or (much better) to add language specific glyph
variants and include the necessary table information in AAT / OpenType
fonts.

In the first case the author of a document may have to select the required
font.
The second case requires some kind of language tagging - but in many cases
these
tags can be inserted automatically by picking up e.g the input locale being
used while
inputting a given run of text. [Someone mentioned HTML boundaries - but in
HTML 4,
XML or with CSS aren't there tags that could be used which may span normal
boundaries?]

Someone argued that there is little support for OT and AAT features
in real applications. This may be true, but I feel it would be much more
worthwhile for those making this proposal to expend effort pressuring
systems and application developers to include such support than it is to ask
for these additional "characters" in Unicode. This "solution" looks like a
kludge
to me and I feel it's acceptance would almost inevitably result in numerous
similar proposals for all the other scripts which have language specific
glyph forms.

> ...

- Chris



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT