Re: Plane 14 language tags

From: Christopher John Fynn (cfynn@dircon.co.uk)
Date: Wed Jan 26 2000 - 10:04:21 EST


Mark

I may be wrong, but with CJK text don't you often have to know whether
to render that text in a Korean, Japanese or Chinese form? If that text
is say in the Japanese langusge you could probably use any number of
Japanese fonts - so it isn't fonts we have to specify - I'm not talking
about
"optimal rendering" here, only "intelligible rendering" for users who speak
and read a particular language.

You might want to use say a Nashtaliq form of the Arabic script
for default display the Urdu language but that wouldn't necessarily make
sense for all languages that use other forms of the Arabic script.
And some of those other forms of Arabic script wouldn't make sense
for Urdu. Specifying a *particular* Urdu font does cross the border to
rich text.

There are also cases where a user might want to render text encoded
in one block in a completely different script. i.e. text in the Pali
language
might need to be rendered in Devanagri, Sinhalese, Burmese, Thai, or
Roman scripts depending on which script the user understood. There
is really no standard script for Pali but we have to use one or another
(script specific) block to encode it - even though, in this case, the
language
is probably more important than the script.

A form of Basic script tagging is inherent in Unicode since it encodes
scripts. But there are many cases where you won't know how to form
ligatures, where to break words, etc if you don't know also know the
language of that text and, in a significant number of cases simply
rendering text in a default form of a script will make the text at least
partially unintelligible to users of the language that text is written in.

You may be right about "the vast majority of cases" - though I suspect that
majority may not be as vast as you assume - but we are talking about a
world-wide character encoding standard here and to live up to that name
I think the script requirements of minority languages must be handled too
- even if that minority is relatively small.

- Chris

----- Original Message -----
From: Mark E. Davis <markdavis@ispchannel.com>
To: Unicode List <unicode@unicode.org>
Cc: Unicode List <unicode@unicode.org>; Martin J. Duerst <duerst@w3.org>
Sent: Wednesday, January 26, 2000 2:03 PM
Subject: Re: Plane 14 language tags

> > With plain otherwise untagged text there are many cases where
> > you can't know how to sensibly render the text (even if you know
> > the what the basic script is) without also knowing what the language is.

> I disagree. In the absence of any other information, in the vast majority
of
> cases if you render plain text in the default fonts chosen by the user for
the
> his/er computer, you will get perfectly acceptable results. (Of course, to
be
> "acceptable" you have have some choice of fonts on the machine -- but if
you
> don't have any choice, typically you couldn't do any better if the
language were
> tagged.) These fonts may be derived from the default language on the
user's
> machine (or may be simply chosen independently). For really precise
results,
> what people actually want is font tagging, not language tagging.

> Thus at most, this could be reworded to:

> "With plain otherwise untagged text there are some cases where
> you can't know how to optimally render the text (even if you know
> the what the basic script is) without also knowing what the language is."

> Even this way, the "some" may convey too strong an impression. No one has
ever
> come forward with actual cases where the above approach is insufficient.

> Mark
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT