Re: Languages supported by UTF8 and UTF16

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Sat Sep 10 2005 - 09:20:01 CDT

Next message: Lateef Sagar: "Arabic Script: Hamza Issue - My point of view"

Previous message: Andrew West: "Re: Languages supported by UTF8 and UTF16"
In reply to: Doug Ewell: "Re: Languages supported by UTF8 and UTF16"
Next in thread: Peter Kirk: "Re: Languages supported by UTF8 and UTF16"
Reply: Peter Kirk: "Re: Languages supported by UTF8 and UTF16"
Reply: Mark Davis: "Re: Languages supported by UTF8 and UTF16"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Fri, 9 Sep 2005, Doug Ewell wrote:

> I'm afraid the list is at risk of falling into a hole debating this "how
> many languages on the head of a pin" question, when the real underlying
> question may be completely different.

Indeed, especially since the question was probably based on a
misconception on one thing at least, since it asked about encoding forms
and not Unicode.

While waiting for a clarification to the question, we can still discuss
_another_ question, namely that of language support by Unicode. There
seems to be confusion around it, too, and the question itself is somewhat
obscure. For example, does "Unicode" mean the Unicode repertoire of
characters, or the Unicode Standard, or the Unicode Consortium?

I'd say that the short answer to the question "what languages are
supported by the Unicode Standard?" would be as follows (without trying to
clarify the question much - can't do that in a _short_ answer):

All living languages, and many dead languages, can be written in
their normal writing system(s) using Unicode characters. However, some
of their characters cannot be represented as single Unicode characters
but as combinations. Some orthographic and typographic constructs, which
could in principle be expressed in plain text, cannot be expressed
in Unicode. Some of the properties of characters as defined by the Unicode
Standard do not correspond to their behavior in different languages.
Moreover, Unicode is meant to describe plain text only, so it generally
lacks any support that might be needed for display and processing of text
by language-specific rules.

Well, that's not very short, really. Neither is it very understandable,
since it lacks examples. The point, anyway, is that "support to a
language" can mean much more than just presence of all characters used in
a language. It's also debatable, since people may disagree on what really
belongs to a language, even at the character level. Moreover, it's
debatable what can be regarded as "support". For example, if the rules of
a language require a thin nonbreakable space before or after some
punctuation marks, can we claim that Unicode "supports" it, since you can
use a thin space character with a zero width no-break space on both sides
of it?

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Next message: Lateef Sagar: "Arabic Script: Hamza Issue - My point of view"
Previous message: Andrew West: "Re: Languages supported by UTF8 and UTF16"
In reply to: Doug Ewell: "Re: Languages supported by UTF8 and UTF16"
Next in thread: Peter Kirk: "Re: Languages supported by UTF8 and UTF16"
Reply: Peter Kirk: "Re: Languages supported by UTF8 and UTF16"
Reply: Mark Davis: "Re: Languages supported by UTF8 and UTF16"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Sep 10 2005 - 09:20:53 CDT