From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Dec 25 2003 - 06:44:55 EST
As one answer reveals to me that the term "orthograph" is apparently not
defined in Unicode, and this may create confusion with the term "character",
I'll try to define what I mean by this term, using carefully selected terms
(each one is important):
An orthograph is
an agreed convention
between writers of a selected language
to use a common set of glyphs
recognized as equivalent in that language and
creating classes of glyphs
commonly refered to as "characters"
by users of this convention,
and to order these classes
according to accepted "orthographic" rules
(that try to match the language lexical
and grammatical rules)
in order to write words, sentences or whole texts
that will be correctly understood by readers.
The term "recognized" is important here, as well as the limitation of the
term "equivalent". It supposes education and reading skills as they are
tought. It's a good justification for making Fraktur and modern Latin
letters separate as (despite they represent the same letters) they are not
recognized by users of the most common form of the language.
The term "class" above refers to wider subsets of glyphs than those that are
acceptable to represent Unicode characters. This is a place where the
"characters" defined in an orthograph are spanning distinct abstract
characters in Unicode. In that case, Unicode would create distinctions
between characters that do not exist in the origin orthograph, so that any
Unicode character may be equivalently acceptable to correctly represent the
word. Which abstract Unicode character is used is not relevant, so
recognizing which form is better is not an option, but this creates needs
for allowing "folding" rules or "decompositions", to restore the initial
distinctions and equivalences relevant for an orthograph (the written
language).
This archive was generated by hypermail 2.1.5 : Thu Dec 25 2003 - 07:39:47 EST