Re: script complexity, was Re: OpenType vs TrueType

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Dec 05 2004 - 10:04:25 CST

Next message: Marcin 'Qrczak' Kowalczyk: "Re: Nicest UTF"

Previous message: Philippe Verdy: "Re: Nicest UTF"
In reply to: Doug Ewell: "Re: script complexity, was Re: OpenType vs TrueType"
Next in thread: Bob Hallissy: "Re: script complexity, was Re: OpenType vs TrueType (was current version of unicode-font)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> Richard Cook <rscook at socrates dot berkeley dot edu> wrote:
>
>> Script complexity is not so easily quantified. Has anyone tried to
>> sort scripts by complexity? In terms of the present discussion, Han
>> would be viewed as a simple script, and yet it is "simple" only in
>> terms of the script model in which ideographs are the smallest unit.
>> In a stroke-based Han script model, Han is at least as complex as any.

If Han had not been encoded with a ideograph-based model, may be(?) we would
have needed much less code points. However the main immediate problem would
have been that the layout of composite radical and strokes in the
ideographic square is very complex, highly contextual, and in fact too much
variable across dialects and script forms to allow a layout algorithm to be
designed and standardized.

At least one could have standardized a Han strokes-to square layout system,
but it would have required a huge dictionnary, requiring many
dialect-specific sections to handle the variant forms and placement of the
composing strokes. In addition, the "square" model is not imperitive in Han,
because there are various styles for writing it, where the usual square
model is much relaxed, or simply not observed on actual documents.

To model such variations in a stroke-based model, it would have been needed
to encode:
- the strokes themselves (all, not just the radicals!)
- stroke variants
- descriptive composition pseudo-characters (like the existing IDC in
Unicode)
- dialectal composition rules.
And then to create a very complex specification to describe each ideograph
according to this model, and allow a renderer to redraw the ideographs from
such composition grapheme clusters.
The second problem is that GB* and BigFive encodings already existed as
widely used standards, but there was no concrete and interoperable solution
to represent Han characters with such composed sequences.

This modeling was possible for Hangul, but with a simplification: the
encoded "jamos" sometime represent several "strokes" (considered as letters,
also because they have a clear phonetic value, but sometimes grouped within
the same "jamo" to simplify the design of the Hangul layout system, notably
for double-consonnant "SANG*" jamos). But a simpler system of jamos was
still possible (for example it was easy to model the double-consonnant jamos
as two successive simpler jamos, and then update the Hangul syllable model
accordingly)

Next message: Marcin 'Qrczak' Kowalczyk: "Re: Nicest UTF"
Previous message: Philippe Verdy: "Re: Nicest UTF"
In reply to: Doug Ewell: "Re: script complexity, was Re: OpenType vs TrueType"
Next in thread: Bob Hallissy: "Re: script complexity, was Re: OpenType vs TrueType (was current version of unicode-font)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Dec 05 2004 - 10:10:20 CST