Re: Level of Unicode support required for various languages

From: Vinod Kumar ([email protected])
Date: Thu Oct 25 2007 - 04:10:24 CDT

Next message: David Starner: "Re: Cost of no OCR for extended Latin"

Previous message: Otto Stolz: "Re: Cost of no OCR for extended Latin"
In reply to: [email protected]: "Re: Level of Unicode support required for various languages"
Next in thread: Peter Constable: "RE: Level of Unicode support required for various languages"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Let me rephrase the question from the software architecture viewpoint. What
is the level or nature of support to be provided by the unicode text
rendering software architecture?

Some assumptions are:
1) The Unicode standard has encoded the script as well as its rendering
rules.
2) The support sought is for the operation of rendering and not for sorting
etc.

The answer to the rephrased question, from example (ICU) and experience, is:
1) There should be a character level operation to find the appropriate
boundaries in the text stream. At the lowermost level, we have to identify
the sequence of characters that have to be treated together to obtain its
rendering. For Latin, this is mostly a single character, but for South Asian
scripts this is a 'logical syllable'. International Components for Unicode
(ICU from IBM) provides a character iterator for returning the next sequence
of characters that should be treated as a unit for shaping purpose.

2) There should be a function that takes in a syllable of characters
(obtained from 1 ) and a font to return the sequence of glyphs from the
font. If these glyphs are laid out, they would be a rendering of the
syllable. The concrete method used by ICU is of the form
XXXLayoutEngine->layoutChars(text, ...) followed by
XXXLayoutEngine->getGlyphs(glyphs, ..);

Thus for a any script XXX, Unicode would arrive at an encoding for the
characters and how the characters will be shaped. Now it for the software
archtects to implement the rendering in a concrete form. Especially for (2),
this involves using an intelligent font (ISO/IEC 15285) such as OpenType or
AAT able to convert the n characters from the syllable of (1) to m glyphs.
Thus if the software can implement these two functions (breaking text to
syllables, and getting the glyphs corresponding to each syllable) we can say
that the software supports the unicode text rendering of script XXX.

Cheers,
Vinod Kumar
Project IndiX
ps: While I was wording my answer Timothy Armes has thanked all for their
answers and put his questions into context. Maybe this answer can address
the issue now.

Next message: David Starner: "Re: Cost of no OCR for extended Latin"
Previous message: Otto Stolz: "Re: Cost of no OCR for extended Latin"
In reply to: [email protected]: "Re: Level of Unicode support required for various languages"
Next in thread: Peter Constable: "RE: Level of Unicode support required for various languages"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Oct 25 2007 - 04:12:57 CDT