Re: "A Programmer's Introduction to Unicode"

From: Asmus Freytag <>
Date: Mon, 13 Mar 2017 10:00:08 -0700
On 3/13/2017 3:31 AM, Janusz S. Bien wrote:
Just yet another reason for introducing the notion of textel?

The main difference between "textel" and "pixel" is that the unit of processing /displaying text is not uniform and fixed, unlike a pixel. In other words, different operations may need to look at text differently, and I don't mean the trivial case of storage (byte level) vs. any higher level.

Correspondingly the discussion of "text element" at least in the early versions of the Unicode Standard, left the particular division of the text into "text elements" unspecified.

There are closely related tasks that might demonstrate this. Assume a script where multiple code points make up a syllable, yet that syllable is the intuitive basic unit of reading and writing.

One task is cursor placement. For that task, you need to be able to divide *any* text so that the cursor ideally does not get positioned in the middle of a syllalbel. However, the definition of a "syllable" has to allow degenerate and 'defective' cases. Which is which is of no importance, as long as it is possible to find a valid cursor position.

The other task would be to assert that a string contains only well-formed syllables. Here, it is crucially necessary to be able to define which syllables are well-formed. Finding divisions in parts of the string that does not contain well-formed syllables is not necessary.

You may also find that in some cases, even though the syllable is the basic unit, there may be a need to edit it in ways other than as a unit. Some syllables may have some optional marks, signs or symbols added that may need to be edited or traversed explicitly, while a "core" syllable may be more likely to be a unit.

This (or similar) scenarios indicate the impossibility to come to a single, universal definition of a "textel" -- the main reason why this term is of lower utility than "pixel".


Received on Mon Mar 13 2017 - 12:01:47 CDT

This archive was generated by hypermail 2.2.0 : Mon Mar 13 2017 - 12:01:48 CDT