Re: Non-ascii string processing?

From: Peter Kirk (
Date: Tue Oct 07 2003 - 06:28:18 CST

On 07/10/2003 04:35, Jill Ramonsky wrote:

> No. What you have demonstrated below is that given an API based on
> characters, one can write an API based on default grapheme clusters.
> Nonetheless, it is only the /_resulting
> _/default-grapheme-cluster-based API which would actually be of any
> use to end-users.
> ...and anyone who even /thinks/ of writing an API based on default
> grapheme clusters is surely competent enough to write that that
> (almost trivial) character-based middle layer themselves.
> I have yet to see an APPLICATION which needs a character-based API.
> Jill
Well, application programming with default grapheme clusters will be
fairly trivial when using a computer language which has string etc
processing able to work transparently and efficiently with arbitrary
length characters, I mean, default grapheme clusters. Until such
computer languages are widely available, and given that for very many
widely used natural languages (if NFC is used) characters and DGCs
coincide, I would much prefer to work with a character-based API than
have to always do my own combining of UTF-8 bytes.

Anyway, DGCs are not always what you want to work with. I work a lot
with pointed Hebrew texts. For most purposes (though not for calculating
space taken up on a line) the entities I need to work with correspond to
Unicode characters rather than DGCs, for I work separately with the base
characters (mostly consonants), the vowel points and the accents. In
some cases the match is not precise, but it is a lot more convenient
for my work if I can access a string character by character, rather than
UTF-8 byte by UTF-8 byte or DGC by DGC. And, by the way, I have real
examples of DGCs in Hebrew consisting of six characters.

Peter Kirk (personal) (work)

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST