Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: Peter Kirk (peterkirk@qaya.org)
Date: Thu Dec 11 2003 - 17:58:39 EST

Next message: wjbm820: "character map in Microsoft Word"

Previous message: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
In reply to: Mark Davis: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Next in thread: Mark Davis: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: Mark Davis: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: Philippe Verdy: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 11/12/2003 10:16, Mark Davis wrote:

>>Mark, don't patronise me. I'm not talking about levels of enlightenment.
>>I'm not talking about levels in the sense you just used when you
>>mentioned "higher-level issues". I'm talking about the well-known
>>concept of levels or layers of programming and of communication protocols.
>>
>>
>
>My apologies; I had intended a lighter tone, not patronization.
>
>
>
Apology accepted. I should have recognised the "enlightenment" of the tone.

> ...
>
>One could, of course, design a programming language that always indexed and
>counted by some other entity, say, default grapheme clusters. Such a language
>would be be unable to deal with pieces that didn't constitute a complete
>cluster, or and have to deal with the issues such as that the number of entities
>in the concatenation of two strings is not the same as the same as the sum of
>number of numbers of entities in each of the strings, so indexing gets pretty
>tricky. I don't know of any programming language that has tried to do this, and
>I don't think it would be of particular value -- and would be exceedlingly slow.
>
>
This is I suppose what I was thinking of. I see the problem if partial
clusters are permitted, but they could be forbidden from this type. Is
there ever a case where a concatenation of n DGCs and m DGCs is not
equal to (n+m) DGCs? If so there is a small problem, but one which is
surmountable if it is made clear that concatenation does not always
imply addition of string length. I do think this would be a useful thing
to do, and Benjamin, who seems to agree, suggests that .NET does it at
least to some extent. I am sure that some tricks could be found to
simplify the indexing if necessary, e.g. using PUA or non-character code
points indexed into a special table to replace DGCs which cannot be
represented as a single character. (There are plenty of non-characters
available as you need to use UTF-32 here to avoid exactly the same
problems with surrogates.)

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/

Next message: wjbm820: "character map in Microsoft Word"
Previous message: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
In reply to: Mark Davis: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Next in thread: Mark Davis: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: Mark Davis: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: Philippe Verdy: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Dec 11 2003 - 18:42:48 EST