Re: indexing of various langauges

From: Martin J. Duerst (
Date: Fri Jul 25 1997 - 12:15:57 EDT

On Thu, 24 Jul 1997, Gary Grosso wrote:

> This is not strictly on the topic of Unicode, but many on this list are
> knowledgeable about editing/typography of many of the worlds languages.
> Also, I would be happy to get pointers to other sources.
> My question is this: for reasons of streamlining our implementation, we
> would like to limit the number of primary sort characters to 255. Does
> anyone knows of any language where the generally accepted indexing practice
> would have more that 255 distinct primary weights, or index groupings?

I guess yes. Examples could be Ethiopic or Yi. But I guess it's
not a problem to get around this. For example, assume you want
to sort on raw Unicode (2-byte), which is much more than 255.
You just convert each Unicode character to a sequence of 3 values
(two is not enough because you only have 255, and not 256, distinct
values), and then sort on this sequence. The secondary,.. sort keys
all go on the last item of the sequence.

Regards, Martin.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT