RE: Collation - last character?

From: Lars Kristan (lars.kristan@hermes.si)
Date: Mon Mar 18 2002 - 04:11:18 EST


Markus Scherer wrote:
> How about U+10ffff?
> It is a non-character, which gives it a high (unassigned
> character) weight in the UCA. It is the highest code point =
> "the last character".

That is definitely not what I was looking for. It is an illegal codepoint,
while I was looking for a legal codepoint, and one that would not 'happen to
be' the last, but would be 'defined as' last.

Initially, I wanted to have such a codepoint, which would counterpart the
underscore (_). Meaning, it would be a valid alpha character (one that is
guaranteed to be accepted for identifiers, even as the first character), and
would have a non-zero-width representation.

Asmus Freytag [asmusf@ix.netcom.com] also noted that there could be use for
such characters in user interfaces. However, for this type of usage, it
would be preferred to have two zero-width, non-breaking characters, that
would typically NOT be allowed in user input, allowing the application to
keep reserved items on top or bottom of a sorted list, also knowing that the
user can never delete them or add an item with the same name, as long as
these are screened at point of input. Things get more complicated if you
allow reversed sort order, so I cannot say at this point whether or not
anyone would really choose to use such an approach.

The question would then be, if we pursue this issue, are we looking for a
single character, that would counterpart the underscore, or are we looking
for four characters, two alpha characters and two zero-width spaces? To
allow for the latter, I now think that these would fit more in the General
Punctuation block than in the Specials block.

Lars Kristan



This archive was generated by hypermail 2.1.2 : Mon Mar 18 2002 - 03:44:16 EST