Re: Encoding for Fun (was Line Separator)

From: John Cowan (jcowan@reutershealth.com)
Date: Wed Oct 22 2003 - 11:43:49 CST


Jill Ramonsky scripsit:
>
> I can't argue with that ... but my strings were always in (32-bit wide)
> Unicode at "sort-time". I'm not sure exactly how much value there is a
> lexicographical sort anyway. I mean, even in Latin-1, surely '' should
> not come after 'z'?

Fair enough. Another good property that your "UTF-4" scheme has is that
8-bit search will work correctly, which is true of UTF-8 as well but not
of UTF-16.

-- 
John Cowan  jcowan@reutershealth.com  www.ccil.org/~cowan  www.reutershealth.com
I must confess that I have very little notion of what [s. 4 of the British
Trade Marks Act, 1938] is intended to convey, and particularly the sentence
of 253 words, as I make them, which constitutes sub-section 1.  I doubt if
the entire statute book could be successfully searched for a sentence of
equal length which is of more fuliginous obscurity. --MacKinnon LJ, 1940


This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST