RE: Plane 14 language tags

From: Murray Sargent (
Date: Wed Jun 28 2000 - 16:25:49 EDT

Note that in C, it's essentially just as fast to make character comparisons
with (ch | 0x20) as with ch alone, i.e., if you know ch is in an ASCII range
(0 - 0x7F or 0xE0000 - 0xE007F), you can do a case insensitive compare as
quickly as a case sensitive one. The problem with assuming lower case is
that the input might not all be in lower case. I remember all too well
having to accept RTF control words with upper-case letters even though the
RTF spec and Word both specifically use all lower case for these words.


> -----Original Message-----
> From: Kenneth Whistler []
> Sent: Wednesday, June 28, 2000 12:03 PM
> To: Unicode List
> Cc:
> Subject: Re: Plane 14 language tags
> Doug Ewell asked:
> > 2. (Ken and Glenn) Can you explain in a little more detail the
> rationale
> > for lowercasing the entire language tag? It seems that if RFC 1766
> > is the model to be followed, then the RFC 1766 casing convention
> > (lowercase for language, uppercase for country) might be preferred.
> John Cowan's non-authoritative response was fine by me -- and was
> better-expressed than this author would probably have done. ;-)
> > I guess I don't see how lowercasing the entire tag simplifies or
> > speeds up anything, since the hyphen which separates language from
> > country is outside the range of lowercase letters anyway and
> > processes that want to ignore LT's must ignore the entire range from
> > U+E0000 through U+E007F.
> It is not a matter of range-checking. For ignoring tags, you would always
> check the entire range. Rather, it is just a suggestion that since
> case is not significant in the language tags, it is slightly preferable
> to do the early "normalization" (i.e. case folding to lowercase, in
> this instance), rather than emitting arbitrarily mixed case tags
> and distributing the case-folding burden to all the interpreters of
> the tags.
> --Ken Whistler

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT