Re: Correct definition for an "isLatin1()" function

From: John Cowan (jcowan@reutershealth.com)
Date: Thu Oct 05 2000 - 13:50:25 EDT


"Rogers, Paul" wrote:

> We're whipping up a little function named isLatin1() that returns true if
> the (UCS-2) string in question is "all Latin1".

[snip]
 
> In other words, should we exclude the C0, C1, and Latin Extended code
> values?

Including or excluding C0 and C1 is a matter of taste. If you mean
"strictly containing characters in ISO 8859-1", then they're out.
If you mean "representable in typical Latin-1 text files", then at least
C0 is in, and C1 will do no great harm. (Provided your Unicode
characters don't originate from incorrect transcoding from CP 1252.)

The Latin Extended blocks are definitely out.

-- 
There is / one art                   || John Cowan <jcowan@reutershealth.com>
no more / no less                    || http://www.reutershealth.com
to do / all things                   || http://www.ccil.org/~cowan
with art- / lessness                 \\ -- Piet Hein



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT