"Rogers, Paul" wrote:
> We're whipping up a little function named isLatin1() that returns true if
> the (UCS-2) string in question is "all Latin1".
[snip]
> In other words, should we exclude the C0, C1, and Latin Extended code
> values?
Including or excluding C0 and C1 is a matter of taste. If you mean
"strictly containing characters in ISO 8859-1", then they're out.
If you mean "representable in typical Latin-1 text files", then at least
C0 is in, and C1 will do no great harm. (Provided your Unicode
characters don't originate from incorrect transcoding from CP 1252.)
The Latin Extended blocks are definitely out.
-- There is / one art || John Cowan <jcowan@reutershealth.com> no more / no less || http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT