RE: Unicode and end users

From: Yves Arrouye ([email protected])
Date: Sat Feb 16 2002 - 23:09:37 EST


> If "foo" is a US-ASCII string, "grep foo file" will work fine with any
> US-ASCII-superset charset for which non-ASCII characters do not use
> bytes < 0x80, including the hypothetical one I described, with no
> possibility of a false match. However "grep f�� file" will work only
> if the current shell charset (i.e. of argv[1]) matches the encoding of
> "file".

Not necessarily. It will work as long as the sequence of 3 bytes f�� is the
representation of the string you are looking for in the file, in that file's
encoding. grep does not validate anything, nor should it IMHO. If you want
to guarantee the encoding, use a converter like ICU's uconv(1) or iconv(1).

YA



This archive was generated by hypermail 2.1.2 : Sat Feb 16 2002 - 22:43:10 EST