Regular expressions in Unicode (Was: Ethiopic text)

From: Hallvard B Furuseth ([email protected])
Date: Thu Mar 12 1998 - 03:43:20 EST

Next message: Bob Verbrugge: "unicodedata-2.0.14.txt"
Previous message: Tex Texin: "German-library Collation"
Next in thread: Jeroen Hellingman: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Jeroen Hellingman: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Hallvard B Furuseth: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Jeroen Hellingman: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Keld J|rn Simonsen: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Alain LaBont\i\ : "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Alain LaBont\i\ : "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: [email protected]: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Kenneth Whistler: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Hallvard B Furuseth: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: [email protected]: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Hallvard B Furuseth: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Kenneth Whistler: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Mark Davis: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Hallvard B Furuseth: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Kolbj�rn Aamb� : "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Peter Westlake: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Kolbj�rn Aamb� : "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Peter Westlake: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Gianni Mariani: "RE: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: John Cowan: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Keld J|rn Simonsen: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Peter Westlake: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Peter Westlake: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Jeroen Hellingman: "RE: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Glen Perkins: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Mark Davis: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Mark Davis: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> On the subject of regular-expression support for Unicode, the POSIX
> definition of regexps includes recognition of character classes. I
> believe that the regexp package in GNU gawk, available at
>
> ftp://prep.ai.mit.edu/pub/gnu/gawk-3.0.3.tar.gz ,
>
> has the POSIX definition implemented. While it is still based on
> 8-bit characters, it might prove a suitable starting point for Unicode
> support.

Have anybody defined or implemented "Unicode regular expressions" for a
program which uses Unicode internally? In particular, I wonder about
character ranges: If the user says "[�-�]" in his 8-bit charset (not
latin-1), then the program should use the characters from � to � in the
user's charset, not the range of iso10646 character codes from � to �.
So it seems that Unicode strings containing regexps must be tagged with
their "source charset". OTOH, [\200-\377] probably means "all non-ASCII
characters". And how do you say "all non-ascii Unicode characters"?
[\200-\3777777777]? :-)

-- 
Hallvard

Next message: Bob Verbrugge: "unicodedata-2.0.14.txt"
Previous message: Tex Texin: "German-library Collation"
Next in thread: Jeroen Hellingman: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Jeroen Hellingman: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Hallvard B Furuseth: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Jeroen Hellingman: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Keld J|rn Simonsen: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Alain LaBont\i\ : "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Alain LaBont\i\ : "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: [email protected]: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Kenneth Whistler: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Hallvard B Furuseth: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: [email protected]: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Hallvard B Furuseth: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Kenneth Whistler: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Mark Davis: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Hallvard B Furuseth: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Kolbj�rn Aamb� : "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Peter Westlake: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Kolbj�rn Aamb� : "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Peter Westlake: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Gianni Mariani: "RE: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: John Cowan: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Keld J|rn Simonsen: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Peter Westlake: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Peter Westlake: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Jeroen Hellingman: "RE: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Glen Perkins: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Mark Davis: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Maybe reply: Mark Davis: "Re: Regular expressions in Unicode (Was: Ethiopic text)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT