Re: UTF-8 based DFAs and Regexps from Unicode sets

From: Doug Ewell (doug@ewellic.org)
Date: Sun Apr 26 2009 - 10:40:25 CDT

Next message: Mark Davis: "Re: UTF-8 based DFAs and Regexps from Unicode sets"

Previous message: Bjoern Hoehrmann: "UTF-8 based DFAs and Regexps from Unicode sets"
In reply to: Bjoern Hoehrmann: "UTF-8 based DFAs and Regexps from Unicode sets"
Next in thread: Mark Davis: "Re: UTF-8 based DFAs and Regexps from Unicode sets"
Reply: Mark Davis: "Re: UTF-8 based DFAs and Regexps from Unicode sets"
Reply: Asmus Freytag: "Re: UTF-8 based DFAs and Regexps from Unicode sets"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Bjoern Hoehrmann" <derhoermi@gmx.net>

> Now, if we replace each character by its UTF-8 encoding, we would ob-
> tain a regular expression and corresponding automata that match the
> same language, but would operate directly on bytes:
>
> /(A|B|...|a|b|...|\xC3\x80|...)(...)/

I know this isn't the answer you're looking for, but it almost always
makes more sense to decode UTF-8 code units into Unicode code points
FIRST and then apply other algorithms to operate on Unicode text,
instead of trying to build UTF-8 decoding into every algorithm.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ

Next message: Mark Davis: "Re: UTF-8 based DFAs and Regexps from Unicode sets"
Previous message: Bjoern Hoehrmann: "UTF-8 based DFAs and Regexps from Unicode sets"
In reply to: Bjoern Hoehrmann: "UTF-8 based DFAs and Regexps from Unicode sets"
Next in thread: Mark Davis: "Re: UTF-8 based DFAs and Regexps from Unicode sets"
Reply: Mark Davis: "Re: UTF-8 based DFAs and Regexps from Unicode sets"
Reply: Asmus Freytag: "Re: UTF-8 based DFAs and Regexps from Unicode sets"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Apr 26 2009 - 10:45:01 CDT