Re: Regex and arcane parsing

From: Mark Davis (mark_davis@taligent.com)
Date: Wed Jan 29 1997 - 18:45:14 EST


Good reminder. I just double-checked the URL, and had to use:

http://www.cm.spyglass.com/unicode/uni2errata/UnicodeDatabaseErrata.html

instead.

unicode@Unicode.ORG wrote:
>
> > Another charter member of Sun's Java team later confided that they
> > (the Java team) "didn't have a clue" about how to go about handling all
> > the "incredibly arcane problems" involved in parsing Unicode
>
> I hope that people who are concerned about implementing regular
> expression parsers in Unicode and/or other kinds of parsers
> (for example SQL expression parsers, etc.), are paying attention
> to the implementation guidelines in the Unicode Standard, Version 2.0,
> in particular, Section 5.14 Identifiers, and the errata to that
> section posted on the unicode.org web site:
>
> http://www.unicode.org/unicode/uni2errata/UnicodeDatabaseErrata.html
>
> Identifier parsing is relatively straightforward in Unicode, given
> properly defined classes. The same principles can be applied to
> regular expression parsers to specify the classes of characters to
> be matched by various wildcard characters.
>
> It would truly be a shame if every regex developer were to cobble up
> a different solution for what did and did not match, when a standard
> specification is available, and the machine readable versions of the
> data specifying the classes are available on the ftp site.
>
> --Ken Whistler



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT