Re: Regex and arcane parsing

From: Mark Davis (
Date: Wed Jan 29 1997 - 18:45:14 EST

Good reminder. I just double-checked the URL, and had to use:


unicode@Unicode.ORG wrote:
> > Another charter member of Sun's Java team later confided that they
> > (the Java team) "didn't have a clue" about how to go about handling all
> > the "incredibly arcane problems" involved in parsing Unicode
> I hope that people who are concerned about implementing regular
> expression parsers in Unicode and/or other kinds of parsers
> (for example SQL expression parsers, etc.), are paying attention
> to the implementation guidelines in the Unicode Standard, Version 2.0,
> in particular, Section 5.14 Identifiers, and the errata to that
> section posted on the web site:
> Identifier parsing is relatively straightforward in Unicode, given
> properly defined classes. The same principles can be applied to
> regular expression parsers to specify the classes of characters to
> be matched by various wildcard characters.
> It would truly be a shame if every regex developer were to cobble up
> a different solution for what did and did not match, when a standard
> specification is available, and the machine readable versions of the
> data specifying the classes are available on the ftp site.
> --Ken Whistler

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT