From: Bjoern Hoehrmann (derhoermi@gmx.net)
Date: Mon Apr 27 2009 - 08:52:41 CDT
* Asmus Freytag wrote:
>If I understand him correctly, Bjoern also suggests his method to give
>yet another avenue for Unicode-enabling of existing multi-byte aware
>applications. Depending on the circumstances in each case, such retrofit
>might make sense.
Yes. You can transform a grammar as a pre-processing step and then use
the grammar without making other changes to the application, or none at
all if you pre-process the grammar before using it with an application.
Modifying an application so it decodes UTF-8 streams and then operates
on the scalar values is considerably more complicated; you would likely
use two code paths for byte-level and Unicode processing, and you need
new data structures for Unicode character classes, for instance.
As performance is concerned, there appears to be little published com-
parative research into this problem. I hope my implementation may aid
in changing that.
-- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
This archive was generated by hypermail 2.1.5 : Mon Apr 27 2009 - 08:56:43 CDT