Re: Finite state machines? UTF8: toFold(), normalisation, etc

From: Theodore H. Smith (
Date: Tue May 06 2003 - 10:08:39 EDT

  • Next message: Ram Viswanadha: "Re: Finite state machines? UTF8: toFold(), normalisation, etc"

    Hi Addison,

    Thanks a lot for the answers that may help me get a clean solution.

    I'm unfamiliar with "trie". What does it mean? If it's less complex
    than a finite state machine I'm sure that'll be a benefit for me.

    "Bits of Unicode" is in .ppt format. Is that "Power point"? I don't
    have powerpoint or an app to read .ppt.

    Thanks a lot for your kind help.

    > Hi Mr. Smith,
    > I wrote about "compiling" the Unicode character data tables in my
    > response. That reply was somewhat sketchy: my three-year old son was
    > sitting in my lap waiting for his machine to boot while I wrote it...
    > Mark Davis wrote more-or-less the canonical presentation on this
    > subject for an IUC conference a few years ago. The title was "Bits of
    > Unicode". It may be elsewhere, but I've always found it on his
    > personal page
    > I have personally had reason to compile my own tables (NOT using a
    > finite state language, just tries and similar structures) for purposes
    > beyond those of ICU. But I must admit that in recent years I have
    > tended to extend ICU or the very similar code in the Java JDK instead
    > of implementing my own tables, but it isn't that hard to do. Getting
    > the edge cases and esoteric details right, though, make it not worth
    > my while (in my estimation).
    > A finite state machine could certainly do "the job" (although what you
    > really have is a number of similar "jobs" to do), but trie tables and
    > similar structures are a lot easier to build and maintain and do the
    > job marvelously well.
    > Good luck with your implementation.

         Theodore H. Smith - Macintosh Consultant / Contractor.
         My website: <>

    This archive was generated by hypermail 2.1.5 : Tue May 06 2003 - 10:55:48 EDT