Re: Finite state machines? UTF8: toFold(), normalisation, etc

From: Addison Phillips [wM] (
Date: Tue May 06 2003 - 11:00:42 EDT

  • Next message: Edward H Trager: "Re: Finite state machines? UTF8: toFold(), normalisation, etc"


    A "trie" is a two stage table.

    PPT is PowerPoint. I was under the impression that the presentation was
    available in PDF or HTML format also, but I guess it isn't if you didn't
    see it in that format.

    Here's a hint, though: if you search "Bits of Unicode" on Google, you
    can view much of the paper in HTML format using the option Google
    provides there.

    Best Regards,


    Theodore H. Smith wrote:
    > Hi Addison,
    > Thanks a lot for the answers that may help me get a clean solution.
    > I'm unfamiliar with "trie". What does it mean? If it's less complex than
    > a finite state machine I'm sure that'll be a benefit for me.
    > "Bits of Unicode" is in .ppt format. Is that "Power point"? I don't have
    > powerpoint or an app to read .ppt.
    > Thanks a lot for your kind help.
    >> Hi Mr. Smith,
    >> I wrote about "compiling" the Unicode character data tables in my
    >> response. That reply was somewhat sketchy: my three-year old son was
    >> sitting in my lap waiting for his machine to boot while I wrote it...
    >> Mark Davis wrote more-or-less the canonical presentation on this
    >> subject for an IUC conference a few years ago. The title was "Bits of
    >> Unicode". It may be elsewhere, but I've always found it on his
    >> personal page
    >> I have personally had reason to compile my own tables (NOT using a
    >> finite state language, just tries and similar structures) for purposes
    >> beyond those of ICU. But I must admit that in recent years I have
    >> tended to extend ICU or the very similar code in the Java JDK instead
    >> of implementing my own tables, but it isn't that hard to do. Getting
    >> the edge cases and esoteric details right, though, make it not worth
    >> my while (in my estimation).
    >> A finite state machine could certainly do "the job" (although what you
    >> really have is a number of similar "jobs" to do), but trie tables and
    >> similar structures are a lot easier to build and maintain and do the
    >> job marvelously well.
    >> Good luck with your implementation.
    > --
    > Theodore H. Smith - Macintosh Consultant / Contractor.
    > My website: <>

    Addison P. Phillips
    Director, Globalization Architecture
    webMethods, Inc.
    +1 408.962.5487
    Internationalization is an architecture. It is not a feature.
    Chair, W3C I18N WG Web Services Task Force

    This archive was generated by hypermail 2.1.5 : Tue May 06 2003 - 12:07:05 EDT