From: Mark Davis (email@example.com)
Date: Tue May 06 2003 - 12:13:14 EDT
A minor correction: a trie may be more than two stages.
We also have an enhanced version (developed by Markus Scherer) called
a folded trie, that we use in ICU. See
These data structures (and others that we use in ICU) are flattened;
that is, they are stored as a continuous block of memory, with offsets
instead of pointers. That way, they can all be used in read-only
memory-mapped files for fast initialization.
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
fax: (408) 256-0799
----- Original Message -----
From: "Addison Phillips [wM]" <firstname.lastname@example.org>
To: "Theodore H. Smith" <email@example.com>
Sent: Tuesday, May 06, 2003 08:00
Subject: Re: Finite state machines? UTF8: toFold(), normalisation, etc
> A "trie" is a two stage table.
> PPT is PowerPoint. I was under the impression that the presentation
> available in PDF or HTML format also, but I guess it isn't if you
> see it in that format.
> Here's a hint, though: if you search "Bits of Unicode" on Google,
> can view much of the paper in HTML format using the option Google
> provides there.
> Best Regards,
> Theodore H. Smith wrote:
> > Hi Addison,
> > Thanks a lot for the answers that may help me get a clean
> > I'm unfamiliar with "trie". What does it mean? If it's less
> > a finite state machine I'm sure that'll be a benefit for me.
> > "Bits of Unicode" is in .ppt format. Is that "Power point"? I
> > powerpoint or an app to read .ppt.
> > Thanks a lot for your kind help.
> >> Hi Mr. Smith,
> >> I wrote about "compiling" the Unicode character data tables in
> >> response. That reply was somewhat sketchy: my three-year old son
> >> sitting in my lap waiting for his machine to boot while I wrote
> >> Mark Davis wrote more-or-less the canonical presentation on this
> >> subject for an IUC conference a few years ago. The title was
> >> Unicode". It may be elsewhere, but I've always found it on his
> >> personal page http://www.macchiato.com
> >> I have personally had reason to compile my own tables (NOT using
> >> finite state language, just tries and similar structures) for
> >> beyond those of ICU. But I must admit that in recent years I
> >> tended to extend ICU or the very similar code in the Java JDK
> >> of implementing my own tables, but it isn't that hard to do.
> >> the edge cases and esoteric details right, though, make it not
> >> my while (in my estimation).
> >> A finite state machine could certainly do "the job" (although
> >> really have is a number of similar "jobs" to do), but trie
> >> similar structures are a lot easier to build and maintain and do
> >> job marvelously well.
> >> Good luck with your implementation.
> > --
> > Theodore H. Smith - Macintosh Consultant / Contractor.
> > My website: <www.elfdata.com/>
> Addison P. Phillips
> Director, Globalization Architecture
> webMethods, Inc.
> +1 408.962.5487 mailto:firstname.lastname@example.org
> Internationalization is an architecture. It is not a feature.
> Chair, W3C I18N WG Web Services Task Force
This archive was generated by hypermail 2.1.5 : Tue May 06 2003 - 13:14:59 EDT