Re: Ternary search trees for Unicode dictionaries

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Nov 18 2003 - 05:45:25 EST

  • Next message: Philippe Verdy: "Re: Ternary search trees for Unicode dictionaries"

    From: "Theodore H. Smith" <delete@elfdata.com>
    > I've looked into the TST thing.
    >
    > I'm not sure that it is optimal, despite how popular they are!
    >
    > Look at this, if I add "1", "2","3", "4", "5", "6", "7", "8", "9" to a
    > TST, they will all be in a line, in the tree. All will be reference via
    > the "high" node.
    >
    > So, to find "9", I have to read through 9 items!

    Nothing forbids to code each n-ary node as a binary tree for faster
    searches.
    Mapping the n-ary node to a binary tree is only the first step to compress
    the data.
    Looking for some open-source RDBMS engine will reveal a lot of interesting
    storage structures to map dictionnaries like an index.

    The trie is simple to build, but other structures like B-trees will work as
    well in a even more optimal search speed. The way you encode each indexed
    string in the trie can de indicated in the data part of each node, which can
    indicate the length of a common substring from the previous indexed string
    in the tree, to reduce its size, and an indicator whever the node in each
    non-leaf B-tree page is a "ghost" substring holder or effectively maps to a
    final string.



    This archive was generated by hypermail 2.1.5 : Tue Nov 18 2003 - 07:02:53 EST