Re: Ternary search trees for Unicode dictionaries

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Nov 18 2003 - 05:45:25 EST

Next message: Philippe Verdy: "Re: Ternary search trees for Unicode dictionaries"

Previous message: Jungshik Shin: "Re: How can I input any Unicode character if I know its hexadecimal code?"
In reply to: Theodore H. Smith: "Re: Ternary search trees for Unicode dictionaries"
Next in thread: Markus Scherer: "Re: Ternary search trees for Unicode dictionaries"
Reply: Markus Scherer: "Re: Ternary search trees for Unicode dictionaries"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Theodore H. Smith" <delete@elfdata.com>
> I've looked into the TST thing.
>
> I'm not sure that it is optimal, despite how popular they are!
>
> Look at this, if I add "1", "2","3", "4", "5", "6", "7", "8", "9" to a
> TST, they will all be in a line, in the tree. All will be reference via
> the "high" node.
>
> So, to find "9", I have to read through 9 items!

Nothing forbids to code each n-ary node as a binary tree for faster
searches.
Mapping the n-ary node to a binary tree is only the first step to compress
the data.
Looking for some open-source RDBMS engine will reveal a lot of interesting
storage structures to map dictionnaries like an index.

The trie is simple to build, but other structures like B-trees will work as
well in a even more optimal search speed. The way you encode each indexed
string in the trie can de indicated in the data part of each node, which can
indicate the length of a common substring from the previous indexed string
in the tree, to reduce its size, and an indicator whever the node in each
non-leaf B-tree page is a "ghost" substring holder or effectively maps to a
final string.

Next message: Philippe Verdy: "Re: Ternary search trees for Unicode dictionaries"
Previous message: Jungshik Shin: "Re: How can I input any Unicode character if I know its hexadecimal code?"
In reply to: Theodore H. Smith: "Re: Ternary search trees for Unicode dictionaries"
Next in thread: Markus Scherer: "Re: Ternary search trees for Unicode dictionaries"
Reply: Markus Scherer: "Re: Ternary search trees for Unicode dictionaries"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Nov 18 2003 - 07:02:53 EST