Re: Ternary search trees for Unicode dictionaries

From: Mark Davis (mark.davis@jtcsv.com)
Date: Tue Nov 18 2003 - 00:39:14 EST

  • Next message: Jungshik Shin: "Re: How can I input any Unicode character if I know its hexadecimal code?"

    We tend to use tries, which have very good performance characteristics. See
    "bits of unicode" on my site: www.macchiato.com.

    Mark
    __________________________________
    http://www.macchiato.com
    ► शिष्यादिच्छेत्पराजयम् ◄

    ----- Original Message -----
    From: "Theodore H. Smith" <delete@elfdata.com>
    To: <unicode@unicode.org>
    Sent: Mon, 2003 Nov 17 18:21
    Subject: Re: Ternary search trees for Unicode dictionaries

    > I've looked into the TST thing.
    >
    > I'm not sure that it is optimal, despite how popular they are!
    >
    > Look at this, if I add "1", "2","3", "4", "5", "6", "7", "8", "9" to a
    > TST, they will all be in a line, in the tree. All will be reference via
    > the "high" node.
    >
    > So, to find "9", I have to read through 9 items!
    >
    > Now, I'm not sure if this is a bad thing. It's just that compared to a
    > binary search on an array, in this example, a TST does more work.
    >
    > When reading in data, that has already been sorted, I think this could
    > be a problem. It's fine for randomly ordered data, but with sorted
    > data... I can't tell how much of a problem it could be.
    >
    > This is the kind of structure that punishes neat people. An unexpected
    > payback for the effort of being neat and ordered.
    >
    > I'm quite new to this, and so I don't know if there is a good solution
    > to making the tree's balanced without imposing huge overhead, or if in
    > practice, this will be a problem.
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Tue Nov 18 2003 - 01:27:53 EST