Re: ISO 8859-11 (Thai) cross-mapping table

From: John Cowan (
Date: Tue Oct 08 2002 - 08:44:59 EDT

  • Next message: Elliotte Rusty Harold: "Re: ISO 8859-11 (Thai) cross-mapping table"

    Elliotte Rusty Harold scripsit:

    > The Verifier class has a similar issue, though there it's a case of
    > determining whether or not any given character is a legal XML
    > character/name character/name-start character/ etc. This is done with
    > a trick introduced in JDOM where the code looks like this:

    The underlying data structure here is called a "range table", and is
    a list of ranges in codepoint order, expressed thus:

            start of first range
            end of first range + 1
            start of second range
            end of second range + 1

    etc. etc. What you are doing is equivalent to a linear search over
    this range followed by loop unrolling. However, you can do better,
    especially in complex cases, with a *binary* search over the range
    followed by loop unrolling. The trick here is that if the binary
    search returns an even value, it succeeds; an odd value fails.

    But I agree that testing ASCII first is wise. Mozilla bypasses
    its general algorithm in the ASCII/Latin-1 case, providing a fast
    256-element lookup table and then an extremely compact (but somewhat
    slower than the above) data structure for the rest of Unicode.

    John Cowan                      
    At times of peril or dubitation,
    Perform swift circular ambulation,
    With loud and high-pitched ululation.

    This archive was generated by hypermail 2.1.5 : Tue Oct 08 2002 - 09:35:43 EDT