RE: Normalization Implementation Tricks

From: Richard Ishida (
Date: Thu Feb 12 2009 - 09:12:09 CST

  • Next message: Mark Davis: "Re: Normalization Implementation Tricks"


    You can look at my source code for normalization in PHP or JavaScript at , if that's any help.


    Richard Ishida
    Internationalization Lead
    W3C (World Wide Web Consortium)

    > -----Original Message-----
    > From: []
    > On Behalf Of Michael D. Adams
    > Sent: 12 February 2009 02:33
    > To:
    > Subject: Normalization Implementation Tricks
    > How do people efficiently implement the (re-)composition table used by
    > the normalization algorithm for NFC and NFKD? (I am writting a
    > library for a project.)
    > The most naive implementation would be a table indexed by a starter
    > character and a combining character. Of course that is completely
    > unreasonable as it would require 0x110000 * 0x110000 entries (a few
    > terabytes).
    > If I understand right, ICU library uses shared tries (as the Unicode
    > spec suggests) indexed by the starter character that point to lists of
    > combining character and result pairs (an association list in
    > Lisp/Scheme terminology). This should reduce the size requirements,
    > but now there a list we have to scan through which could increase
    > run-time access cost.
    > Are there any other implementation methods that have a small memory
    > footprint (~10-20kb) and quick access (~ 10-20 instructions)? Any
    > guidance in this regard would be appriciated.
    > Michael D. Adams

    This archive was generated by hypermail 2.1.5 : Thu Feb 12 2009 - 09:16:56 CST