RE: Unihan.txt and the four dictionary sorting algorithm

From: Tom Emerson (
Date: Tue Apr 20 2004 - 22:20:48 EDT

  • Next message: Philippe Verdy: "Re: Unihan.txt and the four dictionary sorting algorithm"

    Unihan is designed, first and foremost, to be a _data_ file for
    consumption by software. It doesn't matter at all how many spaces are
    used for the tabs. The use of tabs make it trivial to write scfipts to
    parse the file with grep, awk, Perl, Python.

    With regards to the Pinyin orthography: tone numbers make it easier to
    process the readings into initial, final and tone. Replacing the
    numbers with diacritics makes it more difficult to do this.

    Tom Emerson                                          Basis Technology Corp.
    Software Architect                       
      "Beware the lollipop of mediocrity: lick it once and you suck forever"

    This archive was generated by hypermail 2.1.5 : Tue Apr 20 2004 - 23:02:03 EDT