RE: Unihan.txt and the four dictionary sorting algorithm

From: Mike Ayers (
Date: Tue Apr 20 2004 - 22:01:22 EDT

  • Next message: Tom Emerson: "RE: Unihan.txt and the four dictionary sorting algorithm"

    > From: []On
    > Behalf Of John Jenkins
    > Sent: Tuesday, April 20, 2004 6:40 PM

    > > The tab "character" is used in the file. Arguably, this
    > "character"
    > > should
    > > never appear in a plain text file, rather it should be
    > converted to an
    > > appropriate number of U+0020 characters by the application on save.
    > > Of course, this would make the file even bigger.
    > >
    > Tab-separated data files are quite common. (Indeed, I tend to get
    > annoyed with the main UCD file because it's
    > semicolon-separated.) I'm
    > not sure why you'd want a tab never to appear in a plain-text file.

            Different systems (and different applications, too) have different
    interpretations of where tab boundaries occur. The most common
    interpretations are modulo-8 and modulo-5, but I've seen modulo-4 as well.
    Viewing tabs on a system with a different interpretation of tab widths can
    be painful, which is why James proposed they not be used (also note his
    "arguably"). I code with modulo-3 tabs, and must convert to and from
    pure-space text to archive my code, which comes with its own set of


    This archive was generated by hypermail 2.1.5 : Tue Apr 20 2004 - 22:54:26 EDT