Re: Unicode for words?

From: Richard Cook (
Date: Tue Dec 07 2004 - 04:39:17 CST

  • Next message: Philippe Verdy: "Re: Nicest UTF"

    On Dec 5, 2004, at 07:02 PM, Doug Ewell wrote:

    > A word-based encoding for English could automatically assume spaces
    > where they are appropriate. The sentence:
    > "What means this, my lord?"
    > would have seven encodable elements: the five words, the comma, and the
    > question mark. Spaces would be automatically filled in as needed, not
    > explicitly encoded. This implies "standard" English punctuation and
    > spacing conventions, however that is defined. For French conventions,
    > there would probably be a space before the question mark as well.

    Well, why stop with words, my lord? Why not just encode all sentences,
    paragraphs, pages, chapters, books, libraries, or your higher level
    unit of choice, for that matter.

    For example, in my library, the single code point U+100000 happens to
    contain hi-res color images of all pages of an edition of Moby Dick
    that I happen to like very much.

    Or consider an image-based encoding, which joins standard text to
    image. Images of the text to be encoded are indexed using some private
    indexing scheme, and the index elements are then mapped to a standard
    encoding. The relatively lo-res standard encoding (which must
    necessarily collapse some distinctions that are less generally
    important), is augmented with hi-res indexing of images of the specific
    text to be digitized.

    Whether you choose to associate a single glyph with your private-use
    code point, or an entire book, why, that's up to you (and your

    This archive was generated by hypermail 2.1.5 : Tue Dec 07 2004 - 04:40:58 CST