Re: Unicode for words?

From: Richard Cook (rscook@socrates.berkeley.edu)
Date: Tue Dec 07 2004 - 04:39:17 CST

Next message: Philippe Verdy: "Re: Nicest UTF"

Previous message: Jony Rosenne: "RE: No Invisible Character - NBSP at the start of a word"
In reply to: Doug Ewell: "Re: Unicode for words?"
Next in thread: Doug Ewell: "Re: Unicode for words?"
Reply: Doug Ewell: "Re: Unicode for words?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Dec 5, 2004, at 07:02 PM, Doug Ewell wrote:

> A word-based encoding for English could automatically assume spaces
> where they are appropriate. The sentence:
>
> "What means this, my lord?"
>
> would have seven encodable elements: the five words, the comma, and the
> question mark. Spaces would be automatically filled in as needed, not
> explicitly encoded. This implies "standard" English punctuation and
> spacing conventions, however that is defined. For French conventions,
> there would probably be a space before the question mark as well.

Well, why stop with words, my lord? Why not just encode all sentences,
paragraphs, pages, chapters, books, libraries, or your higher level
unit of choice, for that matter.

For example, in my library, the single code point U+100000 happens to
contain hi-res color images of all pages of an edition of Moby Dick
that I happen to like very much.

Or consider an image-based encoding, which joins standard text to
image. Images of the text to be encoded are indexed using some private
indexing scheme, and the index elements are then mapped to a standard
encoding. The relatively lo-res standard encoding (which must
necessarily collapse some distinctions that are less generally
important), is augmented with hi-res indexing of images of the specific
text to be digitized.

Whether you choose to associate a single glyph with your private-use
code point, or an entire book, why, that's up to you (and your
software).

Next message: Philippe Verdy: "Re: Nicest UTF"
Previous message: Jony Rosenne: "RE: No Invisible Character - NBSP at the start of a word"
In reply to: Doug Ewell: "Re: Unicode for words?"
Next in thread: Doug Ewell: "Re: Unicode for words?"
Reply: Doug Ewell: "Re: Unicode for words?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Dec 07 2004 - 04:40:58 CST