From: Richard Cook (rscook@socrates.berkeley.edu)
Date: Tue Dec 07 2004 - 04:39:17 CST
On Dec 5, 2004, at 07:02 PM, Doug Ewell wrote:
> A word-based encoding for English could automatically assume spaces
> where they are appropriate. The sentence:
>
> "What means this, my lord?"
>
> would have seven encodable elements: the five words, the comma, and the
> question mark. Spaces would be automatically filled in as needed, not
> explicitly encoded. This implies "standard" English punctuation and
> spacing conventions, however that is defined. For French conventions,
> there would probably be a space before the question mark as well.
Well, why stop with words, my lord? Why not just encode all sentences,
paragraphs, pages, chapters, books, libraries, or your higher level
unit of choice, for that matter.
For example, in my library, the single code point U+100000 happens to
contain hi-res color images of all pages of an edition of Moby Dick
that I happen to like very much.
Or consider an image-based encoding, which joins standard text to
image. Images of the text to be encoded are indexed using some private
indexing scheme, and the index elements are then mapped to a standard
encoding. The relatively lo-res standard encoding (which must
necessarily collapse some distinctions that are less generally
important), is augmented with hi-res indexing of images of the specific
text to be digitized.
Whether you choose to associate a single glyph with your private-use
code point, or an entire book, why, that's up to you (and your
software).
This archive was generated by hypermail 2.1.5 : Tue Dec 07 2004 - 04:40:58 CST