Re: Transcribing old documents into Unicode compatible document files.

From: Doug Ewell (
Date: Sat May 03 2003 - 12:39:27 EDT

  • Next message: Addison Phillips [wM]: "Re: Implementing on UTF8: toUpper(), toFold(), normalisation, collation, etc"

    William Overington <> wrote:

    > ... I am wondering about whether it would be a good idea, or whether
    > it has already been done by anyone, to design a number of constructed
    > glyphs which are for the purpose of having no established meaning yet
    > are available as temporary characters with which to produce a Unicode
    > compatible computer document file.

    A fundamental design principle is that "Unicode encodes characters, not
    glyphs." That is, the things encoded in Unicode are characters with
    some kind of semantic value. Encoding replacement glyphs that expressly
    have "no established meaning" seems to run counter to this principle.

    This is not a completely black-and-white area. Who can tell me, for
    example, what the deep underlying semantic meaning of U+25A6 SQUARE WITH
    ORTHOGONAL CROSSHATCH FILL is? And specifically there is U+3013 GETA
    MARK, which is annotated as a "substitute for glyph not in font." How
    close is this to what William has in mind? But the general philosophy
    in Unicode is to encode characters, not glyphs. These are glyphs.

    > These would initially be encoded in the Private Use Area but perhaps
    > it might be a possibility for them to be promoted to regular Unicode
    > at a later date, as Private Use Meaning characters...

    Careful... don't use that word "promoted"... we've talked about this

    > ... so that the general shape of each glyph is formally defined and
    > the character has a formal name, yet it has no fixed meaning and is
    > intended for use on a temporary basis to represent an unknown
    > character, yet to represent an unknown character in a data-recoverable
    > manner. For example, maybe a block of sixteen such characters could
    > be defined. I am thinking of the glyphs having a resemblance in
    > general terms to Latin alphabet characters, yet being abstract as to
    > meaning.

    Inventing new symbols that resemble Latin letters is a path taken by
    missionaries and others who have extended Latin for new orthographies,
    as well as the path taken by Sequoyah in inventing the Cherokee
    syllabary. (Sequoyah had the "clean-room" advantage, though: he had
    seen the Latin script but could not read it, so for him the letters
    truly were abstract symbols that could be extended without inherent

    Usually it's hard to avoid bringing along some semblance of meaning. A
    symbol that looks like a rotated or inverted or distorted A, or an A
    with extra or missing strokes, will look to the reader like a modified A
    and not a completely different letter.

    I'd say your best bet is to go with the circled numbers.

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Sat May 03 2003 - 13:29:49 EDT