Transcribing old documents into Unicode compatible document files.

From: William Overington (
Date: Sat May 03 2003 - 08:12:32 EDT

  • Next message: Theodore H. Smith: "Implementing on UTF8: toUpper(), toFold(), normalisation, collation, etc"

    I am currently designing and producing a font which I hope will have various
    uses, including transcribing old documents into Unicode compatible computer
    document files.

    The latest development version, Quest text 052, is now available on the web
    from the following web page. Quest text is the latest item on the web page
    at present and is near the end of the page. There are a number of new
    characters from regular Unicode and some more Private Use Area ligatures
    included in this version of the font compared with the previously available
    Quest text 048 font.

    I have in mind the possibility that someone could transcribe a written
    document or a printed document. I am wondering about what happens when
    someone transcribing a document finds a character which is both not in
    regular Unicode and not in any Private Use Area encoding which he or she may
    be using. Certainly, later, the person could sit down and think about the
    newly found character and decide to devise a new character or otherwise as
    he or she thinks fit, perhaps after discussion with other people. However,
    at the time, perhaps using a lap top portable computer in a library setting
    the person has to find a solution promptly.

    It is certainly possible to use digit characters, but that could lead to
    confusion, particularly if numbers are used in the original document, so I
    am wondering about whether it would be a good idea, or whether it has
    already been done by anyone, to design a number of constructed glyphs which
    are for the purpose of having no established meaning yet are available as
    temporary characters with which to produce a Unicode compatible computer
    document file. These would initially be encoded in the Private Use Area but
    perhaps it might be a possibility for them to be promoted to regular Unicode
    at a later date, as Private Use Meaning characters, so that the general
    shape of each glyph is formally defined and the character has a formal name,
    yet it has no fixed meaning and is intended for use on a temporary basis to
    represent an unknown character, yet to represent an unknown character in a
    data-recoverable manner. For example, maybe a block of sixteen such
    characters could be defined. I am thinking of the glyphs having a
    resemblance in general terms to Latin alphabet characters, yet being
    abstract as to meaning.

    I have had a look at the Conscript Unicode Registry at yet, as far as I can find
    at present, nothing seems to be of the required nature.

    Another possibility would be to include some of the regular Unicode
    geometric shapes in the font so that they could be used for the purpose as
    needed. Or perhaps the circled number characters are the answer.

    However, I thought that I would mention the topic here so as hopefully to
    find out how people transcribing documents into a computer system who find
    an unknown character proceed at present and how they would like to proceed
    in the future.

    William Overington

    3 May 2003

    This archive was generated by hypermail 2.1.5 : Sat May 03 2003 - 09:13:01 EDT