Re: Writing a proposal for an unusual script: SignWriting

From: Asmus Freytag (
Date: Sun Jun 13 2010 - 22:30:17 CDT

  • Next message: Joó Ádám: "Re: Writing a proposal for an unusual script: SignWriting"


    When you design your encoding proposal, please bear in mind that even
    for a language as well supported as English, it is generally not
    possible to fully represent the semantic content (let alone the
    appearance) with plain text alone.

    Yes, 99% or so, of the semantic content can be encoded in plain text
    alone, but some texts in English require the use of italics for
    disambiguation (removing the emphasis will allow more than one choice of
    how to read the text).

    If you move one level up, to HTML, say, you can capture all these
    documents, but also many others where styled text has a weaker semantic
    role (headings, generic emphasis, etc.).

    With CSS as a next level up you can express the author's choice of
    appearance of the text for 99% of all documents, and the continuum
    doesn't end there.

    Your discussion of sign writing needs to encompass a role for higher
    level protocols like HTML or CSS (or their effective equivalents for
    other types of writing, such as MathML). Not everything needs to be
    carried at the plain text level, but everything needs to be expressible
    at a suitable level.

    Sometimes, a higher level protocol requires the availability of
    "building blocks" that may not make sense in the context of a plain text
    "stream", but that together with the higher level protocol allow an
    efficient representation of the notation. The mathematical alphabetics
    or the musical notation elements are encoded in Unicode for such use
    with higher level protocols.

     From the way you describe the requirements (faithfully representing the
    minutest details of the authors choice of placements, etc.) and your
    claim that the plain text level should not / does not encode semantic
    contents, I get the impression that you have not fully thought through
    what information should be represented at what level of the text

    The name "Character Glyph Model" hides the complexity, and the layers of
    real world text and data architectures into which Unicode must fit.

    If a plain text cannot hope to encode at least a basic representation of
    a notation (as in music, or for all but the most trivial mathematical
    notation) then the precedent has been to try to abstract the semantic
    contents so that it is available for data procession (searching,
    sorting, etc.) in the plain text layer, while the description of the
    visible text in these cases requires the use of a higher level protocol
    where notions of placement etc. can be expressed succinctly.

    Concretely: do you see the need for, existence of a SignWritingML? Do
    you think, existing HTML could correctly render SignWriting if that was
    presented as part of the plain text data (under your proposal)? What
    would the role be for CSS? What happens when a user agent selects a
    different font, because the one the author used is unavailable on the
    system used by the reader?

    In some of your answers you've given a few hints, but for someone like
    me who has no firsthand experience of signing and difficulties
    visualizing sign writing, you probable will want to be way more explicit
    and concrete in your description and examples, so that it becomes
    possible to evaluate whether your choices in the encoding model are the
    correct ones, or possibly the only ones, or whether, on the contrary,
    the represent an unnecessary departure from the way Unicode deals with
    non-linear notations.


    This archive was generated by hypermail 2.1.5 : Sun Jun 13 2010 - 22:33:35 CDT