Re: Writing a proposal for an unusual script: SignWriting

From: vanisaac@boil.afraid.org
Date: Sat Jun 12 2010 - 15:19:24 CDT

  • Next message: Stephen Slevinski: "Re: Writing a proposal for an unusual script: SignWriting"

    From: Steve Slevinski <slevin@signpuddle.net>

    > Thanks everyone for all the great comments.
    >
    > vanisaac@boil.afraid.org wrote:
    >> I can say that the character encoding model is more powerful than you can probably see right now. It certainly was for me.
    >>
    >>
    > The Duployan Shorthands is very interesting. I'll need to review it in
    > more depth.

    The shorthand formatting system is just the barest of starts in comparison to the needs of Sutton, but I think the idea of an inherent position with place modifiers will be the most resilient.

    >> Even though script elements can be written on the canvas anywhere,
    >> there are a limited number of /relative/ positions in which given
    >> elements can appear.
    > Good writing is based on aesthetics. It is an iconic writing system.

    and "good" writing isn't necessarily going to simply appear from a character encoding model. Some aspects of "looking right" may need to be handled by exception (a special alignment for a fist to the cheek vs. a palm, etc.) That is explicitly NOT supposed to be handled with the encoding model, but rather by the font designer. The Unicode Design Principle of "characters, not glyphs" is the essence of this distinction.

    > Logically, there should be a limited number of relative positions, but
    > there are many exceptions.
    >
    > Just considering how to position a hand and a head, there are numerous
    > positions inside and outside the head. The palm could be on the chin,
    > nose, forehead, right cheek, left cheek, right ear, left ear. And those
    > are just the obvious positions inside the head. Sometimes it takes a
    > fine adjustment to make the writing feel right.

    And those fine adjustments should NEVER be accounted for by a character encoding model. That is the business of the font designer - making things look and feel right. An encoding model needs to define only semantic information; what can a hand symbol touch? It's kind of counter-intuitive, but the actual visual representation is irrelevant - it's the ABSTRACT representation that is important. Instead of being worried about whether something "feels" right, ask youself whether your encoding will convey the information, and if it's the ONLY way to convey the information. If it fails either test, it needs to be refined.

    > For movement arrows, it may appear obvious where symbols before and
    > after attach, but it does not always work.

    No. For almost every rule, you will need to have position modifiers which will allow for the exceptions. For Sutton, I would argue that any model which does /not/ provide for the exceptions to a placement rule is inadequate.

    > There is also the problem of divergent paths. Consider 4 handshapes and
    > 2 movement arrows. Both hands start in the middle and then they both
    > move outward. In this situation, there is no symbol in the middle. If
    > we start with the right hand, attach it's movement, and then it's
    > finishing hand, we end up on the right side of the sign. How do we
    > return to center to start the second path for the left hand?

    Quite frankly, if you have a base right hand character (with modifiers), plus an arrow and another right hand character, and then you have a left hand character, it should just naturally fall into position for a left hand character - on the left (viewer's right) side.

    >> I also take exception to the contention that there are an infinite
    >> number of signs that can be created: it may be many millions, but
    >> there is most definitely a finite number of complete signs that can be
    >> defined.
    > True. I initially wrote potentially infinite, but reworded the
    > sentence. However, we're dealing with all of the world's sign
    > languages. These languages are still changing at a greater rate than
    > that of spoken languages. There are different dialects between
    > communities and unique signs used within families and between friends.
    > The vocabularies have not been enumerated and are still evolving. There
    > are no accepted spellings, and pantomime can be a large part of story
    > telling. Instead of infinite, I should have said innumerable.

    Fair enough. One question that I haven't even considered, and which I think may occupy an inordinate amount of your time is whether the vagueries of individual usage are, should be, or can be encoded with a character model. This is a question that delves deeply into usage, and I don't think anyone outside of the community can begin to answer it.

    >> It may be handy to just define placement with coordinates, but a proper script encoding will only define those elements that are contrastive and salient.
    > Do you have any reading suggestions for understanding your definition of
    > a proper script encoding?

    The Unicode Standard, v 5.0. Quite frankly, everything before the code charts, but especially section 1.2 (Design Goals), and Chapter 2 (General Structure). You can also search for all of this content on the unicode.org website, but I
    really think the book is especially helpful (less than $30, used, at amazon). There is going to be a huge amount of work differentiating characters from glyphs, with a lot of research and examples.

    > My qualifications would be that it accurately encodes the script as it
    > is used. It is easy to search, sort, and parse. I'd consider
    > coordinates to be salient, especially when they refer to the position
    > relative to the center of the canvas.

    What is the salience of a hand being 5 cooordinate points low of center vs. 10? If two people encoded a particular sign, would they necessarily use the same coordinates? How do I search for a given sign when the coordinates can be different? Does the /search/ algorithm need to know that there is about 10 coordinate points variation for a hand touching the side of the torso, but only 2 when it's touching a part of the face (lips/chin/cheek)? If your coordinate system has to incorporate "fine" adjustments to look right, how can I do a search of the dictionary to find all of the signs where the eyes are closed and the hand or fingers brush the chin? If you can define any element as having any coordinate, how do you normalize text when someone defines the left hand before the right? These are basic text tasks that the coordinate system makes insanely complex.

    >> For signwriting, there will undoubtedly be numerous relative placements for hand elements (over the head, beside the face, chest height, wide, forward, waist height, opposite side, etc), but it would be truly sad if we were forced by lack of imagination to settle for a coordinate system.
    >>
    >>
    > I'm trying to understand how any script encoding would not devolved into
    > a convoluted coordinate based system of degrees and distance.

    Element: [HandR], [HandL]. Position Modifiers: [HeadTop], [HeadCheek], [HeadOppositeCheek], [HeadChin], [HeadNose], [HeadTemple], [BodyHigh], [BodyLow], [BodyCenterHigh], [BodyCenterLow], [BodyCenterMid], [BodyWideHigh],[BodyWideLow], [BodyWideMid], Default: [BodyMid]. Each of these could/would/should be associated with coordinates in your XML system, no problem. There are probably even many different coordinates that could be associated with a particular position, and the coordinates would be further refined by the particular hand type. A character encoding does not define them by their precise position, it specifies a /meaning/. The font designer has to figure out exactly how best to graphically represent that. All you need to do is provide a "representative glyph" to show how that can be done. It may turn out that there will be users in the future who decide on slightly different alignments of elements to symbolize a particular sign. By having /abstract/, rather than /concrete/ positions, it makes this me
    rely a font issue, rather than an encoding issue. Imagine the horror of trying to figure out whether coordinate differences are because of the Sutton encoding interpretation, or an actual difference in the sign made?

    > But, I'll suspend my disbelief and assume that an alternate encoding is
    > possible. What would be gained? Searching? Sorting? Parsing?

    Searching; sorting; Unicode Design Goal: Unambiguous; Unicode Design Principle: Characters, not glyphs; Unicode Design Principle: plain text;

    > For example, I recently change my model to encode using characters based
    > on 652 BaseSymbols, rather than characters based on 37,812 symbols.
    > Instead of symbols being accessed with a single character, it requires
    > three characters: one BaseSymbol character and two modifiers. I did
    > this change because of searching. I found that in my code, I had to
    > pre-process a string and create a separate index string.
    >
    > I was reading about searching in Unicode and someone wrote the example
    > of searching for all "e"s, both accented and unaccented. If the
    > accented "e"s are stored as two characters, one "e" and one accent, then
    > it is a simple search. It's now the same for my encoding. If you want
    > to search for BaseSymbol 154, you can easily search for BaseSymbol 154
    > without having to pre-process the string. It was a good change.

    I think so too. I just think there is further room to go.

    > I consider my current solution of Binary SignWriting as powerful and
    > elegant. It can write anything from the 30 plus years of SignWriting
    > history. Any of the writing in the last 6 years can be automatically
    > converted to this new standard. This is a working solution, not a theory.

    And that's great. But it is, fundamentally, a graphic model. A character model not only needs to be able to encode everything salient, it also needs to NOT encode things that are stylistic preference. It should encode only that which is salient.

    > Many have taken issue with the coordinate based writing, but other than
    > personal ascetics of elegance and beauty, I do not see the disadvantage.

    If two people would encode a particular sign in two different ways, it fails as a character encoding model. It is not searchable. It does not allow for stylistic variation by markup. In short, a coordinate system is both too precise, and not meaningful enough.

    > I'm hoping to learn more as I write the proposal. I appreciate the
    > feedback.
    >
    > Thanks,
    > -Steve

    No problem. Just watching the World Cup and typing away...

    Van



    This archive was generated by hypermail 2.1.5 : Sat Jun 12 2010 - 15:26:23 CDT