Re: Request for Review: draft-slevinski-signwriting-text

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Fri, 30 Nov 2012 07:22:38 +0100

From basic analysis, the posted IETF draft has nothing to do with Unicode
encoding. However it remains related to it, because it presents the
relations between the characters that are to be encoded in the UCS, and
their additional properties within the focus of their classification and
behavior within the script. Additionally it presents the various algorithms
related to the conversions between the UCS characters and the ISWA symbols
sets.
However, this IETF draft still uses a PUA encoding (nothing wrong with it)
only as a temporary tool for testing the integration in the ISWA-compatible
frameworks of the characters to encode in the UCS (using a simple offset
from one of the possible internal representations to an arbitrarily defined
PUA block).

But there are major differences: this IETF draft is doucmenting an encoding
for things that are still not considered to be encoded in the UCS:

- the "term prefixes" for example, are encoded in the draft as
B+100...B+106, and present some segmental information controls, currently
not part of the UCS proposal : it is the logical structure of the
composition area, with the three lanes, and grammatical structure that
separates gesture of hands from the facing expression and some other visual
modifiers (like speed or stress), or the spacial structure (partly covered
by the "fill/rotation" modifiers) and the time strucure of the performance
of each sign

- there are more base symbols (more previsely described in categories and
groups in the IETF draft) than in the UCS, and they are more or less
related to a glyph approach (even if it describes the the 6 "fill"
modifiers. and the 16 "rotation/mirror" modifier) to reduce a lot the
number of necessary characters.

- However the concept of a "implicit" fill modifier and an implicit
"rotation/mirror" modifier is not explicited : so it maps them so that all
symbols (except punctuations) will be encoded as triples (base, fill,
rotation). In the UCS proposal makes the fill and rotation modifiers
optional (encoding them as combining characters). Both drafts however agree
that there are impossible combinations to realize (but for Unicode itself,
it's impossible to forbid these combinations, an external properties file
is provided to explicit the "valid" combinations in the scope of the script
itself.

- The IETF draft explicitly encodes a lot of "base characters" for encoding
one of the possible approaches for the layout for morphemes : the
"free-form" cartesian coordinates system. The IETF draft admists that other
representations of the layout have not been extensively studied. The most
important number of "characters" in the B+nnn encoding are for representing
spacial coordinates (expressed using absolute X,Y cordinates measured from
the center in a 0..+/-250 range : only these are counting from the more
than 500 additional "characters" mapped in the Unicode PUA space by this
draft.

Beside this, Both documents agree on the structure of the script.

But the IETF draft details things that are superficially covered in the
formal UCS proposal draft (notably the algorithmic part that explains how
to convert between the proposed encoding in the UCS block and the other
representations. Nothing is saif in the UCS proposal about the layout
(which remains left to an external upper-layer protocol (the IETF draft
explores some possibilities and existing implemenations using various
forms).

One good question is why the UCS proposal chose to forget, for now, the
segmental prefixes (used within SW signs, but not in the SW punctations),
which are by evidence semantic, and not purely related to layout.

Given that IETF draft admits that its current implementation (based on
simple, unchecked, the free-form cartesian form with absolute positioning)
works well, it also admits that it also creates problems for interpersonal
variations (in other words, they contain some style information, which are
personal intra-personal variations or preferences), and that other possible
choices have still not been explored (the IETF draft authors admits that
this could limit the interpersonal variation, but may be it's not possible
to regulate it, because these semantic classification rules may be
language-depandant (between ASL, FSL, and so on...).

So even if there does exist working layout mechanisms, they are still not
ready for standardization. Instead, most of the encoding efforts seem to
have been in defining a precise set of symbol components (refined by
improving fonts, and by continuing to develop several styles, notably for
faster handwriting).

Input methods are also largely unexplored : for now the system used
presents the full set needed for the script, plus several methods for
composing the layout of signs, with several software experimentations, like
in the author's website signpuddle.net, or like the new experimentation in
Wikimedia Labs for the ASL language (to test the integration with HTML, CSS
and authoring tools), or some other browser addons, all of which which may
not be practival for easily authoring texts in actual sign languages (too
many options offered, complex UI...).

To help composing the layout, the simple absolute positioning using the
cartesian composition box amy be too difficult to use in practice, and
other language-dependant shortcuts will be needed (this is equivalent to
the development of an orthography in other written languages).

So I don't think that these too documents are contradicting each other :
their focus is very different. The IETF draft presents a work in progress
and nothing else. It is informational by nature (the normative parts of it
will have to be extracted from it, and separated between each other for
each upper-layer protocol partially developed in this IETF draft).

The IETF draft is not a character encoding proposal by itself, it presents
only several other abstractions for things not covered by the UCS proposal
(because they are not in direct scope of what will be encoded), but still
needed to demonstrate that the UCS proposal is working. I see it as an
annex document to the UCS proposal, or proof of concepts. Michael Everson
who helped Valery Sutton to formulate the proposal may reply here : both
documents are sharing identical external references (and nothing indicates
that someone disagrees with what the script expert Valery Sutton and its
foundation agrees in this UCS proposal).

2012/11/30 Steve Slevinski <slevin_at_signpuddle.net>

>
> On 11/29/12 3:17 PM, Doug Ewell wrote:
>
>> Steve Slevinski <slevin at signpuddle dot net> wrote:
>>
>> I have documented a text encoding for an unusual script that is used
>>> by an international community. The script use a 2-dimensional plain
>>> text encoding with ASCII and Unicode PUA.
>>>
>>> draft-slevinski-signwriting-**text
>>> http://datatracker.ietf.org/**doc/draft-slevinski-**signwriting-text/<http://datatracker.ietf.org/doc/draft-slevinski-signwriting-text/>
>>> http://signpuddle.net/wiki/**index.php/I-D_draft-slevinski-**
>>> signwriting-text<http://signpuddle.net/wiki/index.php/I-D_draft-slevinski-signwriting-text>
>>>
>>> [...]
>>>
>>> I chose to release through the IETF because I am not using Unicode
>>> design principles or algorithms. I am using Unicode code points on
>>> plane 15, but only for temporary font characters.
>>>
>> What, in a nutshell, is the relationship between this document and the
>> WG2 proposal at http://std.dkuug.dk/jtc1/sc2/**wg2/docs/n4342.pdf<http://std.dkuug.dk/jtc1/sc2/wg2/docs/n4342.pdf>, which
>> does appear to adhere to Unicode/10646 design principles?
>>
>
> The original SignWriting Unicode proposal (N4015) is based on my previous
> I-D (draft-slevinski-iswa-2010). N4090 split from the user community to
> focus on Unicode design principles. N4342 continues the Unicode design
> exploration.
>
> Regards,
> -Steve
>
>
Received on Fri Nov 30 2012 - 00:22:38 CST

This archive was generated by hypermail 2.2.0 : Fri Nov 30 2012 - 00:29:00 CST