Request for Review: draft-slevinski-signwriting-text from Steve Slevinski on 2012-11-27 (Unicode Mail List Archive)

From: Steve Slevinski <slevin_at_signpuddle.net>
Date: Tue, 27 Nov 2012 12:22:43 -0600

I have documented a text encoding for an unusual script that is used by
an international community. The script use a 2-dimensional plain text
encoding with ASCII and Unicode PUA.

draft-slevinski-signwriting-text
http://datatracker.ietf.org/doc/draft-slevinski-signwriting-text/
http://signpuddle.net/wiki/index.php/I-D_draft-slevinski-signwriting-text

This document was submitted to the IETF on Nov 5th. A member of the
independent submission board suggested that the Unicode list would be a
good place to discuss character encoding.

Two years ago, this list discussed my previous Internet Draft
(draft-slevinski-iswa-2010). That I-D has expired and is currently
being withdrawn.

I have submitted a new I-D, draft-slevinski-signwriting-text, to the
IETF. This I-D incorporates the final searching addition that make
auto-complete possible. Using plain text ASCII and regular expressions,
I can search databases, filesystems, and text input with near
instantaneous results. I have productive working examples using up to
10 MB of data.

I chose to release through the IETF because I am not using Unicode
design principles or algorithms. I am using Unicode code points on
plane 15, but only for temporary font characters.

Here's the short version that pertains to Unicode.

You can see an example of these characters on the ASL Wikipedia Project.
http://ase.wikipedia.wmflabs.org/wiki/Main_page_2

You can see the images that pertain to these characters on the main page.
http://ase.wikipedia.wmflabs.org/

You can see the ASCII names that pertain to the logographic signs on the
view source page.
http://ase.wikipedia.wmflabs.org/srv/mediawiki/index.php?title=Main_Page&action=edit

The plane 15 Unicode code points represent temporary font characters and
should never been seen by the end user. Creating the text from the
temporary font characters can be handled several ways. 1) We have a
proof of concept 2-color PostScript Type 3 font for the symbol strings
individually, but not layout. Creation of the font is manual and
tedious, but possible. 2) We are targeting graphite for text layout
using a TrueType font. Initial stages of development. 3) A JavaScript
client that accesses an online server for logographic images. The
design is stable, the implementation is quickly falling into place. 4)
A server side solution where an entire column of vertical text is
returned to the user interface. Option 4 is the current solution used
for the ASL Wikipedia Project on Wikimedia Labs.

The clarity of using Unicode code points has an appeal, but the
logographic sign names stored in plain text ASCII are explicit, easy to
process and semi-human readable.

I have learned it takes longer to process the data with programming code
than it takes to recognize it with regular expressions. I have
optimized the processing with ASCII regular expressions. Obligatory
XKCD explains it all.
http://xkcd.com/208/

ASCII searching is 4 times faster than the equivalent Unicode. The
regular expressions for Unicode (UTF-32) are easier to write than the
ASCII, but the UTF-16 (used in JavaScript) requires the same structured
breakdown as ASCII in order to search a range.

Targeting cellphones, tablets, desktops and more, fast ASCII processing
will always be an option.

For draft-slevinski-signwriting-text, I still need to flesh out some of
the sections and transfer content from the "Modern SignWriting" theory
and example document.
http://signpuddle.net/wiki/index.php/Main_Page#Modern_SignWriting

The plain text encoding has been in a final state since January 12th, 2012.

The documentation and implementations are nearing completion.
http://signpuddle.com

I'd appreciate any reviews or comments.

Regards,
-Steve
Received on Tue Nov 27 2012 - 12:24:24 CST

This archive was generated by hypermail 2.2.0 : Tue Nov 27 2012 - 12:24:25 CST