> -----Original Message-----
> From: Michael Everson [mailto:everson@indigo.ie]
> Sent: Monday, October 04, 1999 6:44 AM
> To: Unicode List
> Subject: Re: Why is Unicode inconsistant?
>
> ...
> >If you look att letter: 0xD8 it cannot be decomposed,
>
> (that's LATIN CAPITAL LETTER O WITH SLASH)
>
One thing I and no doubt many others would find very useful is a standard
short name for the repertoire. Also a standardized abstract syntax
notation. This would be quite useful especially in cases where writers use
ascii as a metalanguage to talk about ascii, as has occurred frequently in
the terminal discussion. A good example is one writer's reference to 'the
character n~'; with short names and an abstract syntax, he could have
written something like 'the character {n,+~}' or {n,~} or {n~}, using ascii
to denote ascii, comma to delimit codepoints, '+' to mean 'combining', and
adjacency to mean 'with'. And curly braces to delimit a unicode abstract
syntactic phrase.
For example, 0xD8 could be {O/}, and a 'word' using it might be written
'{c,O/,t}'.
One could also combine the number and short name: {c,0xD8:O/,t} or the
like.
"Above" could be written '/ /', below '\ \'. The motivation for this is
from the Z language, which uses 0x2197:NE-arrow paired with 0x2199:SW-arrow
to denote superscripted expressions, and 0x2198:SE-arrow and 0x2196:NW-arrow
for subscripts. Since we don't have arrows in ascii, we can use the slashes
(Z uses /^ for 0x2197 and v/ for 0x2199.) So for example, everybody's fave
composed letter would be written {a/ring/} = {a,+/ring/} or {0xE5:a/ring/}
= {0x61:a,0x030A:+/ring/}. Etc.
For letters outside of ascii, we could use a two or three character language
prefix: 0x0A95 = GUJ-ka (instead of GUJARATI LETTER KA).
To a certain extent this would give us the ability to concisely encode (for
discussion purposes) characters that are not in Unicode. For example,
{a,+\AR-hamza\} = latin a followed by combining arabic hamza below. It
would give us a kind of metalanguage for encoding the visual grammar of
letter forms.
More generally, '+' = affix; '/ /' = surfix; '\ \' = subfix; '- -' = infix,
and so on.
Has this been done before? Would anybody other than me find this useful?
-gregg
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT