metalanguage (was RE: Why is Unicode inconsistant?)

From: Reynolds, Gregg (greynolds@datalogics.com)
Date: Mon Oct 04 1999 - 09:16:26 EDT


> -----Original Message-----
> From: Michael Everson [mailto:everson@indigo.ie]
> Sent: Monday, October 04, 1999 6:44 AM
> To: Unicode List
> Subject: Re: Why is Unicode inconsistant?
>
> ...
> >If you look att letter: 0xD8 it cannot be decomposed,
>
> (that's LATIN CAPITAL LETTER O WITH SLASH)
>

One thing I and no doubt many others would find very useful is a standard
short name for the repertoire. Also a standardized abstract syntax
notation. This would be quite useful especially in cases where writers use
ascii as a metalanguage to talk about ascii, as has occurred frequently in
the terminal discussion. A good example is one writer's reference to 'the
character n~'; with short names and an abstract syntax, he could have
written something like 'the character {n,+~}' or {n,~} or {n~}, using ascii
to denote ascii, comma to delimit codepoints, '+' to mean 'combining', and
adjacency to mean 'with'. And curly braces to delimit a unicode abstract
syntactic phrase.

For example, 0xD8 could be {O/}, and a 'word' using it might be written
'{c,O/,t}'.

One could also combine the number and short name: {c,0xD8:O/,t} or the
like.

"Above" could be written '/ /', below '\ \'. The motivation for this is
from the Z language, which uses 0x2197:NE-arrow paired with 0x2199:SW-arrow
to denote superscripted expressions, and 0x2198:SE-arrow and 0x2196:NW-arrow
for subscripts. Since we don't have arrows in ascii, we can use the slashes
(Z uses /^ for 0x2197 and v/ for 0x2199.) So for example, everybody's fave
composed letter would be written {a/ring/} = {a,+/ring/} or {0xE5:a/ring/}
= {0x61:a,0x030A:+/ring/}. Etc.

For letters outside of ascii, we could use a two or three character language
prefix: 0x0A95 = GUJ-ka (instead of GUJARATI LETTER KA).

To a certain extent this would give us the ability to concisely encode (for
discussion purposes) characters that are not in Unicode. For example,
{a,+\AR-hamza\} = latin a followed by combining arabic hamza below. It
would give us a kind of metalanguage for encoding the visual grammar of
letter forms.

More generally, '+' = affix; '/ /' = surfix; '\ \' = subfix; '- -' = infix,
and so on.

Has this been done before? Would anybody other than me find this useful?

-gregg



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT