Re: Generic Tagging: A Modest Proposal

From: Kent Karlsson (
Date: Wed Jul 30 1997 - 13:19:03 EDT

Kenneth Whistler wrote (on the current "plane 14" proposal):

> 1. For simplicity and generality, an entire set of
> 95 ASCII metacharacters are provided, instead of just
> A-Z, 0-9, hyphus and space. This retains the
> transparency of characters, as in your proposal, but
> allows us to set up a scheme that will also be easy
> to debug or to redisplay as ASCII values for tags.

This message is NOT about whether language tagging at
various "levels" is a good idea or not. This message is
only concerned with the mechanism proposed in the "plane
14" proposal (which I haven't seen the details of...).

For simplicitly and generality I strongly dislike the
"plane 14" proposal. The reason is that it introduces
yet another copy of 7-bit ASCII. I find that backwards
striving, lacking foresight. Why introduce a 7-bit ASCII
idea just when we are getting out of that limitation,
and even within the character coding that is getting us
out of the ASCII limitations. We already have (deprecated)
duplicates of ASCII in Unicode, and we do not need yet
another one.

For simplicity and generality, let me instead suggest that
one adds a SINGE new code (in BMP): COMBINING META. It can
follow *any* character. E.g. +COMBINING META is a meta ,

Meta characters would not normally be displayed, unless
one has explicitly asked for meta-characters to be shown.
And meta characters can easily be shown without any new
glyphs, by just ignoring the combining meta or temorarily
reinterpreting it (as "show in pale blue" or whatever).

Meta characters can also be filtered away faily easily,
the one difficulty being that all surrounging combining
character codes must also be removed. This filter need
not know anything about the syntax of the meta-data,
which may be very simple (akin to MLSF) or very complex
(like TeX or SGML/XML/HTML), as long as it is meta-marked.

Then it is up to those specifying meta-data to decide
if COMBINING META must or should be used, which characters
it may be "legally" combined with, and the syntax of and
how to interpret any sequence of meta-characters.

E.g. (if applied to language tagging)
        s C.META v C.META s v e n s k t e x t
or (better)
        { C.META s C.META v C.META s v e n s k t e x t } C.META
may be a swedish language tagged text normally shown as
        svensk text

                /kent k

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT