re: New Internet-Draft draft-ietf-acap-langtag-00.txt

From: Martin J. Duerst (mduerst@ifi.unizh.ch)
Date: Thu Jun 26 1997 - 07:57:19 EDT


On Thu, 19 Jun 1997, Mark Crispin wrote:

> My reaction upon reading this document is "RFC 1815". It leaves me with the
> exact same feeling; the feeling that one gets upon buying something from a
> salesman who really didn't want to sell the product in the first place and
> wants to make it as unpleasant as possible.

I'm sorry about this, and grateful to Ken to have pointed out how this
can be improved. It's probably difficult to have both material discussing
the limitations of language tagging and methods for language tagging in
the same document, maybe this should be split into two documents.

> In the remainder, there is no former specification of the TELT and STLT
> proposals other than the C code that implements it.

Are there any specific points that are ambiguous and that you would like
to be clarified? Are there any specific description techniques that
should be used? (I know the formal syntax is missing, but is there
anything else that should be added?)

> >From my perspective, TELT is completely unacceptable to me and everyone else
> who wants to have language tags in plaintext. It's just another reiteration
> of the "use rich text" argument; TELT in effect is "HTML made real simple".
> TELT does not layer well; many other applications assign semantic meanings to
> "<" and ">" so quoting is required.
>
> TELT really is not plaintext at all.

I don't mind whether you call it plaintext or not. I know quoting
of "<" should probably be added. For ">", it is not needed.
What I wanted to show with TELT is that it is important to have
things in a readable format, for debugging, preparation of texts,
and so on. Without the possibility to use plain text editors,
HTML never would have flown the way it did.

Also, what I wanted to show is that it is not difficult to make
a distinction between protocol representation and internal
representation. There are processing needs for which MLSF is
definitely better sutied than any of my proposals, but it should
be upon each application to decide what its exact processing
needs are and what representation it wants to use. I showed that
converting from TELT/STLT to MLSF and back is really easy, so
that the requirement to have exactly the same representation
internally and externally is not very important. Also, it is
interesting to note that the most complicated part of the code
for the conversion is on the side of MLSF. TELT or STLT are
very easy to understand by a general programmer; MLSF or the
plane 14 proposal need a lot of understanding about what goes on.

> STLT is less objectionable than TELT, but still has problems. STLT is
> basically a "just give us one codepoint as a mark and we'll go from there"
> proposal. Unlike plane 14, this requires explicit knowledge of STLT's
> semantics, since the characters used in the tag are ordinary graphic
> characters. In plane 14, the tag characters can be specified as zero-width.
>
> STLT is also not very general. Any future tagging needs would require a new
> mark.

No. The idea is to have only one tag. Various protocols could use it
for various purposes, or ideally the various tags could be designed
to be exclusive. As defined, we would have:

        First character meaning
        after start-tag

        A-Z Language tag
        % Language alternative

Later, we could have e.g.

        & introducing source set separator
        ^ introducing locale tag
        ) introducing "cultural" tag
        @ introducing font tag

For something really thought important, one could also use "a-z",
or one could add another special character for language tags
to make them more equal to others. General software would
just overread everything between the start-tag character and
the "#" at the end.

> The use of "#" as a terminator may also cause problems

What problems do you see with "#" as a terminator?

> Leaving aside the technical inferiorities of this proposal compared to plane
> 14, this really doesn't save UTC anything over plane 14. Rather than
> creating a general purpose tagging mechanism, it sets a precedent for
> acquiring codepoints for special purpose marks.

No, as explained, only one codepoint is needed. Distinction between
various tag mechanisms is done after the general start-tag character,
where needed. In this sense, I think, it is rather similar to plane 14,
where you also have a general convention to say "what is a tag" and
then more conventions to distinguish various tags.

Regards, Martin.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT