Re: Unicode 3.1: incomplete tags considered harmless/useful

Date: Thu Feb 01 2001 - 02:25:00 EST

> The section "Dangers of Incomplete Support" in section 13.7 seems to me
> to be far too strongly worded; it should be weakened or removed
> altogether.
> In particular, there is no reason why sequences of tag characters
> not beginning with LANGUAGE TAG or CANCEL TAG cannot be used
> for various purposes by private agreement. However, as currently
> worded, language-tag-interpreting applications SHOULD remove them,
> contrary to the usual Unicode view of not-understood content
> ("leave it alone").

What would be the meaning or benefit of a sequence of tag characters *not*
beginning with a tag header in the range U+E0001 through U+E001F? We are
already promised that tag characters may only be used to form valid tags, so
I don't see any benefit in allowing their use for privately defined purposes.
 But clearly the restriction to U+E0001 LANGUAGE TAG and U+E007F CANCEL TAG
will be inappropriate as soon as another type of tag is defined.

> Nor is there any reason why a CANCEL TAG should be required to exist for
> every LANGUAGE TAG; in particular, a LANGUAGE TAG at the beginning
> of plain text that is meant to apply to the whole text (document,
> human-readable-string in protocols, etc.) should be unproblematic.
> As currently worded, editors SHOULD not permit such uses.

This makes sense, and in fact I was not aware of any such requirement.
Technical Report #7 specifically mentions the legitimate possibility of
language-tagged text going out of scope (i.e. hitting EOF) without a CANCEL

-Doug Ewell
 Fullerton, California

