Re: RFC 1766 language tags

From: Martin J. Duerst (
Date: Fri Jun 13 1997 - 15:14:43 EDT

On Fri, 13 Jun 1997, Mark Crispin wrote:

> I *think*, but I'm not 100% sure, that it is alright to declare that
> untagged text is implicitly tagged with the "unknown language",

This is 100% okay. This happens day-in-day-out. The computer has
no idea about the language of at least 99% of the texts in current
computer systems.

> and allow
> higher levels to establish a default (such as English in client/server
> protocols).

In the case of IMAP or ACAP, or whatever, would this mean:

- The server should send Enlish error messages by default
        (as we are discussing it in another thread)?
- The server should send English content (e.g. the English
        part of an alternative language string if such things exist)
        by default?
- Messages sent by the server should implicitly be tagged English
        if the negotiated language for error messages is English?
- Messages sent by the server should implicitly be tagged English
        (and explicitly tagged for other languages) independent
        of the result of the negotiation?
- Actual data should be assumed to be tagged English in the absence
        of tags (i.e. all untagged Emails or untagged ACAP attribute
        values would be English; all other languages would have to
        be tagged, or be wrong)?

> There probably should be some monument in the RFC1766
> successor document that states something to the effect that even though
> the 1766bis default is "unknown", high level protocols can establish a
> different default.

As my answer to some of the above questions is no, I think that such
a wording should be done extremely carefully.

> What I mean by all of this is that it should be clear that, when inserting
> untagged text into tagged text, although normally you'd tag the untagged
> text with "unknown", if it came from a higher-level protocol with a
> default language, that default should be used.

Well, whatever you use to establish the language of a text, once you
have established it, and you have a way of keeping that information,
you should keep it appropriately. The question is whether it is
useful to establish a default in a given protocol, not how that
default can be used, once it is established (and kept!).

> The other possibility is to declare English to be the default, since it is
> the only likely default. But I can see situations where "unknown" is
> better, and the issues of English being the default really are irrelevant
> at the 1766 level so going with "unknown" is probably better.

Establishing something like: "Whenever you see untagged text, assume
it's English" will be totally counterproductive. Because the majority
of texts in the world are something else than Englsh, it will be wrong
more than right. It is Socratic wisdom to say "I know that I don't know.".

Regards, Martin.

