From: Behnam (firstname.lastname@example.org)
Date: Mon May 26 2008 - 13:06:25 CDT
Thank you Mr Ewell for references and info.
From what I understand, or more precisely from what I don't
understand (which would be most of it!) I think that your proposal
for language identifier is very sophisticated and takes part in
encoding standard scheme.
This is probably why it is facing resistance because it is entering
in a domain that most applications and their developers consider
What I am suggesting is much much simpler, to the point of banality.
Yet very efficient. But also much more acceptable to all parts.
The paragraph language identifier that I'm suggesting, doesn't do
anything in plain text at all. It just sits there as a part of the
Only when the paragraph is opened by an application, it can identify
the language of the paragraph to the application and trigger the
language support system of that application... or simply be ignored,
just as in plain text.
The value of this identifier is just its existence, being there with
the paragraph, wherever it goes. So an email client knows that this
is for example a French paragraph. The word processor knows that it
is a French paragraph and a web-page knows that it is a French
paragraph. What do they do with this knowledge is totally up to them,
with regards to whatever support system they already have developed
that could use of this knowledge and whatever their customers ask
them to be developed.
On 23-May-08, at 8:55 PM, Doug Ewell wrote:
> "Behnam" <behnam dot rassi at gmail dot com> wrote:
>> I wonder why Unicode didn't put language identifier to the paragraph.
> Unicode 3.1 introduced a set of tag characters in the range U+E0000
> through U+E007F ("Plane 14"), primarily to allow language tags to
> be embedded in plain text, as a defense against an external
> proposal to use invalid UTF-8 sequences for that purpose. However,
> the Plane 14 tag characters were "strongly discouraged" by Unicode
> almost immediately after being encoded, and have since been
> formally deprecated. For more information, see sections 5.10 and
> 16.9 of TUS 5.0.
This archive was generated by hypermail 2.1.5 : Mon May 26 2008 - 13:10:08 CDT