Re: Exemplifying apostrophes

From: Behnam (
Date: Mon May 26 2008 - 13:06:25 CDT

  • Next message: Dominikus Scherkl: "Re: Exemplifying apostrophes"

    Thank you Mr Ewell for references and info.

     From what I understand, or more precisely from what I don't
    understand (which would be most of it!) I think that your proposal
    for language identifier is very sophisticated and takes part in
    encoding standard scheme.
    This is probably why it is facing resistance because it is entering
    in a domain that most applications and their developers consider
    their own.

    What I am suggesting is much much simpler, to the point of banality.
    Yet very efficient. But also much more acceptable to all parts.
    The paragraph language identifier that I'm suggesting, doesn't do
    anything in plain text at all. It just sits there as a part of the
    paragraph encoding.
    Only when the paragraph is opened by an application, it can identify
    the language of the paragraph to the application and trigger the
    language support system of that application... or simply be ignored,
    just as in plain text.

    The value of this identifier is just its existence, being there with
    the paragraph, wherever it goes. So an email client knows that this
    is for example a French paragraph. The word processor knows that it
    is a French paragraph and a web-page knows that it is a French
    paragraph. What do they do with this knowledge is totally up to them,
    with regards to whatever support system they already have developed
    that could use of this knowledge and whatever their customers ask
    them to be developed.


    On 23-May-08, at 8:55 PM, Doug Ewell wrote:

    > "Behnam" <behnam dot rassi at gmail dot com> wrote:
    >> I wonder why Unicode didn't put language identifier to the paragraph.
    > Unicode 3.1 introduced a set of tag characters in the range U+E0000
    > through U+E007F ("Plane 14"), primarily to allow language tags to
    > be embedded in plain text, as a defense against an external
    > proposal to use invalid UTF-8 sequences for that purpose. However,
    > the Plane 14 tag characters were "strongly discouraged" by Unicode
    > almost immediately after being encoded, and have since been
    > formally deprecated. For more information, see sections 5.10 and
    > 16.9 of TUS 5.0.

    This archive was generated by hypermail 2.1.5 : Mon May 26 2008 - 13:10:08 CDT