Re: Exemplifying apostrophes

From: Doug Ewell (
Date: Mon May 26 2008 - 13:39:16 CDT

  • Next message: Behnam: "Re: Exemplifying apostrophes"

    "Behnam" <behnam dot rassi at gmail dot com> wrote:

    > From what I understand, or more precisely from what I don't understand
    > (which would be most of it!) I think that your proposal for language
    > identifier is very sophisticated and takes part in encoding standard
    > scheme.

    It's not my proposal, just one I described to you. I think that's what
    you meant.

    > This is probably why it is facing resistance because it is entering in
    > a domain that most applications and their developers consider their
    > own.

    It has faced resistance because it is stateful -- that is, it applies to
    an entire, open-ended chunk of text rather than just a single character
    or a small, fixed run of characters -- and the Unicode Consortium
    considers stateful mechanisms to be out of scope for a character
    encoding standard. There are other mechanisms like this, such as the
    Interlinear Annotation characters at U+FFF9 through U+FFFB, and those
    are frowned upon as well.

    > What I am suggesting is much much simpler, to the point of banality.
    > Yet very efficient. But also much more acceptable to all parts. The
    > paragraph language identifier that I'm suggesting, doesn't do anything
    > in plain text at all. It just sits there as a part of the paragraph
    > encoding.

    The problem is that in Unicode, there is no concept of "the paragraph
    encoding." There is simply a stream of characters. How they are
    formatted and interpreted as paragraphs is dependent on a higher-level
    protocol or application.

    > Only when the paragraph is opened by an application, it can identify
    > the language of the paragraph to the application and trigger the
    > language support system of that application... or simply be ignored,
    > just as in plain text.

    You are correct that the tag characters can be ignored in certain plain
    text contexts where no advantage can be taken of them. That was one of
    the rationales behind burying them in Plane 14, and that strategy was
    explicitly mentioned when the characters were introduced.

    > The value of this identifier is just its existence, being there with
    > the paragraph, wherever it goes. So an email client knows that this is
    > for example a French paragraph. The word processor knows that it is a
    > French paragraph and a web-page knows that it is a French paragraph.
    > What do they do with this knowledge is totally up to them, with
    > regards to whatever support system they already have developed that
    > could use of this knowledge and whatever their customers ask them to
    > be developed.

    Preaching to the choir. As I said, go ahead and use them if you like,
    but be aware they are deprecated and there is probably nobody else using
    them. I thought they were a great idea, and even I don't use them any

    Doug Ewell  *  Arvada, Colorado, USA  *  RFC 4645  *  UTN #14  ˆ

    This archive was generated by hypermail 2.1.5 : Mon May 26 2008 - 13:41:42 CDT