Re: New Corrigendum to The Unicode Standard

From: Behnam (
Date: Fri Aug 17 2007 - 05:38:55 CDT

  • Next message: Michael Everson: "Re: New Corrigendum to The Unicode Standard"

    What makes quotation marks any different than other mirroring
    characters in BiDi context? if it's only backward compatibility, it's
    not good enough for maintaining or removing mirroring property from
    one character or the other.
    Mirroring property was a bad idea from the start and as long as the
    position of cursor is well defined within a BiDi context, the writer
    knows which character he or she wants to use to open or close.
    This property should be removed from all characters such as
    parentheses and brackets etc. as well. Backward compatibility is just
    a matter of 'find and replace' if need be.
    Get rid of mirrors all together. It's a bad idea.

    Now Hebrew script might be interested in mirroring property for
    question mark, comma, and semicolon!


    On 17-Aug-07, at 1:34 AM, Asmus Freytag wrote:

    > On 8/16/2007 9:33 PM, Philippe Verdy wrote:
    >> This corrigendum is quite troubling; in a BiDi context, this means
    >> that
    >> initial quotation marks will not be mirrored.
    > The corrigendum restores the mirrored property of these quotation
    > marks to the status that they had before Unicode 5.0. (in other
    > words, it reverses a change made in 5.0, which was found to
    > adversely effect existing data). Rather than being troubling, the
    > corrigendum is a welcome correction to a problem that had been
    > introduced in Unicode 5.0.
    > The way quotation marks are used, neither automatic mirroring, nor
    > the absence of such mirroring is an ideal solution. Removing the
    > Bidi_Mirroring property, as the corrigendum does, preserves
    > compatibility with the status quo ante.
    > Implementations that claim conformance to Unicode 5.0 *with
    > Corrigendum 6 applied* can now be conformant *and* be backwardly
    > compatible with Unicode 4.1 and earlier, as well as forward
    > compatible with Unicode 5.1.
    > Fixing this compatibility problem is what matters to users and
    > implementers, and it was deemed serious enough, by UTC, that they
    > issued a corrigendum. Such a move would have to have had the
    > support of major vendors represented in the UTC, so I hope this
    > means that we can expect patches that will apply this corrigendum,
    > before too much data is created that assumes mirroring quotation
    > marks.
    > The time for theoretical or philosophical speculations on this
    > issue is well past; that means I'll skip the remainder of your
    > musings in their entirety
    > A./
    > PS: I will note that the values Pe and Pi for the general category
    > indeed are merely advisory - they do not constrain the usage of the
    > characters. The Unicode Standard clearly documents in the text that
    > the usage of quotation marks is governed by mutually incompatible
    > orthographic conventions. Because of that it is not possible to
    > know from the character code alone whether the mark is the opening
    > or the closing one of the pair. (In fact many languages don't even
    > use a pair, but use the same character for opening and closing).
    >> Anyway, the classification of quotation marks as initial or final is
    >> problematic because it is not consistant with actual uses in various
    >> languages that use reversed conventions, even in the same LTR
    >> directional
    >> context and only in the Latin script.
    >> So the distinction between "Pi" and "Pe" general categories should
    >> remain
    >> informative only for the "most common" usage. These punctuations
    >> should not
    >> be mirrored simply betcause they can't be accurately distinguished
    >> asinitial
    >> or final. So the exact form (orientation and baseline/exponent
    >> position) of
    >> these quotation marks should not be altered even in a BiDi
    >> context, and it's
    >> up to the writer to choose the proper one for each context.
    >> But how can you manage the correct reordering of these characters
    >> if yoy use
    >> them to surround for example a latin quotation within an Arabic
    >> text? The
    >> initial quotation will need to inherit the directional property
    >> from the
    >> previous Arabic text, and the final quotation will need to inherit
    >> the
    >> directional property of the previous Latin text, and there's no
    >> way to
    >> determine automatically that it should attach here to the Arabic
    >> text after
    >> it, simply because there's no way to determine if the quotations
    >> are initial
    >> or final.
    >> This is a difficult problem for which there's no clear indication
    >> about what
    >> can be done exactly on this case where quotation marks are
    >> inserted exactly
    >> at the positions where a change of script direction occurs. So how
    >> to handle
    >> this "smartly"?
    >> => A good solution will be to consider once again their "Pi"/"Pe"
    >> default
    >> distrinction in the general category. And in that case, it gives
    >> good hints
    >> about what the quotations marks are marking. So if you know that a
    >> quotation
    >> mark is initial or final, then you know that an initial quotation
    >> mark after
    >> an Arabic text should not be mirrored given that it will be reordered
    >> according to the direction of the text after it, and that the
    >> finalquotation
    >> mark will not need to be mirrored as it will be reordered
    >> according to the
    >> latin text before it.
    >> The Caveat is that an Arabic text will not be able to quote a
    >> Latin-written
    >> citation »like this« or even ”like that“ even if the quoted
    >> language uses
    >> this convention (reversed from the default Pi/Pe distinction), but
    >> only
    >> «like this» or even “like that”.
    >> Another difficulty : the quotation marks may be followed by (non-
    >> breaking)
    >> spaces (this is even mandatory for double angle quotation marks if
    >> you use
    >> French typography, and depending on tricky typographic differences
    >> this may
    >> be a NBSP or NNBSP); this is not a major difficulty for the final
    >> quotation
    >> marks, but will add some difficulty for the initial (Pi) quotation
    >> mark in a
    >> BiDi context where the embedded quotation needs to be reordered.
    >> As a consequence, an Arabic text will not be able to use
    >> accurately any
    >> (non-breaking) space with the quotation marks to embed for example
    >> a French
    >> quotation, and so will not accurately cite it using the usual «
    >> French »
    >> quotation style, unless he drops the non-breaking spaces for
    >> «French» or
    >> uses the English quotations to embed the “French” citation.
    >> Before the corrigendum in Unicode 5, the Arabic text would have
    >> needed to
    >> embed an Arabic quotation like “Arabic”, but due to the mirrored
    >> property,
    >> it would have been read with mirrored quotation marks. So an
    >> author could
    >> have decided to swap his quotation signs into ”Arabic“ (so the
    >> initial
    >> quotation mark would have the default Pe=ending property, and the
    >> final
    >> quotation would have the default Pi=initial property) and if he
    >> used them as
    >> well to cite Latin quotations ”like this“, then the BiDi
    >> reordering would
    >> still give the expected result because the quotation marks would
    >> be attached
    >> to the surrounding Arabic text where they are mirrored and not
    >> reordered,
    >> but not to the inner reordered Latin text which is not mirrored.
    >> And after
    >> reordering, everybody would see the quoted text as if it was
    >> “Latin” with
    >> the quotations reordered with the quoted Latin text.
    >> After the change, given that the quotation marks are no longer
    >> mirrored, the
    >> Latin quotation will seem to be now swapped if the text was
    >> created for
    >> Unicode 5 without the corrigendum (incorrect orientation) in all
    >> cases (in
    >> an Arabic text, they will look like:
    >> .snoitatouq ”cibarA“ dna ”Latin“ erofeb txet emoS
    >> This will be the reading of the text rendered by a post-
    >> corrigendum renderer
    >> from the text encoded in this order:
    >> Some text before ”Latin“ and ”Arabic“ quotations.
    >> I suppose then that the intent of the corrigendum is to make sure
    >> that the
    >> quotation marks are not mirrored, given that they were not
    >> mirrored in
    >> Unicode 4 and before. So the texts are expected to be encoded in this
    >> logical order (BiDi reordering and mirroring disabled):
    >> Some text before “Latin” and “Arabic” quotations.
    >> so that it will be rendered like this in renderers based on
    >> Unicode 4 or
    >> post-corrigendum Unicode 5:
    >> .snoitatouq “cibarA” dna “Latin” erofeb txet emoS
    >> but like this if a renderer was built using the pre-corrigendum
    >> Unicode 5
    >> properties :
    >> .snoitatouq ”cibarA“ dna ”Latin“ erofeb txet emoS
    >> There may exist other difficulties for the special case of
    >> quotation marks
    >> used at the beginning of each paragraph continuing a long
    >> quotation (not
    >> closed in the previous paragraph) but this will not affect Arabic
    >> documents
    >> making long Latin quotations, but will possibly affect Latin texts
    >> including
    >> long Arabic quotations. I think that no authors will try to use this
    >> Latin-specific style for long Arabic quotations.
    >> (final note: in all I wrote above, replace Arabic by any other RTL
    >> script,
    >> and Latin by any other LTR script)
    >>> -----Message d'origine-----
    >>> De : [mailto:cldr-users-
    >>> De la part de Rick McGowan
    >>> Envoyé : vendredi 17 août 2007 04:42
    >>> À :
    >>> Objet : New Corrigendum to The Unicode Standard
    >>> The Unicode Consortium has issued a new Corrigendum to The Unicode
    >>> Standard Version 5.0.0. For details on this corrigendum, see:
    >>> For general information on corrigenda to The Unicode Standard, see:
    >>> In brief, this corrigendum corrects the Bidi_Mirrored property
    >>> for several
    >>> characters.
    >>> Regards,
    >>> Rick McGowan
    >>> Unicode, Inc.

    This archive was generated by hypermail 2.1.5 : Fri Aug 17 2007 - 05:41:50 CDT