Re: BIDI IRI Display (was spoofing and IRIs)

From: Martin J. Dürst (duerst@it.aoyama.ac.jp)
Date: Thu Mar 04 2010 - 02:23:22 CST

  • Next message: Vinodh Rajan: "Re: [unicode] Problems with Windows 7 Unicode Font Rendering"

    On 2010/03/04 2:12, Shawn Steele wrote:
    >> An IRI is a sequence of Unicode characters. Is there not
    >> already a well-defined way of converting a sequence of
    >> Unicode characters to a visual display?
    >
    > The problem (from my perspective at least) is that the Unicode BIDI rules are somewhat "generic".

    Yes indeed. It would be nice if we could add support for more and more
    stuff with arbitrary complexity to the Unicode bidi algorithm, but I
    don't see how that could be deployed.

    > Unicode expects things like / and . to be used in a context of same-script stuff, like a date, time or number.
    > IRIs use them as delimiters for a list of elements (labels in the domain name or folders in the path), in a hierarchical form.
    > The Unicode BIDI algorithm doesn't recognize that there's an underlying hierarchy, so it can end up "swapping" pieces in that hierarchy in some cases.

    There's of course a lot of hierarchy, but what's more inherent and basic
    is sequence. The URI spec defines the order of the various components,
    the hierarchy is more in people's heads than anywhere else.

    > I'm not sure UTR#36 is the proper place

    I fully agree that UTR#36 is NOT the right place for putting what's
    currently in section 4 of the IRI WG draft
    (http://tools.ietf.org/html/draft-ietf-iri-3987bis-00#section-4).

    In some sense, this would be equivalent to the IRI spec only saying that
    IRIs are composed of domain names, path components, query parts,..., and
    then saying: Look over there for how to order them on a napkin or on the
    side of the bus (or on a display).

    UTR#36 already has a good section on 2.5 Bidirectional Text Spoofing
    (http://www.unicode.org/reports/tr36/#Bidirectional_Text_Spoofing),
    which currently does exactly the right thing, namely say that bidi
    display of IDNs and IRIs is, among else, also a security issue.

    [off-topic: 2.5.1 in UTR#36 doesn't belong in 2.5, but should be its own
    subsection; there is only minor overlap in that Arabic is affected by
    both bidi and complex shaping.]

    > to clarify display of such ordered lists.

    Ok, you got from hierarchy to ordered list, which I think is exactly
    what I called 'sequence' above.

    > Proper BIDI rendering of IRIs isn't just a security, but also a usability, problem.

    Very much so. There are two levels here:
    - Interoperability as usability: If there isn't a single, well-defined,
    consistent logic <-> visual mapping for IRIs, they are not usable at all.
    - Immediate human usability: It should be possible for humans to build
    an easily understandable and actionable mental model (or use an existing
    mental model that they already have) for bidi IRIs and their visual
    ordering.

    Regards, Martin.

    > It does seem like perhaps this concept should be mentioned in Unicode somewhere. (IRIs aren't the only place that similar ordered lists happen).
    >
    > -Shawn
    >

    -- 
    #-# Martin J. Dürst, Professor, Aoyama Gakuin University
    #-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
    


    This archive was generated by hypermail 2.1.5 : Thu Mar 04 2010 - 02:27:23 CST