Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Aug 10 2003 - 18:23:26 EDT

  • Next message: ekeown@student.umass.edu: "Re: Roadmap---Mandaic, Early Aramaic, Samaritan"

    On Sunday, August 10, 2003 9:17 PM, Peter Kirk <peter.r.kirk@ntlworld.com> wrote:

    > On 10/08/2003 10:09, Michael Everson wrote:
    >
    > > It is the formally specified way to represent what you say you want
    > > to represent. If an implementation doesn't do that nicely enough,
    > > complain to the implementors. (This has already been suggested to
    > > you.)
    >
    > As has already been clearly pointed out by Philippe, Kent and myself
    > (and ignored by those opposed to any change), the combination SPACE +
    > diacritic does not have the required categories, properties and
    > specification for the function it is supposed to perform. Either these
    > categories etc need to be adjusted (and I don't expect the general
    > category of SPACE to be changed!), or some exceptional mechanism needs
    > to be clearly defined, or, by far the simplest solution, a new base
    > character can be defined which, when combined with the diacritic, has
    > the required categories and properties.

    That's exactly what I suggested (and I used the word "suggest", and
    wanted to show the inaccuracy of the SPACE or NBSP to represent
    spacing diacritics as a normal symbol, due to the undocumented
    properties for that combination). Due to the lack of formal
    documentation (no one here demonstrated that such sequence with
    SPACE was really documented as such somewhere in the Unicode
    specs), such legacy usage is still just a hack which only works
    sometimes, but not always as intended because it contradicts some
    other principles like the inheritance of the base character properties
    to the whole combining sequence using it.

    And still, even if SPACE+diacritics is documented now as producing
    officially a symbol, its properties are still not defined (not interoperable
    as varying among implementations), and it still gies problems with the
    huge legacy use of SPACE as a padding character or with
    space normalizations like in XML, HTML and SGML.

    In addition, it still does not solve the problem of its insertion within
    words, and of its directionality for BiDi, its parsing for breaking
    (line breaking, word breaking, ...) where distinct base character(s)
    for the correct interpretation would be needed.

    Yes I have read your comment, and Yes I know that
    SPACE+diacritics is widely used. But this is with many unsolved
    problems that one could legitimately want to solve with more precise:
    - definition of such combining sequence with SPACE
    - definition of its properties
    - documentation within the Unicode breaking algorithms
    - adjustments to the BiDi specs
    - etc...

    If all these adjustments are made, there will be many, all of them
    handled like exceptions to the normal rules, when a much simpler
    approach (which would not require all these changes in specs),
    would consist in defining other(s) more explicit base character(s)
    for the appropriate function.

    If Ken, Michael, Kent and other respectable UTC members can't
    see the problem, who will? Please consider the problem itself and
    don't be too much focused on the exact terminology that you would
    have used yourself to better describe the problem and its solutions.

    I am not discussing the terminology itself, but the lack of
    documentation and support for what seems a true interoperability
    problem. So please don't flame me with sarcasms, that's not the
    subject of my messages which do not want to comment about
    the respective Unicode expertize of respectable UTC members...

    Sorry if this message seems still too long for you. But each time
    I want to be short, I am flamed for inaccuracies, or imprecisions,
    or suspected of claiming something about the standard when in
    fact I am not discussing what is currently in the standard itself,
    but what is not there now and causes problems. It's easy to
    be short if you only refer to the standard itself, and only respond
    as if this list was just a FAQ.

    -- 
    Philippe.
    Spams non tolérés: tout message non sollicité sera
    rapporté à vos fournisseurs de services Internet.
    


    This archive was generated by hypermail 2.1.5 : Sun Aug 10 2003 - 18:57:30 EDT