Re: Keys. (derives from Re: Sequences of combining characters.)

Date: Sat Sep 28 2002 - 07:30:18 EDT

  • Next message: Barry Caplan: "Re: Keys. (derives from Re: Sequences of combining characters.)"

    [Still off-topic, but I'm hopeful that progress can be made, so am
    continuing a little farther]

    On 09/27/2002 10:26:36 AM "William Overington" wrote:

    >>XML is the way to go.
    >Maybe, maybe not. The issue of U+003C being used to mean LESS-THAN SIGN
    >documents which mix ordinary text and markup may or may not, depending
    >the application, be a problem.

    It really isn't a problem. XML provides other means to represent that
    character when it is needed as part of the content rather then as part of
    the markup. It is the job of an XML parser to sort that out, and there are
    various XML parsers that all handle this without a hitch and that are
    freely available. Someone made reference to MathML, which is a markup
    language built on XML (XML is a spec for building markup languages), and
    clearly mathematicians need to be able to represent this character within
    content, and the special use of U+003C for markup in XML was not seen in
    any way to be an obstacle.

    Your proposed markup convention would also need a parser to identify the
    pieces in a stream of data. If someone wants to use U+2604 in content, you
    would probably need some indirect way to represent it in a data stream.
    (E.g. One can imagine a hypothetical message "My favourite Unicode
    character is P1" into which someone might want to insert the COMET
    character.) So, I expect you'll have to deal with the same problem anyway.
    But this parser doesn't yet exist; some software developer will have to
    create it. On the other hand, XML parsers exist today. If you had been
    pursuing an XML-based approach, you might already be testing live
    prototypes rather than discussing a hypothetical system.

    Also, in an earlier message, you mentioned that you wanted to be able to
    use this messaging system on the Web. And, of course, you want to be able
    to represent U+003C directly in content. Did you realise that those two are
    contradictory? HTML has the same heredity as XML (both are implementations
    of SGML). It also uses U+003C for markup, and provides the same alternative
    means to represent that character as part of content. So, if one of the
    contexts within which you want your system to work is the Web, then you're
    going to have to deal with indirect representation of U+003C anyway. Since
    its already a magic character, why not let it be the magic character for
    your proposed protocol.

    XML really *is* the way to go. Please believe us. You don't need to believe
    me; believe Tex, Ken, Marco and the others who have offered you this
    recommendation. They really are among the most well-informed contributors
    to this list.

    BTW, my mail client (Lotus Notes, for better or worse) reports what time in
    *my* time zone an author wrote the given message. Such reporting of time in
    international communications is problematic; time zones need to be stated
    explicitly. We discovered this quite a while ago after scheduling a
    tele-conference; the half of the dept. in the UK assumed the time they saw
    was Dallas time (or maybe they suggested the time and we were reading it),
    but Notes had silently done a time zone conversion.
    - Peter

    Peter Constable

    Non-Roman Script Initiative, SIL International
    7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
    Tel: +1 972 708 7485
    E-mail: <>

    This archive was generated by hypermail 2.1.5 : Sat Sep 28 2002 - 12:31:41 EDT