Re: Keys. (derives from Re: Sequences of combining characters.)

From: Barry Caplan (bcaplan@i18n.com)
Date: Sat Sep 28 2002 - 12:04:29 EDT

Next message: Doug Ewell: "Re: Keys. (derives from Re: Sequences of combining characters.)"

Previous message: Doug Ewell: "[OT] Time zone reported by e-mail (was: Re: Keys.)"
In reply to: William Overington: "Re: Keys. (derives from Re: Sequences of combining characters.)"
Next in thread: William Overington: "Re: Keys. (derives from Re: Sequences of combining characters.)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

At 12:23 PM 9/27/2002 +0100, William Overington wrote:
>Are you perhaps trying to make a deduction by the fallacy of the
>undistributed middle, along the following lines.
>
>William's need is a markup system.
>XML is a markup system.
>
>William's need is XML.

I think what is being suggested is not nearly so obvious as that. It is more along the lines of:

William's need is a product of which data interchange is a key feature
Said product needs a architecture and a business model
Data interchange happens both externally and internally within the program
The business model chosen may indeed require a non-xml system
XML data interchange is better supported than any proprietary system.
If non-xml is chosen for the outside system, it should be converted to xml as early as possible for inbound, and as late as possible for outbound interchange in order to capitalize on xml tools

Of course, if the system is closed on the outside, and useful, it will be quickly duplicated by someone using open interchange formats anyway, but that advice on how to handle that situation only comes at a price :)

>I am simply saying that XML, as I understand it, does not suit my specific
>need.

It may be, that you don't understand your need well enough to understand why XML for outside interchange is an extremely strong contender.

>text cannot be used directly. For me, that is a major limitation of XML.

Why is it a "major limitation" of XML? Have not already over a million applications and web sites been implemented using XML technology? Is there a record of anyone ever griping about this limitation at all?

>legacy issue of which I do not want to have the problem with my research in
>language translation and distance education.

How so? A single line of code will automatically escape any characters as needed.

> Maybe one day Unicode will
>encode special XML opening and closing angle brackets so that XML can
>operate without that problem.

This is not up to Unicode to decide, it is XML's choice to specify the way its tags are constructed. XML's family tree starting with SGML (or earlier for all I know) and going through HTML pretty much constrains it. Trillions of people know <> as the tag delimiter. Earlier markup languages used a . PERIOD in the first character in a line as a delimiter - I think RTF is of this heritage. when was the last time someone mentioned they were creating or editing a RTF file compared to *ML?

>However, as XML uses the U+003C character in
>that manner at the moment, for me it is a problem and it has led me to use
>the key method using a comet circumflex key.

Instead of typing a trivial escape character in the rare case of a < in the content you want to force people to type weird Unicode characters in every tag?

>Also, I do not need to have all those " characters and = characters and /
>characters within messages.

Have not thought the problem all the way through? Why on earth would you want your message creators typing raw XML anyway? You are going to need some other UI, right? And that "message editor" can generate the XML, complete with escapes, using existing code you can have for free. This frees your time from having to create your own wheel and maintain it.

>Well, U+2604 U+0302 U+20E3 is not ridiculous. It is entirely permissible
>within the Unicode specification.

He is not saying it is ridiculous because it is not within the specification. He is saying it is ridiculous because the development community as a whole (a very large whole), both closed source and open source advocates, is rallied around XML as a basis for data interchange. If you ever wanted to move your comet files to another system, or create them from data in an existing system (such as Trados or another translation memory), you will need a 2 way XML<->Comet converter anyway. Why bother?

>you think it ridiculous then maybe that is good evidence of its originality
>as a piece of creativity.

I am sure it will create a pretty glyph. But software creation is about way more than pretty glyphs.

>A comet circumflex key could be viewed as a piece
>of original art. I specifically designed it so as to be a design which
>involves an inventive leap so as to produce something new and unexpected,
>which someone "skilled in the art" would not produce as the application of
>skill in the existing art without invention, yet which would display
>properly using an all-Unicode font.

This sounds a lot like you are planning to trademark or patent a character. I would personally travel to the ends of the earth to testify that all possible combining sequences are described as prior art in the description of how to create them in the Unicode specification and thus can never be proprietary. Now if you want to have a graphic artist draw a logo of a comet with a box around it, that is your prerogative. But the idea that combining characters in any fashion is somehow proprietary is not ridiculous, is it just a waste of time. In case you think otherwise, I can write a 5 line perl program to run on a spare machine that will create prior art of every possible combination of characters.. I can let it run forever and hook it to a web server to make it visible too.

>An added bonus of using the comet circumflex key is that documents
>containing comet circumflex codes do not necessarily need to contain any
>characters from the Latin alphabet.

Why is this a bonus, let alone an added one? I have a 4 year old niece just learning the "latin alphabet" and as far as I can tell it hasn't changed since I learned it. There is no +U003C character in that alphabet.

In fact, the bonus of using 3C as a delimiter (along with other XML delimiters) is that they are in every legacy encoding, meaning if no Unicode tools are available for editing, a regular text editor can be used and the conversion to Unicode can happen later.

Your method requires Unicode support and fonts (not the same thing) at the editing stage, which is not realistic unless you want to limit your community to a few of your closest friends so to speak.

No one is suggesting such a system can't be built, only that its usefulness would be strongly limited for a lot of very good reasons. As others have noted, I concur that this is not really a Unicode issue per se, but a software design issue.

Barry Caplan

Next message: Doug Ewell: "Re: Keys. (derives from Re: Sequences of combining characters.)"
Previous message: Doug Ewell: "[OT] Time zone reported by e-mail (was: Re: Keys.)"
In reply to: William Overington: "Re: Keys. (derives from Re: Sequences of combining characters.)"
Next in thread: William Overington: "Re: Keys. (derives from Re: Sequences of combining characters.)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Sep 28 2002 - 13:37:40 EDT