Re: Comments on <draft-ietf-acap-mlsf-00.txt>

From: Martin J. Duerst (mduerst@ifi.unizh.ch)
Date: Sat Jun 07 1997 - 15:52:39 EDT


On Fri, 6 Jun 1997, Chris Newman wrote:

[part of this mail should probably go to the ACAP list. But I still
haven't been able to subscribe.]

> On Fri, 6 Jun 1997, Rick McGowan wrote:

> > I thought ACAP was defining a protocol on the wire, not a storage
> > protocol or format for the server...
>
> It also defines a data model. As well as formats for attribute, entry,
> and error strings. There are very few protocols which are just "on the
> wire" -- most of them have very strong implications for the server data
> model.

There are of course implications. But the implications of having
something like text/enriched for language tags on the protocol
wouldn't affect the internal data model. MLSF might be a nice
idea for internal storage; you could do this completely on your
own (and at your own risk).

> > Is there a requirement that ACAP server side program store data in UTF-8?
>
> Not strictly, but all human readable text strings in the protocol are
> UTF-8. And the comparator functions will apply to UTF-8.

The comparator functions, so called ORDERINGS, can do whatever they
want. They are an interesting and probably very valuable concept
of ACAP. But they will involve a lot of work.
It should be expected that ORDERINGS are described in terms of
characters, and not in terms of UTF-8 bytes, because that's
the appropriate abstraction level. How they are implemented
and applied internally is not very relevant, but I guess I would
prefer working from UTF-16. Maybe Alain LaBonte has some comments?

> > Why not in Unicode?
>
> Do you mean UTF-16 or UCS-4? There are good reasons RFC 2130 recommended
> UTF-8 over other formats of Unicode. I won't bother going into them
> because that's off topic.

It's not off topic. The good reason that RFC 2130 recommended UTF-8
over others for protocols is its ASCII compatibility. There are
likewise good reasons, known widely in the industry, for working
with a fixed-width process code. If you study the proceedings
of the last few Unicode conferences, you will see a lot of such
exaples, and not much else. MLSF is definitely too much biased
because it excludes some very valid implementation solutions.

For examlpe, assume a database vendor wants to offer an ACAP
implementation. Assume wideranging Unicode support is available
in that database (most database vendors have already done that,
or are close to it). I could even immagine to have a student
implement ACAP functionality on top of such an OO database
in a larger project. If this database uses UTF-16 as a process
code, the project is dead with MLSF.

And while up to now, I have had the impression that the only
type of searching is against a fixed string (no case folding
or anything), for which some of the arguments about MLSF
apply, it looks like it can (for very good reasons) be much
more complicated. And in this case, the few lines of code
saved by MLSF become even more irrelevant.

While I'm at it, here are a few more comments to the ACAP spec:

From "Open Issues":

] 5) Some people have indicated a desire for multi-valued attributes.

Multi-valued attributes seem to be desired. Language alternatives
are one kind of multi-valued attributes. It would be tedious to
handle them specially (which would have to be done with MLSF
alternatives).

From "2.5 Datasets and Enties":

] Each entry in a dataset is a set of attribute/value pairs. Each
] attribute is a hierarchical name in UTF-8, with each component of the
] name being separated with a period ("."). Each attribute/value pair
] may have additional metadata; this is described in section <section>.
] There must be exactly one "entry" attribute, whose value is unique
] amongst all entries in the dataset and contains zero or more UTF-8
] characters other than slash ("/") or dot (".").

Metadata would make the best place for language information, wouldn't it?
The values in each entry are short, as we have been told, and this
means that indeed the possibility that there is multilingual text in
them that needs to be tagged is low.

Another idea is to take the attribute name structure, and append a
language tag at the end after an additional dot. This would nicely
deal with alternatives, and would make language searchable in the
same way as other things. Maybe a special separator could be defined
to be used in front of the final language tag, with the special
semantics that there is no need for a trailing * wildcard in an
attribute specification but still all the attributes with different
languages get searched.

All of the above proposals would solve the bulk of ACAP language
identification problems in a maner more appropriate to the protocol
and the data model than MLSF.

What remains is the language of the Alert and Warning messages.
For this, the correct solution is language negotiation, i.e.
the client telling the server about the languages preferred by
the user, and the server telling the client about the language
it will use. Alternates in this context are not a solution,
because they don't scale.

As I have said, I'm in no way against language tagging.
But it should be done by considering the structure and
the needs of the protocol.

Regards, Martin.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT