On Wed, 4 Jun 1997, David Goldsmith wrote:
> Chris Newman (Chris.Newman@innosoft.com) wrote:
> >What about a multi-valued attribute where each value may be in multiple
> Since Unicode can support multiple languages, can you give an example
> where language tagging is necessary *and* there is only plain text
Take an "alternate names" attribute in a personal addressbook, which may
be multivalued. Each of these multiple names may also be represented in
different langauges. Fonts, styles, and other viewer based attributes are
completely unnecessary as they don't have anything to do with the name.
But the language of the name representation is necessary to select the
appropriate variant string.
This needs a solution that is above the character stream level and below
the application protocol level. Were it encoded into the character
stream, that would result in quoting problems which would destroy
server-side searching capabilities. Were it put at the application
protocol level it would require a datastructure so complicated as to be
completely impractical and unusable. In addition, putting it at the
application protocol level means that a different solution for the same
problem will be necessary in each application protocol.
There are *a lot* of attribute-value based protocols including SNMP, LDAP,
DHCP, ACAP, RFC 822/MIME/HTTP, etc. In fact, the attribute-value model is
a basic tool in protocol design. There's simply no clean way to represent
language alternatives of multi-valued attributes at the application
MLSF can solve the general case problem, with language selection code
that's just over a page of C, and "ignore" code that's 1/2 a page of C.
Yes, it is _possible_ to solve this problem at different levels, but
only with many many times the level of complexity of MLSF, and
potentially serious performance costs as well. We don't need more ASN.1s,
X.400s and ISO 2022s in the world. We need a multi-lingual string format
that's as simple and elegant as UTF-8 is for multi-lingual-character
strings. The distinction is subtle but important.
There are blind people in the world and there are multi-lingual people in
the world who need spelling assistance. I can't ignore the fact that
UTF-8/Unicode is inadequate for their needs, nor am I willing to say such
people must always use complex formats like HTML and deal with all the
associated problems they create. But if their needs can be addressed with
only 1/2 a page of additional code, then that's absolutely the right thing
Make the common case simple and the uncommon case possible -- that's a
design principle I try to live by whenever I can.
> This is true, but seems like a rare case (the multilingual aspect).
> Wouldn't sending a phonetic alternate form (suitable for driving a speech
> synthesizer) work even better?
And a text string can't be converted to a phonetic alternate form unless
it has language tags.
> >Do you think I'd go through the trouble of writing up a detailed proposal
> >like MLSF, and writing functional source code in the appendix if I didn't
> >think a solution was necessary?
> Well, as you yourself said there are political considerations at work
> here, so, yes, I thought I would ask if there was a real problem being
> solved. I can certainly see that language tagging helps solve some of the
> problems you list, but I don't see why those problems can't be solved by
> "higher-level protocols". Please note I'm not trying to change your mind,
> just trying to understand why you think this is necessary.
I ignore people who have requirements with no basis in fact. The most
vocal person in the anti-Unicode camp who we all know and love is easy
to ignore. Reasonable people with reasonable requirements and realistic
(if uncommon) scenarios like Mark Crispin and others are hard to ignore
and it would be wrong to do so when a simple solution is possible.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT