Re: collating sequence

From: Mark Davis (mark@macchiato.com)
Date: Thu Jun 28 2001 - 22:24:30 EDT


Your portrayal of ICU as requiring xIUA to be useful is inaccurate . We
strongly disagree that "Users should not normally touch ICU code"; and that
xIUA is required to "save man-months of work".

The example you give:

> Open a collator and check to see if it opened without errors. Then set
the
> collator to use compatibility decomposition followed by canonical
> composition. Next set UCOL_ALTERNATE_HANDLING to UCOL_NON_IGNORABLE,
> UCOL_CASE_LEVEL to UCOL_ON and the strength to UCOL_TERTIARY.
> Issue the collate, close the collator and check for errors.

is an artificial strawman.

We considered the pluses and minuses of your approach years ago -- before
ICU was every open source -- and settled on our current interface. Had we we
thought it better, we would have done it that way in the first place.

Mark

P.S. If you want to debate the pros and cons, the ICU list is a better forum
than here.

----- Original Message -----
From: "Carl W. Brown" <cbrown@xnetinc.com>
To: "unicode" <unicode@unicode.org>
Sent: Thursday, June 28, 2001 16:49
Subject: RE: collating sequence

> Markus,
>
> >
> > > Currently I am developing an easy to implement
> > > interface (xIUA) that is also free Open Source code.
> >
> > I would like to point out that ICU itself has a fully functional
> > and useful "interface" (API) for all of its services (conversion,
> > collation, normalization, formatting, etc.).
> What xIUA provides is a started package for those folks who want an ICU
> wrapper. If they don't want a wrapper than there is no need to use xIUA
at
> all.
>
> Wrappers are very useful especially when retrofitting existing code.
> Imagine having every function pass current locale, time zone etc. to any
> function they may at some time call ICU. The wrappers also standardize
> calls to use the same parameters and they can simplify the interface.
This
> type of wrapper code is designed to be tailored by the user and serves a
> very different function for the base ICU code. It does things that should
> never be implemented in ICU. Users should not normally touch ICU code.
>
> ICU must be flexible to handle any possible set of parameters.
Development
> environments usually restrict these to the house standards which are often
a
> small subset of can be done. If there is a special circumstance they can
> either implement a different calling API or invoke ICU services directly.
>
> >
> > In particular, for as long as one works with UTF-16 strings, the
> > ucol_strcoll() function is quite easy to use.
> >
>
> Even with UTF-16 strings you have to setup your collator. For example a
> typical collate call:
>
> Open a collator and check to see if it opened without errors. Then set
the
> collator to use compatibility decomposition followed by canonical
> composition. Next set UCOL_ALTERNATE_HANDLING to UCOL_NON_IGNORABLE,
> UCOL_CASE_LEVEL to UCOL_ON and the strength to UCOL_TERTIARY.
> Issue the collate, close the collator and check for errors.
>
> With xIUA you call: xiua_strcoll(str1,str2);
>
> If you want a bit more flexibility you can call:
> xiua_strcollEx(str1,str2,XCOL_TERTIARY); or
> xiua_strcollEx(str1,str2,XCOL_SECONDARY_CAN); if you want a secondary,
case
> insensitive, collate with canonical decomposition followed by canonical
> composition.
>
> It you want to tailor these setting you can tailor xIUA to call ICU with
> whatever you like.
>
> A user implementing a wrapper would only have to change the code in one
> place to upgrade for ICU 1.6 to ICU 1.8.
>
> If for example you do a lot of repeat calls to the collator and want to
> setup a collator to use for something like sorting, then none of the xIUA
> functions will be suitable. But you can use the code and other programs
> like the ICU test and sample programs to develop you own functions. This
is
> why xIUA is a starter package.
>
> > It may help in some applications to use whatever wrapper one
> > likes, but it is not necessary to use a wrapper.
> Very true. But if they want to write a wrapper this can save them
> man-months of work.
>
> >
> > Also, if a wrapper library performs hidden string conversions,
> > then a user needs to understand the impact on performance and memory
use.
> It should not perform unnecessary conversions. xIUA has a memory manager
> for internal working memory. It keeps a small buffer that it uses and can
> subdivide into pieces for working memory. Thus is saves the malloc/free
> overhead that is likely to occur with explicitly implemented calls.
> Combining calling sequences into a single function can reduce code size.
To
> improve speed xIUA uses its own UTF-8 to UTF-16 and UTF-32 to UTF-16
> conversion routines. You don't want the overhead of a full converter
> especially for converting lots of small fields.
>
> xIUA will also maintain an open ICU converter for translations to and from
> code pages. Just keeping track of such a converter is not easy to
retrofit
> into existing applications. Thus a wrapper if properly designed can lower
> overhead. xIUA is a starting point that users can tailor to be efficient
in
> their own environment.
>
> >
> > ICU will add some helper functions to allow users to explicitly
> > convert in-process strings between UTFs. This is simple (even
> > without such helper functions) and fast - but of course not as
> > fast as staying with a single encoding.
>
> Some functions like collate are easy to convert to UTF-16 and process.
> Other functions like strtok don't work that way. They need a separate
> implementation because they return pointers into the source string and
> modify the string contents.
>
> I have worked to implement ICU for clients and have put a lot of pro bono
> work into this product because I have seen that ICU would be accepted by
> more clients if they could save a lot of time and effort in implementing
> ICU. This provides them with a way to speed up the process. It may not
be
> for everyone but I hope that it will help many.
>
> Carl W. Brown
> X.Net, Inc.
>
> I am not a part of the ICU development team. xIUA is not supported by
them.
>
>
>
>
>



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:19 EDT