Re: ISO 639 "duplicate" codes (was: Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures)

From: Mark Davis (
Date: Mon Jul 14 2003 - 21:40:41 EDT

  • Next message: John Cowan: "Re: Aramaic, Samaritan, Phoenician"

    First, you should check again, since a significant amount of work was
    done in modularization in 2.6.

    Second, the phrase "IBM forgot to modularize ICU" is misleading, at
    the least. Unlike some people, who appear to have unbounded time and
    energy for, say, writing emails, we have to carefully pick and choose
    where we spend our time. Whether very fine-grained modularization is
    important depends a great deal on the client's requirements, and must
    be traded off against the many other things we could be doing with our

    Third, ICU4J is a source product. Saying that it is "impossible to
    integrate the ICU's Normalize..." is also misleading, since one can
    clearly modify source to remove dependencies on code one doesn't want
    to include, if it is not core to the functionality. (Of course, it may
    vary in amount of effort that is required.). And transliterators are
    not, in any event, required for Normalization.

    ► “Eppur si muove” ◄

    ----- Original Message -----
    From: "Philippe Verdy" <>
    To: <>
    Sent: Monday, July 14, 2003 11:13
    Subject: Re: ISO 639 "duplicate" codes (was: Re: Ligatures in Turkish
    and Azeri, was: Accented ij ligatures)

    > On Monday, July 14, 2003 5:34 AM, Mark Davis <>
    > > ...
    > > > Of course
    > > > Java already includes some parts of ICU, but other things are in
    > > > ICU4J are difficult now to integrate in Java, simply because IBM
    > > > forgot to modularize ICU so that it can be integrated slowly.
    > > > Accepting ICU4J as part of the core is a big decision choice,
    > > > because ICU4J is quite large, and there are certainly developers
    > > > for Java that would not accept to have 1 aditional MB of data
    > > > classes loaded in each JVM (particularly because the integration
    > > > of ICU would affect a lot of core classes for the Java2 platform
    > > > now also used for small devices).
    > > ...
    > > > For example, it is impossible to integrate the ICU's Normalizer
    > > > class in Java without also importing the UChar class and all its
    > > > related services for UString, such as transliterators, and
    > > ...
    > >
    > > You are very misinformed about ICU4J.
    > I hae tried several times to do it. It does not work: you may
    > effectively remove some tables your don't need, but trying
    > to extract just the normalizer is a real nightmare. I tried it
    > in the past, and abondonned: too tricky to maintain, and I
    > retried it recently (one month ago, from its CVS source) and
    > this was even worse than the first time.
    > I know that there's now a recent announcement (less than 1
    > month ago) for its modularization, but it's true that I did not
    > check the new "modularized" sources. So my application
    > of ICU4J is still only when I can accept the whole package,
    > as maintaining a stripped-down customization is too tricky.
    > But may be this has changed, I just updated my ICU sources
    > from CVS. I'll recheck it to see if a "ICU Light" version can be
    > created (which would only keep the core features, without the
    > support for tailoring rules compiled at run-time).
    > --
    > Philippe.
    > Spams non tolérés: tout message non sollicité sera
    > rapporté à vos fournisseurs de services Internet.

    This archive was generated by hypermail 2.1.5 : Mon Jul 14 2003 - 22:12:48 EDT