Re: ISO 8859-11 (Thai) cross-mapping table

From: Theo Veenker (Theo.Veenker@let.uu.nl)
Date: Wed Oct 09 2002 - 03:55:30 EDT

  • Next message: Asmus Freytag: "Re: ISO 8859-11 (Thai) cross-mapping table"

    Marco Cimarosti wrote:
    >
    > John Aurelio Cowan wrote:)
    > > Marco Cimarosti scripsit:
    > > > Talking about the format of mapping tables, I always
    > > > wondered why not using ranges. In the case of ISO
    > > > 8859-11, the table would become as compact as
    > > > three lines:
    > >
    > > Well, that wins for 8859-1 and 8859-11 and ISCII-88, where Unicode
    > > copied existing layouts precisely. But it wouldn't help other 8859-x
    > > much if at all,
    >
    > All 8859 tables would be more succint.
    >
    > Non-Latin sections use contiguous ranges of letters in alphabetical order
    > or, however, in the same order used by Unicode; this is also true for most
    > other non-ISO charsets.
    >
    > Latin sections are a worse case, but they still benefit slightly, because
    > characters shared with Latin-in stay the same positions.
    >
    > > and it requires binary search rather than direct
    > > array access, which would be a terrible lossage in CJK, where the
    > > real costs are.
    >
    > I agree. In the case of CJK it simply doesn't pay.

    If I may add my two cents; IMO using search algorithms to reduce table size
    doesn't pay in any case. If one uses fast one/two-stage lookup tables for
    both mappings (legacy to unicode and v.v.) then most tables require about
    3 kb or less of storage space. Approx. times 10..30 for CJK encodings.
    Compared to the 256 Mb in a typical PC each lookup table would consume 0.001%
    (or 0.01-0.03% for CJK) of main memory. My point is it is better to concentrate
    on processing speed than on table foot print.

    Theo



    This archive was generated by hypermail 2.1.5 : Wed Oct 09 2002 - 04:43:24 EDT