RE: UTN #31 and direct compression of code points

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue May 08 2007 - 14:45:04 CDT

Next message: Adam Twardoch: "Re: swastika"

Previous message: John Hudson: "Re: Uppercase ÃŸ is coming? (U+1E9E)"
In reply to: Doug Ewell: "Re: UTN #31 and direct compression of code points"
Next in thread: Doug Ewell: "Re: UTN #31 and direct compression of code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Doug Ewell wrote:
> Envoyé : mardi 8 mai 2007 08:26
> À : Unicode Mailing List
> Cc : Richard Wordingham
> Objet : Re: UTN #31 and direct compression of code points
>
> Richard Wordingham <richard dot wordingham at ntlworld dot com> wrote:
>
> >> On a large alphabet like Unicode, this conversion table will have a
> >> very significant size,...
> >
> > That entirely depends on how one stores the table. One need only
> > store the entries for the characters that occur in the text.
>
> That is exactly the point I've been trying to make about the supposed
> "large alphabet" effect. This e-mail contains no Cyrillic characters,
> and a Unicode-based Huffman encoding of it would not need to allocate
> space for Cyrillic characters, even though there are hundreds of
> Cyrillic characters in Unicode.

Side note: do you know if the nick-named "arithmetic coding" (that optimizes
a bit further the compression using principles similar to Huffmann coding,
but with a better approximation of the entropy reduction, and that also
needs similar tables for its statistic decision tree) is still challenged by
the IBM patents on it?

I say that, because some people have demonstrated that they were able to
produce completely equivalent results, based only on a prior art document,
using another analogy; the equivalence is now demonstrated in the
mathematical sense, even though the definition is based on different
background concepts (i.e. there exists a bijection between the two models
implied by the two conceptual definitions).

So, many free open-sourced implementations of some audio/video codecs (for
example in JPEG image decoders) are now citing this prior art document in
their documentation instead of the IBM patent, and just say that the codec
is "compatible" with the JPEG standard, instead of claiming that they
implement it in a compliant way, even if this does not make any difference
and these applications effectively comply to the standard if you test them.

Next message: Adam Twardoch: "Re: swastika"
Previous message: John Hudson: "Re: Uppercase ÃŸ is coming? (U+1E9E)"
In reply to: Doug Ewell: "Re: UTN #31 and direct compression of code points"
Next in thread: Doug Ewell: "Re: UTN #31 and direct compression of code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue May 08 2007 - 14:46:26 CDT