RE: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)

From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Mon Nov 24 2003 - 06:29:32 EST

Next message: Andrew C. West: "Re: creating a test font w/ CJKV Extension B characters."

Previous message: jon@hackcraft.net: "Re: creating a test font w/ CJKV Extension B characters."
In reply to: Doug Ewell: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Next in thread: Peter Kirk: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Reply: Peter Kirk: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Reply: Philippe Verdy: "RE: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

...
> >> Of course, no compression format applied to jamos could
> >> even do as well as UTF-16 applied to syllables, i.e. 2 bytes per
> >> syllable.

I wonder why Hangul would need compression over and above
any other alphabetic script... It has already quite a lot of compression
in the form of precomposed syllables. I think we better start a project
for allocating precomposed "syllables" for many other scripts,
precomposed Latin script syllables, precomposed Greek script
syllables, precomposed Tamil script syllables (most of the Brahmic
derived scripts are especially disadvantaged, from a 'compression'
viewpoint by the virama characters), etc. That should take up much
of the excess space in the unused planes (3-13, decimal).
Unfortunately that mean 4 bytes per non-Hangul syllable (before
byte oriented compression is done), but that could be compensated
by using an SCSU-like approach, just with bigger windows.

No, this was not serious ;-)
/kent k

PS
Hangul syllables are "LVT" (actually (L+)(V+)(T*)), not TLV.

Next message: Andrew C. West: "Re: creating a test font w/ CJKV Extension B characters."
Previous message: jon@hackcraft.net: "Re: creating a test font w/ CJKV Extension B characters."
In reply to: Doug Ewell: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Next in thread: Peter Kirk: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Reply: Peter Kirk: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Reply: Philippe Verdy: "RE: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Nov 24 2003 - 07:20:27 EST