Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)

From: John Cowan (
Date: Tue Nov 25 2003 - 12:25:35 EST

    Philippe Verdy scripsit:

    > The question of Latin letters with two diacritics added in Latin Extension B
    > does not seem to respect this constraint, as it is not justifed in the
    > Vietnames VISCII standard that already does not contain characters with two
    > diacritics, but already composes them with two characters in the limited CCS
    > set.

    I'm not sure what standard you are referring to. There are three standards
    for Vietnamese text: VISCII 1.1 (de facto), TCVN 5712-1 (aka VSCII-1),
    and TCVN 5712-2 (aka VSCII-2). VISCII provides no combining characters,
    fills the C1 space with graphics, and even replaces certain C0 characters
    with graphics. 5712-1 provides combining characters and fills the C1
    space with graphics. 5712-2 provides combining characters and leaves
    both C0 and C1 clear of graphics (and so is ISO 2022-compatible). But
    all of them provide at least some characters with double diacritics.

    > I don't know why even ISO10646 would have needed them, unless there's some
    > Vietnamese DBCS standard that allows representing in a 94x94 matrix all
    > letters with two diacritics as well as Han ideographs used in Vietnamese.

    I very much doubt that any such encoding ever existed.

