RE: johab compound letters reference for Hangul? (3)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Dec 21 2003 - 05:51:36 EST

Next message: Kent Karlsson: "RE: johab compound letters reference for Hangul? (3)"

Previous message: Philippe Verdy: "RE: johab compound letters reference for Hangul? (3)"
Maybe in reply to: Philippe Verdy: "RE: johab compound letters reference for Hangul? (3)"
Next in thread: Kent Karlsson: "RE: johab compound letters reference for Hangul? (3)"
Reply: Kent Karlsson: "RE: johab compound letters reference for Hangul? (3)"
Reply: Doug Ewell: "The last straw (was: Re: johab compound letters reference for Hangul? (3))"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Doug Ewell wrote:
> Philippe,
>
> > When looking at this document:
> > http://std.dkuug.dk/JTC1/SC22/WG20/docs/n1051-hangulsort.pdf
> > and its associated data file "n1051t-table-hangulctt6.txt"...
>
> Do you have access to a Web or FTP site, or some other place where you
> could post these relatively long lists? (If you don't, I understand; I
> didn't for many years while I was with CompuServe.)
>
> By including a long list of proposed decompositions in your message,
> followed by commentary at the end, you run the risk that people will
> skip over the list and miss out on the commentary.

Well I'm involved with projects that include other Korean writers/readers
with collections of texts that are very imperfectly mapped in Unicode, and
badly rendered with most fonts, despite they are correctly represented.

My idea was to offer better accessibility to these texts, and I do think
that Unicode made errors to encode Hangul twice, but also Korean standards
that used the Wangsung set and later the Johab set.

For many implementers, they feel that the precomposed Hangul syllables will
be enough but necessary to support the Hangul script. Due to its size in the
Unicode space, they are reluctant to include this support. However Korea is
the most Internet-connected country in the world, and the need to have
Unicode texts correctly supported for tens of millions of users is urgent.

We could serve them more easily, by describing more precisely in Unicode the
structure of their script that they all have learned at school. The
artificially built subset of the script found in Unicode, and KSC5601
desserves this need to have good support for them, including with good
typography (if you look at various Korean web sites, you'll see that the
lack
of support for good typography has a consequence: most sites use a lot of
bitmaps to represent text, even if that breaks accessibility for blind
users, that may simply be able to represent the Basic jamos with very simple
Braille patterns.

It is also needed to allow better support in text editors in order to offer
users a way to edit letters separately within Hangul clusters. I do think
that a good Korean editor should have a display mode where Hangul clusters
are represented only by Basic Jamos presented as an alphabet (for example
with taller/uppercase glyphs for Choseongs, Jungseongs and Jongseongs being
presented like lowercase letters in Latin).

Then it would be up to the text editor to automatically recompose first the
Johab compound jamos to get the prefered Unicode NFD form, and optionnally
use the algorithmic composition of Hangul syllables.

Browsers should also be able to decompose syllables and compound Johab
jamos into Basic jamos, in order to use simpler fonts that are only defined
with these Basic Jamos. Then, if browsers implement the 2D composition
model for Hangul (which is defined only in terms of Horizontal or Vertical
property attached to Basic letters) they could recreate the layout of
Hangul syllables. Basic fonts could also be easily extended to fonts
supporting the whole standard set of clusters (I prefer the term cluster
to the term of syllable for the Unicode or Johab or Wangsung subsets of
valid Hangul syllables), by adding a default composition routine. Fonts
could also be more easily hinted within the reduced set, with the
advantage that this hinting would be inherited in clusters.

Finally the capability of performing full text search in Hangul is too
limited for now and not easily interoperable. I have been told that
Google.kr already performs this decomposition of Hangul texts for
pages encoded with Korean standard charsets, in order to increase the
number of good hits detected in pages. If Google needs that, I think that
many users will also need that too, and I would prefer that this
decomposition of compound jamos be officially described with a coherent
set of decompositions (which, in my opinion, should become canonical,
except that the Unicode NFD and NFC forms would not be modified for
existing correctly composed texts: the NFD form will use the Johab
compound letters, or recompose to them)

There are two new files to add in the UCD: one that defines these
extra "canonical Johab" compositions into Basic Jamos (I would call
it "HangulBasic.txt") and an extension to the CharacterProperties
to assign the horizontal or vertical stack composition of a Basic
or compound jamo (intended for renderers that choose to display
the 2D layout of syllables):

- if two clusters have horizontal layout (normally these jamos are taller
than wide and include most often a long vertical stroke), they are
stacked side-by-side from left-to-right, and their resulting composition
also has horizontal layout.
- same thing if the first cluster is vertical and the second horizontal.
- if two clusteres have vertical layout (normally these jamos are taller
than wide and include most often a long horizontal stroke), they are
stacked from top-to-bottom, and their resulting composition
also has vertical layout.
- same thing if the first cluster is horizontal and the second vertical.
In summary:
        H + H -> H,
        V + H -> H,
        V + V -> V,
        H + V -> V
i.e. the layout of a compound is determined by the last jamo in the
cluster, something that is extremely simple to understand and implement
efficiently.

Of course this second set of layout property is not required for renderers
as the presentation as a 2D syllabic cluster is optional. Hangul already
supports "Half-width" presentation, made only of Basic letters, where the
differentiation between Choseongs or Jongseongs can be a matter of style:
bold/light, tall/x-height (similar to lettercase in Latin), ...
Or even not differentiated in texts (similar to what was done with Wangsung
where the reader implicitly rebuilds delimitations between syllables using
his linguistic and phonetic knowledge of Korean, exactly like for Latin
text readers). Fonts could be built to support either presentation styles.

__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com

application/ms-tnef attachment: winmail.dat

Next message: Kent Karlsson: "RE: johab compound letters reference for Hangul? (3)"
Previous message: Philippe Verdy: "RE: johab compound letters reference for Hangul? (3)"
Maybe in reply to: Philippe Verdy: "RE: johab compound letters reference for Hangul? (3)"
Next in thread: Kent Karlsson: "RE: johab compound letters reference for Hangul? (3)"
Reply: Kent Karlsson: "RE: johab compound letters reference for Hangul? (3)"
Reply: Doug Ewell: "The last straw (was: Re: johab compound letters reference for Hangul? (3))"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Dec 21 2003 - 06:35:59 EST