Re: [unicode] CJK Ideograph Fragments

From: Uriah Eisenstein (uriaheisenstein@gmail.com)
Date: Thu Apr 29 2010 - 18:19:37 CDT

Next message: announcements@unicode.org: "[Unicode Announcement] Unicode Releases Common Locale Data Repository, Version 1.8.1"

Previous message: alka irani: "Re: [indic] Halant - can it be called a "Linguistic Zero" (Panini)?"
In reply to: mpsuzuki@hiroshima-u.ac.jp: "Re: [unicode] CJK Ideograph Fragments"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hello,
I will try to clarify: I am well aware of the existence of many software
projects using IDS, or other hierarchical structures, to describe
ideographs. I assume they will all have to deal with cases where a component
of an ideograph isn't itself a character. My question was specifically about
the possibility of encoding such components, rather than leaving it to be
resolved separately in each project (e.g. by using PUA characters,
indicating variants or abbreviations of existing characters etc). I hope it
is clearer now. If examples (websites of such projects) are desired, please
let me know.

I've recently been supplementing an existing list of decompositions for CJK
Unified Ideographs, and have so far found several dozens of such repeating
components which aren't encoded characters. This doesn't include the
extension blocks, for which I don't have decomposition data (but have found
quite a few components in). I would therefore estimate the number of
components at several dozens or perhaps a few hundreds, depending on the
policy of suggesting new ones.

Regards,
Uriah Eisenstein

2010/4/29 <mpsuzuki@hiroshima-u.ac.jp>

> Hi,
>
> I think you searched wrong place or wrong keyword.
>
> 1. Recently JTC1/SC2/WG2/IRG checks if new proposed CJK Unified
> Ideograph has similar shape with existing one, by comparing
> IDS (a sequence of IDC, CJK Unified Ideograph and CJK Radicals).
>
> 2. Wenlin Institute has a system to generate CJK Unified Ideograph
> glyph from the sequence of IDC and CJK Strokes and extra positiong
> informations, described by markup language CDL.
>
> If you know them already and think your idea is different
> from their activity, please clarify the difference. It will
> help the reviewers to understand your future proposal to UTC.
> How many components do you want to propose?
>
> Regards,
> mpsuzuki
>
> On Wed, 28 Apr 2010 20:03:52 +0200
> Uriah Eisenstein <uriaheisenstein@gmail.com> wrote:
>
> >Hello,
> >My question is about common components of CJK Ideographs which are not
> >encoded as independent Han characters (and perhaps indeed aren't). A good
> >example is the right-hand part of the character 漢 itself: it is a distinct
> >component appearing in multiple other characters, but is not encoded to
> the
> >best of my knowledge. The same goes for the top part of 鳥 and 島, the
> >surrounding part of 與 and 興 and several others. My question is whether
> there
> >are any plans or discussions for encoding these fragments in Unicode.
> >
> >(I haven't found anything about this in mailing list archives; I did find
> >statements that Unicode does not intend to provide any decomposition data
> of
> >Han characters :) And for good reasons. However, such fragments may well
> be
> >useful for third-party software dealing with 漢字 glyph generation, lookup
> by
> >components etc.)
> >
> >Thanks,
> >Uriah Eisenstein
> >
>

Next message: announcements@unicode.org: "[Unicode Announcement] Unicode Releases Common Locale Data Repository, Version 1.8.1"
Previous message: alka irani: "Re: [indic] Halant - can it be called a "Linguistic Zero" (Panini)?"
In reply to: mpsuzuki@hiroshima-u.ac.jp: "Re: [unicode] CJK Ideograph Fragments"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Apr 29 2010 - 18:25:37 CDT