Re: Some questions about Unicode's CJK Unified Ideograph

From: Ken Whistler <>
Date: Fri, 29 May 2015 18:50:28 -0700

On 5/29/2015 5:20 PM, gfb hjjhjh wrote:
> 1. I have seen a chinese character ⿰言亜 from a Vietnamese dictionary

> So, a.) In , it show that
> CJK Extension E and F have already been accepted, but where can I
> check those proposals to see if the xharacter is in them or not?

For Extension E, you can check the following code chart:

See: U+2C89A..U+2C931 (pp. 54-56 of the pdf) for the relevant
radical (#149). But I don't see that character in the list of
Extension E characters.

Extension F is harder to track down, because it has not yet been
approved by the UTC, and comes in two pieces, with different
progression so far in the ISO committee. Perhaps somebody on this list
who has better access to the relevant documents can let you
know whether ⿰言亜 can be found in those sets.

> and b.) it say to propose a new character, the proposal must include
> information about someone who would agree to provide a computer font
> for publishing the standard, do that mean i have to provide info about
> someone who is anticipated to agree on doing so or do i need to
> contact them for their agreement first, and does that mean I can just
> put info of someone who are making free full unicode CJK coverage font
> into the proposal?,

It would require (eventually) provision of a font with correct display
of just the character proposed -- but in the case of CJK additions, these
first go through a process of collection and review by the Ideographic
Rapporteur Group. The best thing to do is to work with a national
body concerned with CJK characters and ensure that they include
this character on their list of submissions for IRG review.

> and c.) just like the question (b), do "names and addresses of
> appropriate contacts within national body or user organizations"
> represent Vietnamese government in this case?

If the character has not been submitted to the IRG for review, it would
probably be best to work through the Vietnamese national standards
body. Again, people on this list may be able to provide you the
correct contact information for them.

> 2. Is combined characters like U+20DD intended to work with all
> different type of characters, or is it some problem related to
> implementation ? as I when i write ゆ⃝ (Japanese Hiragana Letter Yu +
> Combining Enclosing Circle) appear to be separate on most font I use,
> but if I change the Hiragana Yu into a conventional = sign or some
> latin character, most fonts are at least somehow able to put them
> together. Or, is there any better/alternative representation in
> unicode that can show japanese hiragana yu in a circle?

Combining enclosing marks in principle could work with most characters,
but in practice most arbitrary combinations do not work very well,
because they would require very complicated font support.

> 4.In CJK Symbols and Punctuation, Proper name mark and Book name mark
> are not included. While there are charactera like U+2584, U+FE33,
> U+FE4F, and U+FE34 in unicode that is more or less a representation
> for the two symbol, they do not appear below or on the left of typed
> characters when text flow is horizontal/vertical, and instead, they
> occupy their own space which make them having little use in daily
> life, and while the proper name mark and book name mark can
> represented by text editing softwares and css but those representation
> are not ideal and they do match "Criteria for Encoding Symbols". Is it
> possible to make a new unicode symbol, or change some current symbol
> into one that could appear in suitable place of other characters when
> typed? And a property of the symbol is that when used in case like 美
> 國紐約 which 美國 and 紐約 are two different proper name (place name),
> so an underline should go below them without any separation between
> the character 美and國 or 紐and約 (when text are written horizontally),
> but at the same time the underline should not be linked between 國 and
> 紐 as 國 is the end of first place name while 紐 is the start of the
> other.

What you are talking about is, indeed, best handled by text styling
rather than by individual character encoding. These are various CJK-specific
underlining styles (for horizontal text layout) or sidelining styles (for
vertical text layout). It is precisely because these require
highlighting for
ranges of characters (without breaks) that this kind of text decoration is
handled best by style attributes (or markup), rather than by individual
combining symbols.

The characters U+FE33, U+FE34, U+FE4F (but not U+2584) are compatibility
characters only for mapping to old Chinese standards that had individual
characters encoded for these underlining or sidelining text highlights,
but which required specialized text layout programs to make any use
of them.

