The Unicode Consortium Discussion Forum

The Unicode Consortium Discussion Forum

 Forum Home  Unicode Home Page Code Charts Technical Reports FAQ Pages 
 
It is currently Fri Apr 18, 2014 1:24 pm

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: Duplicate characters from the same source in CJK ideographs
PostPosted: Thu Oct 18, 2012 9:11 am 
Offline

Joined: Sat Aug 06, 2011 9:02 am
Posts: 43
On Chapter 12 : East Asian Scripts, one can read on page 395 :

    "1. Where the repertoires of two of the character set standards within a single source have considerable overlap, the characters in the overlap might be included only once in the source. This approach is used, for example, with GB 2312-80 and GB 12345-90, which have many ideographs in common. Characters in GB 12345-90 that are duplicates of characters in GB 2312-80 are not included in the G source"

The words in red transmit the idea that duplicate characters from the repertoire of two character standards may be included in the source (G for example), which ultimately tells me that duplicate characters are being included in the UCD. Is that correct ? If that is the case, are these characters all included in the block CJK Compatibility Ideographs ?


Top
 Profile  
 
 Post subject: Re: Duplicate characters from the same source in CJK ideogra
PostPosted: Thu Oct 18, 2012 6:01 pm 
Offline

Joined: Mon Feb 01, 2010 6:18 pm
Posts: 77
This is actually talking about the source data for CJK ideographs, not the encoding of compatibility ideographs. CJKV characters have their source standards listed in the Unicode Han (Unihan) database. A lot of those character standards are simply updates of each other, however, and the source data does not list every update, even though it is technically another standard in which the character was found. So when a character shows up in GB-2312-80, it will also be present in GB-2312-90, but the G source data will only list GB-2312-80 as the source standard, and only characters found in GB-2312-90, but are not also found in GB-1212-80, will have GB-2312-90 listed as its G source.

Compatibility ideographs are only created when a source standard has two separate character code points for characters that would otherwise be unified in Unicode. The compatibility character exists only to allow for lossless conversion of the two separate characters from the source standard to Unicode (which would otherwise map them to the same character) and back again.


Top
 Profile  
 
 Post subject: Re: Duplicate characters from the same source in CJK ideogra
PostPosted: Thu Oct 18, 2012 8:32 pm 
Offline

Joined: Sat Aug 06, 2011 9:02 am
Posts: 43
vanisaac wrote
Quote:
A lot of those character standards are simply updates of each other, however, and the source data does not list every update, even though it is technically another standard in which the character was found. So when a character shows up in GB-2312-80, it will also be present in GB-2312-90, but the G source data will only list GB-2312-80 as the source standard, and only characters found in GB-2312-90, but are not also found in GB-1212-80, will have GB-2312-90 listed as its G source.

I think you mean GB-12345-90 instead of GB-2312-90, as this last standard is not listed on Table 12.1.

I edited what I have said before, for I think I understood what's being said in the paragraph quoted in my question.

    1. GB-2312-80 contains the simplified Chinese characters and others (A).
    2. GB-12345-90 contains the traditional Chinese characters and others (B).
    3. Simplified and traditional characters in general don't overlap. The ones that overlap (the simplified and traditional characters are identical) are taken only once to Source G.
    4. A and B have some duplicate characters and those are not duplicated in source G.

Is this correct ?

PS. How do I include "vanisaac wrote", in substitution to "Quote" in the quoted title ? I've tried several arrangements, to no avail.

Thanks.


Top
 Profile  
 
 Post subject: Re: Duplicate characters from the same source in CJK ideogra
PostPosted: Sat Oct 20, 2012 12:32 am 
Offline

Joined: Mon Feb 01, 2010 6:18 pm
Posts: 77
Belloc wrote:
I think you mean GB-12345-90 instead of GB-2312-90, as this last standard is not listed on Table 12.1.

Thanks.


You are right, that should have been GB-12345-90. I read it wrong the first time through and just blindly copied the wrong number writing my response.

PS, the way to get someone's name in the quote is quote="name". It should come up automatically if you hit the "quote" button instead of "reply".


Top
 Profile  
 
 Post subject: Re: Duplicate characters from the same source in CJK ideogra
PostPosted: Sat Oct 20, 2012 7:12 am 
Offline

Joined: Sat Aug 06, 2011 9:02 am
Posts: 43
vanisaac

Thanks for your reply. How about my conjectures (1,2, 3 and 4) stated on my prior post ? Looking forward to hear from you on this.

Thanks


Top
 Profile  
 
 Post subject: Re: Duplicate characters from the same source in CJK ideogra
PostPosted: Sat Oct 20, 2012 3:12 pm 
Offline

Joined: Mon Feb 01, 2010 6:18 pm
Posts: 77
#3 and #4 seem to be based on a complete misunderstanding of what the GSource, JSource, etc. fields represent. All they are is saying what other standards a particular Unicode character can be found in, and also provide a citation and justification for their inclusion in Unicode. The point of the passage is that the source listings in Unihan are not comprehensive when you have two source standards with widely overlapping repertoires. The phrase "taken only once to Source G" and pretty much the entire #4 are non-sensical. The source data are not a list of characters. They are a property of each CJK character that lists standards in which that particular character can be found.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


Quick-mod tools:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Template made by DEVPPL.com