Re: U-Source ideographs mapped to themselves

From: Uriah Eisenstein (uriaheisenstein@gmail.com)
Date: Tue Aug 31 2010 - 10:32:33 CDT

  • Next message: Leonardo Boiko: "Re: ,,semi-virgula''"

    Thanks for the answers (and sorry for the somewhat late reply),
    My interest in this question is purely technical - as I've mentioned
    elsewhere, I'm trying to load Unihan data into an SQL database*, so
    occasionally I need more details about the contents of fields without
    actually using them. In this case I guess I'll ignore non-UTCnnnnn values
    since they are to be changed anyway in the next version of Unicode.
    Regarding your question, mpsuzuki, I assume the data in Unihan should
    represent the source of the ideograph as precisely as possible, which may be
    considered "historical background info". But the mapping of ideographs to
    themselves is unclear; ultimately, I guess sources may had better be
    associated with specific glyph variants (expressed as IVS), which I
    understand is still a bit far... Anyway, since I'm not directly using the
    data, I can't say for sure.

    * I'm aware of the existence of libUnihan, but I couldn't find it's latest
    versions which are supposed to support Windows, and anyway I'm doing
    something somewhat different.

    Regards,
    Uriah Eisenstein

    On Mon, Aug 30, 2010 at 9:02 PM, John H. Jenkins <jenkins@apple.com> wrote:

    >
    > On Aug 29, 2010, at 6:07 AM, Uriah Eisenstein wrote:
    >
    > Hi,
    > UAX #38 (Unihan) defines the kIRG_USource field as a reference into the
    > U-source ideograph database described in UTR #45, having the form "UTC
    > nnnnn". However, several CJK Compatibility Ideographs are mapped to their
    > own code point values, e.g. "U+FA0C kIRG_USource U+FA0C". The formal
    > syntax of kIRG_USource allows this, but I've found no explanation as to the
    > meaning of such a mapping; there is also no such mapping from a code point
    > to another code point.
    > Thanks,
    > Uriah
    >
    >
    > This is being changed with the 6.0.0 release. The U-source for all such
    > ideographs has been turned into a UTR #45 index, e.g., the U-source for
    > U+FA0C is now UTC00915.
    >
    > What it means is that the character is a unifiable variant derived from one
    > of the industrial (and not national) sources used by Unicode during the
    > development of the original URO.
    >
    > =====
    > John H. Jenkins
    > jenkins@apple.com
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Tue Aug 31 2010 - 10:35:57 CDT