character names (questions)

From: Viranga Ratnaike (viranga@mds.rmit.edu.au)
Date: Thu Apr 06 2000 - 04:11:37 EDT


Dear Unicoders,

        I have 4 questions about character names:

        (1) how does one figure out the character names of the code points
            (in ranges in the UnicodeData.txt file)? Is there a separate
            file? Can you auto generate them and if so how?

            For example: if I wanted to find the name of code point U+5728
            where would the information be?

            I'm auto generating data structures; Using UnicodeData.txt, as
            input, gets me most of the way (I think). The gaps occur for
            the ranges:

                3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
                4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
                4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;;
                9FA5;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;
                AC00;<Hangul Syllable, First>;Lo;0;L;;;;;N;;;;;
                D7A3;<Hangul Syllable, Last>;Lo;0;L;;;;;N;;;;;
                D800;<Non Private Use High Surrogate, First>;Cs;0;L;;;;;N;;;;;
                DB7F;<Non Private Use High Surrogate, Last>;Cs;0;L;;;;;N;;;;;
                DC00;<Low Surrogate, First>;Cs;0;L;;;;;N;;;;;
                DFFF;<Low Surrogate, Last>;Cs;0;L;;;;;N;;;;;

            ...and also for the private use ranges
                (which we'll probably be needing).

        (2) how do I locate the ISO/IEC character naming guidelines?
            I looked in "The Unicode Standard Version 3.0" and it refers
            me to Informative Annex K of ISO/IEC 10646. Is the information
            available electronically? I looked at the ISO site and it said
            that "there is no electronic access to the contents of ISO
            standards" (http://www.iso.ch/infoe/faq.htm#Standards). It did
            mention that this was in the pipeline, but didn't say when.

        (3) when surrogates are introduced, will there be mappings from
            surrogate pairs to character names? Will they be included
            in later versions of UnicodeData.txt? It's not an issue at
            the moment, but I'd like to structure my code such that I can
            just slot in surrogate code later.

        (4) why are they called "character names" and not "code point names"?

Regards,

        Viranga

Email: viranga@mds.rmit.edu.au
Phone: +61 3 9925 4124 (Work)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT