The Unicode Consortium Discussion Forum

The Unicode Consortium Discussion Forum

 Forum Home  Unicode Home Page Code Charts Technical Reports FAQ Pages 
 
It is currently Fri Jul 25, 2014 6:06 pm

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 4 posts ] 
Author Message
 Post subject: Get all unicode as csv or ...
PostPosted: Thu Dec 29, 2011 6:11 pm 
Offline

Joined: Thu Dec 29, 2011 4:20 pm
Posts: 2
I am trying to find a database with all unicode characters (100,000 now?) - just their name and their code, description or anything else I can get. I suppose my newness is showing, LOL. All I can find is pdf but I need csv or tab. Is there any such thing?

Thank you for ideas. ps, I did search and found some posts but none addressing what I need. And I had searched the internet for several hours before resigning myself to ask for help. :)


Top
 Profile  
 
 Post subject: Re: Get all unicode as csv or ...
PostPosted: Fri Dec 30, 2011 3:14 am 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 186
The names of Chinese Ideographs and Korean Hangul characters are assigned algorithmically. You will find that information in the text of the standard. The names of all the other characters you can find in the file UnicodeData.txt in a semicolon delimited form. This file is part of the Unicode Character Database which contains many data files containing various property listings describing the Unicode characters.

I hope this gives you something of a start.


Top
 Profile  
 
 Post subject: Re: Get all unicode as csv or ...
PostPosted: Fri Dec 30, 2011 1:07 pm 
Offline

Joined: Thu Dec 29, 2011 4:20 pm
Posts: 2
Thank you very much for this! :D

Unicode data txt is exactly the kind of thing I am looking for. Is there anywhere that I can find what each heading should be, such as an original online source? I do not mind adding the field names myself.


Top
 Profile  
 
 Post subject: Re: Get all unicode as csv or ...
PostPosted: Fri Dec 30, 2011 6:09 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 186
Formal names for all the character properties can be found in the file PropertyAliases.txt in a semicolon delimited format. This file is part of the Unicode Character Database which is more fully described in the documentation file for the UCD. The documentation should tell you which name goes with which property.

The Unicode Character Database (UCD) is a very rich repository of machine readable information on Unicode characters, but the way it is presented in files is a bit quirky. You would be well advised to take some time to carefully study the documentation as well as the description of character properties in chapters 3 and 4 of the Unicode Standard.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 2 guests


Quick-mod tools:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Template made by DEVPPL.com