L2/03-285 Title: Submission of kHYPLCDPF data for inclusion in Unihan.txt Contributor: Richard Cook Date: August 24, 2003 It is proposed that selected data from 1st edition 1986, 2nd printing 1990 (ISBN 7-5619-0094-5/H.67) [Modern Standard Beijing Chinese Frequency Dictionary] be added to Unihan.txt under the field name "kHYPLCDPF" (Hanyu Pinlu Cidian Pinyin and Freqency). This data derives from a 440,799 character corpus, cutting across 4 linguistic genres ("News", "Scientific", "Colloquial", and "Literature"). The kHYPLCDPF electronic data provided to the Unicode Consortium, excerpted and adapted from the print source, presents pinyin pronunciations and frequencies for 3,800 common Modern Standard Beijing Chinese characters. This data has been proof-read over a number of years, and is in very good condition. It may be useful to Unicode implementers, particularly those designing Mandarin input methods. It may also prove useful to the subsetting issue currently before the IRG, and for this reason it is suggested that the IRG be made aware of this data, once it is added to Unihan.txt. Links to the data are as follows: http://linguistics.berkeley.edu/~rscook/UTC/kHYPLCDPF-20030204/kHYPLCDPF-header.txt http://linguistics.berkeley.edu/~rscook/UTC/kHYPLCDPF-20030204/kHYPLCDPF.txt Please see the file `kHYPLCDPF-header.txt' for additional description of the data. -------------------------- Richard S. Cook UC Berkeley Linguistics Dept.