From: Raymond Mercier (RaymondM@compuserve.com)
Date: Tue Apr 20 2004 - 17:36:48 EDT
John Jenkins writes
>>Also, even though the full Unihan database is 25+ Mb in size, given the
cheapness of disk space nowadays, it's not all *that* big, surely.
<<
The problem of the size of Unihan has nothing at all to do with the cost of
storage, and everything to do with the functioning of programs that might
open and read it.
Since the lines in Unihan are separated by 0x0A alone, not 0x0A0x0D, this
means that when opened in notepad the lines are not separated. Notepad does
have the advantage that the UTF-8 encoding is recognized, and the characters
are displayed.
If opened in Wordpad the Chinese characters do not appear, perhaps the UTF-8
encoding does not function.
If I try MS Word the machine grinds to a halt - and this is a good modern
machine (XP with 120Mb HD and 512Mb RAM).
Similarly if I open in IE6, with UTF-8 encoding, the text opens up to around
U+4C00, and then grinds to a halt.
I can open it in the HexWorkshop byte editor, or in the editor in Visual C
6, but these do not recognize UTF-8 encoding, and they hardly count as
suitable readers for such a file.
I wish the people who designed this file would accept the need for a more
structured and sophisticated approach. Why not, for example, have a basic
html file, with html-links to the various sections ?
Raymond Mercier
This archive was generated by hypermail 2.1.5 : Tue Apr 20 2004 - 18:28:58 EDT