L2/00-059 UTC/2000-010 "Martin J. Duerst" on 2000/01/27 06:31:04 PM Subject: Re: [li18nux:308] Re: JTC1 conversion tables At 23:59 00/01/27 +0100, Keld Jn Simonsen wrote: > On Thu, Jan 27, 2000 at 07:26:19PM +0100, Bruno Haible wrote: > > Markus Kuhn writes: > > > > > The Unicode Consortium (especially Kenneth Whistler) on the other hand > > > has almost always been able to answer problems competently within less > > > than 48 hours. That is, why I haven't been impressed with the JTC1 > > > cultural registry so far as a primary source of mapping tables. I prefer > > > to align all my own mapping data (via a few simple Perl scripts) with > > > ftp://ftp.unicode.org/ and that is what I also recommend to anyone else. > > > > Furthermore the unicode.org tables represent an agreement with the > > participation of companies including Microsoft, therefore if we use these > > tables, we can hope for being interoperable with CR/LF based operating systems. > > That is one of the problems with the unicode tables, they are controlled > by a closed consortium. That consortium is less closed than it seems. > They are not procuced according to > an open process, and not standardized (they do not have a format > defined by an open standard). The standard is difficult to get, and expensive. And if error corrections take three months, that's not what we need. We need Internet speed. What I think we need is: - A format that is well defined, and easy to process with widely used tools. Who defined it is rather irrelevant, as far as we are happy with its functionality. A data format for conversions is not something you can make money with or take over the world, so whether it's defined by an 'open standard' or whoever is not that relevant here. - A location (ideally a single one) that is stable and has a certain authority, but is flexible enough to accept variants if they are needed, and react to problems quickly. Neither Unicode nor JTC1/dkuug are there yet, but Unicode is closer in my view, and we could probably get it there. If we don't get there, we use our own. - One problem I want to mention for the Unicode site is that some of the Chinese tables contain 'gost codepoints'. This according to my information resulted from the construction of the unified ideographic repertoire. Both China and Taiwan apparently added some characters to their standards because they urgently wanted them in the unified repertoire, but neither the base standards nor the fonts/implementations have followed. So the tables are not directly usable, but they haven't been changed because they are 'official'. At least that's as far as I understand the thing. - The Unicode TR #15 format at the moment has various problems that should be fixed: - There is an error in the URI escaping (reported to the author). - It is not exactly clear what can be defined with it and what not. The description should be improved. For example, it is not clear whether it can define iso-2022-jp or not. It would be nice if it could, and it should say so clearly if it can't. - It uses attributes instead of elements for some fields where free text can be used. This should be changed. - Naming should be revamped. Having a field containing a IANA 'charset' name or something else doesn't work, because there may be overlaps. - There should be only one conversion in one file. Including both usual conversions and conversions in the case of glyphs stored at control character positions should not be done. These are the main points on TR #22 I have. Regards, Martin. #-#-# Martin J. Du"rst, World Wide Web Consortium #-#-# mailto:duerst@w3.org http://www.w3.org Page 1 Document2