Re: ISO10646-1 XLFD registration

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Sun Apr 02 2000 - 16:31:54 EDT


Erik van der Poel wrote on 2000-03-28 04:01 UTC:
> Juliusz Chroboczek wrote:
> >
> > >> iso10646-1 (Registry=iso10646, Encoding=1; why 1?; that's the BMP)
> >
> > CJF> Shouldn't that have been iso10646-0? The BMP is plane 0 not plane 1.
> >
> > Exactly my reaction when I heard about the registration.
>
> I was not involved in the choice of the XLFD suffix name *-iso10646-1,
> but the ISO 8859 series comes in a number of "parts". Part 1 is commonly
> referred to as ISO 8859-1. Similarly, ISO 10646 has a "Part 1", which is
> "Architecture and Basic Multilingual Plane".
>
> Since the XLFD suffix for the 8859 series is *-iso8859-1, etc, the
> suffix *-iso10646-1 is actually just following that same convention. I
> don't know if that was the original reason for naming it that way
> though. Perhaps Markus would know.

Yes, you guessed the idea correctly. For the historic record, I attach
the original X Consortium registration email below.

The question of what to do with other planes came up and there are
various options, including but not limited to

  - finally doing the urgently needed fundamental revision of the X11 font
    architecture, which should provide for

      o 31-bit character set
      o proper separation between characters and glyphs
      o ligature support
      o combining character support
      o efficient subset support
      o etc.

  - use ISO 10646-2, -3, -4, etc. (decimal) to refer to plane 01, 02, 03, etc.

  - use ISO 10646-01, -02, etc. (hexadecimal) instead.

Since there is no published ISO 10646-2 or plane 1 at the moment, I
still think it is best that we defer the question until the problem
actually materializes in the form of a published plane 1 (which I guess
will take a few more years). There are far more urgent issues with X11
fonts to be solved in plane 0, especially with regard to the Indic
scripts, combining accents, and the memory efficient handling of sparse
16-bit fonts.

> Is he at the conference?

No.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

To: "Kaleb S. KEITHLEY" <k.keithley@opengroup.org> Cc: Roman Czyborra <czyborra@dds.nl>, Frank Tang <ftang@netscape.com>, Mark Leisher <mleisher@crl.nmsu.edu>, Primoz Peterlin <primoz.peterlin@biofiz.mf.uni-lj.si>, Kenichi Handa <handa@etl.go.jp>, Gaspar Sinai <gsinai@iname.com>, everson@indigo.ie, jenkins@apple.com Subject: Re: XLFD for Unicode fonts Date: Fri, 27 Mar 1998 22:04:09 +0000 From: Markus Kuhn <mgk25@cl.cam.ac.uk>

Registering ISO 10646 in the X11 font naming scheme

After very useful suggestions from John Jenkins, Roman Czyborra, I'd like to revise my 1998-03-24 proposal for registering FONT CHARSET (REGISTRY AND ENCODING) NAMES for ISO 10646 fonts in the X11 registry <ftp://ftp.x.org/pub/R6.3/xc/registry> as follows:

"ISO10646-1" ISO Universal Multiple-Octet Coded Character Set (UCS), Basic Multilingual Plane, equivalent with Unicode, where the version, or implemented subset is not further specified. It is suggested that the national style variant of the Han ideographs in the font is indicated in the ADD_STYLE_NAME field, for instance as "ja" for Japanese, "ko" for Korean, "zh_CN" for Chinese and "zh_TW" for Taiwanese (ISO 639 and ISO 3166 codes). [1,2]

References:

[1] ISO/IEC 10646-1:1993, Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane, International Organization for Standardization, Geneva, 1993, plus amendments. <http://www.iso.ch/>

[2] The Unicode Standard, Version 2.0, Addison-Wesley, 1996, ISBN 0-201-48345-9. <http://www.unicode.org/>

Rationale:

- I suggest to register ISO10646 instead of UNICODE for consistency with the existing other ISO names in the registry and because while the code tables of both standards are equivalent, the Unicode standard defines additional semantic properties of the characters which are not relevant for an X11 font registry and encoding identifier.

- I did not add a version or year identifier (currently Unicode 2.1 or ISO 10646-1:1993 plus amendments 1-19), because while the standards are changing fast and will not be stable for the foreseeable future, the changes are minor and upwards compatible and therefore not of much concern. Who wants to identify particular versions can register further names for these as well, if anyone really thinks that they are needed. See also RFC 2044, where no version number has been used to identify ISO 10646/UTF-8 in MIME.

- The ADD_STYLE_NAME field seems to be a suitable place to store information about which national style of the Han ideographs is used in the fonts. Such an indicator seems to be appropriate, because there has been a lot of debate against Unicode in the CJK community claiming that Unicode does not address the stylistic differences of Han ideographs as used in China, Taiwan, Japan, and Korea, so some software will probably implement font selection based on language tagging and a systematic naming scheme will be helpful here. The suggested abbreviations are ISO 639 and ISO 3166 codes as they are used to name POSIX locales.

- Fixed subsets of ISO 10646 can be registered as well when the need should arise and stable candidates are available. Potential candidates could be MES-1, MES-2, the OpenType Windows Glyph List 4 (WGL4), RFC 1815, etc.

Further comments and suggestions are very welcome.

Markus G. Kuhn University of Cambridge Computer Laboratory New Museums Site, Pembroke Street Cambridge CB2 3QG United Kingdom

Phone: +44 1223 3-34676 Fax: +44 1223 3-34678 Email: Markus.Kuhn@cl.cam.ac.uk mkuhn@acm.org URL: http://www.cl.cam.ac.uk/~mgk25/

Markus

-- Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK email: mkuhn at acm.org, home page: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT