Re: Character missing from SECS & VSECS

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Tue Aug 18 1998 - 11:34:20 EDT


Otto Stolz wrote on 1998-08-18 20:51 UTC:
> > http://www.cl.cam.ac.uk/~mgk25/ucs/vsecs.html
> > http://www.cl.cam.ac.uk/~mgk25/ucs/secs.html
>
> In both subsets, the character
> 017F LATIN SMALL LETTER LONG S
> is missing.
>
> This letter is needed to spell German correctly, when written in Gothic
> type ("Fraktur", in German) -- even after the recent spelling reform.
> Gothic type is still used widely for decorative inscriptions (such as
> labels on food cans and beer bottles, advertisements, inn signs, even
> bank notes and postal stamps).

This is correct, but I strongly feel that this is no reason to include
it into the Very Simple European Character Set (VSEC) proposal:

I deliberately left the long s out, because it is very clearly an
archaic character that is used today in Germany exclusively in
decorative Fraktur fonts and certainly not in common German writing.
Most Germans would not even recognize a long s in a normal non-Fraktur
font. (For the record: I am German with a quite good background in
German typographic conventions.)

These were very good reasons to keep the long s out of character sets
designed for German such as ISO 646, ISO 8859, ISO 6937, CP1252, TeX,
etc.

This is a good opportunity to make the purpose of the the simple subsets
versus the large typographic subsets clearer.

Please note that VSECS and SECS are designed for a broad range of
non-typographical applications. They are NOT primarily intended for

  - the publishing industry and word processing
  - linguistic and literary research
  - archival and processing of historic text

A Fraktur font covering the VSECS and SECS collections would be a rather
strange product. I would not even know what a COPYRIGHT SIGN, a YEN
SIGN, a FEMININE ORDINAL INDICATOR or a LATIN SMALL LETTER ETH would
correctly look like in a German decorative Gothic font.

Special decorative scripts such as German Fraktur are best kept out of
the scope of the Multilingual European Subset standard entirely! The
manufacturers of special decorative fonts know quite well which subset
makes sense for the specific decorative font to cover, ISO 10646 tells
them the correct code number, and there is not much of a need for subset
interoperability for decorative fonts.

The common subsets are first of all needed to ensure interoperability
and predictability in applications where the sender knows nothing about
the font used by the receiver. Email is a good example.

VSECS and SECS are designed to include only the essential minimal set of
frequently required characters of all European languages. If you are are
only concerned about the character set of your DTP software, then
consider that possible applications for VSECS include message display on
mobile phones, printing the name on your passport, program preview
listings in a digital TV set-top box (actually the DVB digital TV
standard is switching between the 8859 variants at the moment because no
suitable 10646 subset had been defined when they looked for one), and so
on. VSECS/MES-1 is certainly NOT intended to become the recommended
character set for say graphical design software.

The long s might be a valid character for the MES-2 and MES-3 subset
proposals, which address primarily the need of the word processing and
publishing industry.

VSECS and SECS are my proposals for the lower end of the planned CEN/
TC304 UCS subset hierarchy. We would stretch the term "simple character
set" quite a bit if we would require in the upcoming CEN standard, that
in non-typographic applications such as email display on mobile phones
or program preview listings in DVB set-top boxes a character that makes
only sense in Fraktur fonts is required by every implementation in
Europe.

I think we should apply the same criteria applied to the long s when we
discuss whether other controversial characters such as some of the
8859-14 characters for old Irish and Welsh-dictionary usage or the Greek
polytonic characters are appropriate for a simple character set for
non-typographic applications.

[I am fairly sure that the polytonics should be kept out of the simple
non-typographic sets. I am not yet really sure about the 8859-14
characters. I received one email so far telling me that Dutch users
absolutely need the LIGATURE IJ, but I am also not sure yet whether this
isn't as much a misunderstanding or a somewhat exotic opinion as the
request to add the long s for German most certainly is.]

In the design of the 8-bit sets, the code size restrictions forced the
designers to stick with common sense when they tried to select the
essential characters. The result were sicely small sets where there was
a reasonable good rationale for most included characters. With the step
to small 16-bit subsets, the restrictions that led to nice and simple
sets is now gone and we have to be careful not to add too many rarely
required special purpose characters to even the smallest UCS subset.

Markus

-- 
Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK
email: mkuhn at acm.org,  home page: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT