Re: Unicode Character Discussion (s-long)

From: Michael Everson (everson@indigo.ie)
Date: Fri Dec 13 1996 - 11:02:11 EST


At 14:52 1996-12-09, John M. Fiscella wrote:
>Hi Michael:
>
>Thank you for the clarification for 017f. The situation appears
>to be worse than I thought.
>
>You wrote:
>>0053 LATIN CAPITAL LETTER S
>>0073 LATIN SMALL LETTER S
>>017F LATIN SMALL LETTER LONG S
>>
>>The laft character here is the one that looks juft like a fmall letter f
>>without the crofs bar. In italics it does, juft as the f does, take a long
>>defcender."
>
>Thif answerf my queftion explicitly: location 017f referf to "antique s".

(Note. This is incorrect. The LONG S is not ufed at the end of words. You
muft write "This anfwers my queftion".)

>The dilemma: a few African dialects, in typographic examples I
>have obtained, apparently use *both* the 'long s' (which I
>thought 017f was to represent) and 'esh' (0283), which is the IPA
>character you referred to. Since 017f was intended (as you
>indicated) for use in Celtic languages (Irish or Scotch Gaelic
>perhaps?)

No, it is the usual long S used throughout Europe in the 18th century. The
U.S. Constitution uses the form "Congrefs". German Fraktur writing uses
this long s as well. The long s with dot above is used in Irish Gaelic when
written in the Gaelic (variant of the Latin) script.

>and since these languages are based on older alphabets
>(Runic is another extreme example of using an old alphabet),

I don't think this is true.

>this character, even though looking like an 'antique s' really
>is a distinct character. This would be especially the case if it
>could be used simultaneously with 'small letter s'.

Which it is.

>So far so good.
>This situation is reasonable. The 'antique s' as we know it in
>the U.S. is strictly a style variant of the 'small s' and
>therefore should not appear in an encoding in and of itself.

Well it does, and this is convenient. It allows for plain-text encoding of
this character. I think it is a very good thing that long s is encoded, and
that long s with dot above is encoded. This suits our needs for text
interchange in Ireland very well. Documents are complex and the positional
variants (as I mentioned above about long s not appearing at the end of
words) are not really predictable so it is better to encode this important
historical variant.

>(Its ligature is another question--see below.)
>
>But when that same character is amalgamated into a ligature
>(location fb05) which, commonly, in oldstyle typefaces has the
>representation of 'antique s - t' (along with 'e - t', 'c - t',
>'s - p' and others also represented in such a typeface) it really
> does cause some confusion. Is there a 'long s - t' ligature used
> in Celtic?

No. It is used in some

>This would justify the character in fb05. Or, was
>fb05 a case of a committee not paying attention to real meaning?

The character/glyph model is a reasonable model, but it is not gospel, and
breaking with it for certain good reasons is useful. Those ligatures
probably came from existing coded character sets. They aren't very harmful.
If about 8 more were added, in fact, the full mandatory repertoire of
ligatures required for German Fraktur typography would be accommodated.

>The situation is worse because the 'long s' of African
>orthographies apparently is not present in ISO10646/Unicode,
>as I thought it was in location 017f. The sound of a 'long s' is
>not the same sound as an 'esh'.

That doesn't matter. The sound of the letter "c" is hardly the same in most
of the languages of the world.

>If these two characters were
>deliberately unified, then there is a problem in principle. If
>this distinction was overlooked, then there is a chance 'long s'
>may be added in the future, but if it isn't, that is not a
>problem, because the character can be placed in a Private Zone.

Does this African long s (which you say has a tail like ESH does) have a
capital form? Is this capital form the one that looks like Greek Sigma?

>In developing our Unicode typefaces, we have placed many
>additional characters in the Private Zones, a whole slew of them
>being 'antique' ligatures ('antique s - t','e - t' , etc.). The
>case of ligatures (some called "tied letters" because those
>antique ligatures have a fancy tie-bar connecting both glyphs)
>using this 'antique s' is a peculiar case because most antique
>ligatures become antique due to the tie-bar, not due to one of
>the character components composing it. 'antique s' is the only
>antique stand-alone characer that I know of which is commonly
>shown.

Yes.

>[ We all know of the controversy as to
>whether ligatures should be included in an encoding standard. My
>own personal opinion is that they should not, under ideal
>conditions. Unfortunately, some legacy encodings have already
>featured them (Mac encoding, Adobe standard encoding) and this
>controversy really revolves around usage.

Usage is better than philosophy, isn't it? :-)

>If
>ligature-substitution machinery were common in applications, then
>ligatures could be included in a font without being included in an
>encoding. But since the Silicon Valley and Seattle people barely
>acknowledge the
>existence of ligatures, let alone architect software in a manner
>in which either ligatures or custom encodings can be used, the
>only modality for use is to include them in standard encodings,
>as was done in ISO10646/Unicode.

I guess there are some moves as regards this (Apple's QuickTime GX is
supposed to be able to do this but I only saw one experimental implemention
once) among the manufacturers.

>It is far too complicated and
>restrictive (and therefore makes no sense) to include automatic
>ligature substitution in a font for Roman alphabets: the use of
>ligatures *must* be end-user discretionary.

Certainly.

>If fact, there was
>an era (in the U.S.) where oldstyle ligatures were used in some
>words but not in others in the same document. How would a font
>know to do that? This, however, is not a problem in Arabic,
>Indic alphabets, or Hangul, because it is unquestionable (not
>optional) that ligatures be used.

This is why the most common Latin ligatures (and there are not that many)
should have a place in the standard, in my opinion.

>If you check out any of the AFII Glyph directories, you can see
>the difference between a "long s" and an "antique s". They have
>different AFII numbers. This does not necessarily mean they are
>different characters, but it means they are different glyphs. Two glyphs
>of the same character must be stored separately there,
>but two characters using the same glyph may or may not be stored
>separately. There are instances present in the AFII register
>with all three situations.

I don't understand how this relates to the African problem, though.

>Ciao!
>John Fiscella
>Production First Software

Best regards,

--
Michael Everson, Everson Gunn Teoranta
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire (Ireland)
Gutháin:  +353 1 478-2597, +353 1 283-9396
http://www.indigo.ie/egt
27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT