RE: Character identities

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Tue Oct 29 2002 - 15:07:16 EST

Next message: starner@okstate.edu: "Re: Character identities"

Previous message: John Hudson: "Re: Unicode plane 14 language tags."
Maybe in reply to: David Starner: "Character identities"
Next in thread: Michael Everson: "RE: Character identities"
Maybe reply: starner@okstate.edu: "Re: RE: Character identities"
Reply: Michael Everson: "RE: Character identities"
Reply: Keld J�rn Simonsen: "Re: Character identities"
Reply: Kent Karlsson: "RE: Character identities"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Kent Karlsson wrote:
> Marco,

Keld, please allow me to begin with the end of your post:

> Marco, please calm down and reread every sentence of my
> previous message. You seem to have misread quite a few things,
> but it is better you reread calmly before I try to clear
> up any remaining misunderstandings.

I have been absolutely calm, and I apologize if I gave a different
impression. I may happen to heat up when discussing things like ethics,
politics, religions, racism, war, etc., but definitely not when discussing
about the details of the Unicode character-glyph model.

I wish to recall that we are just discussing about a glyph variation for a
diacritic character: a variation that I consider acceptable and you consider
undesirable. Please let's not make this bigger than it could reasonably be.

> Standard orthography, and orthography that someone may
> choose to use on a sign, or in handwriting, are often not
> the same.
>
> And I did say that current font technologies (e.g. OT)
> does not actually do character to character mappings,
> but the net effect is *as if* they did (if, and I hope
> only if, certain "features" are invoked, like "smallcaps").
> It would be more honest to do them as character-to-character
> mappings though, either inside (which OT does not support)
> or outside of the font. Capital A, even at x-height, is not
> a glyph variant of small a (even though, centuries ago, that
> was the case, but then I and J were the same, and U and V,
> et and &, ad and @, ...). But displaying U as V (in effect
> doing a character replacement on a copy of the input) would
> be ok in a non-default mode (using the "hist" feature, say).

I insist that you can talk about character-to-character mappings only when
the so-called "backing store" is affected in some way. If the backing store
is not changed, it is only a character-to-glyph mapping, however complicate
and indirect it may be.

Whether these mappings takes part inside or outside a font is irrelevant as
far, again, as the backing store is not changed.

> My point here is that that replacement (effectively) should
> not be done by default in a Unicode font (see Doug's explanation
> for what a Unicode font is, if you don't like mine).

I totally agree with Doug's careful definition, and I am glad that you agree
as well.

Doug indicates two key points that a font must respect to be suitable for
Unicode:

« [...] calling a font a "Unicode font" implies two things:
1. It must be based on Unicode code points. [...]
2. The glyphs must reflect the "essential characteristics" of the Unicode
character to which they are mapped. [...] »

If we agree that the only requirement for a glyph representing a certain
Unicode character is to respect the "essential characteristics" which make
it recognizable, then all our discussion is simply about determining which
"essential characteristics" a particular character is supposed to have.

To me, a glyph floating atop of letters "a", "o" and "u" is recognizably a
German umlaut if (a) the text is written in German, and (b) the glyph has
one of the following shapes:

1. Two small "blobs" (e.g. circles, squares, acute accents) places side by
side;
2. A straight horizontal line;
3. A wavy horizontal line;
4. a small lowercase "e", or something recalling it.

I don't argue this for caprice or provocation, but because these particular
shapes are commonly attested in one context or another: be it modern
typography, traditional typography, handwriting, fancy graphics, etc.

You seem to argue that only case 1 is acceptable, and probably also add some
constraints on the shape of the "blobs" (e.g., I think I understood that you
find that a double acute shape would be unacceptable).

As I see it, the only reason for which you say this is because the other
shapes are similar or identical to the typical shapes of other Unicode
characters. As I said, I don't find that this is valid reason, unless the
font we are talking about is to be used in contexts (e.g., linguistics, or
languages other than German) in which the distinction is meaningful.

> > [...] I never heard that U+0364
> > (COMBINING LATIN SMALL LETTER E) is part of the spelling of
> > modern German or Swedish.
>
> True (that is not part of modern standard orthography),
> but I don't see how that could imply some kind of support
> for your (rather surprising and extreme) position.

(Frankly, I find surprising and extreme your position -- perhaps we're only
choosing bad examples.)

What I meant is that if (a) U+0364 is not supposed to appear in modern
German, and (b) the font we are considering is designed to be used for
modern German only, then (c) the possibility of confusing U+0364 with U+0308
is a non issue.

> If (and only if!) the author/editor of the text asks for an
> overscript e should the font produce one. It is not up to
> the font maker to make such substitutions without request,

Yes. But a font which displays U+0308 with a glyph resembling the typical
glyph for U+0364 is not "producing" anything; it is not "substituting"
anything with anything else: it is just faithfully reproducing the text,
according to the content decided by the author *and* according to the
typographical style decided by the font designer.

> > More *symbol* characters which escape the general rule.
>
> Math Fraktur A is a letter (of course!). Many letters,
> including ordinary A, are used as symbols too.
>
> You seem to argue that for "symbols" (whichever those are,
> I'm sure you *don't* mean general categories S*...) there is
> total rigidity, while for "non-symbols" (whichever those are)
> there is near total anarchy and font makers can change glyphs
> to something entirely different.

No, I was not referring to general categories (which, by the way, should
have better been "Sm" for the math extensions, IMHO).

While an ordinary "A" is intended for any kind of usage (including as a
symbol), an "A" from the Math extensions is intended for usage as a symbols
inside math formulas, where the typographical style conveys a precise
meaning, rather that just being an esthetic choice.

For these math characters the font designer's freedom is unusually narrow,
because these character *must* look like Fraktur, sans-serif, monospaced,
bold, etc., in order to be useful.

So, what I meant is that these are poor examples of the level of stylistic
freedom which is *normally* granted to type designers.

> I claim that there are no characters for which there is total
> anarchy (except possibly for "view invisibles" of normally
> invisible characters),

(It depends what you mean by "total anarchy". To me, standards are a form of
self-discipline which might a possible basis for an orderly anarchy. But I
guess that this would lead us way way way off the topic...)

> but that there are several degrees
> of flexibility (I'm sure someone can list more than three,
> but here is a coarse division):
>
> 1. glyph (almost) fixed: Dingbats, estimated sign, ...
> [could possibly be given a rugged look, or texture
> if you want to mimic e.g. a typewriter look]

Fine.

> 2. "abstract" glyph is fixed but there can be minor
> shape variations: diacritics, math symbols (Sm),
> "math letters" (there are several Math Fraktur
> designs, several Math sans-serif designs, etc.
> that could suit), Arabic presentation forms (initial/
> medial/final/isolated decided but other aspects are
> not fixed, maybe this case is between 2 and 3), ...

An "abstract glyph" is a very interesting concept, and I have been chasing
it till I first heard about Unicode... But I am starting to think that it
can be a chimera.

For the time being, I would rather stick to Doug's more generic "essential
characteristics".

> 3. fairly free as long as (some) readers recognise
> the character from the glyph (modulo compatibility/
> canonical variants and what should have been
> compatibility/canonical variants...): "nominal"
> digits/letters/punctuation, ... [This, however,
> does NOT allow, e.g., the One Thousand C D character
> to be shown with an M glyph, nor display € as EUR, ...
> in a Unicode font in...; if it did so in default mode
> ["by default"], it would not be a Unicode font.]

And this is the area where discussions are more likely to occur, and the
area where I think we are with this discussion.

The fact is that, out of the special cases 1 and 2 above, "essential
characteristics" are not always, not necessarily *visual* characteristics.
In several cases, the *semantics* of the character may play a role even
bigger. IMHO.

> [4. Near anarchy; you seem to argue that a large part of
> case 2 and all of case 3 fall here...]

Parse error on "Near anarchy". Anarchy is not something we need to discuss
in this century.

> Yes, you can have glyphic variation, but for the diacritics
> there is (by design, but maybe not sufficiently explicit
> stated in the book) a limit to how much it can vary ("in
> default mode"). There are limits also for, e.g., 'nominal'
> letters and roman numeral characters, that are (by design)
> somewhat less constrained. In addition you may note that
> those who asked for the inclusion of overscript e does not
> regard an overscript e glyph to be an acceptable way
> of displaying a diaeresis [in a Uni..., you know].

I guess that those who need a distinct overscript "e" probably do not
typeset their academic works using fantasy fonts, do they?

> These things come up quite often in discussions about
> proposals to add characters, even though it is not formally
> stated. If some of the "Unicode elders" care to elaborate,
> please feel free.

Yes, please. Keld and I are starting to be repetitive...

> Marco, I'm not sure it is of any use to try to explain in
> more detail, since you don't appear to be listening.

I listened to everything you've been saying, and I found that a lot of it
was very sensible. A lot, but not all of it.

> However,
> I think I, Marc, Doug, and Mark (at the very least) seem
> to be in approximate agreement on this (at least, I have
> yet to see any major disagreement).

Unless I missed something, Mark (Davis) did not express his position about
this (not on this public list, anyway), and Doug said something very
equilibrated on which we both agree.

> I'm sure Michael would agree too (at least I hope so), and many others.

There are many Michaels and many "others" here... If any of them wish to
intervene, I hope they'll rather say something new to take the discussion
out of the loop, rather than joining one faction.

Ciao.

_ Marco

Next message: starner@okstate.edu: "Re: Character identities"
Previous message: John Hudson: "Re: Unicode plane 14 language tags."
Maybe in reply to: David Starner: "Character identities"
Next in thread: Michael Everson: "RE: Character identities"
Maybe reply: starner@okstate.edu: "Re: RE: Character identities"
Reply: Michael Everson: "RE: Character identities"
Reply: Keld J�rn Simonsen: "Re: Character identities"
Reply: Kent Karlsson: "RE: Character identities"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Oct 29 2002 - 16:10:53 EST