~(char=letter) (was Re:~(char=glyph))

From: Edward Cherlin (edward.cherlin.sy.67@aya.yale.edu)
Date: Wed Sep 01 1999 - 03:32:44 EDT


At 15:07 -0700 8/31/1999, Erland Sommarskog wrote:
[snip]

>No matter whether I and Dan like it, we will have to face that <>
>may appear as <a> + <>. Not because any stupid Anglophone thinks that
>it is just an <a< with two dots over, but because there is a need
>for a non-spacing diaresis anyway. Say that I one say get the
>bright idea of devising a new, spelling-consistent script for
>Swedish. Maybe I will find that <w> + <> is a character I need. With
>Unicode I can do it. (I presume.)

Sure, you can put the precomposed form in the private use area while you
try it out, assuming that you have reasonably cooperative applications. Us
APLers already have <*> + <>, <~> + <>, GREEK SMALL LETTER ALPHA + <_>,
and so on. (U+2336-237A)

>And thus, we find that there is actually a letter in the English
>langauge that could be a victim of decomposition, to wit <i>. In-
>stead of having the precomposed letter, you might get a dotless
><i> + dot above. Whether this is legal, I don't know, but it would
>be unlogical if it's not. (And for Dean A. Snyder who questioned the
>need for a dotless <i>: you need to pick up some Turkish.)
>
>Now, I don't know what is established practice, if there is any.
>But in my opinion, the user should be saved from keeping track
>of whether the <> he has in front of him is a precomposed letter or
>an <a> + <>.

For input, Swedish keyboards must have as a single letter on a single
key, but an international keyboard will require some sort of composite
entry, along the lines of Option-u a on a Macintosh. Once inside the
program, can be a single letter for sorting purposes in a Swedesh
passage, and a composite in a German passage, even in the same document,
regardless of how it was typed.

>Actually, I submit that the default mode for most tools
>should be to be free to switch from the representation which fits
>the tool best. A tool that prefers to work with precomposed
>letters, could translate decomposed ones when reading input, and
>may write it back that way.
>--
>Erland Sommarskog, Stockholm, sommar@algonet.se

Precisely. Exactly. Totally. Just what Unicode is trying to get at. See,
you understood all along. Now how, I wonder, did we get the idea that we
were arguing with each other?

Encoding a single letter as one character or two internally says nothing
about how to type it, analyze it, store it, transmit it, or render it. A
letter can be encoded as a single composite character, or a sequence of
base character and nonspacing combining character. In either case, a
keyboard and keyboard handler can provide single keystrokes for composite
letters, or accept sequences of combining characters, or both. None of
these says anything about whether a particular font includes the composite
character or the glyphs for composing it.

--
Edward Cherlin   edward.cherlin.sy.67@aya.yale.edu
"It isn't what you don't know that hurts you, it's
what you know that ain't so."--Mark Twain, or else
some other prominent 19th century humorist and wit



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT