Re: Why no combining-character form for U+00F8?

From: Jukka K. Korpela <jkorpela_at_cs.tut.fi>
Date: Fri, 17 Aug 2012 09:35:25 +0300

2012-08-17 1:44, Ian Clifton wrote:

> Andreas Prilop <prilop4321_at_trashmail.net> writes:
>
>> On Thu, 16 Aug 2012, Ian Clifton wrote:
>>
>>> Having just been to Norway, and wanting to email my friends all
>>> about it, I came across a curiosity: neither of the combining
>>> characters U+0337, U+0338 seem to work in usually-reliable Emacs
>>
>> Windows:
>> http://en.wikipedia.org/wiki/Keyboard_layout#US-International
>>
>> Unix:
>> http://en.wikipedia.org/wiki/Compose_key#Common_compose_combinations
>>
> Maybe I should explain at this point: I’ve got used to using combining
> characters as a way of composing characters myself, using direct input
> of characters by hexadecimal character number (<ctrl-X> 8 [RET] hex
> [RET] in Emacs, <shift><ctrl-U>hex<shift><ctrl> in many Unix tools).

It’s certainly useful to know one universal method for entering any
Unicode character in one’s favorite environment. Even if it is somewhat
clumsy, with a key combination prefix that looks odd on first sight,
it’s convenient to use once you’ve learned it and use it regularly.
Often such methods are more or less hidden in software. (I have used
Emacs for about 30 years, and I did not know about <ctrl-X> 8 ...)

> Not
> the most efficient method, but by remembering the character numbers of a
> handful of combining accents, I can assemble most of the accented
> characters I use. Perhaps I should start trying to learn these compose
> combinations, as they’re shorter and mostly mnemonic.

It depends, especially on the frequency of needing a character. If you
e.g. need Latin 1 supplement characters frequently, US International
keyboard layout can be handy, but you need to check the placement of a
character until you’ve learned to memorize it. There is nothing
intuitive or easy to remember in the allocation of “ø” to the key
combination AltGr L (or at least I fail to see any). Using Unicode
numbers, you would still need to check the number until you memorize it,
and there is nothing intuitive in the allocation of “ø” to U+00F8. But
memorizing a Unicode number (for a frequently needed character) lets you
enter it in any software that has some general method for entering a
character by the number. And modern editors and word processors normally
have.

There is an essential difference between using combining mark and using
a precomposed character: they are distinct at the character level, and
in any processing, they are handled as distinct unless programmed to
treat them as “the same”, either via normalization or otherwise. For
example, in Emacs, E <ctrl-X> 8 [RET] 301 [RET] produces an “e” followed
by a combining mark, and while Emacs displays it as “é”, it’s still
different (at character level) from what you get by <ctrl-X> 8 [RET] E9
[RET]. In searches, for example, they do not match. In rendering, they
would normally produce the same result, but not necessary; transferred
to another program, they might produce different results, if the program
cannot handle combining marks or handles them too simply.

My point is that on practical grounds, precomposed characters like
U+00E9 for “é” are generally safe, especially in older software, than a
representation using a combining mark. (For “ø”, there is no choice, as
discussed.)

Yucca
Received on Fri Aug 17 2012 - 01:40:36 CDT

This archive was generated by hypermail 2.2.0 : Fri Aug 17 2012 - 01:40:48 CDT