From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Mon Mar 07 2005 - 08:36:25 CST
On Mon, 7 Mar 2005, Michael Everson wrote:
> a "decree" that hyphenated family-names should no longer be
> spelled with hyphens but now with an en-dash or em-dash does not have
> to do with encoding.
That is correct, though it might have some impact on considerations on
properties of characters. The line breaking rules, for example, reflect
some ideas of common use of characters and even combinations of
characters, and if the actual usage changes considerably, maybe some
reconsideration is needed.
But what matters in encoding is how the characters have been specified.
Unfortunately many standards and recommendations that prescribe the use of
special characters do not identify them by Unicode numbers or names or in
any other unique manner. When the normative version of a norm is a printed
document, it can be impossible to decide what character is meant.
All we have got is a particular glyph instance. For example, what is the
dot-like character used in multiplication of units in the SI? People
commonly encode it as the middle dot, but it would more logically be the
dot operator.
Similarly, what have the French officials really decided? I had understood
that the rules say that two consecutive hyphens be used. (That would be
somewhat vague too, since they might not have considered the differences
between hyphen-minus, minus, and nonbreaking hyphen.) But is it really the
en dash, or the dash, or just some dash-like character (pair?) of
unspecified length and identity?
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Mon Mar 07 2005 - 08:37:34 CST