From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Apr 29 2004 - 18:10:32 EDT
From: "Peter Constable" <petercon@microsoft.com>
> I wrote:
> > If you introduce c-stroke and C-stroke, they should come immediately
> > as a pair
>
> It does not necessarily have to happen that way. There are plenty of
> cases of lowercase letters without uppercase counterparts in the UCS.
> If had had good evidence for C-stroke, I would have proposed it at
> the same time, but I did not. The C-stroke can be proposed after the
> c-stroke has already been added to the standard, and since UTC has
> already accepted the c-stroke, that is what would necessarily have to
> happen.
OK, so we agree about the fact that unifying currency symbols with
letters is ill. A good justification also for the separate encoding of the
CEDI currency sign.
So now we are left with orthographic/phonetic letters. c-stroke is one that
was covered in your searches. But now that we know that capital C-stroke
is also used, can Unicode be updated later to add a case mapping for
c-stroke, if C-stroke is added later?
Aren't case mapping normative properties, thus subject to the stability policy?
What would happen if for example these characters needed to be used for
book or chapter titles (uppercased or with small capitals)? How could most
letters be converted from lower to upper case, leaving the c-stroke alone?
Same problem if this happens for people names or toponyms (uppercase letters
are often needed, sometimes required for example to note postal addresses,
or to fill in some administrative forms.)
If C-stroek is left unencoded, people will need to use hacks like font size
adjustments for this letter only, or to use simple C + a combining overlay
to print the missing character properly. I think that this may already happen
today (because users are using the existing standard with the characters
it currently supports, even if decomposed characters with overlays are
a hack:
This already happens for African languages with missing Latin letters
with horizontal bars, and various hacks are used to render this overlay, more
or less successfully. So to solve the problem, they create specialized fonts
that will already render correctly the decomposed combining sequence.
(I have found such occurences within some existing fonts developed and
published by SIL.org... demonstrating that the missing uppercase letters
were actually needed: SIL is clearly interested in supporting actual languages
in existing communities, and to ease the transcription of sacred texts
(notably the Biblic and Quranic texts) in the missing languages with their
local native orthograph (which was created to cover the language with
its unique phonetic and grammatical or semantic rules).
Notably in Africa, South Asia, Mexico, and in native South-American
communities, where the sacred texts are needed and awaited since long
to help support education.
Just with the Latin script, I have seen numerous examples were the
existence of only the lowercase letters in Unicode caused problems,
because the uppercase version is also needed for the modern usage
which integrates the presentation forms used by other Latin-written
European languages also used in the same regions. So usage of
uppercase, titlecase and lowercase has been accepted as a de-facto
standard for the Latin script, which is now used to write a lot of
languages with various origins or linguistic families and subgroups.
More languages will be, sooner or later, romanized (in fact some ISO
standards require a romanization of the native script, for exchange
of information on toponyms, trademarks, people names... or simply
to get a presence on the web).
It's quite natural to think that all the missing uppercase versions of
letters currently encoded only in lowercase will be needed in a near
future. Today this can be done with ease for non-overlay diacritics,
but letters with overlay diacritics will still be a problem to allow
performing correct case conversions. Today this is possible by
using the combining overlays but the presence of only one case
complicates things. So people will recommend instead to use the
decomposed letters as long as there will not be a case pair. This
will simply ease string handling, and this will still work with a very
simple glyph substitution rule in fonts such as those prepared by SIL
for various linguistic communities.
If we want to avoid complexity and allow using uppercase, smallcaps
and titlecase rendering styles for languages using those letters, it seems
reasonnable to expect that the communities using the Latin letters with
overlays will also recognize easily the uppercase version.
I accept the fact that there is no urgent need to encode an uppercase
version which has still not be found, but the standard should be prudent
and make an explicit reserve face to its stability policy, so that the
normative case mappings will be allowed to change to include possible a
future uppercase version, if it is demonstrated that it exists (I bet that
users of lowercase letters will rapidly adopt the additional character
because it's quite natural to give more freedom in presentation style).
After all, the original Latin script only had uppercase letters, lowercase
was introduced as a matter of style to ease readability of text, by
borrowing some narrower and cursive forms from the handwritten script.
I see the separation of lower and upper case versions of letters as an
invention of printers or typographers, to allow putting more text on the
same page (notably when paper was so expensive and rare, before
its manufacture was simplified by chemical processing of wood).
Today, most written text is printed, and the current technologies will
allow more freedom in presentation styles, including the case variations.
This archive was generated by hypermail 2.1.5 : Thu Apr 29 2004 - 18:49:04 EDT