On Tue, 17 Oct 2000, Herman Ranes wrote:
> Is the use of the existing precomposed characters in the Latin
> Extended-A block considered 'right' for encoding Latvian palatal
> consonants, or is it considered 'wrong' so that I will have to use
> composites with U+0326 'Combining comma below' in stead?
> I am aware that many use those percomposed cedilla-characters, but
> nevertheless it does not look Latvian to me...
> Romanian did get its precomposed letters - can one expect any
> precendence with regard to Latvian?
As far as I know, there is an official decision by the Romanian
Standards Institute to regard certain Romanian characters as
containing comma below and not cedilla, making ISO 8859-2 (originally
designed to cover Romanian too) inadequate for writing Romanian.
There is a committee draft for ISO 8859-16 intended to solve this problem:
What is the official position on the nature of the diacritic mark
we're discussing, in Latvia? Inofficial documents, like
seem to call it "cedilla" - and display glyphs where it is clearly
comma-like in appearance.
_If_ there were an official statement saying that it's a comma and not
a cedilla, then one _might_ refer to the Romanian case as a precedent.
But then the problem would arise whether one really needs to make
a distinction between comma and cedilla. The problem with s and t
with comma or cedilla was that they are also used outside Romanian.
The Unicode attitude, expressed in the description of Latin Extended-A,
http://www.unicode.org/charts/PDF/U0100.pdf is somewhat confusing.
For example, U+015F LATIN SMALL LETTER S WITH CEDILLA is, according to it,
used in Turkish, Azerbaidjani, Romanian, ..., but "a glyph variant
with comma below is preferred for Romanian"; on the other hand,
that "glyph variant" appears as U+0219 LATIN SMALL LETTER S WITH COMMA
BELOW, with the note "Romanian, when distinct comma below form is
So are the characters you're referring to "Latvian only"?
Unfortunately, there doesn't seem to be any collection of information
that could be used as a reference concerning the use of letters in
different languages. The ISO 8859 series implicitly constitutes a partial
(but very partial) reference, since those standards list languages for
which a particular standard of the series is applicable for. (See my
http://www.hut.fi/u/jkorpela/8859.html which summarizes coverage of
European languages by ISO 8859 alphabets.) Then there's the rather
but it is old, and with a status of expired Internet-draft. And there
are some notes in the Unicode standard, but they are typically just
_examples_ of languages in which a character is used. And there's a nice
online database at http://www.eki.ee/letter/ which is based on various
For example, for U+0137 LATIN SMALL LETTER K WITH CEDILLA, all the
information available to me suggests that it is used in Latvian only,
with a glyph where the diacritic part is a comma below "k", not
connected to it in any cedilla-like manner. So what would be the
problem in using it? There _would_ be a problem if some other language
used the character so that the diacritic part is somehow cedilla-like.
(But even then, it might be regarded as something to be handled at
a higher protocol level, based on language information.)
So I don't think it's a problem; the only real problem appears to be
the _name_ which contains the word CEDILLA, but it's just a name,
and diacritics may vary in appearance anyway. (Consider how differently
an acute accent can be displayed.)
-- Yucca, http://www.hut.fi/u/jkorpela/ or http://yucca.hut.fi/yucca.html
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT