RE: Slovak and Czech "CH" (was: Re:Mixed up priorities)

Date: Fri Oct 22 1999 - 09:29:05 EDT

This is not an official difference by Unicode: I don't even think that
Unicode attributes a great value to the term "letter", as many Unicode
characters are not letters at all (punctuation, digits, symbols, marks,
ideographs, dingbats, etc.).

It is just a logical distinction that I introduced becaused it sounded good
within this discussion.

How I see it, a LETTER is the end user's vision of the elements of
alphabetic writing; a CHARACTER is the computer people's vision of the
elements of a character set.

When I went to school (and knew nothing about computers) I learned to write
LETTERS. When I started programming, I learned that CHARACTERS are the
elements that make up text as memorized in a computer.

When Adam says that CH is a letter in Slovak we all believe him, because he
is talking about his own language.

However, this does not imply (nor deny) that it would be an advantage to
encode this LETTER as a single Unicode CHARACTER(*), rather than the current
two-character sequence.


(*) P.S.: I read a Mark Davis' article
> that puts in question the validity of the term "character", and adds the
distinction character vs. code point. His discussion is very reasonable, but
it comes too late by 30/40 years. It is like trying to eradicate the wrong
term "ideograph" after 300/400 years of Western sinology...

> -----Original Message-----
> From: Rajashekhar Hiremath []
> Sent: 1999 October 22, Friday 14.16
> To: Unicode List
> Subject: Re: Slovak and Czech "CH" (was: Re:Mixed up priorities)
> I am not very sure of the difference between a 'letter' and a 'character'.
> I
> was thinking that they are one and the same. Can some body through light
> on
> the differnce of these two
> -shekhar
> -----Original Message-----
> From: <>
> To: Unicode List <>
> Date: Friday, October 22, 1999 04:34 PM
> Subject: RE: Slovak and Czech "CH" (was: Re:Mixed up priorities)
> I don't have any strong opinion about whether to accept or not Adam's
> three
> new Slovak characters (CH, ch and Ch), but I wish to add a few
> observations.
> Adam insists that "CH is a CHARACTER in the Slovak alphabet"...
> This is plain wrong!!! I learned the alphabet when I was 5 years old, but
> I
> first saw the world "character" after my 17th birthday, when I started
> reading the manual of my brand new Commodore 64.
> Slovak or other alphabets do NOT have characters, they have LETTERS! So
> one
> should rather say that CH is a LETTER in the Slovak alphabet.
> Made this distinction between characters and letters, and provided that
> Unicode is not "LETTER encoding standard", the letter CH needs not not be
> encoded as a character: in Slovak it will be a 2-character letter (just
> like
> it is in English, Italian, and many other languages).
> But, but, but... Adam looked at the Unicode charts and saw many other
> letters that could well have been composed of two simpler characters, so
> he
> asked "Why them and not my CH?"
> The most famous of these letter conjuncts is W, that is just a couple of
> V's
> (or a couple of U's, as the English name suggests). Another well known
> exampe is German "Es-Zed", that is just an"ss" or an"sz" sequence (the
> first
> "s" being in the now extinct long form).
> There is also the digraph that, though in the Latin language was a mere
> typographical ligature of two letters, is now regarded as single letter in
> some modern languages.
> But what probably lit Adam's national pride are the Croatian digraphs DZ,
> DJ, etc...
> One could say that W, , and have, at least, a typographical appearance
> that is slightly different from UU, AE, and ss. But the Croat digraphs
> don't! and they have been included anyway! So why not Slovak CH?
> Reading this mailing list, I have learned two things about Unicode that
> could help answering this question:
> 1) Unicode is all PRAGMATICS. Most of the theory and philosophy was added
> later (probably to add some nice text to a book with too many charts:-).
> This theories are not always solid, and may be adjusted when the need
> arises.
> 2) There is an aspect of the history of Unicode that has the utmost
> importance, both practical and theoretical, but is easily forgotten:
> STANDARDS. Unicode did not arise in a desert: it started as a collection
> of
> entities taken from a set of miscellaneous pre-existing national
> standards,
> such as ASCII (from USA), ISCII (from India), TIS (from Thailand), JIS
> (from
> Japan), and so further. Unicode tends to respect and incorporate the
> choices
> made by these pre-existing "traditions", even when they conflicted with
> Unicode guidelines. There are good reasons for this; the main one being
> round-trip conversion from/to national standards.
> Number (2) explains the alternative between pre-composed and de-composed
> sequences for <base letter + diacritic>. The de-composed sequences is
> Unicode's choice; the pre-composed sequences is the tax paid to existing
> "traditions".
> It also explains the presence of the Croatian digraphs: these where
> already
> in a source character set from former Jugoslavia. The former-Jugoslav
> standards introduced these conjuncts with the aim of allowing naive
> conversions from/to the Latin ("Croatian") alphabet and the Cyrillic
> ("Serbian") alphabet, and to permit naive sorting of Serbocroatian text
> (where DJX goes between DX and DA). At the end of 1999, we would rather do
> these kinds of things using mapping tables and collate algorithms... But
> the
> 70's was a different age.
> And, finally, it also explains Unicode has no CH: it is because in
> pre-existing standards from Slovakia (or Czechoslovakia) there was no such
> a
> thing! And the reason why there was no such a thing is probably (as you
> suggested) that Slovak programmers have always been particularly good in
> their job. So, they did a good analysis back in the 70's, and decided to
> use
> collate tables for sorting text.
> Ciao. Marco

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT