RATIONALE FOR MULTILINGUAL SORTING CYRILLIC CHARACTERS (was: Re: Greek/Etruscan/Gothic Unification Proposal)

From: John Clews (10646er@sesame.demon.co.uk)
Date: Tue Nov 18 1997 - 18:10:06 EST


(was: Re: Greek/Etruscan/Gothic Unification Proposal)

In message <9711181922.AA16328@unicode.org> John Cowan, via
unicode@unicode.org, writes:

> The following pre-proposal suggests a unification of the
> archaic Etruscan and Gothic scripts with Unicode Greek...

By pure coincidence, the following was also circulated to the
tc46sc2@elot.gr email discussion list on transliteration recently,
and also to the ISO/IEC JTC1/SC22/WG20 list. Like John Cowan's email,
this suggested similar links between different scripts related to
Greek, in particular between Greek, Cyrillic and Georgian scripts.

The point of the email via tc46sc2@elot.gr was NOT to suggest
unification in ISO/IEC 10646, but to aim at rationalising the
repertoire of ISO/IEC 10646 and ISO 9, and the character order in
ISO/IEC 14651 and ISO 9. This would mean rationalising standards in
ISO/IEC JTC1/SC2, ISO/IEC JTC1/SC22/WG20 and in ISO/TC46/SC2.

It is quite a complex document, and will require some study to see
the conventions used (a key is given near the end of the email) so
please ignore/delete this if you are not interested in Greek,
Cyrillic and Georgian repertoires and alphabetic orders.

        * * * * * * * *

RATIONALE FOR MULTILINGUAL SORTING OF CYRILLIC CHARACTERS

John Clews

This updates my earlier suggestions to ISO/IEC JTC1/SC22/WG20 on
Cyrillic sorting, with more solid evidence. I would guess that a
consensus would soon emerge on this issue.

The basis for sorting conventions should be user expectations.
Well-documented alphabetic orders exist for Church Slavic, which is
broadly in the same order as Russian. This is also used in national
transliteration standards: BS 2729: 1958 provides much fuller
information than does ISO 9 in this respect. BS 2729 also matches the
sorting order in various well-established reference sources (e.g.
DE BRAY, R.G.A. Guide to the Slavonic Languages. London: Dent, n.d).

Conventions for using letters for numbering, e.g. in dates and in
item lists, also exist across several European languages, and are
almost identical across Greek, Cyrillic and Georgian. This also helps
to establish user expectations across a wide range of European
languages and scripts.

As it is possible to produce a chart showing the Church Slavonic
filing order for all characters in terms of BS 2729, and also the
numerical conventions, this should be the basis for the order of a
pan-Cyrillic ordering.

Within this chart, it should be possible to interpolate that
additional (mainly non-Slavonic) letters later, usually following
their cognate characters (e.g. variants of KA after KA).

A further chart providing these interpolations will be provided in
due course.

Here is the basic chart showing the Slavonic alphabet, as listed in
BS 2729: 1958 (still current) and relationships between Cyrillic and
two other European scripts.

BS2979 SLAVONIC GREEK GEORGIAN
 BS -Num ID Name -Num ID Name -Num ID Name

  1. -1 0430 Cy_a : -1 03B1 Gr_ALPHA : -1 10D0 Ge_AN
  2. 0431 Cy_be
  3. -2 0432 Cy_ve : -2 03B2 Gr_BETA : -2 10D1 Ge_BAN
  4. -3 0433 Cy_ghe : -3 03B3 Gr_GAMMA : -3 10D2 Ge_GAN
  5. -4 0434 Cy_de : -4 03B4 Gr_DELTA : -4 10D3 Ge_DON
  6. -5 0435 Cy_ie : -5 03B5 Gr_EPSILON : -5 10D4 Ge_EN
  7. 0436 Cy_zhe .
* 8. -6 0455 Cy_dze : -6 03DA Gr_STIGMA : -6 10D5 Ge_VIN
  9. -7 0437 Cy_ze : -7 03B6 Gr_ZETA : -7 10D6 Ge_ZEN
 10. -8 0438 Cy_i : -8 03B7 Gr_ETA : -8 10F1 Ge_HE
[For -9 see BS 42.] : -9 03B8 Gr_THETA : -9 10D7 Ge_TAN
 11. -10 0456 Cy_be-uk_i : -10 03B9 Gr_IOTA : -10 10D8 Ge_IN
 12. -20 043A Cy_ka : -20 03BA Gr_KAPPA : -20 10D9 Ge_KAN
 13. -30 043B Cy_el : -30 03BB Gr_LAMDA : -30 10DA Ge_LAS
 14. -40 043C Cy_em : -40 03BC Gr_MU : -40 10DB Ge_MAN
 15. -50 043D Cy_en : -50 03BD Gr_NU : -50 10DC Ge_NAR
[For -60 see BS 40.] : -60 03BE Gr_XI : -60 10F2 Ge_HIE
 16. -70 043E Cy_o : -70 03BF Gr_OMICRON : -70 10DD Ge_ON
 17. -80 043F Cy_pe : -80 03C0 Gr_PI : -80 10DE Ge_PAR
*17a. 0481 Cy_koppa : -90 03DE Gr_KOPPA : -90 10DF Ge_ZHAR
[For -90 see BS 27.]
 18.-100 0440 Cy_er :-100 03C1 Gr_RHO :-100 10E0 Ge_RAE
 19.-200 0441 Cy_es :-200 03C3 Gr_SIGMA :-200 10E1 Ge_SAN
 20.-300 0442 Cy_te :-300 03C4 Gr_TAU :-300 10E2 Ge_TAR
 21.-400 0443 Cy_u :-400 03C5 Gr_UPSILON :-400 10E3 Ge_UN
*21a. 0479 Cy_UK (oy) : 10F3 Ge_WE
 22.-500 0444 Cy_ef :-500 03C6 Gr_PHI :-500 10E4 Ge_PHAR
 23.-600 0445 Cy_ha :-600 03C7 Gr_CHI :-600 10E5 Ge_KHAR
[For -700 see BS 41]. :-700 03C8 Gr_PSI :-700 10E6 Ge_GHAN **
 24.-800 0461 Cy_omega :-800 03C9 Gr_OMEGA :-800 10E7 Ge_QAR **
 24a. 047D Cy_omega_titlo
 24b. 047B Cy_round_omega
 24c. 047F Cy_ot
 25. xxxx[Cy_shte - variant of 28a. below, in different order]
 26.-900 0446 Cy_tse :-900 03E0 Gr_SAMPI :-900 10E8 Ge_SHIN
 27. -90 0447 Cy_che
 28. 0448 Cy_sha
 28a. 0449 Cy_shcha
 29. 044A Cy_hard_sign
 30. 044B Cy_yeru
 31. 044C Cy_soft_sign
 32. 044D Cy_e
 32a. 0463 Cy_yat
 33. 044E Cy_yu
 34. 044F Cy_ya
 35. 0465 Cy_iotified_e
 36. 0467 Cy_little_yus
*37. 046B Cy_big_yus
*38. 0469 Cy_iotified_little_yus
 39. 046D Cy_iotified_big_yus
 40. -60 046F Cy_ksi
 41.-700 0471 Cy_psi
 42. -9 0473 Cy_fita
 43. 0475 Cy_izhitsa
 43a. 0477 Cy_izhitsa_double_grave_accent

Key:

BS: reference number in BS 2729, table D: Transliteration of
        Church Slavonic Cyrillic (shown as nn.)

1a. 1c. Numbers like this indicate that character is not in BS 2729,
        but position is likely because of neighbouring characters or
        otehr evidence.

-Num: numeric value of letter in that particular script (shown as -nn)

ID: UCS ID from ISO/IEC 10646

Name: UCS character name (abbreviated systematically, Using
        Cy_ as a script code, and be and uk as language codes).
        Only small letters are shown in this table)

* Changes required from list in CD 14651 (not complete)

** No close equivalent phonetically, despite common numeric value.

        * * * * * * * *

Annex:

Other Georgian letters, in standard Georgain sort order:

                                                    1000 10E9 Ge_CHIN .
                                                    2000 10EA Ge_CAN .
                                                    3000 10EB Ge_JIL .
                                                    4000 10EC Ge_CIL .
                                                    5000 10ED Ge_CHAR .
                                                    6000 10EE Ge_XAN .
                                                    7000 10F4 Ge_HAR .
                                                    8000 10EF Ge_JHAN .
                                                    9000 10F0 Ge_HAE .
                                                   10000 10F5 Ge_HOE .

                                                          10F6 Ge_FI .

Additional Greek and Coptic letters: sort order not known to me:

                                              03DC Gr_DIGAMMA
                                              03F3 Gr_YOT

                                              03E3 Co_SHEI
                                              03E5 Co_FEI
                                              03E7 Co_KHEI
                                              03E9 Co_HORI
                                              03EB Co_GANGIA
                                              03ED Co_SHIMA
                                              03EF Co_DEI
        
Sources for common sorting values in standards douments:

[1] BS 2979: 1958 Transliteration of Cyrillic and Greek (still current)

[2] ISO/TC46/SC2 N 223 Rev: Annex: Regeln fur die Alphabetische
    Katalogierung [RAK] Anlage 5: Transliteration der armenischen und
    georgischen Schrift. The RAK are widely used in at least Germany
    and Austria.

Between them, these provide numerical values, and traditional sort
orders, for Cyrillic, Greek, Georgian and Armenian.

John Clews

15 November 1997

--
Chair of ISO/TC46/SC2: Conversion of Written Languages;
Member of CEN/TC304: Character Set Technology;
Member of ISO/IEC/JTC1/SC2: Character Sets.

SESAME Computer Projects, 8 Avenue Road, Harrogate, HG2 7PG, England Email: Converse@sesame.demon.co.uk; tel: +44 (0) 1423 888 432



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:38 EDT