Cyrillic Q, W for Kurdish

From: Peter Constable (
Date: Mon Jan 24 2000 - 10:32:11 EST

>We have said this many times before. Kurdish is written in
       Cyrillic in some places. Kurdish is written in Latin in other
       places. Same language. You can't sort a multiscript list of
       Kurdish words if you are making Q and W serve as letters of two
       different scripts. Never mind English and Kurdish. The example
       is Kurdish and Kurdish.

       We all know that there are several languages that are written
       using more than one script. As Michael points out, if you're
       sorting Kurdish words written in different writing systems, the
       Q and W will be ambiguous, even if strings are language tagged
       (as was suggested in another message).

       What may be less familiar to some is that there are cases where
       a language is written with more than one writing system, but
       the various writing systems are based on a single script. It is
       not that uncommon for a minority linguistic group to have
       competing orthographies while in a early literacy,
       pre-standardisation stage. In fact, even if there is a single
       effort within a community to establish an orthography, there
       may be several proposed orthographies that are being considered
       and tested.

       In our language software, we have concluded that all strings
       need to be tagged not only for language, but also for writing
       system. This permits us to handle several different cases:

       - multiple standard orthographies based on differing scripts
       - multiple pre-standard (prototype) orthographies based on a
       single script
       - both "practical orthography" (orthography in the true sense)
       and "technical orthography" (phonetic/phonemic transcription)

       If strings are all tagged for writing system, then that would
       provide a solution to the problem Michael presents. I doubt,
       however, that most implementers would want to support all of
       the infrastructural mechanisms that we need in our software.
       The only alternatives to tagging for writing system are

       - dis-unify Cyrillic and Latin Q, W
       - live with the ambiguity for Kurdish (potentially other
       languages now or in the future) of the current situation


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT