Re: Tajik alphabet code

From: Peter Kirk (
Date: Mon Mar 01 2004 - 15:56:15 EST

  • Next message: Philippe Verdy: "Re: Tajik alphabet code"

    On 01/03/2004 12:30, Philippe Verdy wrote:

    >>On 01/03/2004 00:18, Asomiddin Atoev wrote:
    >>>I am emailing on behalf of the Tajikistani state
    >>>working group on localizing software for Tajik
    >>>language. Could you please kindly guide us to be in
    >>>right direction. What shall be the procedure of
    >>>standartization of alphabet symbols? Tajik alphabet
    >>>makes use of cyrillic symbols and contains of 35
    >I think that his question is not whever Unicode supports Tajik, if works has
    >been done (may be in other countries, for librarian purposes) to define a subset
    >appropriate to publish and work with texts in Tajik language. The fact that
    >Tajik orthograph has been influenced a lot from the time of USSR and Russian
    >domination in this former Republic of the Union, may have influenced the
    >language so that some old texts with important cultural backgrounds have lost
    >some of their original semantic.

    Any texts from before the time of Russian domination would be in Arabic
    script. Some from the earlier Soviet period may be in Latin script. It
    is clear that Aso's e-mail related to Cyrillic not Arabic script, and
    there is no hint that it relates to anything other than the current

    >So there may exist libraries in the world, where there remains texts in original
    >orthograph, or adapted from the Cyrillic-based orthograph, which contain more
    >letters than those that we commonly see. If there are attempts to reform the
    >orthograph to better match the language needs, there may already exist some
    >letter variants which would interest him.
    >Also, if there are existing sets, this means that this creates an opportunity to
    >propose an alternate 8-bit encoding for Tajik, which would be a variant of the
    >ISO-8859 Cyrillic encoding used for Russian, except that it would contain all
    >letters needed for Tajik.
    >Unicode clearly seems to support this language well, but there's still a need to
    >have a common framework for working with Tajik texts with an 8-bit encoding
    >(which would be better than UTF-8 and as simple and efficient as ISO-8859-1 for
    >Western European languages, or ISO-8859-4 for Russian).
    >So this question would certainly meet some exports at the ISO Working Group
    >working on 8-bit encodings compatible with the ISO-8859 standard (this is
    >independant of the fact that this subset will be fully mapped and supported with
    >Unicode. Having such a subset will certainly help unifying various sources by
    >agreeing on a common orthograph, instead of relying on the support of the large
    >Unicode/ISO/IEC 10646 coded set. If such a subset is then approved nationally,
    >it will help get a decent support and mapping within many fonts, keyboard
    >drivers, and text processing tools.
    >After all, ISO-8859-15 was decided and standardized after a similar reform in
    >the Euopean Union.that needed some Latin characters not present in ISO-8859-1,
    >even if all these characters were already present in Unicode, or adopted
    >recently in Unicode (like the Euro codepoint that was created instead of using
    >the legacy and non standard ECU symbol with various and non distinctive forms).
    >So why not with Tajik too?
    I understand that there have been previous attempts to define a new or
    extended Cyrillic 8-but character set supporting Central Asian
    languages, but that such proposals have been rejected. I hardly think
    that Aso would have turned to the Unicode list if he wanted to define an
    8-bit encoding.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Mon Mar 01 2004 - 17:33:07 EST