Re: Tajik alphabet code

From: Peter Kirk (peterkirk@qaya.org)
Date: Mon Mar 01 2004 - 15:56:15 EST

  • Next message: Philippe Verdy: "Re: Tajik alphabet code"

    On 01/03/2004 12:30, Philippe Verdy wrote:

    >>On 01/03/2004 00:18, Asomiddin Atoev wrote:
    >>
    >>
    >>
    >>>I am emailing on behalf of the Tajikistani state
    >>>working group on localizing software for Tajik
    >>>language. Could you please kindly guide us to be in
    >>>right direction. What shall be the procedure of
    >>>standartization of alphabet symbols? Tajik alphabet
    >>>makes use of cyrillic symbols and contains of 35
    >>>letters.
    >>>
    >>>
    >
    >I think that his question is not whever Unicode supports Tajik, if works has
    >been done (may be in other countries, for librarian purposes) to define a subset
    >appropriate to publish and work with texts in Tajik language. The fact that
    >Tajik orthograph has been influenced a lot from the time of USSR and Russian
    >domination in this former Republic of the Union, may have influenced the
    >language so that some old texts with important cultural backgrounds have lost
    >some of their original semantic.
    >
    >

    Any texts from before the time of Russian domination would be in Arabic
    script. Some from the earlier Soviet period may be in Latin script. It
    is clear that Aso's e-mail related to Cyrillic not Arabic script, and
    there is no hint that it relates to anything other than the current
    orthography.

    >So there may exist libraries in the world, where there remains texts in original
    >orthograph, or adapted from the Cyrillic-based orthograph, which contain more
    >letters than those that we commonly see. If there are attempts to reform the
    >orthograph to better match the language needs, there may already exist some
    >letter variants which would interest him.
    >
    >Also, if there are existing sets, this means that this creates an opportunity to
    >propose an alternate 8-bit encoding for Tajik, which would be a variant of the
    >ISO-8859 Cyrillic encoding used for Russian, except that it would contain all
    >letters needed for Tajik.
    >
    >Unicode clearly seems to support this language well, but there's still a need to
    >have a common framework for working with Tajik texts with an 8-bit encoding
    >(which would be better than UTF-8 and as simple and efficient as ISO-8859-1 for
    >Western European languages, or ISO-8859-4 for Russian).
    >
    >So this question would certainly meet some exports at the ISO Working Group
    >working on 8-bit encodings compatible with the ISO-8859 standard (this is
    >independant of the fact that this subset will be fully mapped and supported with
    >Unicode. Having such a subset will certainly help unifying various sources by
    >agreeing on a common orthograph, instead of relying on the support of the large
    >Unicode/ISO/IEC 10646 coded set. If such a subset is then approved nationally,
    >it will help get a decent support and mapping within many fonts, keyboard
    >drivers, and text processing tools.
    >
    >After all, ISO-8859-15 was decided and standardized after a similar reform in
    >the Euopean Union.that needed some Latin characters not present in ISO-8859-1,
    >even if all these characters were already present in Unicode, or adopted
    >recently in Unicode (like the Euro codepoint that was created instead of using
    >the legacy and non standard ECU symbol with various and non distinctive forms).
    >So why not with Tajik too?
    >
    >
    >
    >
    >
    >
    I understand that there have been previous attempts to define a new or
    extended Cyrillic 8-but character set supporting Central Asian
    languages, but that such proposals have been rejected. I hardly think
    that Aso would have turned to the Unicode list if he wanted to define an
    8-bit encoding.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Mon Mar 01 2004 - 17:33:07 EST