From: Peter Kirk (peterkirk@qaya.org)
Date: Mon Mar 01 2004 - 15:56:15 EST
On 01/03/2004 12:30, Philippe Verdy wrote:
>>On 01/03/2004 00:18, Asomiddin Atoev wrote:
>>
>>
>>
>>>I am emailing on behalf of the Tajikistani state
>>>working group on localizing software for Tajik
>>>language. Could you please kindly guide us to be in
>>>right direction. What shall be the procedure of
>>>standartization of alphabet symbols? Tajik alphabet
>>>makes use of cyrillic symbols and contains of 35
>>>letters.
>>>
>>>
>
>I think that his question is not whever Unicode supports Tajik, if works has
>been done (may be in other countries, for librarian purposes) to define a subset
>appropriate to publish and work with texts in Tajik language. The fact that
>Tajik orthograph has been influenced a lot from the time of USSR and Russian
>domination in this former Republic of the Union, may have influenced the
>language so that some old texts with important cultural backgrounds have lost
>some of their original semantic.
>
>
Any texts from before the time of Russian domination would be in Arabic
script. Some from the earlier Soviet period may be in Latin script. It
is clear that Aso's e-mail related to Cyrillic not Arabic script, and
there is no hint that it relates to anything other than the current
orthography.
>So there may exist libraries in the world, where there remains texts in original
>orthograph, or adapted from the Cyrillic-based orthograph, which contain more
>letters than those that we commonly see. If there are attempts to reform the
>orthograph to better match the language needs, there may already exist some
>letter variants which would interest him.
>
>Also, if there are existing sets, this means that this creates an opportunity to
>propose an alternate 8-bit encoding for Tajik, which would be a variant of the
>ISO-8859 Cyrillic encoding used for Russian, except that it would contain all
>letters needed for Tajik.
>
>Unicode clearly seems to support this language well, but there's still a need to
>have a common framework for working with Tajik texts with an 8-bit encoding
>(which would be better than UTF-8 and as simple and efficient as ISO-8859-1 for
>Western European languages, or ISO-8859-4 for Russian).
>
>So this question would certainly meet some exports at the ISO Working Group
>working on 8-bit encodings compatible with the ISO-8859 standard (this is
>independant of the fact that this subset will be fully mapped and supported with
>Unicode. Having such a subset will certainly help unifying various sources by
>agreeing on a common orthograph, instead of relying on the support of the large
>Unicode/ISO/IEC 10646 coded set. If such a subset is then approved nationally,
>it will help get a decent support and mapping within many fonts, keyboard
>drivers, and text processing tools.
>
>After all, ISO-8859-15 was decided and standardized after a similar reform in
>the Euopean Union.that needed some Latin characters not present in ISO-8859-1,
>even if all these characters were already present in Unicode, or adopted
>recently in Unicode (like the Euro codepoint that was created instead of using
>the legacy and non standard ECU symbol with various and non distinctive forms).
>So why not with Tajik too?
>
>
>
>
>
>
I understand that there have been previous attempts to define a new or
extended Cyrillic 8-but character set supporting Central Asian
languages, but that such proposals have been rejected. I hardly think
that Aso would have turned to the Unicode list if he wanted to define an
8-bit encoding.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Mon Mar 01 2004 - 17:33:07 EST