From: Philippe Verdy (firstname.lastname@example.org)
Date: Mon Mar 01 2004 - 15:30:28 EST
> On 01/03/2004 00:18, Asomiddin Atoev wrote:
> >I am emailing on behalf of the Tajikistani state
> >working group on localizing software for Tajik
> >language. Could you please kindly guide us to be in
> >right direction. What shall be the procedure of
> >standartization of alphabet symbols? Tajik alphabet
> >makes use of cyrillic symbols and contains of 35
I think that his question is not whever Unicode supports Tajik, if works has
been done (may be in other countries, for librarian purposes) to define a subset
appropriate to publish and work with texts in Tajik language. The fact that
Tajik orthograph has been influenced a lot from the time of USSR and Russian
domination in this former Republic of the Union, may have influenced the
language so that some old texts with important cultural backgrounds have lost
some of their original semantic.
So there may exist libraries in the world, where there remains texts in original
orthograph, or adapted from the Cyrillic-based orthograph, which contain more
letters than those that we commonly see. If there are attempts to reform the
orthograph to better match the language needs, there may already exist some
letter variants which would interest him.
Also, if there are existing sets, this means that this creates an opportunity to
propose an alternate 8-bit encoding for Tajik, which would be a variant of the
ISO-8859 Cyrillic encoding used for Russian, except that it would contain all
letters needed for Tajik.
Unicode clearly seems to support this language well, but there's still a need to
have a common framework for working with Tajik texts with an 8-bit encoding
(which would be better than UTF-8 and as simple and efficient as ISO-8859-1 for
Western European languages, or ISO-8859-4 for Russian).
So this question would certainly meet some exports at the ISO Working Group
working on 8-bit encodings compatible with the ISO-8859 standard (this is
independant of the fact that this subset will be fully mapped and supported with
Unicode. Having such a subset will certainly help unifying various sources by
agreeing on a common orthograph, instead of relying on the support of the large
Unicode/ISO/IEC 10646 coded set. If such a subset is then approved nationally,
it will help get a decent support and mapping within many fonts, keyboard
drivers, and text processing tools.
After all, ISO-8859-15 was decided and standardized after a similar reform in
the Euopean Union.that needed some Latin characters not present in ISO-8859-1,
even if all these characters were already present in Unicode, or adopted
recently in Unicode (like the Euro codepoint that was created instead of using
the legacy and non standard ECU symbol with various and non distinctive forms).
So why not with Tajik too?
This archive was generated by hypermail 2.1.5 : Mon Mar 01 2004 - 16:13:44 EST