We in SIL have overlapping interests with you since we are
heavily involved in supporting linguistic research in a large
number of minority languages (currently over 1000) around the
world. We have a variety of software tools for doing linguistic
research, some of which may perform the tasks that you need.
We are in the process of a major re-engineering of our tools.
Unfortunatly, we do not have Unicode-capable versions available
at present. That is being added, however, as is the capability
to define the writing-system-specific needs of each minority
language for rendering, collation, etc. So, I can't help you
today, but in the future our software may be of use to you.
For further information on our language software tools, feel
free to visit our web site at
From: firstname.lastname@example.org AT internet on 05/24/99 05:32
Received on: 05/24/99
To: Peter Constable/IntlAdmin/WCT, email@example.com AT
Subject: Unicode corpus tools/missing characters
Hello (I'm a recently subscribed addition to the mailing list).
Can anyone advise me on the availability of commerical corpus
tools (concordance/collocation etc) which are able to handle
Also, I am in the process of converting a small Punjabi corpus
from an 8-bit Indian font into the Gurmukhi Unicode characters
(using UniEdit 1.4 from Duke University). However, I am facing
a few problems:
1) some of the diacritic characters in the font don't exist in
the Unicode Standard. In particular the pehri haha and pehri
rara. I'd also like to be able to input a bindi with a
horizontal joining line.
2) some of the diacritics that do exist in Unicode aren't
well by UniEdit (notably the bindi 0A02 and UU 0A42)
3) some of the missing diacritics appear in private use slots
I was wondering if anyone else had come up against limitations
either for the Unicode Editor they were using, or in the
Unicode Standard themselves - and if so, how they dealt with
it. I've tried using a "best fit" solution by employing other
characters, which is not ideal. I'm wondering if I should
invest in another editor. And what would happen if I tried to
open a UniEdit .uni file in another Unicode editor? Would it
open at all? How would it handle the private space characters?
I apologise if these are very naive questions or have already
been dealt with.
Minority Languages Engineering Project
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT