Unicode corpus tools/missing characters

From: PAUL BAKER (j.p.baker@lancaster.ac.uk)
Date: Mon May 24 1999 - 06:32:41 EDT

Next message: stephen_holmes@lionbridge.com: "Question about U+FFFC"
Previous message: Yves Arrouye: "RE: FAQ"
Next in thread: peter_constable@sil.org: "Re: Unicode corpus tools/missing characters"
Maybe reply: peter_constable@sil.org: "Re: Unicode corpus tools/missing characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hello (I'm a recently subscribed addition to the mailing list).

Can anyone advise me on the availability of commerical corpus tools
(concordance/collocation etc) which are able to handle Unicode
characters?

Also, I am in the process of converting a small Punjabi corpus from an 8-bit
Indian font into the Gurmukhi Unicode characters (using UniEdit 1.4 from Duke
University). However, I am facing a few problems:

1) some of the diacritic characters in the font don't exist in the
Unicode Standard. In particular the pehri haha and pehri rara. I'd also
like to be able to input a bindi with a horizontal joining line.
2) some of the diacritics that do exist in Unicode aren't represented
well by UniEdit (notably the bindi 0A02 and UU 0A42)
3) some of the missing diacritics appear in private use slots in
UniEdit.

I was wondering if anyone else had come up against limitations either
for the Unicode Editor they were using, or in the Unicode Standard
themselves - and if so, how they dealt with it. I've tried using a "best
fit" solution by employing other characters, which is not ideal. I'm
wondering if I should invest in another editor. And what would happen if
I tried to open a UniEdit .uni file in another Unicode editor? Would it
open at all? How would it handle the private space characters?

I apologise if these are very naive questions or have already been dealt
with.

Paul Baker
Minority Languages Engineering Project
Lancaster University
UK.
http://www.ling.lancs.ac.uk/monkey/ihe/mille/public/title.htm

Next message: stephen_holmes@lionbridge.com: "Question about U+FFFC"
Previous message: Yves Arrouye: "RE: FAQ"
Next in thread: peter_constable@sil.org: "Re: Unicode corpus tools/missing characters"
Maybe reply: peter_constable@sil.org: "Re: Unicode corpus tools/missing characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT