RE: transliterations (was Compelling Unicode demo)

From: Lars Kristan (lars.kristan@hermes.si)
Date: Mon Nov 19 2001 - 09:09:14 EST


Well, it is nice to have things defined, like transcription vs.
transliteration. But end users will not really care...

Anyway, maybe I did a mistake by mixing the two aspects right from the
start. If we forget about the ë for a moment and think about Björk or
Almodóvar. The most basic transliteration would be dropping of all accents
and I did not find that in the http://oss.software.ibm.com/cgi-bin/icu/tr
demo, the closest thing I got was Almodo<'>var.

I think people will expect that searching for Almodovar will find both
forms. And that means people searching the web (ok, you can say those have
time to repeat the search) as well as people working for example in a bank
searching for an account.

Once simple transliteration is covered, adding some transcriptions as well
would be a plus. Providing both Bjork and Bjoerk as entries in the index may
not be neither always correct nor always complete, but - it's something,
right?

To sum it up - I am was not thinking exact transcription or transliteration,
with both source and target language defined. All I am saying is that
something generic would be handy.

Lars

> -----Original Message-----
> From: Mark Davis [mailto:mark.davis@macchiato.com]
> Sent: Saturday, November 17, 2001 08:02
> To: Lars Kristan; 'Tex Texin'; unicode@unicode.org
> Subject: transliterations (was Compelling Unicode demo)
>
>
> The unpleasant thing about transliteration standards is not
> that there are
> to few; it is that there are too many. Library of Congress,
> UN, ISO, GOST,
> BGN/PCGN, etc. -- all of them differ in some way. We list the
> standards that
> we base our current rules at the bottom of
> http://oss.software.ibm.com/icu/userguide/Transliteration.html
> . It also
> takes a little bit of effort to take a transliteration
> standard and convert
> it into programmatic rules -- we use a specially adapted
> regular expression
> mechanism for those rules -- and often we find that the
> transliteration
> standards are underspecified.
>
> ICU can have multiple variants of transliterations; for
> example, we have
> Latin-Greek (for the general case) and Latin-Greek/UNGEGN as
> a variant for
> the UN version. We don't have many variants now, but expect to in the
> future. For example, we use ISO 9 for Cyrillic, but it
> doesn't line up with
> with the Russian GOST standard so we will probably add that
> as a variant.
> But we will probably add variants after we cover all the
> scripts that ICU
> supports.
>
> If you are going towards pronunciation, that is called
> "transcription" by
> ISO, and differs in that it can't usually be generated
> algorithmically from
> the text, especially for much-less-than-phonetic languages
> like English and
> Japanese (in the customary orthographies).
>
> Mark
>
> —————
>
> Ὀλίγοι ἔµφονες πολλῶν ἀφρόνων φοßερώτεροι — Πλάτωνος
> [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]
>
> http://www.macchiato.com
> ----- Original Message -----
> From: "Lars Kristan" <lars.kristan@hermes.si>
> To: "'Tex Texin'" <texin@progress.com>; <unicode@unicode.org>
> Sent: Friday, November 16, 2001 10:40
> Subject: RE: Compelling Unicode demo
>
>
> > I was about to complain about the transliteration 'Mikhail
> Gorbachev', but
> > then I saw that this was discussed before (I am new to the
> list) and that
> it
> > is how English does the transliteration of ë.
> >
> > Still... What I put here is U+00EB (Latin small letter E
> with diaresis)
> > which looks like U+0451 (Cyrillic small letter Io). One
> wonders, was the
> > English transliteration ë to e 'defined' for U00EB and
> should be ë to o
> (or
> > yo) for U0451?
> >
> > Anyway, after I browsed quickly through some previous
> postings, I realized
> > that transliteration is far from simple. But after playing
> with Tex's
> sample
> > for a day or two (and doing related searches on the web) I
> also realized
> > that some form of transliteration will need to be provided
> to implement
> > loose searching. Unicode will encourage more and more
> people to write
> names
> > (people, places, etc) 'correctly'. Without transliteration,
> searching will
> > become very unreliable - to say the least...
> >
> > My two cents.
> >
> >
> > Regards to all,
> >
> > Lars
> >
> >
> > > -----Original Message-----
> > > From: Tex Texin [mailto:texin@progress.com]
> > > Sent: Friday, November 16, 2001 08:51
> > > To: Unicoders
> > > Subject: Re: Compelling Unicode demo
> > >
> > >
> > > The page has been updated again. It is getting to be pretty
> > > cool, thanks
> > > to new submissions for Inuktitut, Eritrean, Ethiopean, and others.
> > > It also has a link for ICU's online transliteration page.
> > > If you have trouble displaying it, there is a link to the
> > > Unicode site's
> > > helper page.
> > >
> > > I have to fix a problem with the name for Thailand, that will be
> > > tomorrow.
> > >
> > > A few more entries, exotic or otherwise, would make my
> weekend! ;-)
> > > For the exotic ones, pointers to the relevant fonts would
> be helpful.
> > >
> > > I find IE 5.5 displays it well. I use Netscape 4.7 which
> has trouble
> > > with Hebrew among other things.
> > > If you use another browser, I would be interested in
> reports on which
> > > ones work well.
> > > (Don't bother to tell me which ones don't work.)
> > >
> > > thanks
> > > tex
> > >
> > > http://www.geocities.com/i18nguy/unicode-example.html
> > >
> > > --
> > > -------------------------------------------------------------
> > > Tex Texin Director, International Business
> > > mailto:Texin@Progress.com Tel: +1-781-280-4271
> > > the Progress Company Fax: +1-781-280-4655
> > > -------------------------------------------------------------
> > > "When choosing between two evils, I always like to try the
> > > one I've never tried before."- -Mae West
> > >
> >
> >
> >
>



This archive was generated by hypermail 2.1.2 : Mon Nov 19 2001 - 10:11:09 EST