RE: How to remove accents while conforming to language standards?

From: Erkki I Kolehmainen <eik_at_iki.fi>
Date: Sat, 2 Nov 2013 11:22:15 +0200

Re attached: The machine readable section in the passports should be seen as a code rather than text.

The rules for this section originate from the OCR-B standard with a highly limited character repertoire to ensure reliable scanning. The travel documents contain the name in the original script (e.g., Cyrillic) plus the transliterated name in the Latin script (if the original script is different) plus the machine readable section produced in OCR-B.
The transliteration practices may and do change from time to time; e.g., Russian passports used to be transliterated to French as the target language, whereas now the target language is English. Also, several of the former Soviet Union countries have recently introduced their own transliteration schemes from Cyrillic to Latin.

Actually, the original question addresses fall backs rather than transliteration.

Sincerely, Erkki I. Kolehmainen

-----Alkuperäinen viesti-----
Lähettäjä: unicode-bounce_at_unicode.org [mailto:unicode-bounce_at_unicode.org] Puolesta Ilya Zakharevich
Lähetetty: 2. marraskuuta 2013 01:34
Vastaanottaja: Jukka K. Korpela
Kopio: unicode_at_unicode.org
Aihe: Re: How to remove accents while conforming to language standards?

On Fri, Nov 01, 2013 at 07:32:44PM +0200, Jukka K. Korpela wrote:
> 2013-11-01 17:37, Jennifer Wong wrote:
>
> >I would like to ask for advice on removing accents from characters.
>
> To address first the question you ask in the Subject line, “How to
> remove accents while conforming to language standards?”, but do not
> ask in the message body, the answer is: You can’t.

Of course, he can. He even provided an algorithm to do it.

  (And to address “it is as acceptable as stripping the vowels from
   English”, stripping vowels from English CAN be done, and it MUST be
   done if the context requires it.)

This mailing list bursts with reasonable insightful people. This
question comes again and again; how comes that it is ALWAYS that the
same answer pops out, the answer which is meaningless, not helpful,
and, MOREOVER, wrong?

I suspect that what the participants wanted to write was that such
processes are usually LOSSY, not that they CANNOT be done. Given that
the initial question was more or less explicitly formulated as “how to
minimize the losses?”, I think that what is happening in this thread
is even less forgivable than the other times this was happening here…

When one MUST convert into an accent-less form [for human consumption]
(the situation which, being in US, I find myself frequently in), SOME
losses are usually tolerable. One approach (which is very often
applicable) is “lossy; so what?”; just strip away, and be happy.

If minimization of losses is important, this question was also
answered on this list. Checking “my database of useful answers”
  http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Useful_tidbits_from_Unicode_mailing_list_%28unsorted%29
I see:

  Transliteration on passports (see p.IV-48)
    http://www.icao.int/publications/Documents/9303_p1_v1_cons_en.pdf

[BTW, the URL for the database contains a misprint; nowadays, most of
the entries are sorted into categories. “This one”, though, is not sorted.]

Hope this helps,
Ilya
Received on Sat Nov 02 2013 - 04:26:56 CDT

This archive was generated by hypermail 2.2.0 : Sat Nov 02 2013 - 04:27:02 CDT