Re: Bengla syllables <... 09BF 09BE> and <... 09BF 09C0>

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Tue, 7 Feb 2017 21:46:13 +0000

On Tue, 7 Feb 2017 12:22:44 -0800
Manish Goregaokar <manish_at_mozilla.com> wrote:

> I found things like this[1] on wikisource which seems like an OCR of
> some really garbled text. The text does indeed seem like it has
> additional vowel diacritics, but that could also be a scanning glitch.
> The same word appears twice in the document, but once in the text.

In particular, the two sequences look like misinterpreted U+09CB
BENGALI VOWEL SIGN O and U+09CC BENGALI VOWEL SIGN AU, which would
account for their high frequency. The OCRed texts cited by
Manish seem to be in acute need of manual correction.

Richard.
Received on Tue Feb 07 2017 - 15:46:58 CST

This archive was generated by hypermail 2.2.0 : Tue Feb 07 2017 - 15:46:58 CST