Re: Detecting encoding in Plain text

From: D. Starner (
Date: Tue Jan 13 2004 - 21:05:42 EST

  • Next message: D. Starner: "Re: Detecting encoding in Plain text"

    Peter Kirk writes:
    > I agree that heuristics should be adjusted for Thai. But problems may
    > arise if they have to be adjusted individually, and without regression
    > errors, for all 6000+ world languages.

    Thai is hard because of the writing system. But most writing systems weren't
    encoded pre-Unicode, so if they were typed into a computer, it was with
    a Latin (or Cyrillic?) transliteration that probably used spaces and new lines,
    and in fact was probably ASCII.

    More cynically, those who use obscure character sets or font encodings have
    trouble viewing them; that is one of the reasons for Unicode. That this tool
    may to some extent be an example of that problem is a simple fact of life,
    and doesn't call for it to be thrown out.

    [If a reply to this message with no reply appeared, I'm sorry. Hit the enter
    key in the wrong place and off it went.]

    Sign-up for Ads Free at

    This archive was generated by hypermail 2.1.5 : Wed Jan 14 2004 - 02:09:35 EST