Re: Detecting encoding in Plain text

From: Doug Ewell (dewell@adelphia.net)
Date: Tue Jan 13 2004 - 11:34:32 EST

Next message: John Jenkins: "Re: Chinese rod numerals"

Previous message: Doug Ewell: "Re: German characters not correct in output webform"
In reply to: Peter Kirk: "Re: Detecting encoding in Plain text"
Next in thread: Peter Kirk: "Re: Detecting encoding in Plain text"
Reply: Peter Kirk: "Re: Detecting encoding in Plain text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Peter Kirk <peterkirk at qaya dot org> wrote:

>> If a certain Unicode plain text file uses ASCII punctuation OR spaces
>> OR end-of-line characters, AND the file is not too short or has a
>> very odd formatting, then the algorithm should work.
>
> True. But there may be certain languages (perhaps Thai?) for which all
> of these circumstances regularly occur together. It would be very
> inconvenient for users of these languages if programs regularly
> attribute the wrong encoding to their text.

Whether this is specifically true for Thai or not -- and I doubt that
the "short file or odd formatting" condition could ever be considered
language-dependent -- I would say an otherwise-good heuristic that
performs badly for Thai ought to have special cases built in for Thai,
rather than being discarded.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: John Jenkins: "Re: Chinese rod numerals"
Previous message: Doug Ewell: "Re: German characters not correct in output webform"
In reply to: Peter Kirk: "Re: Detecting encoding in Plain text"
Next in thread: Peter Kirk: "Re: Detecting encoding in Plain text"
Reply: Peter Kirk: "Re: Detecting encoding in Plain text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Jan 13 2004 - 12:14:48 EST