Re: suggestions for strategy on dealing with plain text in potentially any (unspecified) encoding?

From: Ben Dougall (bend@freenet.co.uk)
Date: Sat May 10 2003 - 09:10:25 EDT

Next message: Michael Everson: "Good news, if true, about the Baghdad Museum"

Previous message: Bob_Hallissy@sil.org: "RE: suggestions for strategy on dealing with plain text in potentially any (unspecified) encoding?"
In reply to: Bob_Hallissy@sil.org: "RE: suggestions for strategy on dealing with plain text in potentially any (unspecified) encoding?"
Next in thread: Rick McGowan: "Re: suggestions for strategy on dealing with plain text inpotentially any (unspecified) encoding?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> >It would appear to be a three step process:
> >
> >(1) First, detect ...
> >(2) Second, compare ...
> >(3) Third, ... test
>
> (4) Give the user a chance to correct your program's guess -- some
> users actually know!

this is all very useful information, including the details of it, and
the emacs related info (will follow that up definitely) - thanks very
much.

what should the default be though? post encoding detection, post fuzzy
logic, post whatever other tricks, pre giving the user a chance to
change it themselves: still don't know. so how should that particular
decision be made (while knowing the user's main language)?

if the user's main language was any latin based one - 8bit extended
ascii would be the obvious one.

but what if the user's main language is one based on a character set
other than latin? would falling back to a character set other than
extended ascii be in order in those cases? if so which basic character
bases are there other than ascii? - i'm guessing there's not going to
be many basic character bases (viewing ascii as the one for latin based
scripts). OR should it not fall back to an alternative to extended
ascii? but just fall back to 8bit ascii as default regardless of
language setting?

Next message: Michael Everson: "Good news, if true, about the Baghdad Museum"
Previous message: Bob_Hallissy@sil.org: "RE: suggestions for strategy on dealing with plain text in potentially any (unspecified) encoding?"
In reply to: Bob_Hallissy@sil.org: "RE: suggestions for strategy on dealing with plain text in potentially any (unspecified) encoding?"
Next in thread: Rick McGowan: "Re: suggestions for strategy on dealing with plain text inpotentially any (unspecified) encoding?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat May 10 2003 - 10:39:10 EDT