On 10/21/2012 4:09 AM, Philippe Verdy wrote:
>> Unless there's a way to rebuild the metadata unambiguously or to enforce
>> >that it is complete and correct, it's very hard to rely on it for any
>> >particular purpose.
> Enforcing that the metadata is correct is perfectly possible, at least
> to ensure that it matches the requirements. (For example, an incorrect
> encoding, given in metadata, should be signaled each time it violates
> one of its rules : this is possible for many text standardized
> encodings, including UTF's).
It may be possible to do some verification of well-formedness for 
well-designed encoding schemes like the UTFs but, pray, how do you tell 
apart 8859-1 from 8859-15?
These are not rarely occurring character sets and enforcement for them, 
as for any of the other 8859 series would only be possible if you were 
to do the very same character-set sniffing that you so dislike.
If you run a variation of a language detector, it's possible to detect, 
for example, that the text is in Icelandic, and therefore requires 
8859-1 instead of 8859-15. That is because the few code points that are 
mapped to different characters in these two sets would be appearing 
(statistically) in the wrong context.
This is something a clever text editing (or HTML editing) tool could do, 
but not something that you can build into an OS.
Anyway, to cut the discussion short, I'd love to see a working example 
of any system where metadata are 100% reliable.
A./
Received on Sun Oct 21 2012 - 21:54:54 CDT
This archive was generated by hypermail 2.2.0 : Sun Oct 21 2012 - 21:54:56 CDT