Re: texteditors that can process and save in different encodings

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Fri, 19 Oct 2012 23:56:41 +0200

2012/10/19 Doug Ewell <doug_at_ewellic.org>:
> Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>
>>>> ASCII,
>>>
>>> A strict subset of UTF-8, so no need to support this separately.
>>
>> Not really. If the file to save does not need any character which is
>> found in an 8-bit extended character set (there are many of them),
>> saving them as ASCII (i.e. saving this charset information in the
>> metadata) still preserves the compatibility of the encoded text with
>> all these other extended charsets (notably all ISO 8859-* codepages as
>> well as UTF-8).
>
> Which metadata is that? I was sure we were talking about editors for
> plain-text files, which don't have any sort of metadata declaring the
> character encoding or anything else.

There's always some metadata : either it comes from the filesystem
itself (filenaming conventions or explicit storage of this metadata,
including HTTP that is a filesystem supporting them, or MIME for
emails), or it comes from information provided by the user in that
editor, to instrut it about how to decode it, or it is implicit in the
editor itself which offers no choice for it in its GUI or command
line.

The metadata I am refering to are of course not those stored in the
plain-text body of the file itself (including the decoded body part of
a MIME email or the body part of an HTTP request or reply, or the
content read with I/O after opening a file or a continuous stream), so
they are not those you may find in HTML or XML processing syntaxes as
part of the file content itself (something that is not really
recommanded if those files are handled blindly as if they were just
"plain text", ignoring their required syntax for decoding them : the
information about the syntax needed to process them however is
metadata, when you first have to know that the file type is XML or
HTML, because it is not really stored in the file content, but just
"guessed" from some leading signatures)

As soon as a user needs to specify the filetype or file encoding
somewhere that the filesystem does not provide itself as separately
stored metadata, the user provides additional metadata. This is true
when he also chooses a specific editor that handles a specific syntax
or encoding (the metadata provided by the user consists in this choice
of tool, even if it was inappropriate from a wrong guess or
assumption).
Received on Fri Oct 19 2012 - 16:59:38 CDT

This archive was generated by hypermail 2.2.0 : Fri Oct 19 2012 - 16:59:40 CDT