Re: texteditors that can process and save in different encodings from Philippe Verdy on 2012-10-21 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Sun, 21 Oct 2012 13:09:22 +0200

2012/10/21 Asmus Freytag <asmusf_at_ix.netcom.com>:
> Metadata that is separate from the data has a way of being disassociated
> from it.

Something that an OS can still avoid by preserving them for its most
common operations on files.

> Annoying, but a fact of life. This can be as simple as file
> creation dates not being preserved on copy.

Here again, the OS is perfectible : there's no reason to not preserve
the metadata associated to a file content when performing a copy of a
file between two capable filesystems.

The file creation date should be preserved (as well as the last
modification date), even if both could be part of the file content, or
even its content integrity (digital signatures).

> Metadata that is contained in the same file as the data, has a way of being
> incorrect. Look no further than HTML language tags or charset declarations.

Of course, but the reason is not the OS but an incorrect action of the
file creator or maintainer.

> Then there is metadata that can be easily reconstituted from the data, like
> file sizes, hash signatures or preview images (in many cases). They are
> "meta" data only as a mater of convenience, since they don't add
> information.

I don't call this meta data if this is genreated automatically from
the content, because it does not add any separate information.

Basic hash signatures are not directly part of metadata, unless it is
certified by someone living in some identy and securty realm. The fact
of storing the hash signature separately is just there to detect an
alteration of the content, but it does not add any other information.

> Unless there's a way to rebuild the metadata unambiguously or to enforce
> that it is complete and correct, it's very hard to rely on it for any
> particular purpose.

Enforcing that the metadata is correct is perfectly possible, at least
to ensure that it matches the requirements. (For example, an incorrect
encoding, given in metadata, should be signaled each time it violates
one of its rules : this is possible for many text standardized
encodings, including UTF's).

With such enforcement, and with their preservation along a
processing/transmission chain, metadata becomes reliable ; much more
than if you rely only on users or automatic detectors to guess which
encoding is correct (in absence of metadata) ; the metadata is created
correctly only once and then there remains no pissibility of errors as
no one needs to "guess" the correct answer provided immediately.
Received on Sun Oct 21 2012 - 06:16:17 CDT

This archive was generated by hypermail 2.2.0 : Sun Oct 21 2012 - 06:16:23 CDT