Re: Long-term archiving of electronic text documents

From: Asmus Freytag <>
Date: Mon, 28 Jan 2013 08:01:22 -0800

On 1/28/2013 4:30 AM, William_J_G Overington wrote:
> The idea is that there would be an additional UTF format, perhaps UTF-64, so that each character would be expressed in UTF-64 notation using 64 bits, thus providing error checking and correction facilities at a character level.

I think this proposal is a few weeks early, and that it should be
resubmitted on the proper date, but as UTF-256 - for greater redundancy.

UTF-256 allows each hex digit of UTF-32 to be expressed as an ASCII hex
digit (characters 0-9 and A-F encoded as bytes 0x30-0x39 and 0x41-0x46).

This leaves two bits per hex digit unused which could be utilized for
bit-level error correction, or you could go to UTF-512 and encode each
code twice.

The possibilities are endless.

Received on Mon Jan 28 2013 - 10:03:29 CST

This archive was generated by hypermail 2.2.0 : Mon Jan 28 2013 - 10:03:29 CST