Re: Long-term archiving of electronic text documents

From: Jim Breen <>
Date: Tue, 29 Jan 2013 10:51:48 +1100

William_J_G Overington <> wrote:

> The idea is that there would be an additional UTF format, perhaps UTF-64,
> so that each character would be expressed in UTF-64 notation using 64 bits,
> thus providing error checking and correction facilities at a character level.

Error detection and correction at the character level is considered
very old-fashioned now. Modern techniques such as Reed-Solomon
codes[1] are much more effective and involve much less overhead
than the 100% in the proposal above. Such techniques are already
used in modern disc storage[2], and when combined with RAID
techniques[3] provide better data protection than character-level
redundancy ever would.

In any case, I think issues of error detection and correction are
quite outside the scope of Unicode.




Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
Received on Mon Jan 28 2013 - 18:00:54 CST

This archive was generated by hypermail 2.2.0 : Mon Jan 28 2013 - 18:00:58 CST