Re: Long-term archiving of electronic text documents

From: Alka Irani <>
Date: Mon, 28 Jan 2013 10:26:58 -0500

I would love to have such a facility because it is too much hassle to
write bilingual/trilingual documents which is often the case at least
in Indian environment.
On Jan 28, 2013 6:17 PM, "William_J_G Overington" <>

> I was thinking about the problems of the long-term archiving of electronic
> text documents and thought of an idea.
> I wonder if I may please mention the idea here in the hope of there being
> a discussion so that an assessment of whether the idea is worth developing
> can be made.
> The idea is that there would be an additional UTF format, perhaps UTF-64,
> so that each character would be expressed in UTF-64 notation using 64 bits,
> thus providing error checking and correction facilities at a character
> level.
> If such a UTF-64 format were established as part of the standard, then
> maybe in the future, for example, Microsoft WordPad could carry an option
> to save a text file as UTF-64.
> At present, on the Windows xp system that I am using, when saving a text
> file from within Microsoft WordPad one of the choices of file type is
> listed as Unicode Text Document, which uses a UTF-16 format.
> A document saved as UTF-64 may well take four times as many bytes as such
> a Unicode Text Document, yet there would be the error checking and
> correction facilities at a character level.
> Similarly, there could be a type of pdf document where the text within the
> pdf document were stored in UTF-64 format.
> So, I write to put forward the idea so as to seek opinions please on
> whether establishing such a UTF format, whether UTF-64 or some other size,
> with error checking and correction facilities at a character level would be
> useful.
> William Overington
> 28 January 2013
