it seems that you are reinventing Reuters' Compression Scheme for Unicode
(RSCU) which is really usefull specially if you have many short ans
independent text files.
Via a search engine you should find some information on RSCU on the net.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT