Dan,
it seems that you are reinventing Reuters' Compression Scheme for Unicode 
(RSCU) which is really usefull specially if you have many short ans 
independent text files.
Via a search engine you should find some information on RSCU on the net.
--J"org Knappen
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT