Re: Generic Tagging: A Bold Proposal

From: Markus G. Kuhn (
Date: Tue Jul 15 1997 - 17:23:57 EDT

Murray Sargent wrote on 1997-07-15 20:27 UTC:
> I dare say that most people on the UTC agree with you and don't want to
> put any tagging scheme into Unicode or 10646. But there are lots of
> people out there who don't want to depend on a higher-level protocol
> such as HTML to specify language. After the hundreds of email messages
> to this effect that we've seen on the Unicode aliases (did you delete
> them en masse?), it's hard not to provide some mechanism, regardless of
> whether it should be done from a technical point of view. Ken's
> approach is probably the least offensive scheme to come up in the myriad
> email. Sigh!

The real problem that the people who want language tagging so urgently fail
to understand is that they actually want a vendor independent standard
WYSIWYG file format for word processing. Some people seem to believe
that Unicode is a word processing file format, which it is clearly not,
just as ASCII is not.

The only real solution out of this is that we start to standardize a really
nice extensible word processing format. There is no existing one! SGML
is a completely different arena, and ISO's ODA has failed and just by its
mere existence blocks any further work on word processing file formats
within ISO/JTC1.

What people want is something like the MS-Word or WordPerfect native
binary file formats, but stable, well-defined, standardized, extensible,
and vendor independent just like Unicode.

Since Unicode has been so successful, what about starting a new project
(say "Unitext"), that develops a full standard file format that could be
used as a drop-in replacement for all the native binary formats of
Word, WordPerfect, FrameMaker, etc. Those existing file formats are not
that different in their basic structure that it would be impossible
to find a nice common format between them.

We have already a nice common format for vector graphics (ISO's CGM),
but we have nothing like this for WYSIWYG word processing files.

Unitext would obviously be based on Unicode, but it would add page layout,
font selection, language tagging, image inclusion, table and math formula
formatting, and a few other goodies.

Would the Unicode Consortium be interested in a Unitext project?

Dear folks from the Word, Wordperfect, Framemaker, etc. teams,
just take the phone and ask your boss about the work on a common
word processing formats? Are you sick of having to maintain the
import/export filter functions for the competitors file format?
Your competitor might feel the same ...

Language tagging is just a part of the real problem that has
to be addressed in order to bring us towards emails handled
interoperable with word processing software and functionality.

Leave Unicode alone with your tagging ideas. Start Unitext, the
all you can dream of tagging collection.


Markus G. Kuhn, Computer Science grad student, Purdue
University, Indiana, USA -- email:

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT