Plane 14 tags and SCSU

From: Doug Ewell (
Date: Thu Jun 29 2000 - 11:40:48 EDT

(Warning: this message contains perilous overloading of the word "tag".)

One nice feature about the Plane 14 tags is that they can be encoded
very compactly in SCSU, due to the fact that (a) the range of tag
characters is limited to a single 128-byte window and (b) SCSU dynamic
windows can be defined in the "expansion space" beyond Plane 0.

SCSU requires only a 3-byte SDX tag before the first *string* of Plane
14 tag characters, and if multiple tags are present (e.g. in a multi-
lingual document), only a single-byte SCn tag is required before the
second and successive strings. (The SCn tag can even be omitted in the
unlikely event that no other dynamic windows are selected between Plane
14 tags.) The actual Plane 14 tag characters are then encoded in only
one byte each.

This is much more efficient than the 4 bytes required to encode each
Plane 14 tag character in either UTF-8, UTF-16, or UTF-32.

Of course, neither Plane 14 tags nor SCSU has achieved much popularity
yet, so at present this is only an academic observation.

-Doug Ewell
 Fullerton, California

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT