Re: Digraphs as Distinct Logical Units

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Fri Aug 09 2002 - 05:54:27 EDT


Doug Ewell wrote:

> And if you think that's bad, you should have seen the ones that got rejected --
> special "emphasized" Hangul for writing the names of North Korean dictators

Not so outlandish as it may first appear. When Egyptian hieroglyphs get encoded in Unicode, I would
not be surprised to see special characters for the cartouched names of pharaohs (for pharaohs read
dictators).

And in China, historically the personal names of emperors (for emperors read dictators) have been
tabooed (some dynasties, e.g. Han, Song and Qing, more than others), meaning that if you had to
write a character that happened to be part of the emperor's personal name, then you either
substituted another character (synonym or homophone as appropriate), or wrote the character with the
last stroke omitted. This later practice was prevalent during the Qing dynasty (1644-1911). For
example, the character hong ºë [U+5F18] is often found written without the final dot on the bottom
right in texts dating from and after the reign of the Qianlong emperor (r.1736-1795), whose personal
name was hongli ºë•Ñ [U+5F18, U+66C6].

Whilst an editorial decision may be made to transcribe all instances of the tabooed form of ºë
[U+5F18] as ºë [U+5F18] for a given text, because these tabooed forms are so useful for dating
purposes, textual scholars often have to refer to the tabooed form as distinct from the canonical
form (I myself have had to do so, and have been reduced to using awkward formulae such as "the
character ºë with a missing final stroke").

I was thinking that perhaps there might be a need for a new Unicode block - "CJK Taboo Replacement
Characters", but having just looked at the chart for CJK Unified Ideographs Extension B
<http://www.unicode.org/charts/PDF/U20000.pdf> (scary reading for you font developers), I notice
that the tabooed form of hong is encoded at U+2239E, as is at least one other taboo-form that I
checked (U+248E5).

Andrew West



This archive was generated by hypermail 2.1.2 : Fri Aug 09 2002 - 04:01:20 EDT