UTC/1999-020 (text)

Diagram is at <http://www.unicode.org/unicode/members/UTC/u1999-020.pdf>

 1999-06-04

From: John Jenkins [jenkins@apple.com]

Subject: Diagram and language

The diagram (which should become figure 10-8) is also on the book ftp site
in INCOMING/jhj/Diagram.pdf.

Also insert the following text into the section on Ideographic Description
Characters, immediately before the sub-subsection on "Equivalence":
"The fact that Ideographic Description Sequences can contain other
Ideographic Description Sequences means that implementations may need to be
aware of the <I>recursion depth</I> of a sequence and its <I>back-scan
length</I>. The recursion depth of an Ideographic Description Sequence is
the maximum number of pending operations encountered in the process of
parsing an Ideographic Description Sequence. In Figure 10-8 the maximum
recursion depth is on the third line, where three operations are still
pending when you come to the end of the IDS.

"The back-scan length is the maximum number ideographs unbroken by
Ideographic Description Characters in the IDS. None of the examples in
Figure 10-8 have more than two ideographs in a row; for all, the back-scan
depth is two. If you access the middle of a text stream and encounter an
ideograph, the back-scan length tells you how far backwards you have to go
to know whether or not the ideograph is part of an Ideographic Description
Sequence. In the examples, you never have to go back more than two
characters.

"The Unicode Standard places no limits on the recursion depth of Ideographic
Description Sequences. It does, however, limit the back-scan depth for
valid Ideographic Description Sequences to be five or less. This is to
simplify the work done by Unicode implementations that parse Ideographic
Description Sequences."

Looking over chapter 10:

Table 10-1 is OK except that we appear to have an extraneous "KS" in the
cell immediately after the "U source:"

Do we still want an example of how "it may be desired to treat the
Ideographic Description Sequence as a ligature of the individual characters
for purposes of hit-testing, cursor movement, and other user interface
operations"?

We also need two in-line graphics for the "water" radical: in its full form
and in its three-dot form. How should these be delivered?

In §10.6 (Yi), we have an "XXX" in "Naming conventions and order" that I'm
|supposed to fill in. The answer is "high tone" (per ISO/IEC
JTC1/SC2/WG21481). "-t" is the high tone, "-p" is the low tone, "-x" is the
middle high tone, and no tone mark is the middle low tone.

=====

John H. Jenkins
jenkins@apple.com
tseng@blueneptune.com
http://www.blueneptune.com/~tseng