Unicode 3.1: UTF-8

From: John Cowan (jcowan@reutershealth.com)
Date: Wed Jan 31 2001 - 14:40:15 EST

I propose that the distinction between illegal and irregular UTF-8
code sequences (D36bc) be eliminated. Since there are no code points
between U+D7FF and U+E000 (the apparently intervening code points
are UTF-16 code units, but not Unicode code points)
the corresponding UTF-8 code sequences should be illegal.

This can be achieved by replacing the U+1000..U+FFFF row in
Table 3.1B as follows:

U+1000..U+CFFF E1..EC 80..BF 80..BF
U+D000..U+D7FF ED 80..9F 80..BF [9F underscored]
U+E000..U+FFFF EE 80..BF 80..BF

