[Proposal] Extended UTF-16 by using Plane 14

From: Masahiko Maedera (masahiko.maedera@nifty.ne.jp)
Date: Sun Apr 11 1999 - 22:29:02 EDT


Dear, all.

  I have just sent a mail here first time.
  I am Masahiko Maedera, Japanese Software Engineer
  in Lotus Development Japan.
  I am making Unicode Text Editor on Windows9X/NT
  for three years (since 1996).

  Now I have a problem that
  the area 0x00100000-0x7FFFFFFF of UCS-4 can not be mapped by UTF-16.

  I think that this area may not be used right now.
  But if this area will be used in future,
  we will have serious problems of conversion and compatibility.

  Especially, in ISO-10646-1, we can use Praivate Use Area
  (0x0E000000-0x00FFFFFF, 0x60000000-0x7FFFFFFF),
  And there is no prohibition to use this area now.

  When we meet a plain text which contains these area expressed by UTF-8,
  we must give up to treat this text by UTF-16
  in spite of currect ISO-10646-1 text.
  It is unhappy, indeed.

  Therefore, I offer one proposal to solve this problem.

  If you will offer better proposal than mine,
  I am willing to accept it.
  But It is unhappy for me to accept the condition
  that there is no conversion rule between UCS-4 to UTF-16.

  My proposal is,

-----
At first, I use binary expression.

UCS-4(binary expression):
  0wxxxxxx-xxxxyyyy-yyyyyyzz-zzzzzzzz

Extended UTF-16(binary expression):
  11011011-0111110w, 110111xx-xxxxxxxx,
  11011011-01111110, 110111yy-yyyyyyyy,
  11011011-01111111, 110111zz-zzzzzzzz

Next, I use hexadicimal expression.

UCS-4 range:
  0x00110000-0x3FFFFFFF

Extended UTF-16 expression:
  U+DB7C + low surrogate + U+DB7E + low surrogate + U+DB7F + low surrogate

UCS-4 range:
  0x40000000-0x7FFFFFFF

Extended UTF-16 expression:
  U+DB7D + low surrogate + U+DB7E + low surrogate + U+DB7F + low surrogate
-----

  You may have anxiety that there need be 12 octets to express this area.
  But I think this is trivial thing.
  Becouse it is important that I should guard current efficiency
  of surrogate pairs and I rarely process this area.
  And from this conversion,
  some code points(0x000EF000-0x000EFFFF) in Plane 14 shall be reserved.

Best regards,
  Masahiko Maedera.

--
  1999/04/12
  Masahiko Maedera<Masahiko_Maedera@lotus.co.jp>
  Lotus Development Japan.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT