Re: Unused Unicode planes

From: Doug Ewell (
Date: Sat Jan 10 2009 - 23:37:57 CST

  • Next message: Leo Broukhis: "Re: Emoji: emoticons vs. literacy"

    Michael D'Errico <mike dash list at pobox dot com> wrote:

    > Well people say if you want to encode non-plain-text things, then you
    > need to start your own standard. Plain text is a subset of everything
    > you would want to encode, so it makes sense to include everything from
    > Unicode in this new standard. Trying to minimize the effort required
    > to implement a new standard, it also makes sense to utilize the UTF-8
    > mechanism (without the 17 plane artificial limitation placed on it) to
    > access the Unicode part as well as the new non-plain-text part. There
    > is nothing "evil and dangerous" about it, just unfamiliar and
    > untested.

    If you make it look like UTF-8, people and programs will treat it as if
    it were UTF-8 and try to feed it into processes built to handle UTF-8.
    That's what is evil and dangerous.

    A hypothetical "Everycode" standard that encodes arbitrary bits of data
    certainly should include Unicode characters as a subset, but the
    encoding format has to be different enough that nobody will be confused
    about which standard the data belongs to. Check the mail archives;
    there are lots of possible "UTF" ideas that could have been used for
    Unicode, but were not, and might make sense for your project instead.

    Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14  ˆ

    This archive was generated by hypermail 2.1.5 : Sat Jan 10 2009 - 23:40:20 CST