RE: Surrogate points

From: Lars Kristan (
Date: Tue Feb 01 2005 - 04:02:11 CST

  • Next message: D. Starner: "RE: Surrogate points"

    Hans Aberg wrote:
    > A relatively minor change to the UTF-16 would make that
    > condition to go
    > away. The current UTF-16 implementations would merely need to
    > be aware of
    > that these character numbers may be used, and become altered
    > appropriately.
    > This is not an urgent change as these character numbers will
    > not be filled
    > very soon.

    Extending UTF-16 would not be easy. An incompatible implementation would not
    succeed, since in that case you can simply use UTF-8, which we already
    concluded can be extended should the need arise.

    But all this is rather irrelevant. There are enough codepoints. Yes, such
    statements have been made before, and were proven wrong, but this fact alone
    does not mean the same thing will happen again. You are falsely applying the
    Moore's law here. Assigning codepoints doesn't have much to do with
    processing power or storage capacity growth. It has to do with people. And
    not the number of people, but the number of scripts and characters in them.
    Those don't grow much, in fact they are probably declining.

    A new glyph will probably pop-up now and then, the demand for it is most
    likely to come from mathematics or physics. I would not be concerned that we
    will run out of codepoints on that account. Things that would fill up the
    codepoints would be:
    * Artificial scripts, like Klingon
    * Formatting (escape) codes
    * An alien race

    Unicode doesn't encode the first two, an the third one is unlikely. And if
    it happens, I bet we would adopt their encoding (along with the rest of the
    technology) rather than vice versa.

    A decision was once made that 21 bits will suffice. An so far it seems it
    will indeed. The industry will not be willing to make an investment into
    something that we may never need. And as I already said, if we ever run out,
    UTF-16 will probably be long gone by then.


    P.S.: Speaking of introducing new glyphs, I wonder how is one supposed to
    introduce a new one. One needs to prove a glyph is in use, but then again
    nowadays a glyph cannot be used (efficiently) until it is encoded. Catch 22?

    This archive was generated by hypermail 2.1.5 : Tue Feb 01 2005 - 04:13:36 CST