RE: Unicode Stability (Was: Re: E0000 Language Tags for Some Obscure Languages)

From: Addison Phillips (
Date: Fri Mar 04 2005 - 13:38:07 CST

  • Next message: Asmus Freytag: "Re: Unicode Stability (Was: Re: E0000 Language Tags for Some Obscure Languages)"

    Only if Unicode were expanding at a steady or at an increasing pace and there were no end to the characters available for encoding. Ken Whistler has on various occasions pointed out that the pace of additions is slacking and that, in fact, the world will run out of scripts that need encoding before Unicode's maximum of 1.1 million characters (0x10FFF) is reached.

    That is, there is an upper limit to the size of Unicode and the various Unicode versions are approaching it asymptotically.

    Most fonts don't encode the whole of Unicode, so they aren't really the problem. Fonts that support every assigned code point are unlikely to be more than a curiosity due to the fact that they are difficult to create and offer unattractive tradeoffs (an example: having to choose between at least three distinct writing traditions for Han ideographs). While many of us are eternally grateful for Arial Unicode MS and James Kass's work in this area, most folks choose fonts that encode specific scripts or collections of scripts using a particular writing tradition (for aesthetic reasons).

    Textual data's size increase (due to Unicode) is also fixed. The maximum character size for UTF-8, UTF-16 and UTF-32 is four octets. Most data in UTF-8 and UTF-16 will average fewer octets per character.

    Rumors of Unicode's inefficiency and bloat (notwithstanding certain FAQs) are greatly exaggerated.

    Best Regards,


    Addison P. Phillips
    Globalization Architect, Quest Software

    Chair, Internationalization Core Working Group

    Internationalization is not a feature.
    It is an architecture.

    > -----Original Message-----
    > From: [] On
    > Behalf Of Jeroen Ruigrok/asmodai
    > Sent: vendredi 4 mars 2005 09:27
    > To: Asmus Freytag
    > Cc: Doug Ewell; Unicode Mailing List
    > Subject: Re: Unicode Stability (Was: Re: E0000 Language Tags for Some
    > Obscure Languages)
    > -On [20050303 20:52], Asmus Freytag ( wrote:
    > >You are asking an excellent question, which addresses a perspective that
    > is
    > >certainly motivate by the types of expierence you cite:
    > -blush-
    > >On the second level, Unicode describes how to encode content. Unlike
    > >software, data rarely (if ever) gets updated once it exists. Requiring
    > >existing data to be updated would effectively mean to abandon it to
    > >inaccessibility after a few years. That's totally contrary to Unicode's
    > >aims.
    > >On the third level, there may be a time, sometime in the mists of the
    > >future, when it's time to start over. In terms of Unicode that would be
    > >whenever there's enough reason and momentum behind a successor standard.
    > >However, as we have seen when we created the Unicode Standard,
    > >existing character sets have a way of forcing a new standard to be
    > >compatible, lest it be a non-starter. The same pressure, magnified, would
    > >face any successor standard to the Unicode Standard.
    > Given these points, wouldn't an ever-expanding standard like Unicode be a
    > cause to data bloat at one point? Since you will need continuous larger
    > encoding space to encode certain specific characters?
    > Not to mention that the supporting fonts will get bigger and bigger.
    > Although I have no idea how much of a problem it is given storage prices
    > nowadays.
    > Anyone clued about this? I am a programmer by nature and very much
    > interested in languages (love the CJKV book) so hence I am lurking on the
    > Unicode list (amongst others), so eager to learn. : )
    > --
    > Jeroen Ruigrok van der Werven <asmodai(at)> / asmodai / kita no mono
    > Free Tibet! |
    > |
    > The riddle master himself lost the key to his own riddles one day, and
    > found it again at the bottom of his heart.

    This archive was generated by hypermail 2.1.5 : Fri Mar 04 2005 - 13:39:13 CST