RE: more dingbats in plain text

From: Peter Constable (
Date: Sat Apr 18 2009 - 22:52:51 CDT

  • Next message: Roozbeh Pournader: "Re: more dingbats in plain text"

    From: Andrew West []

    >> Not 100% true. These fonts are encoded in a encoding called
    >> "symbol" -- which means a font-specific encoding. The Symbol
    >> encoding uses a 16-bit representation in the fonts, and
    >> typically fonts have characters mapped from F020 to F0FF. It
    >> looks a lot like Unicode PUA, though strictly speaking it is
    >> not.

    > Not 100% true either according to

    > <>

    > <quote>
    > Non-Standard (Symbol) Fonts


    > For non-standard fonts on Microsoft platforms, however, the 'cmap' and
    > 'name' tables must use platform ID 3 (Microsoft) and encoding ID 0
    > (Unicode, non-standard character set).

    Interesting. This is a bit of an inconsistency in the OT spec, then, since the specification for the cmap table itself

    certainly does not use the name "Unicode, non-standard character set" for encoding ID 0 in platform 3. Rather, it calls this "Symbol". Another inconsistency is that it also says, "Remember that 'name' table encodings should agree with the 'cmap' table", but there's clearly some hand-waving going on in all that: the name records for platform 3 are expected to give the encoding ID as 0, but the strings themselves certainly are not in the "Symbol" encoding (however you interpret that); they are in UTF-16.

    > The Microsoft 'cmap' subtable (platform 3, encoding 0) must use format
    > 4. The character codes should start at 0xF000, which is in the Private
    > Use Area of Unicode.
    > </quote>

    It's certainly the case that this recommended range for encoding ID 0 was chosen to align with Unicode PUA, and may well have been thought of by some as Unicode PUA.

    But looking at the entire ecosystem where symbol fonts are concerned, it's clear that Unicode PUA is not always used but rather that a variety of other encodings may be used depending on the context. As you pointed out elsewhere, there was a need to provide compatibility with older, pre-Unicode symbol font implementations that "used ASCII" (or, rather, used 8-bit non-standard encodings). Therefore, it is crucial for implementations that platform 3 encoding 0 not be interpreted simply as 'Unicode PUA', but rather as 'this special Symbol encoding that uses Unicode PUA in some contexts but non-standard 8-bit encodings in some other contexts'.

    It is in that sense that I say that the Windows Symbol encoding in TrueType and OpenType is not simply Unicode PUA -- the text in recom.htm that you quoted notwithstanding.


    This archive was generated by hypermail 2.1.5 : Sat Apr 18 2009 - 22:55:37 CDT