RE: Control picture glyphs (was Re: Apostrophes at www.unicode.org)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Aug 28 2007 - 00:40:51 CDT

  • Next message: Philippe Verdy: "RE: Control picture glyphs (was Re: Apostrophes at www.unicode.org)"

    Mark Davis wrote:

    > I think it makes sense to support most if not all 26 whitespace in

    > fonts, although I'd group into the following priorities (but the

    > priorities would depend on the target audience for the font).

    > http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:whitespace:]

     

    I think you should have avoided surrounding these whitespaces by spaces in the parentheses, due to the way they combine; also figure space and punctuation space are not essential for the correct typography of texts in each language, unlike the thin space (in French typography, near punctuations or as separators for groups of digits) or hair space (same usage but in English-style typography) :

     

    Also the support for the Zl and Zp characters is mandatory for HTML, so they should not be in the gray area.

     

    In fact, the HTML 4.01 standard list of entities is significant here, as it also includes the EN SPACE and EM SPACE.

     

    The figure space and punctuation space are not referenced in any major standard, they should be in the gray area, because most often, they can be replaced respectively as a non-breaking space (NBSP for the figure space) or the narrow non-breaking space (NNBSP for the punctuation space), without affecting much of the document (except that it won’t specify exactly which typographic tradition it follows, and better replacements may be the THIN SPACE and AIR SPACE). Note that the NARROW NO-BREAK SPACE does not specify if its implied width is THIN or HAIR (it depends on the typographical tradition for rendering, French typography for THIN, English typography for HAIR). The figure space should never be “narrow” (and so, never “thin” and never “hair”).

     

    Also I use here square brackets to show more visibly the differences of widths between those spaces. I use an em dash for those that have no glyph and are not needed in fonts (should be handled directly by renderers and code parsers; one exception is TAB which may be mapped like a regular SPACE, but actual tab widths depend on the renderer and the default mapping is reasonable if the renderer supports no tabs; VT and FF, if mapped, should be zero-width spaces for renderers that ignore these controls, as they are typically present at the beginning of lines after line breaks or paragraph breaks or at the end of documents). If they have glyphs in fonts, they should be mapped only within the separate “visible controls” feature of that font to visible control glyphs (so these glyphs will be ignored by default in all renderers, except if the renderer is told explicitly by the user, not by the document itself, to use some “visible controls” rendering mode for editing instead of generating line-breaks).

     

    Here is my better summary of the situation.

     

    General

    0009 — no name # TAB (may be mapped like the regular SPACE in fonts)

    000A — no name # LF (generates a line break, no glyph needed!)

    000B — no name # VT (generates a line break, no glyph needed!)

    000C — no name # FF (generates a line break, no glyph needed!)

    000D — no name # CR (generates a line break, no glyph needed! ignored after or before LF)

    0085 — no name # NEL (generates a line break, no glyph needed!)

    2028 — LINE SEPARATOR # (generates a line break, no glyph needed!)

    2029 — PARAGRAPH SEPARATOR # (generates a line break + paragraph break, no glyph needed!)

    200A [ ] HAIR SPACE # espace fine anglaise (1/8 cadratin = 0.15 em); should be a bit narrower

    2009 [ ] THIN SPACE # espace fine française (1/6 cadratin = 0.2 em)

    202F [ ] NARROW NO-BREAK SPACE # espace fine insécable (most often 1/6 cadratin = 0.2 em)

    0020 [ ] SPACE # espace (1/3 cadratin = 0.4 em)

    00A0 [ ] NO-BREAK SPACE # like SPACE ; espace insécable (1/3 cadratin = 0.4 em)

     

    Gray Area

    2008 [ ] PUNCTUATION SPACE # width of comma “,” (about 1/4 cadratin = 0.3 em, but depends on font design)

    2008 [,] alternate character shown here just to compare

    2007 [ ] FIGURE SPACE # width of digits “0” to “9” (about 1/2 cadratin = 0.6 em, but depends on font design)

    2007 [0] alternate character shown here just to compare

    2002 [ ] EN SPACE # larger space, width of “n” (1/2 cadratin = 0.6 em)

    2002 [–] alternate character EN DASH shown here just to compare

    2002 [n] alternate character “n” shown here just to compare

    2002 [M] alternate character “M” shown here just to compare (generally larger than “n”)

    205F [ ] MEDIUM MATHEMATICAL SPACE # SPACE + EN SPACE (about 3/4 cadratin = 0.9 em)

    205F [  ] alternate characters (SPACE + EN SPACE) shown here just to compare (1/4 + 1/2 cadratin)

    2003 [ ] EM SPACE # extra large space (1 cadratin = 1.2 em) ; not the width of “M”!

    2003 [—] alternate character EM DASH shown here just to compare

     

    Specialized for typography or page layout

    2006 [ ] SIX-PER-EM SPACE # (= 1/6 cadratin exactly = 0.2 em)

    2005 [ ] FOUR-PER-EM SPACE # (= 1/4 cadratin exactly = 0.3 em)

    2004 [ ] THREE-PER-EM SPACE # (= 1/3 cadratin exactly = 0.4 em)

    2000 [ ] EN QUAD # half-square (= 1/2 cadratin exactly = 0.6 em)

    2001 [ ] EM QUAD # square (= 1 cadratin exactly = 1.2 em)

     

    Specialized in fonts for specific scripts

    1680 [ ] OGHAM SPACE MARK

    180E [᠎] MONGOLIAN VOWEL SEPARATOR

    3000 [ ] IDEOGRAPHIC SPACE # width of ideographic “squares” (about 1 cadratin = 1.2 em)

     

    The cadratin width is normally the same as the cadratin height to define the definition square at normal interlining (1.2 em, where the “em” unit is in fact the M-height of the font, excluding descenders and vertical gaps)

     

    In ideographic fonts, where there’s no descenders or ascenders, the cadratin height is also the em height and the height of the glyph, not the total height of the square in the page grid (about 1.2 em). Usually in those fonts, the cadratin height equals the cadratin width, and consequently the em height equals the em width.

     

    This is not the case with all alphabetic font designs where the effective em width and em height are different (in rendered millimetres, but also in effective display pixels due to possible resolution difference, however CSS generally defines an “ideal” virtual pixel which is square); HTML and CSS should permit the distinction between the two ems, by considering the direction of writing to see what an “em” means, but some browsers use only a single unit horizontally and vertically, taken from the “em” height of the currently applicable font, as they still don’t support presenting texts vertically… but this affects the definition of box widths, like tables and table cells, or positioned blocks that are often inconsistent across browsers if specified in “em” instead of pixels (bad for usability) or percents of page width (better), if font styles are not forced (forcing unique font styles is also bad, as the font may not be available…). This ambiguity also depends on the version of CSS implemented…

     

    In all cases (alphabetic or ideographic fonts), the cadratin width may be narrower than the cadratin height in narrow fonts (in ideographic fonts, this means also that the grid is not square but rectangular).

     

    The cadratin height is not necessarily the vertical height between two paragraphs (because there maybe additional margins before and after paragraphs), only between two lines of the same paragraph with normal line spacing.

     



    This archive was generated by hypermail 2.1.5 : Tue Aug 28 2007 - 00:42:27 CDT