Re: Exemplifying apostrophes

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Mon May 19 2008 - 02:00:50 CDT

  • Next message: André Szabolcs Szelp: "Re: Exemplifying apostrophes"

    Eric Muller wrote:

    > One of the goals of the UDHR in Unicode project is indeed to show (via
    > the translations themselves) and to document (via the notes) what
    > could be called "best practices for Unicode use".

    For that purpose, shouldn’t the data have been checked _before_ making
    it public via the “UDHR in Unicode” http://unicode.org/udhr/ pages?
    According to http://unicode.org/udhr/index_by_name.html there are only 4
    “complete and reviewed” translations, out of 347. And even they use
    HYPHEN-MINUS (instead of a dash) in the main heading like “Universal
    Declaration of Human Rights - French” and incorrectly leave spaces
    around the dash in the boilerplate expression “© 1996 – 2007” below that
    heading.

    > Most of the texts have been "rescued" from the UN site,

    If there’s a point, then, the data should be essentially more correct
    than the UN pages. But that doesn’t seem to be the case.

    What is the added value when the stated purpose is to demonstrate the
    use of Unicode and even “best practices for Unicode use”?

    Moreover, I find it highly questionable whether e.g. the declared policy
    (see http://unicode.org/udhr/tech_whichcharacter.html ) of using U+2010
    HYPHEN is best practice on web pages as of today and near future. It
    wins nothing in practice but loses quite a lot when the browser or
    associated software (such as a speech synthesizer) cannot handle U+2010
    HYPHEN but has no problem with U+002D HYPHEN-MINUS.

    The use of U+2019 RIGHT SINGLE QUOTATION MARK as a punctuation
    apostrophe or otherwise (when applicable) is reasonably safe to justify
    calling it best practice, though somewhat debatably. Yet, this issue isn’t
    even mentioned on the ”Which character?” page. Promoting the use of
    characters like U+2010 with relatively limited support in fonts simply
    isn’t right. For reasonably up-to-date information on font support to
    it, consult
    http://www.fileformat.info/info/unicode/char/2010/fontsupport.htm
    (which lists just a set of Lucida fonts; there are some additional, less
    common fonts that contain it).

    In many situations, a browser will use a glyph for a character from a
    different font when none exists in the primary font. While this is often
    useful e.g. for technical symbols, it easily leads to confusion,
    especially for characters like hyphens, dashes, and relatives. For them,
    length is essential, and most (though not all) fonts have reasonable
    widths assigned to them _relative to each other_. For example, the
    HYPHEN-DASH is shorter than EN DASH, in most fonts. But when you mix
    such characters from different fonts, such relationships are often lost.

    Somewhat similarly, the varying apostrophe-like characters might be
    reasonably implemented in a given font, but a mixture of fonts may mess
    this up.

    Jukka K. Korpela ("Yucca")
    http://www.cs.tut.fi/~jkorpela/



    This archive was generated by hypermail 2.1.5 : Mon May 19 2008 - 02:05:32 CDT