Re: Some questions about Latin diacritics

From: Asmus Freytag (
Date: Wed Jun 14 2006 - 18:22:43 CDT

  • Next message: Richard Wordingham: "Description of CGJ"

    On 6/14/2006 1:56 PM, Richard Wordingham wrote:
    > Philippe Verdy wrote on Wednesday, June 14, 2006 at 12:56 PM
    >> Regarding the i-shaped "Haken" phonetic diacritics included in the
    >> PDF (for the "hline Offen" and "überoffen" vowel qualifiers), I see
    >> them like simple or double dotless i subscripts (their form are very
    >> similar to the form of the small i letter under which they are drawn,
    >> except that they just lack the top arm, but the resolution of the
    >> bitmap is iunsufficient to really decide) which may merit encoding...
    > Are we attempting an exact reproduction of the glyphs, or are we
    > looking for the correct encoding of texts?
    What we are looking for is to correctly reflect the text. If there are
    different *conventions* in writing down a concept, it's not correct to
    say "oh they mean the same thing, give them the same code point".
    However, if there are different visual style for the same symbol, then
    we do unify.

    An example for the latter is the use of inclined vs. upright integral
    signs. The two are the same symbol (integral sign), so the style is
    relegated to the font.

    In regular expression syntax, you can find both ^ and ~ used to negate a
    character class, as in
    [~a] or [^a] (anything, but 'a'). These are two different conventions
    for the same concept, but they are using two different symbols. It's not
    correct in that case to unify ~ and ^ into a single character.

    I suspect that for phonetics, sometimes there's a common symbol with
    different typographical style, and sometimes there is the use of a
    different symbol. I'm not knowledgeable enough in that discipline to
    help decide the particular question, but in listening to arguments pro
    or con it helps me when the proponents are aware of the distinction I've
    drawn above and can directly address it.

    When there's a doubt whether its two styles of the same symbol or two
    symbols used for the same concept, Unicode has often preferred to err on
    the side of allowing separate code points for dissimilar looking
    symbols. This allows for the possibility that something that looks
    different can be assigned different meanings in some other notation or

    I'm not sure, whether in this context, I find the following
    argumentation ultimately compelling:
    > The hooks have the semantic of U+031C COMBINING LEFT HALF RING BELOW,
    > i.e. more open pronunciation. One needs a very good reason to encode
    > them as anything else. In particular, you need to be sure that they
    > are not simple 'squiggle below'. The diacritic for openness is very
    > variable - in Yoruba it can be a vertical line, or even, through the
    > absence of a background in phonetics, a mere dot.
    It does not address the question of whether these differences are more
    than font styles, but reflect different notational conventions.

    There are a couple of cases where in mathematics, continental European
    notations actually use different symbols from American style. (And
    usages also shift over time).


    This archive was generated by hypermail 2.1.5 : Wed Jun 14 2006 - 18:28:23 CDT