Re: CGJ for Two Greek Ligatures?

From: Doug Ewell (dewell@adelphia.net)
Date: Sun Mar 06 2005 - 21:33:42 CST

  • Next message: Patrick Andries: "Re: Unicode Stability"

    "vlad" <emperor dot vlad at gmail dot com> wrote:

    > Well, it seems to me that i is undefined as to whether it has a dot
    > above or not. Its dot may disappear when ligated or when replaced with
    > a diacritic, and some fonts might display it dotless in all situations
    > (for example, fonts where lowercase letters are displayed as small
    > caps). In a situation where it is important to distinguish between
    > dotted and undotted forms, wouldn't it make sense to encode an
    > explicitly dotted form, rather than an ambiguous one?

    It might help at this point if we all flipped open our Unicode 4.0 books
    and turned to section 7.1, "Latin," on page 169. (Virtual page-flippers
    can visit http://www.unicode.org/versions/Unicode4.0.0/ch07.pdf and turn
    to page 6 of the PDF file.)

    The heading "Exceptional Case Pairs" describes how Turkish and Azeri İ
    and ı take the ordinary Basic Latin letters i and I, respectively, as
    their case pairs. There is no provision to encode a separate
    "explicitly dotted i" or "explicitly undotted I" for Turkish and Azeri
    use, because there is no need.

    Just below that, the heading "Diacritics on i and j" describes the
    "soft-dotted" principle, by which the dot on a lowercase i or j is
    replaced by a non-spacing mark above. The letter i with combining
    diaeresis displays the same as undotted ı with diaeresis (well, let's
    see: ï, ı̈), but they are not the same. This is true for ligation as
    well, which is why an fi-ligature in Turkish or Azeri is ambiguous.

    Unicode can't handle every possible font scenario. I suppose if someone
    used a small-caps font to display Greek, then lowercase ς and σ would
    both display as a small Σ, and someone might see that as a security
    hole. That doesn't mean every such scenario should, or can, be solved
    by encoding a new "unambiguous" lookalike character, as suggested here.
    (Among other things, as Peter Kirk pointed out, this would pose a
    problem for untold volumes of existing Turkish and Azeri data --
    although he made it sound as though he was contradicting my point about
    the identity of the character, which puzzles me because they were
    orthogonal points.)

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Sun Mar 06 2005 - 21:40:11 CST