Re: CGJ for Two Greek Ligatures?

From: Doug Ewell (dewell@adelphia.net)
Date: Sun Mar 06 2005 - 21:33:42 CST

Next message: Patrick Andries: "Re: Unicode Stability"

Previous message: E. Keown: "Re: Languages using multiple scripts"
In reply to: vlad: "Re: CGJ for Two Greek Ligatures?"
Next in thread: James Kass: "Re: CGJ for Two Greek Ligatures?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

"vlad" <emperor dot vlad at gmail dot com> wrote:

> Well, it seems to me that i is undefined as to whether it has a dot
> above or not. Its dot may disappear when ligated or when replaced with
> a diacritic, and some fonts might display it dotless in all situations
> (for example, fonts where lowercase letters are displayed as small
> caps). In a situation where it is important to distinguish between
> dotted and undotted forms, wouldn't it make sense to encode an
> explicitly dotted form, rather than an ambiguous one?

It might help at this point if we all flipped open our Unicode 4.0 books
and turned to section 7.1, "Latin," on page 169. (Virtual page-flippers
can visit http://www.unicode.org/versions/Unicode4.0.0/ch07.pdf and turn
to page 6 of the PDF file.)

The heading "Exceptional Case Pairs" describes how Turkish and Azeri İ
and ı take the ordinary Basic Latin letters i and I, respectively, as
their case pairs. There is no provision to encode a separate
"explicitly dotted i" or "explicitly undotted I" for Turkish and Azeri
use, because there is no need.

Just below that, the heading "Diacritics on i and j" describes the
"soft-dotted" principle, by which the dot on a lowercase i or j is
replaced by a non-spacing mark above. The letter i with combining
diaeresis displays the same as undotted ı with diaeresis (well, let's
see: ï, ı̈), but they are not the same. This is true for ligation as
well, which is why an fi-ligature in Turkish or Azeri is ambiguous.

Unicode can't handle every possible font scenario. I suppose if someone
used a small-caps font to display Greek, then lowercase ς and σ would
both display as a small Σ, and someone might see that as a security
hole. That doesn't mean every such scenario should, or can, be solved
by encoding a new "unambiguous" lookalike character, as suggested here.
(Among other things, as Peter Kirk pointed out, this would pose a
problem for untold volumes of existing Turkish and Azeri data --
although he made it sound as though he was contradicting my point about
the identity of the character, which puzzles me because they were
orthogonal points.)

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: Patrick Andries: "Re: Unicode Stability"
Previous message: E. Keown: "Re: Languages using multiple scripts"
In reply to: vlad: "Re: CGJ for Two Greek Ligatures?"
Next in thread: James Kass: "Re: CGJ for Two Greek Ligatures?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Mar 06 2005 - 21:40:11 CST