RE: dotless j

From: Christopher J. Fynn (cfynn@dircon.co.uk)
Date: Mon Jul 05 1999 - 22:59:23 EDT


On Monday, July 05, 1999 at 7:47 PM
G. Adam Stanislav [mailto:adam@whizkidtech.net] wrote:

> On Mon, Jul 05, 1999 at 02:14:20PM +0100, Christopher J. Fynn wrote:

> > However if you know of specific j+diacritic combinations which are
> > widely used in writing some language but are not found encoded as characters in
> > the Unicode Standard, then perhaps you should demonstrate this and try to make the case
> > that these combinations should be individually encoded as separate
> > characters on the basis of prior usage or compatibility.
 
> The problem is that they are NOT widely used. The language I was
> having in mind is mathematics (and related languages, such as physics).
> In mathematic notation it is possible to use any Latin and Greek
> (and other) character with just about any diacritic.

Although I think it is a borderline case if I had a vote I'd support a proposal
to encode a dotless j, in spite of it's weak claim to the status of a character.
IMO encoding this entity would solve more problems than it would create
(though these problems should not be minimised).
I don't think a dotless j would really open up the door to other entities
which have a weaker claim.

I do agree with Peter C. and Ken W. that a line must be drawn somewhere
but, if it was up to me (and it certainly isn't), I'd be inclined to
draw that line just on the other side of a dotless-j.

> And it is not a matter of prior use in this case either. Who are we to tell
> mathematicians and physicists that they may only use characters that are
> naturally dotless in their notation?

I don't want to tell them anything. Since they are going to have *all* the
characters in Unicode to play with we should see some interesting notation
in future.

> Clearly, I am not the only one here who thinks a dotless j would be useful
> (for what it's worth, I did not start this discussion, someone else did).

> Here is a question about dotless i, by the way: It has been stated that it
> is a true character because it is used in Turkish. I would like to know if
> *that* was the reason Adobe has been including it in every font. I have
> my suspicion that Adobe included it in their original PS fonts precisely
> because they felt the need to be able to create the letter i with any
> diacritic possible, not because it also happened to be a Turkish
> character. (And please do not interpret this as being negative about
> Adobe, I happen to think very highly about Adobe.)

I suspect you are right about this. Early Adobe fonts used composite
glyphs and dotlessi was used in constructing some of these.
 
> Again, I repeat my suggestion that no one has replied to yet: Why not have
> two standards? One for glyphs, one for characters and only characters?
> After all, we are dealing with computer communications, and just about
> all other computer communications is done in layers. It seems to make
> a lot of sense to have a different standard for application software,
> and a different one for system (or presentation) software.

I think you are going a much too far here - perhaps a standard mapping from
characters to a unified glyph space, like Michael suggested for plane-17 or
something, would ease implementation and be useful for simple (simplistic?)
designs (though font designers who felt constrained by the limits such a standard
scheme would impose would have to be able to be able to override such a scheme if they wished.

When you are talking about a character for every composite glyph used in a language
you are talking about a *lot* of glyphs. For example if you want to make a font that
has ligatures for every stack that occurs in Tibetan without making design
compromises you are looking at a font with well over 5,000 glyphs for that script
alone. (If you add ligatures for problem letter or stack combination occurring side
by side this figure would be multiplied.

Although the number of compound found glyphs in Tibetan is enormous, less than
200 *characters* are necessary for encoding Tibetan data. So with this
script you have about a 1 to 25 relationship between characters and presentation
glyphs. (Since many of the Tibetan characters are for punctuation etc which don't
form compounds the relationship between consonants and vowel characters and their
presentation forms is actually much higher).

And that's just Tibetan (which I'm familiar with) - there are a
lot of other complex Indic scripts. What you suggest also implies having
to dis-unify the whole CJK block, - and how about all the presentation
/ ligature glyphs you might need for Urdu?

The mind boggles.
 

- Chris

 



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT