From: Philippe Verdy (email@example.com)
Date: Wed Mar 17 2004 - 15:11:58 EST
----- Original Message -----
From: "Peter Kirk" <firstname.lastname@example.org>
To: "Philippe Verdy" <email@example.com>
Cc: "Unicode Mailing List" <firstname.lastname@example.org>
Sent: Wednesday, March 17, 2004 8:11 PM
Subject: Re: Investigating: LATIN CAPITAL LETTER J WITH DOT ABOVE
> On 17/03/2004 09:59, Philippe Verdy wrote:
> >Arcane Jill <email@example.com> wrote:
> >>But if you lowercased that, surely you'd get <j, combining dot above>.
> >>How should that be rendered?
> >This is already addressed: lowercase j is "soft-dotted" meaning that its
> >dot disappears when there's a diacritic above it, and this includes the
> >combining dot above.
> >So <j, combining dot above> is not canonically or compatibility equivalent to
> ><j>, but both normally look the same when rendered, and the difference that
> >invisible in lowercase, comes back to visible when converted back to
> >So the semantic is preserved...
> But if you had a font (e.g. a Celtic one) in which lower case i or j is
> dotless, should the soft-dottedness be cancelled and the dot appeared
> anyway? (Dare I suggest that this would give a way of writing Turkish
> with a Celtic font? Probably not as it would mean non-standard encoding
> of the Turkish text.)
In my opinion yes, a sequence <lower case i or j, combining dot above> should
show the dot even in the Celtic font. The "soft-dotted" property only implies
the appearance of the implicit dot associated with <lower case i or j>, but has
no effect on the following <combining dot above> which is explicitly requesting
the presence of the dot.
So a Celtic font may very well be used to show Turkish text, at the price of a
change of encoding, something that would probably not happen. So if the standard
Turkish text is rendered with the Cletic font, it will not be rendered
correctly, as the Celtic font will display both the soft-dotted <lowercase i or
j> and <lowercase dotless i or j> exactly the same way, unless the renderer is
instructed that the text to render is Turkic, and the Celtic font contains
instructions to restore the implicit dot for <lowercase i or j> for Turkic text.
The font may for example (1) recognize the language tags in the text stream, if
present, or (2) it may contain language-specific character-to-glyph substitution
tables, that a language-aware renderer would be able to use if instructed to do
so by the application using this renderer and instructing the renderer with a
language code option. A priori I prefer option (2), as language tags in the text
stream is already a deprecated method, that requires inserting additional
characters in the plain-text stream to render, and also because the language
information is most often encoded out of the band, for example by a xml:lang
attribute of a container XML element whose content is a text-element (each
text-element in XML is the largest unit of plain-text coded in a XML document,
XML itself not being plain-text by itself but a encoding syntax for general
This archive was generated by hypermail 2.1.5 : Wed Mar 17 2004 - 15:45:08 EST