From: Philippe Verdy (
Date: Wed Mar 17 2004 - 15:11:58 EST

  • Next message: Ernest Cline: "Re: Investigating: LATIN CAPITAL LETTER J WITH DOT ABOVE"

    ----- Original Message -----
    From: "Peter Kirk" <>
    To: "Philippe Verdy" <>
    Cc: "Unicode Mailing List" <>
    Sent: Wednesday, March 17, 2004 8:11 PM
    Subject: Re: Investigating: LATIN CAPITAL LETTER J WITH DOT ABOVE

    > On 17/03/2004 09:59, Philippe Verdy wrote:
    > >Arcane Jill <> wrote:
    > >
    > >
    > >>But if you lowercased that, surely you'd get <j, combining dot above>.
    > >>How should that be rendered?
    > >>
    > >>
    > >
    > >This is already addressed: lowercase j is "soft-dotted" meaning that its
    > >dot disappears when there's a diacritic above it, and this includes the
    > >combining dot above.
    > >
    > >So <j, combining dot above> is not canonically or compatibility equivalent to
    > ><j>, but both normally look the same when rendered, and the difference that
    > >invisible in lowercase, comes back to visible when converted back to
    > >So the semantic is preserved...
    > But if you had a font (e.g. a Celtic one) in which lower case i or j is
    > dotless, should the soft-dottedness be cancelled and the dot appeared
    > anyway? (Dare I suggest that this would give a way of writing Turkish
    > with a Celtic font? Probably not as it would mean non-standard encoding
    > of the Turkish text.)

    In my opinion yes, a sequence <lower case i or j, combining dot above> should
    show the dot even in the Celtic font. The "soft-dotted" property only implies
    the appearance of the implicit dot associated with <lower case i or j>, but has
    no effect on the following <combining dot above> which is explicitly requesting
    the presence of the dot.

    So a Celtic font may very well be used to show Turkish text, at the price of a
    change of encoding, something that would probably not happen. So if the standard
    Turkish text is rendered with the Cletic font, it will not be rendered
    correctly, as the Celtic font will display both the soft-dotted <lowercase i or
    j> and <lowercase dotless i or j> exactly the same way, unless the renderer is
    instructed that the text to render is Turkic, and the Celtic font contains
    instructions to restore the implicit dot for <lowercase i or j> for Turkic text.

    The font may for example (1) recognize the language tags in the text stream, if
    present, or (2) it may contain language-specific character-to-glyph substitution
    tables, that a language-aware renderer would be able to use if instructed to do
    so by the application using this renderer and instructing the renderer with a
    language code option. A priori I prefer option (2), as language tags in the text
    stream is already a deprecated method, that requires inserting additional
    characters in the plain-text stream to render, and also because the language
    information is most often encoded out of the band, for example by a xml:lang
    attribute of a container XML element whose content is a text-element (each
    text-element in XML is the largest unit of plain-text coded in a XML document,
    XML itself not being plain-text by itself but a encoding syntax for general
    structured data).

    This archive was generated by hypermail 2.1.5 : Wed Mar 17 2004 - 15:45:08 EST