Re: looks like some problem in Scripts.txt file of UCD

From: Kent Karlsson (
Date: Fri Aug 13 2010 - 04:28:51 CDT

  • Next message: William_J_G Overington: "Re: Accessing alternate glyphs from plain text"

    Den 2010-08-13 02.28, skrev "Pravin Satpute" <>:

    > Yes, problem is happening only when these characters come at initial
    > position.
    > i.e U+0951 and U+0952 in isolation should render with U+25cc

    U+25CC should never be inserted automatically. That some systems do so is a
    bug (no matter how consciously it was made). (I know, there are some Indic
    script characters that should have had a canonical decomposition but don't
    have one; using what should have been the canonical decomposition should
    then be marked somehow in rendering, but using a dotted circle in not the
    way to do that I think).

    >> "Inherited" means that the character inherits its Script property from
    >> the preceding character(s), so if either of the stress signs is preceded
    >> by a Devanagari character, it should make no difference whether the
    >> stress sign itself is categorized as Devanagari or Inherited.
    > looks good, but hmm its really hard to guess characters script when it
    > will be alone.
    > I think one need to add extra check, when character will be at initial
    > position with property inherited

    When a combining character sequence is ill-formed ("at the initial
    position"), it should be rendered *as if* applied to an NBSP (regardless
    of script)., section 5.13:
    "Defective combining character sequences should be rendered as if they had
    a no-break space as a base character. (See Section 7.9, Combining Marks.)", section 7.9:
    "Marks as Spacing Characters. By convention, combining marks may be
    exhibited in (apparent) isolation by applying them to U+00A0 no-break

        /Kent K

    This archive was generated by hypermail 2.1.5 : Fri Aug 13 2010 - 04:33:50 CDT