Re: BOM as WJ?

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Nov 19 2003 - 20:44:00 EST

  • Next message: jameskass@att.net: "Re: creating a test font w/ CJKV Extension B characters."

    From: "Peter Kirk" <peterkirk@qaya.org>
    > >Of course this is not a standard normalization form, but using this
    pseudo
    > >combining class may help render the last two coded strings (in my quote
    > >above) equivalently in renderers.
    > >This works even in the case where there are multiple diacritics (noted
    CC1
    > >and CC2 below):
    > > <NBSP,CC1,CC2>
    > >is then treated as if it was:
    > > <WJ,SP,WJ,CC1,CC2>
    > >and then the pseudo-normalization had given:
    > > <WJ,SP,CC1,CC2,WJ>
    > >or:
    > > <WJ,SP,CC2,CC1,WJ>
    > >(depending on the canonical reordering of CC1 and CC2, i.e. of their
    > >relative combining class)
    >
    > This trick doesn't work if any of the CC's are in combining class zero.

    Of course, but which combining character of combining class 0 does need to
    combine with NBSP in a way that affect renderers?

    Do you think about sequences like <NBSP,CGJ>?

    Or about issues when rendering <07A6;THAANA ABAFILI;Mn;0;NSM;;;;;N;;;;;>
    after <NBSP>
    which of wourse would be handled only as <WJ,SP,WJ,THAANA ABAFILI> ?

    Or about: <0901;DEVANAGARI SIGN CANDRABINDU;Mn;0;NSM;;;;;N;;;;;> after
    <NBSP>
    rendered as if it was <WJ,SP,WJ,CANDRABINDU> ?

    Or about <0903;DEVANAGARI SIGN VISARGA;Mc;0;L;;;;;N;;;;;> after <NBSP>
    which is this time a "Mc" character ?

    Or about all the Indic vowels which do not seem to be really combining on
    NBSP but would be rendered as a space followed by a defective isolated form
    of the vowel (so without vowel glyphs reordering around the space) ?

    Just curious...

    If we just say that <NBSP> behaves in all cases in renderers as if it was
    <WJ,SP,WJ> where WJ is reordered with a pseudo-combining class 256, it
    solves much problems with the interpretation of NBSP, and it looks like if
    NBSP was a space letter; however NBSP is not a "Lo" character but really a
    "Zs" whitespace and thus justifiable out of the end margin; also NBSP does
    not prohibit word break but only line breaks), so it is more like if it was
    in fact: <LJ,SP,LJ> where LJ is a line-joiner, distinct also from ZWJ
    (zero-width joiner) used to hint ligatures but which does not brohibit any
    break.



    This archive was generated by hypermail 2.1.5 : Wed Nov 19 2003 - 21:26:37 EST