Re: starters and non-starters

From: Doug Ewell (doug@ewellic.org)
Date: Tue Feb 02 2010 - 19:53:20 CST

  • Next message: Doug Ewell: "Re: Uniocde protocol or URNs?"

    Mark E. Shoulson <mark at kli dot org> replied to spir:

    >> Also, these definitions seem to imply that a combining sequence
    >> cannot be originally defined with the base following a combining
    >> mark, eg that a source text holding<U+0307 combining dot above,
    >> U+0064 latin small letter d> is simply illegal. Is this true? If
    >> yes, a sequence of 2 codes can only be properly ordered and we can
    >> safely start reordering from the *third* code.
    >
    > COMBINING DOT ABOVE followed by LATIN SMALL LETTER D would not be a
    > valid sequence, correct, but you should start working from the d, not
    > the code that follows. After all, the "d" by itself *IS* a valid
    > sequence, whether or not a combining character comes after it. It's
    > the orphaned combining dot that is defective.

    There's another problem with spir's original statement. You can't say
    that "a source text holding <0307, 0064> is illegal" because the U+0307
    might not be orphaned at all, but might be preceded by another base
    character. The bracketed text [ėd] consists of the sequence <0065,
    0307, 0064> and is perfectly legal.

    Perhaps spir meant "a source text containing *only* that sequence" or
    "starting with that sequence." This is a nitty detail, but when dealing
    with an inherently stateful concept like combining sequences, nitty
    details matter.

    --
    Doug Ewell  |  Thornton, Colorado, USA  |  http://www.ewellic.org
    RFC 5645, 4645, UTN #14  |  ietf-languages @ http://is.gd/2kf0s ­
    


    This archive was generated by hypermail 2.1.5 : Tue Feb 02 2010 - 19:55:25 CST