re: UTS#10 (collation) : request for a new "Separating" mode for variable weighting (3.2.2)

From: Philippe Verdy (
Date: Sat Jul 31 2010 - 09:48:54 CDT

  • Next message: Doug Ewell: "Re: CSUR Tonal"

    In order to avoid confusions with the modes named "Blanked" or
    "Separating", May be we could adopt a clearer general syntax for them:
    - "Blanked" -> "[]"
    - "Separating" -> "[.0201*]"
    - "Shifted" -> "[.0000.0000.0000*]"

    This syntax explicitly states the collation weights that are inserted
    in variable elements, and the "*" is a place holder stating where the
    default weights from the DUCET are inserted in variable elements. Its
    absence means that non insertion occurs, but instead all the remaining
    weights are filled with [.0000], as needed in the current collation

    This effectively translates what happens to the weights and how new
    weights are inserted in collation elements (at the begining for
    variable elements, or at end with weight FFFF for non variable

    Nothing is needed for other collation elements, but if needed we could
    specify that they use a specific weight, for tailoring purpose, in a
    syntax like:

    Variable:[.0201*]; [\p{Bidi:R}]:[.FFFF*]

    Which would mean that Variable elements are shifted one level up by
    inserting primary weight [.0201] followed by all weights from the
    DUCET, and that collation graphemes starting by a strong RTL character
    are shifted one level up by placing them with primary weight [.FFFF]

    All the other ignorable characters being filled by implicitly
    appending [.0000], and all the other non-ignorable and non-variable
    characters being filled by appending weight [.FFFF]. In all cases, the
    last implicit level (displayed in the DUCET) should still be the code
    point scalar value (or 00000 if the collation element is not the first
    one in an expansion, or if it was inserted from a contraction by
    inserting a ignorable filler to hold the other codepoint scalar values
    not representable in the first collation element).


    This archive was generated by hypermail 2.1.5 : Sat Jul 31 2010 - 09:51:26 CDT