Re: U+0140

From: Ernest Cline (
Date: Fri Apr 16 2004 - 09:26:17 EDT

  • Next message: Bernard Desgraupes: "Re: U+0140"

    > [Original Message]
    > From: Antoine Leca <>
    > ... it is vastly more easy to keep the obvious unification, rather than
    > trying to distort it and trying to make a conditional mapping, if
    > Mathematics, => U+00B7, if Catalan, => U+2027, if NoSeQue, =>
    > some_other_random_middle_dot, etc. Unlike hyphenation rules (where the
    > mapping might very well be => U+2027, by the way), which are pretty easy
    > to pinpoint, tagging Catalan in bulk text is clearly not a easy task. Even
    > when considering the fairly restrictive rules for it to occur (requiring
    > NFC):

    I don't see that as being any worse than the set of HYPHEN_MINUS,
    HYPHEN, MINUS SIGN, etc., which depending upon your taste in
    such matters could be seen as an example of what to do or what
    not to do. That said, let me switch the topic to something almost
    completely different.

    Given the nature of U+0140 (and U+013F) when hyphenated, might it
    not be a good idea to assign these two characters their own Line
    Break class for the Line Breaking Algorithm of UAX #14? These two
    characters if I understand the comments correctly, always provide
    a line breaking opportunity after them, but if that line break opportunity
    is taken, the dot must disappear, so an implementation that is not
    prepared to remove the dot should ignore the opportunity.

    This archive was generated by hypermail 2.1.5 : Fri Apr 16 2004 - 10:08:41 EDT