Re: RFC: Addition of U+05BE HEBREW PUNCTUATION MAQAF to Dash category

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Jan 03 2006 - 19:43:30 CST

  • Next message: Philippe Verdy: "Re: Java 5 Strings"

    > I would like the group's opinion on my proposal to add the U+05BE HEBREW
    > PUNCTUATION MAQAF character to the Dash category. MAQAF is a Hebrew
    > character similar to the HYPHEN, both in functionality and form. To give
    > an English approximation of its function, it connects words together,
    > whether to make a term out of two words (e.g. Tel־Aviv, Home־Owner) or
    > to connect words which are joined when both are written in Hebrew (e.g.
    > InHebrew vs. In־ENGLISH, assuming ENGLISH was written in Latin letters).
    >
    > I'm not sure why HEBREW PUNCTUATION MAQAF was introduced into Unicode
    > the first place, as it seems to be equivalent to the HYPHEN.

    Its existence in Windows Code Page 1255 0xCE = U+05BE HEBREW PUNCTUATION
    MAQAF is both necessary and sufficient reason for it to have been
    included separately in Unicode.

    > Perhaps its
    > due to the fact that it appears in traditional Hebrew texts,

    with distinct shape from a Latin hyphen, which is another reason for
    separate encoding. Overunification of punctuation that has consistently
    different appearances in different script contexts is a potential
    problem for rendering and font choice.

    > whereas
    > other modern Hebrew punctuation (such as COMMA and PERIOD) was borrowed
    > from Latin in modern times. In modern Hebrew texts, MAQAF is often
    > substituted by HYPHEN-MINUS or HYPHEN, as there's no MAQAF character
    > on the Hebrew-Israeli keyboards.
    >
    > By adding MAQAF to the Dash category, aside from putting it where it
    > belongs (in my opinion), we'll make the character folding rule of:
    >
    > pD -> HYPHEN-MINUS
    >
    > apply to it.

    Not automatically, although this might be a good idea in general.

    > This would be beneficial, as the Hebrew-Israeli keyboard
    > doesn't have a key for MAQAF and therefore users cannot easily search for it.
    >
    > Comments?

    There are two distinct potential changes you need to address.

    Change #1:

       General Category: gc=Po (current) --> gc=Pd
       Relevant data file: UnicodeData.txt
          
    Change #2:

       Binary property Dash: False (current) --> True
       Relevant data file: PropList.txt
       
    I believe that all gc=Pd characters also have Dash=True, but the
    inverse is not the case. There are Dash characters that have
    gc=Po or gc=Sm. So it would be possible to change the Dash property
    for MAQAF without changing the General Category for it -- and, in
    fact, I suspect that would be a little easier to persuade the UTC
    to do.

    > How do I go about submitting such a proposal?

    At this point, the most straightforward way to submit such a
    proposal is to use the online contact form:

    http://www.unicode.org/reporting.html

    with the category, Public Review Issue, noting that this is
    feedback for the Unicode 5.0.0 beta review:

    http://www.unicode.org/versions/beta.html

    State the issue succinctly and your proposal clearly, and make
    reference to the exact properties I have cited above. That will
    make it a lot easier for the UTC to consider and decide upon
    the issue.

    --Ken



    This archive was generated by hypermail 2.1.5 : Tue Jan 03 2006 - 19:44:30 CST