Re: New Public Review Issue posted

From: Mark Davis (mark.davis@jtcsv.com)
Date: Tue May 25 2004 - 19:10:31 CDT

  • Next message: Peter Kirk: "Re: Response to Everson Phoenician and why June 7?"

    I don't think the "fold to base" is as useful as some other information. For
    those characters with a canonical decomposition, the decomposition carries more
    more information, since you can combine it with a "remove combining marks"
    folding to get the folding to base.

    For my part, what would be more interesting would be a "full" decomposition of
    the characters that don't have a canonical decomposition, e.g.

    LATIN CAPITAL LETTER O WITH STROKE => O + /

    BTW, I had posted some commentary on TR30, which I will repeat here.

    ... I found these files almost
    impossible to assess in code point form, so I ran them through a quick ICU
    transform to add comments with the real characters and names. I also NFC'd the
    forms, just for consistency. These files generated from Asmus's are in
    http://www.macchiato.com/utc/tr30/.

    I had suggest posting them in this form for public review of the TR, since
    others will have the same difficulty in assessing the quality of the data.

    Here are some quick comments.

    http://www.macchiato.com/utc/tr30/HiraganaFolding-new.txt

    Adding digraph expansions seems quite odd.

    http://www.macchiato.com/utc/tr30/KatakanaFolding-new.txt

    When in NFC, whole batches of these mappings are NOPs. Don't know why they are
    there; they are also not consistent in the use of composed vs. decomposed forms.

    This file combines half-width katakana folding. I think it is much more useful
    if that is separated out. Someone can apply a sequence of two transforms if they
    want both.

    http://www.macchiato.com/utc/tr30/SuperscriptFolding-new.txt

    This feels like a real potpourri of stuff. Why superscripts and not subscripts?
    Why annotation characters? Why modifier letters -- those are not really
    superscripts. Waw?

    http://www.macchiato.com/utc/tr30/WidthFolding-new.txt

    This file would be MUCH more useful if in two separate files.

    Full-width to half-width
    Half-width to full-width

    Again, remove the NFC mappings.

    27E6; 301A # ⟦ → 〚 MATHEMATICAL LEFT WHITE SQUARE BRACKET → LEFT WHITE SQUARE
    BRACKET

    These don't appear to be a width issue.

    Note that I have not checked these new data tables for completeness; these were
    just some quick observations.

    Mark
    __________________________________
    http://www.macchiato.com
    ► शिष्यादिच्छेत्पराजयम् ◄

    ----- Original Message -----
    From: <jcowan@reutershealth.com>
    To: <unicode@unicode.org>
    Sent: Tue, 2004 May 25 14:57
    Subject: Re: New Public Review Issue posted

    > Rick McGowan scripsit:
    > > The Unicode Technical Committee has posted a new issue for public
    > > review and comment. Details are on the following web page:
    > >
    > > http://www.unicode.org/review/
    >
    > I have prepared a draft DiacriticFolding.txt file for this issue; it is
    > temporarily available at http://www.ccil.org/~cowan/DiacriticFolding.txt .
    > This was prepared by looking for lines in UnicodeData that matched
    > the regex '(GREEK|LATIN|CYRILLIC|HEBREW).*WITH'. (I added Hebrew to the
    > set of scripts specified by the current draft of #30.)
    >
    > Characters with decompositions were mapped into the base character of the
    > decomposition; characters without decompositions were mapped by name.
    > The file http://www.ccil.org/~cowan/DiacriticFoldingExceptions.txt contains
    > a list of 32 characters matching the pattern which did not seem to me
    > to be suitable for diacritic folding.
    >
    > I have posted a short version of this note to the Unicode comment form.
    >
    > Comments?
    >
    > --
    > A rabbi whose congregation doesn't want John Cowan
    > to drive him out of town isn't a rabbi, http://www.ccil.org/~cowan
    > and a rabbi who lets them do it jcowan@reutershealth.com
    > isn't a man. --Jewish saying http://www.reutershealth.com
    >
    >



    This archive was generated by hypermail 2.1.5 : Tue May 25 2004 - 19:12:20 CDT