Re: New Public Review Issue posted

From: Mark Davis (
Date: Tue May 25 2004 - 19:10:31 CDT

  • Next message: Peter Kirk: "Re: Response to Everson Phoenician and why June 7?"

    I don't think the "fold to base" is as useful as some other information. For
    those characters with a canonical decomposition, the decomposition carries more
    more information, since you can combine it with a "remove combining marks"
    folding to get the folding to base.

    For my part, what would be more interesting would be a "full" decomposition of
    the characters that don't have a canonical decomposition, e.g.


    BTW, I had posted some commentary on TR30, which I will repeat here.

    ... I found these files almost
    impossible to assess in code point form, so I ran them through a quick ICU
    transform to add comments with the real characters and names. I also NFC'd the
    forms, just for consistency. These files generated from Asmus's are in

    I had suggest posting them in this form for public review of the TR, since
    others will have the same difficulty in assessing the quality of the data.

    Here are some quick comments.

    Adding digraph expansions seems quite odd.

    When in NFC, whole batches of these mappings are NOPs. Don't know why they are
    there; they are also not consistent in the use of composed vs. decomposed forms.

    This file combines half-width katakana folding. I think it is much more useful
    if that is separated out. Someone can apply a sequence of two transforms if they
    want both.

    This feels like a real potpourri of stuff. Why superscripts and not subscripts?
    Why annotation characters? Why modifier letters -- those are not really
    superscripts. Waw?

    This file would be MUCH more useful if in two separate files.

    Full-width to half-width
    Half-width to full-width

    Again, remove the NFC mappings.


    These don't appear to be a width issue.

    Note that I have not checked these new data tables for completeness; these were
    just some quick observations.

    ► शिष्यादिच्छेत्पराजयम् ◄

    ----- Original Message -----
    From: <>
    To: <>
    Sent: Tue, 2004 May 25 14:57
    Subject: Re: New Public Review Issue posted

    > Rick McGowan scripsit:
    > > The Unicode Technical Committee has posted a new issue for public
    > > review and comment. Details are on the following web page:
    > >
    > >
    > I have prepared a draft DiacriticFolding.txt file for this issue; it is
    > temporarily available at .
    > This was prepared by looking for lines in UnicodeData that matched
    > the regex '(GREEK|LATIN|CYRILLIC|HEBREW).*WITH'. (I added Hebrew to the
    > set of scripts specified by the current draft of #30.)
    > Characters with decompositions were mapped into the base character of the
    > decomposition; characters without decompositions were mapped by name.
    > The file contains
    > a list of 32 characters matching the pattern which did not seem to me
    > to be suitable for diacritic folding.
    > I have posted a short version of this note to the Unicode comment form.
    > Comments?
    > --
    > A rabbi whose congregation doesn't want John Cowan
    > to drive him out of town isn't a rabbi,
    > and a rabbi who lets them do it
    > isn't a man. --Jewish saying

    This archive was generated by hypermail 2.1.5 : Tue May 25 2004 - 19:12:20 CDT