Re: writing Chinese dialects

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Feb 05 2007 - 11:26:03 CST

  • Next message: Hans Aberg: "Re: New translation posted"

    From: <vunzndi@vfemail.net>
    > Dear Arne,
    >
    > I would certianly welcome help putting the data into standard ids
    > format. The file is exported from a database of mine that uses a
    > format similar to ids ( close enough for a fuzzy search as described
    > below) . I do have a more recent version which I think is too big for
    > the mailing and so I will send it to you seperately . Briefly the
    > ideas are
    > 1. ? and ?? missing or uncertain character/data (similar to
    > the ids_irg.txt where ? usually denotes a missing character)
    > 2. + , - and brackets with obvious usage
    > 3. A+B combinations as opposed to Mr Taichi Kawabata's reverse
    > polish +AB ordering
    > 4. A-B premited where the part/radical is not in unicode

    You have forgotten to speak about:
    * the use of parentheses: A/(B+C)
    * the use of ideograph description characters (ICD) as binary operators:
    ** A surrounds/encloses B
    ** A borders B (on several sides)
    ** A overlaps B (several overlapping positions)

    Why not using the IDC symbols instead of "+" and "/" for horizontal and vertical stacking?

    I note that the use of "-" is quite smart (better than not using it, and displaying a "?" for a missing radical.

    The database however does not clearly define how the composite traits or radicals are altered (notably when A surrounds/encloses or borders B: sometimes A is modified so that it leaves more space for B, for example by changing angles from a diagonal to a vertical or horizontal, or dropping some parts of a trait); when the glyphs are just rescaled to fit the square box, there's probably no need to give this information in the database.

    Such indications would help reducing the number of internal subglyphs really needed in a font to compact its total size: without such glyph transformation, the font would just need to rescale the component glyph box to create the composed ideograph (in fact the same technic can also be used also to reduce a lot the size of a Hangul font, however these composition patterns are more strictly degined in Hangul by the canonical decomposition of syllables into jamos, because each jamo has a single and wellknown horizontal or vertical composition rule, making the use of binary operators like above unnecessary).



    This archive was generated by hypermail 2.1.5 : Mon Feb 05 2007 - 11:28:43 CST