Re: writing Chinese dialects

Date: Mon Feb 05 2007 - 19:04:23 CST

  • Next message: Philippe Verdy: "Re: writing Chinese dialects"

    Dear Phillipe,

    I know there are definite short comings with the system as it stands,
    I particularly mentioned the Extension C part of my data because even
    in this form it contains information useful and of interest to others.

    The underlying rational is to use +,-,/,( and ) as in mathematics,
    which makes it easy for people to visualise, and therefore less likely
    to contain errors for the same amount of time and checking for
    standard ids, . A minor advantge is one needs no extra input method
    for IDC symbols. In version one of the database many of the other IDS
    symbols are also replaced by + and /, however in this version of the
    data the main reason for leaving them in is to make it obvious that
    the ids are not standard.

    It would be fair to say that taken doown to just +,-,/,( and ) there
    are a few relationships not covered that the IDC symbols make clear,
    however in many cases the IDC symbols are almost redundant as the
    shape of the parts defines the relatiion ship. As you observe well
    the use of - adds something, which standard ids can not do.

    In maths the difference between (a+b)/c and a+b/c is not only one of
    convention, but also an acknowledgement of the principles of
    assocaition and distribution. The solution used is brackets, one of
    course could also solved the uncertiany by saying one must always
    write (a+b)/c as 'a/c + b/c' . Standard ids addresses this by using
    reverse polish order, regarding which some of us remember reverse
    polish pocket calculators, which is well suited to machines but is
    hard for people, in this case (a+b)/c becomes /+abc, whereas a+b/c
    becomes +ab/c . At the end of the day IMHO stanard ids while by
    using reverse polish ordering avoids the use of brackets, but makes
    extra work for people -- including brackets makes this easy, in fact
    writing (a+b)/c and a+(bc) are both clear. High level programming
    languages are popular for this reason because they reflect human
    thinking and leave the rest to the compiler. When I put lots of
    brackets in when writing long if clauses they usually work first time,
      when either by choice, or by the constraints of the programming
    language brackets are not use to clarify, I know that it means I have
    quite a long checking/debugging session to do. As a illutraion the
    (a+b/c+d)/(e/f+g) is something I can visualised straight away, even
    when the parts are Chinese radicals. If required to produce a lagre
    table in reverse polish order, then the way I woould do it is write
    the table in the order I know best and then write a script to convert
    that table to reverse polish order -- the rules from "mathematical"
    ids to standard ids are a little more complicated , but for various
    reason I have been considering doing this, one being to allow effcient
    searching and not get a false result simply because on made a mistake
    with ordering because of using reverse polish ordering, when such
    mistakes would be fewer when writing in mathematical order .

    John Knightley

    PS my congratulations to anyone who can change (a+b/c+d)/(e/f+g)
    into reverse polish order in a less than five seconds in their head

    Quoting Philippe Verdy <>:

    > From: <>
    >> Dear Arne,
    >> I would certianly welcome help putting the data into standard ids
    >> format. The file is exported from a database of mine that uses a
    >> format similar to ids ( close enough for a fuzzy search as described
    >> below) . I do have a more recent version which I think is too big for
    >> the mailing and so I will send it to you seperately . Briefly the
    >> ideas are
    >> 1. ? and ?? missing or uncertain character/data (similar to
    >> the ids_irg.txt where ? usually denotes a missing character)
    >> 2. + , - and brackets with obvious usage
    >> 3. A+B combinations as opposed to Mr Taichi Kawabata's reverse
    >> polish +AB ordering
    >> 4. A-B premited where the part/radical is not in unicode
    > You have forgotten to speak about:
    > * the use of parentheses: A/(B+C)
    > * the use of ideograph description characters (ICD) as binary operators:
    > ** A surrounds/encloses B
    > ** A borders B (on several sides)
    > ** A overlaps B (several overlapping positions)
    > Why not using the IDC symbols instead of "+" and "/" for horizontal
    > and vertical stacking?
    > I note that the use of "-" is quite smart (better than not using it,
    > and displaying a "?" for a missing radical.
    > The database however does not clearly define how the composite
    > traits or radicals are altered (notably when A surrounds/encloses or
    > borders B: sometimes A is modified so that it leaves more space for
    > B, for example by changing angles from a diagonal to a vertical or
    > horizontal, or dropping some parts of a trait); when the glyphs are
    > just rescaled to fit the square box, there's probably no need to
    > give this information in the database.
    > Such indications would help reducing the number of internal
    > subglyphs really needed in a font to compact its total size: without
    > such glyph transformation, the font would just need to rescale the
    > component glyph box to create the composed ideograph (in fact the
    > same technic can also be used also to reduce a lot the size of a
    > Hangul font, however these composition patterns are more strictly
    > degined in Hangul by the canonical decomposition of syllables into
    > jamos, because each jamo has a single and wellknown horizontal or
    > vertical composition rule, making the use of binary operators like
    > above unnecessary).

    This message sent through Virus Free Email

    This archive was generated by hypermail 2.1.5 : Mon Feb 05 2007 - 19:07:08 CST