From: vunzndi@vfemail.net
Date: Mon Feb 05 2007 - 19:04:23 CST
Dear Phillipe,
I know there are definite short comings with the system as it stands,
I particularly mentioned the Extension C part of my data because even
in this form it contains information useful and of interest to others.
The underlying rational is to use +,-,/,( and ) as in mathematics,
which makes it easy for people to visualise, and therefore less likely
to contain errors for the same amount of time and checking for
standard ids, . A minor advantge is one needs no extra input method
for IDC symbols. In version one of the database many of the other IDS
symbols are also replaced by + and /, however in this version of the
data the main reason for leaving them in is to make it obvious that
the ids are not standard.
It would be fair to say that taken doown to just +,-,/,( and ) there
are a few relationships not covered that the IDC symbols make clear,
however in many cases the IDC symbols are almost redundant as the
shape of the parts defines the relatiion ship. As you observe well
the use of - adds something, which standard ids can not do.
In maths the difference between (a+b)/c and a+b/c is not only one of
convention, but also an acknowledgement of the principles of
assocaition and distribution. The solution used is brackets, one of
course could also solved the uncertiany by saying one must always
write (a+b)/c as 'a/c + b/c' . Standard ids addresses this by using
reverse polish order, regarding which some of us remember reverse
polish pocket calculators, which is well suited to machines but is
hard for people, in this case (a+b)/c becomes /+abc, whereas a+b/c
becomes +ab/c . At the end of the day IMHO stanard ids while by
using reverse polish ordering avoids the use of brackets, but makes
extra work for people -- including brackets makes this easy, in fact
writing (a+b)/c and a+(bc) are both clear. High level programming
languages are popular for this reason because they reflect human
thinking and leave the rest to the compiler. When I put lots of
brackets in when writing long if clauses they usually work first time,
when either by choice, or by the constraints of the programming
language brackets are not use to clarify, I know that it means I have
quite a long checking/debugging session to do. As a illutraion the
(a+b/c+d)/(e/f+g) is something I can visualised straight away, even
when the parts are Chinese radicals. If required to produce a lagre
table in reverse polish order, then the way I woould do it is write
the table in the order I know best and then write a script to convert
that table to reverse polish order -- the rules from "mathematical"
ids to standard ids are a little more complicated , but for various
reason I have been considering doing this, one being to allow effcient
searching and not get a false result simply because on made a mistake
with ordering because of using reverse polish ordering, when such
mistakes would be fewer when writing in mathematical order .
John Knightley
PS my congratulations to anyone who can change (a+b/c+d)/(e/f+g)
into reverse polish order in a less than five seconds in their head
Quoting Philippe Verdy <verdy_p@wanadoo.fr>:
> From: <vunzndi@vfemail.net>
>> Dear Arne,
>>
>> I would certianly welcome help putting the data into standard ids
>> format. The file is exported from a database of mine that uses a
>> format similar to ids ( close enough for a fuzzy search as described
>> below) . I do have a more recent version which I think is too big for
>> the mailing and so I will send it to you seperately . Briefly the
>> ideas are
>> 1. ? and ?? missing or uncertain character/data (similar to
>> the ids_irg.txt where ? usually denotes a missing character)
>> 2. + , - and brackets with obvious usage
>> 3. A+B combinations as opposed to Mr Taichi Kawabata's reverse
>> polish +AB ordering
>> 4. A-B premited where the part/radical is not in unicode
>
> You have forgotten to speak about:
> * the use of parentheses: A/(B+C)
> * the use of ideograph description characters (ICD) as binary operators:
> ** A surrounds/encloses B
> ** A borders B (on several sides)
> ** A overlaps B (several overlapping positions)
>
> Why not using the IDC symbols instead of "+" and "/" for horizontal
> and vertical stacking?
>
> I note that the use of "-" is quite smart (better than not using it,
> and displaying a "?" for a missing radical.
>
> The database however does not clearly define how the composite
> traits or radicals are altered (notably when A surrounds/encloses or
> borders B: sometimes A is modified so that it leaves more space for
> B, for example by changing angles from a diagonal to a vertical or
> horizontal, or dropping some parts of a trait); when the glyphs are
> just rescaled to fit the square box, there's probably no need to
> give this information in the database.
>
> Such indications would help reducing the number of internal
> subglyphs really needed in a font to compact its total size: without
> such glyph transformation, the font would just need to rescale the
> component glyph box to create the composed ideograph (in fact the
> same technic can also be used also to reduce a lot the size of a
> Hangul font, however these composition patterns are more strictly
> degined in Hangul by the canonical decomposition of syllables into
> jamos, because each jamo has a single and wellknown horizontal or
> vertical composition rule, making the use of binary operators like
> above unnecessary).
>
>
>
>
>
-------------------------------------------------
This message sent through Virus Free Email
http://www.vfemail.net
This archive was generated by hypermail 2.1.5 : Mon Feb 05 2007 - 19:07:08 CST