RE: Composition of not included Chinese characters

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Sep 25 2007 - 14:04:59 CDT

  • Next message: Erkki I. Kolehmainen: "VS: New Public Review Issue: Proposed Update UTS #18"

    vunzndi@vfemail.net wrote:
    > Envoyé : mardi 25 septembre 2007 07:33
    > À : John H. Jenkins
    > Cc : Unicode Mailing List
    > Objet : Re: Composition of not included Chinese characters
    >
    > Quoting "John H. Jenkins" <jenkins@apple.com>:
    >
    > >
    > > On Sep 24, 2007, at 9:08 PM, Philippe Verdy wrote:
    > >
    > >> One related question: do IDS have to be composed according to
    > >> semantic (i.e. grouping according to the relations between
    > >> components), or according to the ideograph glyph layout?
    > >>
    > >>
    > >
    > > Whichever floats your boat.
    > >
    >
    > ANd also on which boat you want to float, be it generating the glyph,
    > or checking for duplicates.

    Do you mean the kind of duplicates that multiple IDS encodings will generate? Like:
    * <IDS-3radicals-row, radical1, radical2, radical3>
    * <IDS-2radicals-row, IDS-2radicals-row, radical1, radical2, radical3>
    * <IDS-2radicals-row, radical1, IDS-2radicals-row, radical2, radical3>

    If one wants to detect and remove duplicates (when the distinctions above don't have any semanticrequirement), then IDS strings would need to be "canonicalized" using some rules like:
    * multipart IDS should use either the IDS that has the largest number of parts. This means that <IDS-3radicals-row> will be preferred to <IDS-2radicals-row> as much as possible
    * suites of multiparts IDS should be grouped/associated preferably starting by the left side so that more radicals will be moved to the end of the string and IDS characters will be moved to the start.
    * when there's a choice of description between a vertical and horizontal subdivision of a grid-like layout, use a grid radical if available, prior to choosing a vertical subdivision in stacked rows if available and if there are less rows than columns, and otherwise choosing a horizontal subdivision in aligned columns.

    With such rules, not only you will reduce the size of the IDS string, but you'd be able to find equivalent representations.

    Now, IDS strings are still not enough to produce a legible composition of the ideographic square. This string does not specify how radical members are adapted/fitted to look well within the composition square: it's not a simple subdivisions, because it largely depends on the complexity (in strokes) of radicals that make the ideograph, so that simple radicals like 刂 can be narrowed more easily than 長 that would keep a larger width, taking into account a larger weight for adjusting the relative widths.

    Such thing could be easily specified by assigning horizontal and vertical complexity weights to each radical, and by creating rules to compute new weights when the radicals are composed in a IDS; a simple rule would be to add them so that:
    * 刂 would have composition weights (2,1) (a higher horizontal complexity)
    * 長 would have composition weights (4,6) (a higher vertical complexity)
    * if these two are packed into a single ideograph in a row, the result is a ideograph with composition weights (6,6); the left part will a relative width of 2, the right part will have a relative width of 4, meaning that刂 will use one third of the total width. The horizontal complexity is then the sum of the horizontal complexities, and the vertical complexity is just the maximum of the complexities of its two part.

    When radicals (and their strokes) are narrowed or reduced in height, they are not simply adjusted linearly, because the internal angles are changed, and this influences the weight/boldness of the component strokes, that can also move slightly within the compacted radical itself. Some optional details of the strokes will also tend to disappear, notably those at the termination of the strokes, when they are not necessary for the interpretation of the radical using it.
    * For example 月 can be easily narrowed by dropping the small angled termination of the two strokes at it bottom (or these strokes can become vertical).
    * The variable weights of strokes can become more linear, so that parts of strokes that are normally thiner on one side will be bolder (relatively) when the stroke is reduced in width (shorter stroke length) or height (to keep a legible stroke weight). It's difficult to express these transformation of internal strokes (I think that even the native writers will adapt their style depending on their skills to produce legible ideographs, and some glyphic distinctions may or may not be preserved by adapting the strokes in a smarter way). There may exist various stylistic conventions to do that, and it will also depend on the type of pen or brush used to draw the glyph (notably with brushes that can more easily produce variable stroke widths and very thin details)



    This archive was generated by hypermail 2.1.5 : Tue Sep 25 2007 - 14:06:53 CDT