Re: Phetsarat font, Lao unicode

From: Brian Wilson (bountonw@gmail.com)
Date: Wed Jul 11 2007 - 11:01:50 CDT

  • Next message: Philippe Verdy: "RE: Phetsarat font, Lao unicode"

    I have just been to Laos last week and purchased the two latest
    dictionaries. I also have seen elementary school primers. These all list
    the consonants and vowels separately as Thai. In Thai, the convention is to
    use a hyphen type symbol as the base character. In Lao, it is to use an x
    type symbol.

    I do not see the point in opening up the possibilities for expanding an
    infinite number of base character possibilities. As Thai and Lao are close
    cousins, I would go for over kill and allow vowels in both languages to
    attach to either an "-" or an "x" base character.

    Brian

    On 7/11/07, Philippe Verdy <verdy_p@wanadoo.fr> wrote:
    >
    > James Kass wrote:
    > > Envoyé: mardi 10 juillet 2007 05:16
    > > À: unicode@unicode.org
    > > Objet: RE: Phetsarat font, Lao unicode
    > >
    > >
    > > Philippe Verdy wrote,
    > >
    > > > One problem is that fonts (at least with TrueType/OpenType) are not
    > > designed
    > > > to support reordering and positioning with an unbound number of base
    > > > characters.
    > >
    > > Font engines handle reordering.
    >
    > Not completely, and not always. Some fonts do have to use GSUB for local
    > reordering according to style rather than just the script properties.
    >
    > > > For example the GSUB/GPOS tables in TrueType require listing
    > > > somewhere the complete list of codepoints where such reordering and
    > > > positioning may be applied, ...
    > >
    > > A listing of glyph IDs is stored in the font. Fonts only store
    > > codepoints in the "cmap" table. The listing of glyph IDs may
    > > be a complete list of every glyph ID involved, or it may be
    > > done using ranges in order to minimize table size.
    > > > ... something that can't be performed in fonts with
    > > > the current format, because they don't allow defining character
    > classes
    > > > in them,
    > >
    > > The OpenType GDEF table format requires assignment of
    > > glyphs to various character classes. These classes are neither
    > > user- nor developer-definable, though. Unicode also assigns
    > > character classes, but only to characters. Complex script
    > > fonts generally have scads of "presentation form" glyphs
    > > which aren't characters in the Unicode sense.
    >
    > I said "somewhere". You misunderstand what I mean here. I was speaking
    > about
    > the possibility of creating a group of code points (even if they are
    > remapped internally to glyph ids within a "cmap" table or other tables)
    > and
    > assigning them with a single identifier that can be used in GSUB/GPOS
    > rules
    > tables; without it, you'll have to create asmany rules as there are in the
    > product of possible base characters in one class, and of possible
    > combining
    > vowel signs in another class. As there may exist lots of candidate base
    > characters to which such combination will be needed, this will rapidly
    > exhaust the maximum size allowed for such GSUB/GPOS tables.
    > Creating GSUB/GPOS tables so that their selector can include a
    > pseudo-glyph
    > id mapped to a class of codepoints wouldsimplify the design a lot for
    > fonts
    > that need to contain lots of characters (possibly from several or many
    > scripts);
    >
    > > > ... and assigning them pseudo-glyph IDs that can be used in GSUB
    > tables.
    > >
    > > Pseudo-glyph ID might be a misleading phrase. A Glyph ID is
    > > simply the number of the position of a glyph's data in a font.
    > > The first glyph, contrary to conventional counting methods,
    > > is given the glyph ID of zero. And so forth.
    >
    > It was not misleading. I really intended a special id that can be used to
    > designate a class of glyphs (mapped from a class of characters) as if it
    > was
    > a single glyph id, to create a single composition rule, instead of having
    > one composition rule per result of the product of the two classes. It
    > would
    > certainly be more useful in GPOS than in GSUB.
    >
    > > > ... the renderer
    > > > for example could be looking for rules based on the dotted circle
    > > symbol,
    > > > and automatically infer the other applicable rules for other Common
    > > symbols,
    > >
    > > Does this assume that the dotted circle is part of the encoded text?
    >
    > Yes, for the intended purpose of showing the diacritic isolately, with an
    > arbitrary base symbol.
    >
    > > It normally isn't, it's inserted (to the display only) by (at least) one
    > > popular font engine.
    >
    > Not. A renderer should not have to do this unless explicitly instructed to
    > do so, or if there's no other way to display the diacritic in combination
    > with a associatable base character.
    >
    > But even in the case of, for example, a combining cedilla occurring after
    > a
    > base Hebrew letter, for which it is very unlikely that a font would
    > implement a composition rule, and for which the renderer will be of no
    > help,
    > displaying the uncomposable combining cedilla with a dotted circle is not
    > the ultimate solution. Many renderers will instead attempt to use some
    > default reasonable positioning, for example by centering the diacritic
    > horizontally with the center of the base letter (the renderer will
    > probably
    > not be able to move the cedilla vertically, or fond a more appropriate
    > place, given that it would depend on the exact style of each base glyph,
    > which does not necessarily specify attachment points for general Latin
    > diacritics)
    >
    > > Regardless, other symbols will most always
    > > have completely different metrics. It's unlikely that a font engine
    > > will calculate the different heights, advance widths, and so forth,
    > > in order to approximate a correct placement of the combining
    > > character glyph. It's probably equally unlikely that a font developer
    > > will add a potentially infinite number of GPOS rules to a font's tables
    > > in order to accomplish this with every conceivable arbitrary base
    > > character glyph.
    >
    > For notational purpose, this is still what renderers are doing when
    > positioning diacritics with a dotted circle, given that fonts themselves
    > are
    > not specifying such advanced positioning (or substitution for resizing)
    >
    > What is the difference between positioning a diacritic (like a Lao vowel)
    > with a base dotted circle, and positioning the same diaciitic with another
    > base symbol like a cross (or something else like a circle, square, dotted
    > square, crossed hatch, horizontal stroke, or checkers grid)? I've seen
    > various symbols used to denote the absence of a specific base letter in
    > Latin-written texts. Why would not this exist too for Lao?
    >
    > Is the proposer of the x-like cross sure that this convention is not
    > arbitrary and specific to some authors? What is clear is that the chosen
    > symbol should not be confusable with another existing letter (that's why
    > choosing a simple circle or a cross was not appropriate as the base symbol
    > for denoting the position of a base Latin or Greek or Cyrilic letter).
    >
    > But could the Unicode convention of using a dotted circle for such
    > notational use the best option for all scripts? Isn't there a script where
    > a
    > dotted circle character gets another semantic than just a pure symbolic
    > graphical feature, so that the conventional dotted circle could become
    > confusable in that script? I have not seen something in Unicode that says
    > that using a dotted circle for this case is normative, and this is a good
    > reason for not implementing this feature within fonts, but only in
    > Renderers
    > that have better knowledge of the context of use, to see if it really
    > needs
    > to display that symbol, and which symbolic glyph will be the most
    > appropriate.
    >
    >
    >
    >
    >
    >

    -- 
    Brian Wilson, Director
    Mission College Translation Center
    P.O. Box 4
    Muaklek, Saraburi 18180
    THAILAND
    Tel: 66-36-344-777 ext 1221
    Mobile: 66-86-921-0108
    Fax:  66-36-341-629
    


    This archive was generated by hypermail 2.1.5 : Wed Jul 11 2007 - 11:04:12 CDT