PUA glyphs in fonts (was: Is it true that Unicode is insufficient for Oriental languages?)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat May 24 2003 - 09:23:53 EDT

  • Next message: Pim Blokland: "Re: Dutch IJ, again"

    With much respect to your desire to offer designs for ligatures in TrueType fonts, I think you are going to the wrong direction: ligatures are now better handled by using composition rules directly from standard Unicode codepoints, using more recent technology such as OpenType, which allows mapping one or more glyph variants to sequences of one or more Unicode codepoints.

    The key feature of OpenType is that ligatures can be designed in a font without impacting the handling of the abstract text encoding. So some ligatures were introduced only to support roundtrip compatibility with legacy encodings based on older technologies.

    With OpenType, you can safely represent all the ligatures needed to transcript the Brahmic scripts, and you can even transcript a lot of Han ideographic characters by combining glyphs for strokes, or create good looking layouts for many combinations of a base character and one or more diacritics.

    Of course this does not apply to symbols for which you designed some glyphs, but this has nothing in common with information interchange needs, where text must be transmitted and used independantly of its exact final rendering.

    You are not reasoning in terms of abstract characters (i.e. you don't define semantics, only glyphs for which no codepoint standard is needed as the fonts you create are mostly for artistic purpose, and lack another graphic/artistic feature that you cannot encode in a font: color information, where color has its own processable semantic but is only a concrete representation of what abstract characters represent in Unicode and is very different from the artistic design you want to implement in a font).

    I don't think that font technology is what you should use, there are better alternatives to represent graphic designs, notably bitmap or vectorial image formats (such as Windows Meta Files, PostScript instructions, or other similar formats) which are more easy to integrate and reuse than fonts with custom assignments and no color information.

    The time to integrate a vectorial format in a font will become useful only when there's a need to represent astract text (in such case one can easily reuse the glyphs to produce code charts, even if no font currently integrates it). The reverse should not be used (so don't assign codepoints for what is really a sequence of unrelated glyphs with no other semantic than their graphic design). PUA has been created for other purpose: as a way to perform internal conversion steps from an abstract information to a final rendering or final treatment of the encoded information (where some partial transcription is needed that will mix intermediate codes with original/unmodified codes), but certainly not for abstract information interchange.

    Unicode will standardize only abstract characters as semantic entities, and will exhibit only some (non mandatory) recommendations regarding its concrete rendering but only as a guide (or correct example) to avoid creating confusion on the semantic of the rendered text.

    With Unicode, the graphic or artistic design is NOT encoded (I hope it will never be the case, or it would break the whole edifice for the separation between (1) modern font designs, layout renderers and styles, and (2) text as a collection of abstract elements needed to represent semantics, in a way that can be reused with the same semantics independantly of its initial creation context and independantly of technical rendering constraints where style is not relevant or not available).

    -- Philippe.
    ----- Original Message -----
    From: "William Overington" <WOverington@ngo.globalnet.co.uk>
    To: <unicode@unicode.org>
    Cc: <archive@ngo.globalnet.co.uk>
    Sent: Saturday, May 24, 2003 12:53 PM
    Subject: Re: Is it true that Unicode is insufficient for Oriental languages?

    > Michael Kaplan wrote as follows.
    > quote
    > I'll take some of that action, too. Not since W.O. have we had someone
    > around who has been so insistent that Unicode is missing the requirements of
    > its users, without really understanding what The "Unicode way" is....
    > end quote
    > Well, as I remember it, when I put forward some of my ideas in this forum, a
    > well-known and much-respected linguist referred me to an ISO document about
    > characters which document defined character as in an appendix of the same
    > document and that I then found that my ideas were entirely in accordance
    > with that definition.
    > What is the Unicode way?
    > Is it a road? Is there an intersection with Antitrust Avenue and is the
    > barrelhead on the pavement near "The Restriction of Progress"?
    > However, as the matter of matrices has been raised and I have been mentioned
    > by Mr Kaplan, then I feel that it is only fair that I should be allowed the
    > opportunity to mention a Private Use Area solution which I devised some time
    > ago and published on the web.
    > In relation to matrices, are codes similar to the following what are needed?
    > U+E2F6, U+E2F7, U+E2F8 as defined in the following document.
    > http://www.users.globalnet.co.uk/~ngo/ast07101.htm
    > These are my own definitions in the Private Use Area.
    > The following pages might also be of interest.
    > The first two are about eutocode graphics. The second needs a Java enabled
    > browser.
    > http://www.users.globalnet.co.uk/~ngo/ast03000.htm
    > http://www.users.globalnet.co.uk/~ngo/eutocodegraphics.htm
    > http://www.users.globalnet.co.uk/~ngo/ast00000.htm
    > http://www.users.globalnet.co.uk/~ngo
    > Yesterday afternoon I updated the version of the Quest text font which is
    > available at the following web page.
    > http://www.users.globalnet.co.uk/~ngo/font7007.htm
    > This version includes visible glyphs for 28 code points in the Private Use
    > Area to do with expressing a sequential multimedia display in a Unicode
    > plain text file, such as when using the Microsoft WordPad program to author
    > a Unicode text file to customize a Java program which will produce a display
    > on an interactive television. That is, the code points display as a symbol
    > on the screen of a PC during content authorship yet would be acted upon to
    > change colour, wait for a button push and so on, when acted upon by a Java
    > program written for the purpose. I am hoping that a Java program suitable
    > for broadcasting will become available.
    > I am hoping to add some further symbols for a more advanced multimedia
    > system which has a programmed learning capability. However, the four Object
    > Replacement Character Synonym glyphs are all available in the present 1.04
    > version of the Quest text font.
    > The place to discuss these matters is the Digital Television Interactive
    > Broadcasting forum which is run from the http://www.cenelec.org webspace.
    > These codes will hopefully open up content authorship for the digital
    > interactive broadcasting platform so that people may prepare content in
    > Unicode plain text files on PCs and send them in so that they can be
    > broadcast and used to customize a generic software package which produces a
    > multimedia display by interpreting the text and the Private Use Area codes
    > in the Unicode plain text customizing file.
    > The Private Use Area is entirely suitable for the needs of my research and
    > for its application. Certainly, if the Unicode Consortium decides to encode
    > such features into regular Unicode then it has the opportunity to do so and
    > such items could be codified. However, if that is not what the Unicode
    > Consortium decides to do, then that is a matter for it, the Private Use Area
    > is available and much can be achieved by using it. The possibility of
    > publishing the code points for the eutocode system in a book exists and
    > indeed if the Unicode Consortium really wishes to decline the chance to
    > include such codes with the emphasis of the people who are keen to accept a
    > bet, then the opportunity for publishing a private specification for such
    > features is available. My view is that it is better to just get something
    > going using the Private Use Area and not to spend time on trying to persuade
    > the Unicode Consortium to consider such technology.
    > In relation to the Private Use Area, it was said that there would be a
    > change in the wording of the Unicode Standard and I am looking forward to
    > finding out exactly what the new wording is to be!
    > William Overington
    > 24 May 2003

    This archive was generated by hypermail 2.1.5 : Sat May 24 2003 - 10:03:27 EDT