Re: Query on Unicode equivalent of iso-pub

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Mon Apr 21 2003 - 07:56:24 EDT

  • Next message: John Cowan: "Re: *Complete* Big5 to Unicode mappings"

    >Hi All,
    >
    >Can anyone please let me know why there is no
    >equivalent Unicode value for the character &fjlig (small f j ligature) of
    iso-pub character set ?
    >
    >Regards,
    >Sourav
    >

    As far as I know this is because there is a desire to avoid using the code
    in stored documents as that would then cause problems with spell checking
    software, with software which searches for sequences of characters and for
    software which places words into dictionary order.

    For example, suppose that the fj ligature were encoded in Unicode at some
    value such as U+FBXY for some values of X and Y where X and Y are each a
    hexadecimal character. I include the U+FB.. part because seven such
    ligatures, for ff, fi, fl, ffi, ffl, long s t and st are included in that
    block, yet their use is discouraged for the reasons that fj is not included.
    Why are some ligatures included yet not others? It appears to be a
    historical legacy matter, some were included then a decision was made not to
    include any more. This matter of ligature characters has recently been
    discussed again, because I had raised the matter and had asked for it to be
    considered, and a decision made by the Unicode Technical Committee.

    http://www.unicode.org/consortium/utc-minutes/UTC-092-200208.html

    For example, normally the sequence fj is U+0066 U+006A. So a word such as
    fjord is encoded as U+0066 U+006A U+006F U+0072 U+0064 using five Unicode
    characters. If U+FBXY were used for an fj ligature, then the word fjord
    would be stored as U+FBXY U+006F U+0072 U+0064 using five Unicode
    characters. So searching for a word like fjord in a document would need a
    search for both formats.

    Suppose, however, hypothetically, that one is transcribing an old printed
    book into a computer system and in some places individual f and j type sorts
    have been used and in other places an fj ligature has been used and one
    wishes to preserve the information about how the original book was printed
    in the computer transcription. I have no knowledge as to whether such a
    book with a mixed way of printing fj exists, I am simply suggesting a
    scenario for a thought experiment about encoding.

    In that circumstance one could use U+0066 U+200D U+006A to explicitly encode
    an fj ligature. The U+200D character is the ZERO WIDTH JOINER.

    One then has issues with displaying such text correctly. One could either
    use a platform which supports an advanced format font type such as OpenType
    together with a font which recognizes U+0066 U+200D U+006A as an fj ligature
    yet does not process U+0066 U+006A into becoming an fj ligature or one could
    preprocess the incoming text stream from the filing system so that a
    sequence such as U+0066 U+200D U+006A is converted, for purposes of font
    access only, into a Private Use Area character which accesses a glyph for an
    fj ligature. For example, the eutocode typography file format mentioned in
    the following web page could be used for such a purpose.

    http://www.users.globalnet.co.uk/~ngo/ast03300.htm

    One would need a font which supports such a Private Use Area character.
    However, one would not be obliged to use one of those computer systems which
    can support advanced format fonts.

    Suppose, however, in a different scenario, that one is not wishing to
    produce an archive or a scholarly recording of an old printed book, but
    simply wishing to produce a poster for an entertainment called "Music of the
    fjords", where a band from Norway is to play their music, where one simply
    wishes to key characters and produce a printed poster which looks stylish.

    Supposing that one has a desktop publishing package which can accept Unicode
    characters, a Unicode compatible way to achieve the result would be to use a
    Private use Area character for the fj ligature, together with a font which
    implements that ligature within the Private Use Area.

    I know that the Code2000 font produced by James Kass has an fj ligature
    encoded within the Private Use Area. There may be other fonts which include
    an fj ligature encoded within the Private Use Area. There may also be
    non-Unicode fonts which include an fj ligature encoded somewhere between
    hexadecimal 00 and hexadecimal FF, probably somewhere in the range
    hexadecimal 80 to hexadecimal FF.

    Some time ago I produced a collection of Private Use Area encodings for
    ligatures, introduced and indexed from the following web page. I called
    this the golden ligatures collection.

    http://www.users.globalnet.co.uk/~ngo/golden.htm

    The following web page has an fj ligature at U+E70B.

    http://www.users.globalnet.co.uk/~ngo/ligature.htm

    The following web page has an ffj ligature at U+E773.

    http://www.users.globalnet.co.uk/~ngo/ligatur2.htm

    I emphasise that the use of these particular code points for these ligature
    characters is not part of the Unicode Standard and that the use is not an
    exclusive use of those code points. However, the collection is a consistent
    set and if people making fonts which include an fj ligature glyph choose to
    have that ligature glyph accessible as character U+E70B then that is a
    choice which is open to them to make, though they are perfectly entitled to
    make a different choice if they so choose.

    Such a choice need not necessarily only be made in relation to fonts where
    the intended use is for an fj ligature glyph to be accessed directly as a
    Private Use Area character. It may be that someone is producing an advanced
    format font, such as an OpenType font and including an fj ligature glyph
    within it, for access using a character sequence such as U+0066 U+006A or
    U+0066 U+200D U+006A and could then, if he or she so chooses, then, in
    addition, as a secondary matter, also map the glyph to a Private Use Area
    character, so that people with equipment which cannot access the glyph using
    a sequence of characters can nonetheless display it using a Private Use Area
    code point.

    The code U+200C ZERO WIDTH NON-JOINER is mentioned for completeness, as it
    can be used to force a situation that a ligature should not be used.

    On the matter of whether an OpenType font would process a sequence to
    produce an fj ligature, there was some discussion in this group a while ago
    about the possibility of a convention as to how an OpenType font might or
    might not process a sequence of characters into a ligature glyph, if the
    font actually contained the ligature glyph within it, depending upon the
    wishes of the author of the document which is to be displayed.

    John Hudson made a specific suggestion in the thread "Proposal: Ligatures w/
    ZWJ in OpenType" of Saturday 6 July 2002. It would be interesting to know
    whether Mr Hudson's suggestion has been widely taken up. The suggestion is
    a comprehensive opportunity to allow a document to specify any one of the
    following to an OpenType font, for each potential ligature occurrence.

    1. Use a ligature.

    2. Do not use a ligature.

    3. Use a ligature at your discretion.

    Mr Hudson uses fj and ffj as two of the examples within his document.

    It would be interesting to know whether Mr Hudson's suggestion has been
    widely taken up.

    My own font making adventures do not extend to OpenType at this time, I am
    using the Softy shareware program to get started in font making.

    Some of my initial fonts are available at the following web page.

    http://www.users.globalnet.co.uk/~ngo/font7001.htm

    Those are all small specialist fonts. My preparing of a Unicode compatible
    letters font, Quest text, is proceeding and I have now produced all of the
    26 uppercase and the 26 lowercase characters, digits and some punctuation
    characters so that the font displays without any "not defined" glyphs
    appearing on the page when the basic PC font viewer is used to display a
    font synopsis. In addition I have produced AE, thorn and eth in both
    uppercase and lowercase and a lowercase long s and a character for U+FFFD.
    I am hoping to add the yogh character, U+021C and U+021D for uppercase and
    lowercase, when I learn more about it as to how it should line up both in
    uppercase and lowercase with respect to other uppercase and lowercase
    letters. Quest text is intended to be a font which has some of the rarer
    characters which are often not supported in Unicode compatible fonts other
    than in general full Unicode fonts, such as characters for Old English and
    for Esperanto. I am hoping to add Private Use Area characters for various
    ligatures, including the fj and ffj ligatures, mapped as in the golden
    ligatures collection above. Quest text is thus a font intended for niche
    uses, though with the flexibility to have a few extra characters added
    whenever the need arises. However, it could be used as a display font for
    ordinary English text as it has a distinctive look which might be quite
    useful in some circumstances, such as providing signs which combine style
    with clarity. I am hoping to be able to learn how to make OpenType fonts in
    due course.

    William Overington

    21 April 2003



    This archive was generated by hypermail 2.1.5 : Mon Apr 21 2003 - 08:39:21 EDT