Re: Greek curled beta in Unicode code chart

From: Peter Kirk (
Date: Sun Jul 03 2005 - 13:02:20 CDT

  • Next message: "Re: Arabic encoding model"

    On 03/07/2005 13:06, David Perry wrote:

    > ...
    >More important, these characters should be used with great caution, if at
    >all. Consider the fact that if a user searches a document for the word
    >"biblion" and types a standard beta at both positions, he won't find the
    >word if it is encoded using the "curly" beta in the middle. ...

    But he or she would find the word correctly if the search is based on
    the Unicode Collation Algorithm - and if the two betas are collated as
    the same at the top level, as they certainly should be (but I haven't
    checked this). So the issue with using a different character is not as
    serious as you suggest.

    The issue is of course very similar to that of final form sigma, which
    is already clearly encoded as a separate character (although it could
    have been encoded as a positional variant), and is in fact sometimes
    used in the middle of a word e.g. at the end of a prefix (I have
    certainly seen this usage in 19th century works). The correct approach
    here is for searches to treat the two forms of sigma as equivalent,
    rather than expect the user to choose the correct form in the search
    box. And the same should happen for the two forms of beta, and of some
    other letters.


    >The best way to get alternate letter shapes is to use advanced font
    >technologies such as AAT or OpenType that allow the display of alternate
    >glyphs without modifying the underlying Unicode values.
    >I know that support for these technologies is still limited, but it is
    >improving and will probably be more widespread with the next release of
    >Windows, which will support OT features at the system level. ...

    An alternative which could be considered is to encode the special
    variant form as a variation sequence. Such sequences are already
    supported by OpenType fonts in an application-independent way. And the
    searching problem disappears if the variation selector is ignored as it
    should be in all searches. But I doubt if the UTC would accept a
    variation sequence for something which is already encoded as a character.

    Peter Kirk (personal) (work)
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.323 / Virus Database: 267.8.8/37 - Release Date: 01/07/2005

    This archive was generated by hypermail 2.1.5 : Sun Jul 03 2005 - 13:07:54 CDT