Re: Combining diacriticals and Cyrillic

From: Philippe Verdy (
Date: Wed Jul 16 2003 - 04:45:14 EDT

  • Next message: Alex Lam: "Article on Unicode in Globalization Insider"

    On Wednesday, July 16, 2003 8:55 AM, William Overington <> wrote:

    > Peter Constable wrote as follows.
    > > William Overington wrote on 07/15/2003 07:22:22 AM:
    > >
    > > > No, the Private Use Area codes would not be used for interchange,
    > > > only locally for producing an elegant display in such
    > > > applications as chose to use them. Other applications could
    > > > ignore their existence.
    > >
    > > Then why do you persist in public discussion of suggested
    > > codepoints for such purposes? If it is for local, proprietary use
    > > internal to some implementation, then the only one who needs to
    > > know, think or care about these codepoints is the person creating
    > > that implementation.
    > The original enquiry sought advice about how to proceed. I posted
    > some ideas of a possible way to proceed. If the idea of using a
    > eutocode typography file is taken up and software which uses it is
    > produced, then it would be reasonable to have a published list of
    > Private Use Area code points for the precomposed characters which are
    > to be available, as in that way the output stream from the processing
    > could be viewed with a number of fonts from a variety of font makers
    > without needing to change the eutocode typography file if one changed
    > font.
    > I have not published many of my suggested code points in this forum
    > precisely because a few people do not want them published here. For
    > example, there is the ViOS-like system for a three-dimensional visual
    > indexing system for use in interactive broadcasting.
    > > > Publishing a list of Private Use Area code points would
    > >
    > > have absolutely no purpose at all.
    > >
    > >
    > > > mean that such
    > > > display could be produced using a choice of fonts from various
    > > > font makers using the same software
    > >
    > > Now you are talking interchange. Interchange means more than just
    > > person A sends a document to person B. It means that person A's
    > > document works with person B's software using person C's font. (An
    > > alternate term that is often used, interoperate, makes this
    > > clearer.)
    > Exactly. This is why publishing the list of Private Use Area code
    > point assignments for the precomposed characters is a good idea.
    > Person B can display the document and then wonder if it might look
    > better with that font made by person D and have a try with that font.
    > If the list of Private Use Area code point assignments for the
    > precomposed characters has been published and both C and D have used
    > the list to add the extra Cyrillic characters into their fonts, then
    > the published list of Private Use Area code point assignments for the
    > precomposed characters has helped to achieve interoperability.
    > > > I feel that an important thing to remember is the dividing line
    > > > between what is in Unicode and what is in particular advanced
    > > > format font technology solutions
    > >
    > > And best practice for advanced format font technologies eschews PUA
    > > codepoints for glyph processing.
    > Who decides upon what is best practice?
    > > You've been told that several times by
    > > people who have expertise in advanced font technologies, an area in
    > > which you are not deeply knowledgable or experienced, by your own
    > > admission.
    > Well, it is not a matter of an "admission" as if dragged out of me
    > under examination by counsel in a courtroom. I openly stated the
    > limits of my knowledge in that area, not as a retrospective defence
    > yet as an up-front expression of the limitation of my knowledge when
    > putting forward ideas, specifically so as not to produce any
    > incorrect impression as to expertise in that area.
    > > > yet they are not suitable for platforms such as Windows 95 and
    > > > Windows 98, whereas a eutocode typography file approach would be
    > > > suitable for those platforms and for various other platforms.
    > >
    > > Wm, if someone wanted, they could create an advanced font
    > > technology to work on DOS, but why bother? Who's going to create
    > > all the new software that works with that technology, and make it
    > > to work within the limitations of a DOS system?
    > Yet I am not suggesting a system to work on DOS.
    > > Your idea is at best a mental exercise, and even if you or
    > > someone else built an implementation, what is not needed is some
    > > public agreement on PUA codepoints for use in glyph processing.
    > When you say "agreement" I am not suggesting agreement in some formal
    > manner. It is more like the authorship of a story where people may
    > read it or not as they choose. Yet if people do read the story, or
    > watch a television or movie implementation of it, a common culture
    > may come to exist amongst the readers which can be applied in other
    > circumstances.
    > For example, "it's as if on a holodeck and a character says 'arch'
    > and ...." is something which people who have watched Star Trek The
    > Next Generation may use as a cultural way of expressing something.
    > The original enquiry referred as if a number of people are trying to
    > solve the problem. If a list of the characters is published with
    > Private Use Area code points from U+EF00 upwards, then they could
    > all, if they so choose, use that set of code points and it might help
    > in font interoperability, certainly if they choose to implement a
    > eutocode typography file system and maybe in some other
    > implementations. I suggested U+EF00 specifically so that if Vladimir
    > and his colleagues take up my suggestion then the characters will be
    > well placed for compatibility with my suggestions regarding
    > interactive broadcasting.

    For the case of TrueType fonts, this not needed if they are migrated
    to use OpenType table extensions. Each font defines then its glyph
    substitution rules locally, and there's no need of such encoding.

    The eutocode approach would only be usable for font formats that do
    not allow local indexation of glyphs by something else than a single
    code which is used for both the character codepoint and the glyph
    ID. But why not adding a supplementary description file with that
    font to list these substitutions? Why do you want that all fonts
    use the same glyph substitution rules, given that one font may list
    ligatures or alternate forms which will not be valid (or simply not
    needed at all for precomposed letters) with another font style?

    For me PUA are to be used locally and must not be agreed across
    vendors. It must be part of a local software installation and it must
    work with user-defined characters (important for Asian users that
    create their own ideographs, usable in embeddable fonts, so that
    these fonts do not cause interoperability problems).

    Working draft proposals for Unicode use PUA for the demonstration
    fonts embedded in PDF documents. This is still valid as this usage
    is purely local to the document or to the user that defined this
    custom font (often limited because not hinted for all point sizes).

    The usage of PUA is also valid for Web-embedded fonts (the user
    browsing such web pages should be informed if the web-embedding
    format is not supported by its browser, or if it requires a add-on
    component): here also the usage is private for the restricted domains
    in which the embedding format has been defined. The same would
    also be true for embedded fonts in Word documents, where this
    usage of PUA in private fonts is valid. Consider this "interchange"
    of PUA characters as a demonstration for a strictly limited context.

    May be the W3C may think about defining ways to transport also
    private-use normalization rules/tables for these PUA fonts, if
    character processing is needed. I see this as a tailoring of the
    Unicode algorithms, useful in limited contexts where the default
    standard tables are not enough to handle PUA characters
    correctly, due to the absence of accurate character properties
    for these PUA.

    This archive was generated by hypermail 2.1.5 : Wed Jul 16 2003 - 05:36:31 EDT