Re: Combining diacriticals and Cyrillic

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Jul 16 2003 - 04:45:14 EDT

Next message: Alex Lam: "Article on Unicode in Globalization Insider"

Previous message: William Overington: "Re: Combining diacriticals and Cyrillic"
In reply to: William Overington: "Re: Combining diacriticals and Cyrillic"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Wednesday, July 16, 2003 8:55 AM, William Overington <WOverington@ngo.globalnet.co.uk> wrote:

> Peter Constable wrote as follows.
>
> > William Overington wrote on 07/15/2003 07:22:22 AM:
> >
> > > No, the Private Use Area codes would not be used for interchange,
> > > only locally for producing an elegant display in such
> > > applications as chose to use them. Other applications could
> > > ignore their existence.
> >
> > Then why do you persist in public discussion of suggested
> > codepoints for such purposes? If it is for local, proprietary use
> > internal to some implementation, then the only one who needs to
> > know, think or care about these codepoints is the person creating
> > that implementation.
>
> The original enquiry sought advice about how to proceed. I posted
> some ideas of a possible way to proceed. If the idea of using a
> eutocode typography file is taken up and software which uses it is
> produced, then it would be reasonable to have a published list of
> Private Use Area code points for the precomposed characters which are
> to be available, as in that way the output stream from the processing
> could be viewed with a number of fonts from a variety of font makers
> without needing to change the eutocode typography file if one changed
> font.
>
> I have not published many of my suggested code points in this forum
> precisely because a few people do not want them published here. For
> example, there is the ViOS-like system for a three-dimensional visual
> indexing system for use in interactive broadcasting.
>
> > > Publishing a list of Private Use Area code points would
> >
> > have absolutely no purpose at all.
> >
> >
> > > mean that such
> > > display could be produced using a choice of fonts from various
> > > font makers using the same software
> >
> > Now you are talking interchange. Interchange means more than just
> > person A sends a document to person B. It means that person A's
> > document works with person B's software using person C's font. (An
> > alternate term that is often used, interoperate, makes this
> > clearer.)
>
> Exactly. This is why publishing the list of Private Use Area code
> point assignments for the precomposed characters is a good idea.
> Person B can display the document and then wonder if it might look
> better with that font made by person D and have a try with that font.
> If the list of Private Use Area code point assignments for the
> precomposed characters has been published and both C and D have used
> the list to add the extra Cyrillic characters into their fonts, then
> the published list of Private Use Area code point assignments for the
> precomposed characters has helped to achieve interoperability.
>
> > > I feel that an important thing to remember is the dividing line
> > > between what is in Unicode and what is in particular advanced
> > > format font technology solutions
> >
> > And best practice for advanced format font technologies eschews PUA
> > codepoints for glyph processing.
>
> Who decides upon what is best practice?
>
> > You've been told that several times by
> > people who have expertise in advanced font technologies, an area in
> > which you are not deeply knowledgable or experienced, by your own
> > admission.
>
> Well, it is not a matter of an "admission" as if dragged out of me
> under examination by counsel in a courtroom. I openly stated the
> limits of my knowledge in that area, not as a retrospective defence
> yet as an up-front expression of the limitation of my knowledge when
> putting forward ideas, specifically so as not to produce any
> incorrect impression as to expertise in that area.
>
> > > yet they are not suitable for platforms such as Windows 95 and
> > > Windows 98, whereas a eutocode typography file approach would be
> > > suitable for those platforms and for various other platforms.
> >
> > Wm, if someone wanted, they could create an advanced font
> > technology to work on DOS, but why bother? Who's going to create
> > all the new software that works with that technology, and make it
> > to work within the limitations of a DOS system?
>
> Yet I am not suggesting a system to work on DOS.
>
> > Your idea is at best a mental exercise, and even if you or
> > someone else built an implementation, what is not needed is some
> > public agreement on PUA codepoints for use in glyph processing.
>
> When you say "agreement" I am not suggesting agreement in some formal
> manner. It is more like the authorship of a story where people may
> read it or not as they choose. Yet if people do read the story, or
> watch a television or movie implementation of it, a common culture
> may come to exist amongst the readers which can be applied in other
> circumstances.
>
> For example, "it's as if on a holodeck and a character says 'arch'
> and ...." is something which people who have watched Star Trek The
> Next Generation may use as a cultural way of expressing something.
>
> The original enquiry referred as if a number of people are trying to
> solve the problem. If a list of the characters is published with
> Private Use Area code points from U+EF00 upwards, then they could
> all, if they so choose, use that set of code points and it might help
> in font interoperability, certainly if they choose to implement a
> eutocode typography file system and maybe in some other
> implementations. I suggested U+EF00 specifically so that if Vladimir
> and his colleagues take up my suggestion then the characters will be
> well placed for compatibility with my suggestions regarding
> interactive broadcasting.

For the case of TrueType fonts, this not needed if they are migrated
to use OpenType table extensions. Each font defines then its glyph
substitution rules locally, and there's no need of such encoding.

The eutocode approach would only be usable for font formats that do
not allow local indexation of glyphs by something else than a single
code which is used for both the character codepoint and the glyph
ID. But why not adding a supplementary description file with that
font to list these substitutions? Why do you want that all fonts
use the same glyph substitution rules, given that one font may list
ligatures or alternate forms which will not be valid (or simply not
needed at all for precomposed letters) with another font style?

For me PUA are to be used locally and must not be agreed across
vendors. It must be part of a local software installation and it must
work with user-defined characters (important for Asian users that
create their own ideographs, usable in embeddable fonts, so that
these fonts do not cause interoperability problems).

Working draft proposals for Unicode use PUA for the demonstration
fonts embedded in PDF documents. This is still valid as this usage
is purely local to the document or to the user that defined this
custom font (often limited because not hinted for all point sizes).

The usage of PUA is also valid for Web-embedded fonts (the user
browsing such web pages should be informed if the web-embedding
format is not supported by its browser, or if it requires a add-on
component): here also the usage is private for the restricted domains
in which the embedding format has been defined. The same would
also be true for embedded fonts in Word documents, where this
usage of PUA in private fonts is valid. Consider this "interchange"
of PUA characters as a demonstration for a strictly limited context.

May be the W3C may think about defining ways to transport also
private-use normalization rules/tables for these PUA fonts, if
character processing is needed. I see this as a tailoring of the
Unicode algorithms, useful in limited contexts where the default
standard tables are not enough to handle PUA characters
correctly, due to the absence of accurate character properties
for these PUA.

Next message: Alex Lam: "Article on Unicode in Globalization Insider"
Previous message: William Overington: "Re: Combining diacriticals and Cyrillic"
In reply to: William Overington: "Re: Combining diacriticals and Cyrillic"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jul 16 2003 - 05:36:31 EDT