Re: FW: A product compatibility question

From: Mark Davis (mark@macchiato.com)
Date: Wed Oct 17 2001 - 10:45:27 EDT


below
—————

Δός μοι ποῦ στῶ, καὶ κινῶ τὴν γῆν — Ἀρχιμήδης
[http://www.macchiato.com]

----- Original Message -----
From: "Gary P. Grosso" <gpg@arbortext.com>
To: "Mark Davis" <mark@macchiato.com>; <unicode@unicode.org>
Sent: Wednesday, October 17, 2001 6:50 AM
Subject: Re: FW: A product compatibility question

> Hi Mark,
>
> You don't say what application you're running when you do this, but
> clearly it's not just an ascii editor. If you save/export a file
> in a unicode format, say UTF-8 (or any other), you would lose these
> font changes you've specified. My point was that once such font
> changes are removed, there is no certain way to reconstruct them,
> or even to unambiguously determine which portions would best be
> displayed in what "ethnic style" of font. (Please forgive "ethnic
> style" and I'd be interested to hear more suitable phrases.)

I still must be missing something. If you drop all of the fonts from a
document, then you don't have them any more. Similarly, if you drop styles
(italic, bold), size (10pt, 12pt), etc. then you don't have them anymore
either -- and those are even harder to reconstruct!

>
> My understanding of the responses was something like:
>
> yes, there can be ambiguities but
>
> - you could do something legible and probably acceptable, though
> not high-quality typography by using a "compromise" or "generic"
> font;
>
> - it would be possible for "smart" software to make some guesses
> about language boundaries. (I remain somewhat skeptical how well
> this would work in general practice.)
>
> - Finally, this would only be of critical importance in a single
> document containing more than one language (in particular both
> Traditional and Simplified Chinese) which is probably rare.

The requirement for language information embedded in plain text data is
often overstated. See Section 5.11, Language Information in Plain Text in
The Unicode Standard, Version 3.0. (this is online in
http://www.unicode.org/unicode/uni2book/ch05.pdf; the index is messed up --
open up "Disclaimer")

If you really had to have language information in plain text, you could use
the language tags in http://www.unicode.org/unicode/reports/tr27. However,
these are strongly discouraged.

>
> Since the company I work for saves documents in SGML or XML rather
> than a proprietary format, this is of potential interest to us.
> I sometimes wonder if XML or some other standard will evolve toward
> some standard use of markup to denote different languages. It is
> also ambiguous to try to intuit boundaries between most Western
> European languages (how about Portuguese versus Spanish) and yet
> they must be hyphenated differently.

XML (and HTML) already give you the capability of marking language. Look at
xml:lang. If you are using XML, you should definitely not use the language
tags. See http://www.unicode.org/unicode/reports/tr20/. (Martin Dürst
(duerst@w3.org) and Asmus Freytag (asmus@unicode.org) are preparing an
update which will be out some day.)

>
> At 09:19 PM 10/16/01 -0700, Mark Davis wrote:
> >I must be misunderstanding the question. If I want different segments of
a
> >document to be in different fonts, I select the text, go to the font
menu,
> >and pick the fonts I want. I don't need to know the language of the text
to
> >do that.
> >
> >Yes, in very specific cases the font might be tuned to have a different
> >display (French vs Polish) for different languages, but that is not the
> >principal mechanism for display. In practical termsn, I would be more
likely
> >to simply pick a font that is tuned for Polish for the text that I wanted
> >displayed in that way.
> >
> >Mark
> >â?"â?"â?"â?"â?"
> >
> >Î"ÏOÏ, μοι Ï?οῦ ÏfÏ"ῶ, καὶ κινῶ Ï"ὴν γá¿?ν â?"
> >á¼^ρÏ?ιμήδηÏ,
> >[http://www.macchiato.com]
> >
> >----- Original Message -----
> >From: "Gary P. Grosso" <gpg@arbortext.com>
> >To: <unicode@unicode.org>
> >Sent: Tuesday, October 09, 2001 2:00 PM
> >Subject: Re: FW: A product compatibility question
> >
> >
> > > I appreciate these responses. I am certainly not an expert in Han
> > > unification. I am trying to reconcile what John says with what
> > > appears at http://www.unicode.org/charts/unihan.html. For example,
> > > there appear to be stylistic differences, at least, in a character
> > > such as:
> > > http://charts.unicode.org/unihan/unihan.acgi$0x4E9E
> > > between fonts designed for different languages.
> > >
> > > Regarding Asmus' contribution, I would assume that such products use
> > > different fonts depending on what "block" the character is from, as
> > > shown, e.g., at:
> > > http://www.unicode.org/Public/3.0-Update/Blocks-3.txt
> > >
> > > Since I don't see any definition at the level of Traditional Chinese
> > > versus Simplified Chinese in the blocks, I don't see how an
> > > application could properly switch fonts in this case. Perhaps
> > > the answer is "it doesn't need to" but I'll admit to being a bit
> > > skeptical on that point. I'm open to being convinced.
> > >
> > > At 03:21 PM 10/9/01 -0400, John Cowan wrote:
> > >
> > > >Gary P. Grosso wrote:
> > > >
> > > >>Because of Unicode's Han unification, I was under the impression
that
> > > >>to get both Traditional Chinese and Simplified Chinese to really
look
> > > >>right would require using different fonts for each.
> > > >
> > > >
> > > >Han unification does *not* unify traditional and simplified
> > > >characters.
> > >
> > > At 01:02 PM 10/9/01 -0700, Asmus Freytag wrote:
> > >
> > > >At 01:43 PM 10/9/01 -0400, Gary P. Grosso wrote:
> > > >>Because of Unicode's Han unification, I was under the impression
that
> > > >>to get both Traditional Chinese and Simplified Chinese to really
look
> > > >>right would require using different fonts for each. To have
different
> > > >>fonts for the same characters in a single document would seem to
> > > >>require use and recognition of language tagging.
> > > >>
> > > >>Am I just showing my ignorance on this subject?
> > > >
> > > >
> > > >If you want to show English and Chinese in the same document, unless
(or
> > > >even) if the English is strictly for Chinese audiences, you will most
> > > >likely want to use different fonts. Standard office automation
suppliers
> > > >like Microsoft have behind the scenes support for that, so that many
> >users
> > > >don't even know that they are actually using a different font for
Latin
> > > >than Han.
> > > >
> > > >>>We are working with a client who is a publisher of Chinese medical
> > > >>>textbooks.
> > > >>>Our goal is to set up a configuration that will allow layout of
> >English,
> > > >>>
> > > >>>Simplified Chinese, and Traditional Chinese characters in a single
> > > >>>document.
> > > >
> > > >
> > >
> > > ---
> > > Gary Grosso
> > > ggrosso@arbortext.com
> > > Arbortext, Inc.
> > > Ann Arbor, MI, USA
> > >
> > >
> > >
> >
>
> ---
> Gary Grosso
> ggrosso@arbortext.com
> Arbortext, Inc.
> Ann Arbor, MI, USA
>
>
>



This archive was generated by hypermail 2.1.2 : Wed Oct 17 2001 - 11:42:47 EDT