Re: Encoding alternate character sets in tEXt/zTXt strings

From: Chris Lilley (chris@w3.org)
Date: Thu Mar 19 1998 - 08:21:26 EST

Next message: Keld J|rn Simonsen: "Re: Greek accents: sorting order - opinions sought"
Previous message: Adrian Havill: "Last message regarding libpng"
In reply to: Adrian Havill: "Re: Encoding alternate character sets in tEXt/zTXt strings"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Adrian Havill wrote:

> > 0: UTF-7
> > 1: UTF-8
>
> This is redundant. If libpng can handle a 8-bit NULL terminated string,

it can, Latin-1 is exactly that

> it should stick with UTF-8.

Agreed

> > language code (see RFC 1766)
>
> You'll run into trouble with this... The text could be in multiple languages.

Yes, true, but it is still needed. Monolingual text is still by far the
most common; to do more, structured markup is needed and I would not
propose adding that!.

> The language code and such is really only appropriate for cultural
> rendering and sorting and it's correct implementation very complicated.

Not at all. It is just information; the impleentation can do as much or
as little as it needs to. For example, far eastern implementations will
rely on the language code to differentiate Japanese, Chinese (simplified
and traditional) and Korean ideographs - these all have the same
character codes.

Notice that search engines like AltaVista now offer the choice of
searching restricted to a particular language. It is useful to indicate
the language.

> Best to leave it as just UTF-8.

It is orthogonal to the character encoding.
>
> > null byte
> > keyword (translated into the specified language and charset, not compressed)
>
> Again... you'll run into problems with "translated" keywords because translation
> into Latin-1 is often ambiguous for certain languages. Example: The word "Sushi"
> in Japanese can be romanized into "Sushi" or "susi" and Mount "Fuji" can be
> romanized into "Huzi" or "Fuji", depending on the method of "romanization"
Yes but I think the intent with the dual keyword thing was that the
English keyword would be one from a list of registered keywords.

> What happens if the person doesn't know how to translate the word? Or if there
> is no direct translation for a word? (ie a word in a language has to be
> translated into a phase in order to represent the meaning)

Phrases are OK; the "keyword" in PNG is already actually a phrase.

--
Chris

Next message: Keld J|rn Simonsen: "Re: Greek accents: sorting order - opinions sought"
Previous message: Adrian Havill: "Last message regarding libpng"
In reply to: Adrian Havill: "Re: Encoding alternate character sets in tEXt/zTXt strings"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT