Date: Fri Jul 04 1997 - 16:55:52 EDT

On Jul 4, 3:08pm, Jonathan Rosenne wrote:
> At 05:37 04/07/97 -0400, Martin J. Duerst wrote:
> >On Thu, 3 Jul 1997, Markus Kuhn wrote:
> >
> >> I expect problems like this to be many orders of magnitude worse
> >> once Unicode starts to get widely used on the Web. The above
> >> problem is at least well-defined, the people using the
> >> 0x80-0x9f characters in HTML are clearly wrong, the HTML specification
> >> leaves no doubt about this. The problem is just that the authors
> >> of HTML export filters of one very popular word processor have been
> >> ignorant about the problem (I won't mention names here).
> >
> >Some software makers have been ignorant in the past, but they
> >have catched up. If you think this one hasn't, please tell me
> >their name in private, and I will contact them.

I have already done so, they have promised to fix it already (a
combination of labelling content correctly and upgrading their HTML
export options) and lastly there are now SGML character entities for the
offending characters - defined to produce the correct Unicode characters
- which they can transparently map back to the 0x80-0x9f range in the
 mean time on the platform that supports them. Ahem. See

> This would be allowed if the HTML charset will be coded correctly as
> CP1250.


> I guess authoring tools will gradually get over producing a
> misleading 8859-1 specification, which many do now.
> A note to authoring tools producers: If you do not know for sure that it is
> 8859-1 don't produce this charset specification. Either get the correct
> data from the operating system or ask the user,

or convert to character entities if available, or convert to utf-8

> and if this is not possible it is better you do nothing!

Doing nothing is equivalent to labelling as 8859-1, according to HTTP.
I don't see how the application can not know what character set it
is using.

