"Visually approximate" conversion from unicode to Windows-1251 (or similar code page)

From: Paul Johnston (paj@pajhome.org.uk)
Date: Wed Oct 04 2006 - 10:52:56 CST

Next message: Otto Stolz: "Re: Should names for sexes be included into CLDR?"

Previous message: Marion Gunn: "Re: Should names for sexes be included into CLDR?"
Next in thread: Jukka K. Korpela: "Re: "Visually approximate" conversion from unicode to Windows-1251 (or similar code page)"
Reply: Jukka K. Korpela: "Re: "Visually approximate" conversion from unicode to Windows-1251 (or similar code page)"
Reply: Addison Phillips: "Re: "Visually approximate" conversion from unicode to Windows-1251 (or similar code page)"
Reply: Paul Hastings: "Re: "Visually approximate" conversion from unicode to Windows-1251 (or similar code page)"
Maybe reply: Andreas Prilop: "Re: "Visually approximate" conversion from unicode to Windows-1251 (or similar code page)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hi,

I am using Unicode throughout my system (a web-based database for
tracking work). I am forced to use a tool (htmldoc - for html to PDF
conversion) that does not support unicode in any manner. This should not
be a significant problem in practice, as all the data is in English.
However, I am having problems with a few characters, primarily an
apostrophe-like character (don't know the code offhand; it's not in
Latin-1).

If I encode the output as Windows-1251, the character causes an error.
If I used utf-8 it causes visual garbage in the output. What would be
ideal is to perform a "visually approximate" conversion to Windows-1251,
which would replace this with a regular apostrophe. I am happy to accept
the risks that such an approximation carries.

I know Windows can do this, as retrieving values from controls using a
non-Unicode interface does exactly this conversion. However, I have not
been able to find out how I can perform the conversion at will. I
apologise if this is not the most appropriate forum for this question,
but I have been looking long ang hard for this without success.

Many thanks for any help you can offer,

Paul

P.S. If someone can suggest a unicode compatible replacement for
htmldoc, that would satisfy me too!