"Visually approximate" conversion from unicode to Windows-1251 (or similar code page)

From: Paul Johnston (paj@pajhome.org.uk)
Date: Wed Oct 04 2006 - 10:52:56 CST

  • Next message: Otto Stolz: "Re: Should names for sexes be included into CLDR?"

    Hi,

    I am using Unicode throughout my system (a web-based database for
    tracking work). I am forced to use a tool (htmldoc - for html to PDF
    conversion) that does not support unicode in any manner. This should not
    be a significant problem in practice, as all the data is in English.
    However, I am having problems with a few characters, primarily an
    apostrophe-like character (don't know the code offhand; it's not in
    Latin-1).

    If I encode the output as Windows-1251, the character causes an error.
    If I used utf-8 it causes visual garbage in the output. What would be
    ideal is to perform a "visually approximate" conversion to Windows-1251,
    which would replace this with a regular apostrophe. I am happy to accept
    the risks that such an approximation carries.

    I know Windows can do this, as retrieving values from controls using a
    non-Unicode interface does exactly this conversion. However, I have not
    been able to find out how I can perform the conversion at will. I
    apologise if this is not the most appropriate forum for this question,
    but I have been looking long ang hard for this without success.

    Many thanks for any help you can offer,

    Paul

    P.S. If someone can suggest a unicode compatible replacement for
    htmldoc, that would satisfy me too!