"Visually approximate" conversion from unicode to Windows-1252 [summary]

From: Paul Johnston (paj@pajhome.org.uk)
Date: Fri Oct 06 2006 - 10:48:58 CST

  • Next message: Adam Twardoch: "Re: "Visually approximate" conversion from unicode to Windows-1252 [summary]"


    Thanks for all the helpful responses. To clarify things, 1251 was my
    error, I meant Windows-1252. The troublesome character was 2019 - right
    single quotation mark.

    Using WideCharToMultiByte does exactly what I need. For those
    interested, here is the Python code I'm using:

    from ctypes import *
    def de_unicode(instr):
      outstr = create_string_buffer(len(instr) + 1)
      windll.kernel32.WideCharToMultiByte(1252, 0,
                                      len(instr), outstr, len(instr) + 1,
    None, None)
      return outstr.value

    The suggestion to use HTML entities, e.g. ’ was a good idea.
    Unfortunately, htmldoc doesn't support unicode at all - such characters
    just do not appear in the output.

    In my application, I am generating the PDF files in a CGI script.
    Htmldoc is handy as it can do the conversion from the command line. Most
    PDF generators are printer drivers, and I haven't (so far) managed to
    make one work from a CGI script. It's something I may investigate
    further in the future as htmldoc has other problems, e.g. not supporting

    I am aware of solutions like iText that let you generate PDFs without
    using HTML at all, but I've got a feeling that would be hard work. I
    already have well established systems for building HTML documents.

    Thanks again for all the useful suggestions,


    This archive was generated by hypermail 2.1.5 : Fri Oct 06 2006 - 11:02:10 CST