"Visually approximate" conversion from unicode to Windows-1252 [summary]

From: Paul Johnston (paj@pajhome.org.uk)
Date: Fri Oct 06 2006 - 10:48:58 CST

Next message: Adam Twardoch: "Re: "Visually approximate" conversion from unicode to Windows-1252 [summary]"

Previous message: Neil Harris: "Re: Unicode and RFC 4690"
Next in thread: Adam Twardoch: "Re: "Visually approximate" conversion from unicode to Windows-1252 [summary]"
Reply: Adam Twardoch: "Re: "Visually approximate" conversion from unicode to Windows-1252 [summary]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hi,

Thanks for all the helpful responses. To clarify things, 1251 was my
error, I meant Windows-1252. The troublesome character was 2019 - right
single quotation mark.

Using WideCharToMultiByte does exactly what I need. For those
interested, here is the Python code I'm using:

from ctypes import *
def de_unicode(instr):
  outstr = create_string_buffer(len(instr) + 1)
  windll.kernel32.WideCharToMultiByte(1252, 0,
c_char_p(instr.encode('utf-16le')),
                                  len(instr), outstr, len(instr) + 1,
None, None)
  return outstr.value

The suggestion to use HTML entities, e.g. ’ was a good idea.
Unfortunately, htmldoc doesn't support unicode at all - such characters
just do not appear in the output.

In my application, I am generating the PDF files in a CGI script.
Htmldoc is handy as it can do the conversion from the command line. Most
PDF generators are printer drivers, and I haven't (so far) managed to
make one work from a CGI script. It's something I may investigate
further in the future as htmldoc has other problems, e.g. not supporting
CSS.

I am aware of solutions like iText that let you generate PDFs without
using HTML at all, but I've got a feeling that would be hard work. I
already have well established systems for building HTML documents.

Thanks again for all the useful suggestions,

Paul

Next message: Adam Twardoch: "Re: "Visually approximate" conversion from unicode to Windows-1252 [summary]"
Previous message: Neil Harris: "Re: Unicode and RFC 4690"
Next in thread: Adam Twardoch: "Re: "Visually approximate" conversion from unicode to Windows-1252 [summary]"
Reply: Adam Twardoch: "Re: "Visually approximate" conversion from unicode to Windows-1252 [summary]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Oct 06 2006 - 11:02:10 CST