RE: New on list

From: Paul Dempsey (Exchange) ([email protected])
Date: Tue Jan 12 1999 - 13:47:50 EST

Next message: Addison Phillips: "RE: New on list"
Previous message: Alfinito, Charles: "New on list"
Maybe in reply to: Alfinito, Charles: "New on list"
Next in thread: Addison Phillips: "RE: New on list"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Get the RTF specification for complete details of RTF format:
http://support.microsoft.com/download/support/mslfiles/GC0165.EXE

In summary:

Word (or the application in question) stores text internally in Unicode.

The file header contains the \ansicpg control word that specifies the
codepage.

For characters that are in the codepage, the normal RTF is written (the
character
if ASCII, else \'##.

For characters that do not exist in the codepage, the Unicode value is
written (\u####), followed by the an approximation for the character in the
codepage. For double-byte characters, it gets a little more complicated.

What you've described is this case. The Unicode character doesn't exist in
the codepage, and the best thing the RTF writer came up with for an
approximation is '~'.

You'll need to teach your program to recognize the Unicode control words.

On Windows, you can just use WideCharToMultiByte and MultiByteToWideChar to
map the text between Unicode and the codepage specified in the file.

The Unicode web site has mapping tables for many codepages and encodings.

--- Paul Chase Dempsey
[email protected]
Microsoft Visual Studio Text Editor Developer

-----Original Message-----
From: Alfinito, Charles [mailto:[email protected]]
Sent: Tuesday, January 12, 1999 9:11 AM
To: Unicode List
Subject: New on list

...

Unicode is presenting a problem. For example, a ~ may be the character in a
file. Normally in RTF this would be shown as \'98. Recently I had a file
with the unicode, \u8776\'98. This character should have been an
"infinity". Since my program can't handle the Unicode RTF (\u8776) it
ignores it and changes the \'98 to a ~ which obviously is wrong.

Does anyone know how Unicode is deriving the number (as in \u8776). I know
it has to do with the ANSI code page but I can't figure out if there is any
ryhme or reason to the Unicode numbers it is assigning or the combination of
Unicode and RTF (\u8776\'98). If I know then I could program the Unicode
characters. I've been looking for some sort of table.

...

Next message: Addison Phillips: "RE: New on list"
Previous message: Alfinito, Charles: "New on list"
Maybe in reply to: Alfinito, Charles: "New on list"
Next in thread: Addison Phillips: "RE: New on list"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT