Re: Unicode 4.0 is online at last!

From: Peter Kirk (
Date: Thu Aug 14 2003 - 15:30:19 EDT

  • Next message: Peter Kirk: "Re: Unicode 4.0 is online at last!"

    On 14/08/2003 11:44, Jim Allan wrote:

    > Peter Kirk posted:
    >> The documentation is great, but I have had some problems copying text
    >> from it (with Acrobat Reader 5), in particular with text in small
    >> capitals e.g. Unicode character names. For example, I get the following
    >> from p.44:
    >> The sequence of Unicode characters U+0061 “a” 
    >>    + U+0308 “!”  + U+0075 “u”  
    >> 
    >>  unambiguously encodes “äu” not “aü”.
    > This came out perfectly on my Windows 98 system as browsed by me in
    > the Unicode list archives through Mozilla 1.3 and also after I pasted
    > it into the Mozilla Compose window as quoted text.
    > The characters, small capital or others, are displayed with no problems.
    > Jim Allan
    What seems to be happening, in Windows 2000, is that the text on the
    clipboard is made up of PUA character codes U+F7XX, where the XX seems
    to be the corresponding ASCII code. For example, small caps "LATIN"
    comes out as F76C F761 F774 F769 F76E. At some point Windows 98 simply
    strips off the F7's giving you the correct text. But Windows 2000, which
    is Unicode based, keeps the full PUA code points, which in my Mozilla
    1.4 are rendered as strange combinations of base characters with
    combining marks, e.g. "LATIN" comes out as  which appears on my
    screen (in Mozilla mail and browsing the archives with Mozilla) as N
    diaeresis M macron o vertical-line-below n macron o acute dot-below.
    When I browse the archives in IE6 or paste the text into Word, I get
    square boxes.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Thu Aug 14 2003 - 16:00:08 EDT