Re: Œœ on IBM AIX

From: Addison Phillips (addison@yahoo-inc.com)
Date: Fri Jun 01 2007 - 09:37:30 CDT

  • Next message: Andrew West: "Re: writing Chinese dialects"

    Actually, your problem probably has nothing to do with the database. The
    magic word in your email below is "Java".

    Let me explain. Java, internally, uses Unicode and the UTF-16 encoding
    to store textual data. The Oracle JDBC driver uses UTF-8 to communicate
    with the database server, which then converts the data to the local
    database encoding (windows-1252 in your case). This means that, indeed,
    the characters are correct in the database.

    Furthermore, when you retrieve them, your Java program should be getting
    the right values back. You can check this by looking at the character
    values for the characters in the retrieved string.

    For example, assuming you have a string with the Euro symbol in it:

       char c = myEuroCointainingString.charAt(0);
       System.out.println(Integer.toHexString(c));

    You should see "20ac" (the Unicode value for the Euro).

    You didn't say whether the inverted question marks show in your command
    shell (i.e. System.out.println() uses the default encoding for your JVM)
    or in your Web application (your servlet or JSP encoding has to be set
    correctly).

    If that's not the case and your string really contains "FFFD" for the
    Euro symbol (and other affected characters), it *may* be your JDBC
    driver's configuration or another configuration issue. But I'd tend to
    guess that, since the data made it into the database okay, your problem
    is more likely to be how you are viewing the data from your Java program
    later. It is entirely common for the implicit conversion used by the JVM
    to make it look like the data is not correctly stored. Specifying the
    encoding used for conversion or configuring your JVM and its environment
    properly could address your problem.

    Hope that helps,

    Addison

    -- 
    Addison Phillips
    Globalization Architect -- Yahoo! Inc.
    Chair -- W3C Internationalization Core WG
    Internationalization is an architecture.
    It is not a feature.
    Ankit Jain wrote:
    > Hi Addison
    >  
    > 
    > My configuration is :
    > 1. IBM AIX version 5.1 with oracle 9.2 client. (my web application and 
    > my java programs running continously in the background on this machine)
    > 2. Oracle 10g server on windows 2003 Server Edition.
    > 3.
    >  
    > I am saving the following characters: "Àà Ââ Ææ Çç Éé Èè Êê Ëë Îî Ïï Œœ 
    > Ôô Ùù Ûû Üü Ÿ ÿ" in the database,
    > 
    > These characters are stored correclty in the Oracle database by my Web 
    > Application using page encoding 'ISO-8859-15'. I tested the database 
    > usign PL/SQL and found that it is correctly stored in the database.
    > but my continously running java programs when retrieve these characters 
    > from the database, "€" and "Œœ " becomes inverted question mark...
    > 
    > please find my settings and then tell me where i could be wrong
    > 
    > SQL> select * from V$NLS_PARAMETERS
    > PARAMETER                                                                       
    > VALUE
    > ----------------------------------------------------------------  
    > ----------------------------------------------------------------              
    > 
    > NLS_LANGUAGE                                                                    
    > AMERICAN
    > NLS_TERRITORY                                                                   
    > AMERICA
    > NLS_CURRENCY                                                                    
    > $                                                                             
    > NLS_ISO_CURRENCY                                                                
    > AMERICA 
    > NLS_NUMERIC_CHARACTERS                                                      
    > .,                                                                            
    > NLS_CALENDAR                                                                    
    > GREGORIAN                                                                     
    > NLS_DATE_FORMAT                                                                 
    > DD-MON-RR                                                                     
    > NLS_DATE_LANGUAGE                                                               
    > AMERICAN                                                                      
    > NLS_CHARACTERSET                                                                
    > WE8MSWIN1252                                                                  
    > NLS_SORT                                                                        
    > BINARY                                                                        
    > NLS_TIME_FORMAT                                                                 
    > HH.MI.SSXFF 
    > AM                                                                
    > NLS_TIMESTAMP_FORMAT                                                            
    > DD-MON-RR HH.MI.SSXFF 
    > AM                                                     
    > NLS_TIME_TZ_FORMAT                                                              
    > HH.MI.SSXFF AM 
    > TZR                                                            
    > NLS_TIMESTAMP_TZ_FORMAT                                                         
    > DD-MON-RR HH.MI.SSXFF AM 
    > TZR                                                 
    > NLS_DUAL_CURRENCY                                                               
    > $                                                                             
    > NLS_NCHAR_CHARACTERSET                                                          
    > AL16UTF16                                                                     
    > NLS_COMP                                                                        
    > BINARY                                                                        
    > NLS_LENGTH_SEMANTICS                                                            
    > BYTE                                                                          
    > NLS_NCHAR_CONV_EXCP                                                             
    > FALSE                                                                         
    > 
    > 19 rows selected.
    > 
    > 
    > NLS_LANG was not set on AIX machine (client machine-Oracle 9.2.0.1 
    > <http://9.2.0.1>)
    > LANG parameter earlier was en_US for all
    > 
    > i tried setting NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P15 on both oracle 
    > user and myapp user
    > and also i set (for oracle user , myapp user) it to en_US.8859-15
    > 
    > Even after making no changes, i get inverted question marks.
    > 
    > Hence please guide me what else configuration could be revised?
    > 
    > Regards, Ankit
    > 
    > 
    > 
    >  
    > On 5/30/07, *Addison Phillips* <addison@yahoo-inc.com 
    > <mailto:addison@yahoo-inc.com>> wrote:
    > 
    >     You don't say what you see for the characters after making the changes,
    >     so it is difficult to diagnose.
    > 
    >     Please note that you may already have lost the characters in the
    >     database. Once they are stored as question marks or other replacement
    >     characters, that's what they are forever. Did you try (with one or
    >     another of these configurations) inserting known-good characters and
    >     retrieving them? Or are you just trying to view existing data (how do
    >     you know it is good??) Can you see/type the characters into the shell
    >     where you are planning on viewing them? It is important to check each
    >     stage of your configuration to see that it works appropriately.
    > 
    >     Note that your database encoding is the windows code page 1252 encoding.
    >     In this encoding, the "oe" ligature character's byte values are 0x8C
    >     (capital) and 0x9C (lowercase), which are in the C1 control range for
    >     ISO 8859-15. A conversion must take place between the database and the
    >     shell in this case. If your connection isn't set correctly, you may not
    >     have correctly encoded data to view. (Of course, it could also be a font
    >     issue or some other problem.)
    > 
    >     Addison
    > 
    >     Ankit Jain wrote:
    >      > Hi
    >      > please find my settings and then guide me where i am wrong
    >      > SQL> select * from V$NLS_PARAMETERS
    >      > PARAMETER
    >      > VALUE
    >      >
    >     ----------------------------------------------------------------  ----------------------------------------------------------------
    >      >
    >      > NLS_LANGUAGE
    >      > AMERICAN
    >      > NLS_TERRITORY
    >      > AMERICA
    >      > NLS_CURRENCY
    >      > $
    >      > NLS_ISO_CURRENCY
    >      > AMERICA
    >      >
    >     NLS_NUMERIC_CHARACTERS                                                      .,
    > 
    >      > NLS_CALENDAR
    >      > GREGORIAN
    >      > NLS_DATE_FORMAT
    >      > DD-MON-RR
    >      > NLS_DATE_LANGUAGE
    >      > AMERICAN
    >      > NLS_CHARACTERSET
    >      > WE8MSWIN1252
    >      > NLS_SORT
    >      > BINARY
    >      > NLS_TIME_FORMAT
    >      > HH.MI.SSXFF
    >      > AM
    >      > NLS_TIMESTAMP_FORMAT
    >      > DD-MON-RR HH.MI.SSXFF
    >      > AM
    >      > NLS_TIME_TZ_FORMAT
    >      > HH.MI.SSXFF AM
    >      > TZR
    >      > NLS_TIMESTAMP_TZ_FORMAT
    >      > DD-MON-RR HH.MI.SSXFF AM
    >      > TZR
    >      > NLS_DUAL_CURRENCY
    >      > $
    >      > NLS_NCHAR_CHARACTERSET
    >      > AL16UTF16
    >      > NLS_COMP
    >      > BINARY
    >      > NLS_LENGTH_SEMANTICS
    >      > BYTE
    >      > NLS_NCHAR_CONV_EXCP
    >      > FALSE
    >      >
    >      >
    >      > 19 rows selected.
    >      >
    >      >
    >      > NLS_LANG is not set on AIX machine (client machine-Oracle 9.2.0.1
    >     <http://9.2.0.1>
    >      > <http://9.2.0.1/>)
    >      > LANG parameter earlier was en_US for all
    >      >
    >      > i tried setting NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P15 on both
    >     oracle
    >      > user and myapp user
    >      > now i set (for oracle user , myapp user) it to en_US.8859-15
    >      >
    >      > Even then no success. what else configuration could be revised?
    >      >
    >      > Regards, Ankit
    >      >
    >      > On 5/10/07, *Addison Phillips* <addison@yahoo-inc.com
    >     <mailto:addison@yahoo-inc.com>
    >      > <mailto:addison@yahoo-inc.com <mailto:addison@yahoo-inc.com>>> wrote:
    >      >
    >      >     One of the locales Ankit lists is:
    >      >
    >      >       en_US.8859-15
    >      >
    >      >     Any of the 8859-15 locales should be able to display the
    >     character, but
    >      >     that is only part of the problem.
    >      >
    >      >     It is important to ensure that the character in question can
    >     survive the
    >      >     whole round trip to and from the database. The terminal
    >     display locale
    >      >     is only a small part of this.
    >      >
    >      >     First, check the encoding used by the Oracle database:
    >      >
    >      >     SELECT * FROM V$NLS_PARAMETERS;
    >      >
    >      >     Look for NLS_CHARACTERSET in the results. And then look at the
    >      >     connection character encoding used by the client (usually
    >     part of the
    >      >     NLS_LANG environment variable, such as
    >      >     "AMERICAN_AMERICA.WE8ISO8859P15").
    >      >
    >      >     These all need to be compatible. For example, if your
    >     database uses the
    >      >     encoding AL32UTF8 (that is, UTF-8), then the database can
    >     store the
    >      >     characters you want---encoded as UTF-8. If your connection
    >     encoding is
    >      >     WE8ISO8859P15 (that is, ISO 8859-15), then you will get bytes
    >     consistent
    >      >     with that encoding in your queries---the database converts
    >     them to/from
    >      >     UTF-8 for storage using the specified encoding. It is very
    >     likely that
    >      >     your connection encoding is set to ISO 8859-1, though, and
    >     this is
    >      >     possible source of your woes (otherwise, instead of inverted
    >     question
    >      >     mark, you'd see random garbage bytes until you set your
    >     locale to use
    >      >     8859-15).
    >      >
    >      >     Oracle does conversions to match the client character set. If
    >     you choose
    >      >     "WE8ISO8859P1" as the encoding, the oe liguature character
    >     will be
    >      >     converted to the substitution character, regardless of the
    >     locale you
    >      >     set. So be sure you set all of your settings (locale,
    >     NLS_LANG, database
    >      >     encoding) to match or at least support the characters you need.
    >      >
    >      >     Note that there is no requirement to use a French or
    >     France-specific
    >      >     locale to do this. Those locales are necessary only if you
    >     want French
    >      >     behavior such as date formats, number formats, sorting, and
    >     so forth.
    >      >     For character display all you need is the correct character
    >     encoding.
    >      >
    >      >     Best Regards,
    >      >
    >      >     Addison
    >      >
    >      >     ====
    >      >
    >      >     For further reading, check out the Oracle Globalization Guide,
    >      >     especially Chapter 3:
    >      >
    >      >     http://tinyurl.com/kwyql
    >      >
    >      >     You might also find this configuration guide to be of some
    >     small use:
    >      >
    >      >     http://www.inter-locale.com/whitepaper/learn/learn_to_type.html
    >      >
    >      >     --
    >      >     Addison Phillips
    >      >     Globalization Architect -- Yahoo! Inc.
    >      >
    >      >     Internationalization is an architecture.
    >      >     It is not a feature.
    >      >
    >      >     Philippe Verdy wrote:
    >      >      > I see that you are trying to store exactly the list of non
    >     ASCII
    >      >     letters
    >      >      > needed for writing French, but your system does not list
    >     support
    >      >     for any
    >      >      > French locale, only English US and default POSIX…
    >      >      >
    >      >      > Second: œ and Œ are not in any of the encodings supported
    >     in the
    >      >     lists
    >      >      > of locales.
    >      >      >
    >      >      > Yes they are not in ISO 8859-1 (only in ISO 8859-15 which
    >     made a
    >      >      > fewchanges for the Euro, and for French) but in
    >     Windows-1252 (the
    >      >      > Windows "ANSI" codepage for Western European Latin).
    >      >      >
    >      >      > May be this will work if you install the locales for French…
    >      >      >
    >      >      >
    >      >      >
    >      >      >
    >      >    
    >     ------------------------------------------------------------------------
    >      >      >
    >      >      > *De :* unicode-bounce@unicode.org
    >     <mailto:unicode-bounce@unicode.org>
    >      >     <mailto:unicode-bounce@unicode.org
    >     <mailto:unicode-bounce@unicode.org>>
    >      >     [mailto:unicode-bounce@unicode.org
    >     <mailto:unicode-bounce@unicode.org> <mailto:
    >     unicode-bounce@unicode.org <mailto:unicode-bounce@unicode.org>>]
    >      >      > *De la part de* Ankit Jain
    >      >      > *Envoyé :* mercredi 9 mai 2007 15:06
    >      >      > *À :* unicode@unicode.org <mailto:unicode@unicode.org>
    >     <mailto: unicode@unicode.org <mailto:unicode@unicode.org>>
    >      >      > *Objet :* Œœ on IBM AIX
    >      >      >
    >      >      >
    >      >      >
    >      >      > Hi All
    >      >      >
    >      >      >
    >      >      >
    >      >      > I am using IBM AIX version 5.1, oracle 9.2 client and
    >     Oracle 10g
    >      >     server.
    >      >      >
    >      >      >
    >      >      >
    >      >      > I am passing the following characters: Àà Ââ Ææ Çç Éé Èè
    >     Êê Ëë Îî
    >      >     Ïï Œœ
    >      >      > Ôô Ùù Ûû Üü Ÿ ÿ. these characters get stored in the
    >     database, but
    >      >     when i
    >      >      > retrieve them, "Œœ " becomes inverted question mark...
    >      >      >
    >      >      >
    >      >      >
    >      >      > I checked the locales of the IBM AIX and found the following:
    >      >      >
    >      >      >
    >      >      >
    >      >      > C
    >      >      > POSIX
    >      >      > (..)
    >      >      >
    >      >      > en_US
    >      >      > (….)
    >      >      >
    >      >      > en_US@alt.lftkeymap <mailto:en_US@alt.lftkeymap>
    >     <mailto:en_US@alt.lftkeymap <mailto:en_US@alt.lftkeymap>>
    >      >     <mailto:en_US@alt.lftkeymap <mailto:en_US@alt.lftkeymap>
    >     <mailto:en_US@alt.lftkeymap <mailto:en_US@alt.lftkeymap>>>
    >      >      >
    >      >      >
    >      >      > One can see that there is not UTF-8 locale here.
    >      >      >
    >      >      >
    >      >      >
    >      >      > I suppose that "Œœ" character can be viewed using UTF-8
    >     encoding.
    >      >      >
    >      >      > if that supposition is right, then how to install UTF-8
    >     locale
    >      >     here or
    >      >      > if i am wrong, how to retrieve them
    >      >      >
    >      >
    >      >
    >      >
    >      >
    >      > --
    >      > Regards/ Ankit
    > 
    >     --
    >     Addison Phillips
    >     Globalization Architect -- Yahoo! Inc.
    > 
    >     Internationalization is an architecture.
    >     It is not a feature.
    > 
    > 
    > 
    > 
    > -- 
    > Regards/ Ankit
    


    This archive was generated by hypermail 2.1.5 : Fri Jun 01 2007 - 09:41:36 CDT