RE: Œœ on IBM AIX

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Jun 01 2007 - 02:48:07 CDT

  • Next message: Addison Phillips: "Re: Œœ on IBM AIX"

    Did you install the Oracle package in your server that contains the additional mappings?

     

    Your database server is setup only to support Windows-1252 (named “WE8MSWIN1252” by Oracle) for [VAR]CHAR and UTF-16 (named “AL16UTF16” by Oracle) for N[VAR]CHAR.
    So it may just be installed with the single mapping of Windows1252 with Unicode, and will still not support ISO-8859-15 (“WE8ISO8859P15”). (Note that all Oracle servers also support UTF-8 and US-ASCII, even if this is not visible here, so the server would honor connection requests using UTF-8 and US-ASCII client settings).

    It is not enough to specify an alternate encoding in the client settings, given that Oracle will still need to convert your data to the database supported encodings before storage. So setting “WE8ISO8859P15” in your connection environment is ignored, and this is what is shown in your Oracle session. Despite you wanted to connect using ISO-8859-15, the session is open but is still using Windows-1252.

    As a consequence, the Euro symbol and “Œœ” characters must still be sent to the server using Windows-1252, or the client must convert the encodings itself (by installing similar mappings packages on the client side, so that the Oracle client can use it, when the conversion can’t be performed by the server itself. If the client doesnot have this package, it is still saying to the server that it uses Windows-1252.)

     

    Note also that the Euro symbol and “Œœ” are encoded DIFFERENTLY between ISO-8859-15 (where they are in the 0xA0-0xFF range) and Windows-1252 (they are in the 0x80-0x9F range which is assigned to C1 controls in ISO-8859-15), despite most characters are encoded identically within the 0xA0-0xFF range, and all characters are mapped identically in the 0x00-0x7F range (same mappings as US-ASCII). No C1 controls (mapped in the 0x80-9F range in all ISO-8859-* charsets) will be convertible to Windows-1252 so they will be replaced by quotation marks. And most of the characters, punctuations and symbols mapped within the 0x80-9F range in Windows-1252 will NOT be convertible to ISO-8859-15, with a few exceptions. The range 0xA0-0xFF of Windows-1252 is identical to the same range of ISO-8859-1 (Latin 1), but NOT to the same range of ISO-8859-15 (Latin 9, which includes some differences)

     

    The conversion between your desired ISO-8859-15 encoding and the Windows-1252 encoding currently supported by the server requires an intermediate mapping to Unicode of ISO-8859-15. Nothing indicates that this support ISO-8859-15 is present in your server or in your Oracle local client installation, or that the support of a Windows-1252 mapping is present on the client side.

     

    Note that Oracle Server indicates to the client that the requested mapping is not honoured, by returning the effective encoding it uses within the session parameters, but the Server will not reject the connection, as it assumes that the client will adapt itself to this situation by converting its data to Windows-1252 in the SQL requests or to UTF-16 for query parameters bound to a N[VAR]CHAR variable; note also that every text indicated in PL/SQL is still encodeded using the [VAR]CHAR mappings, not the N[VAR]CHAR mappings which can only be sent to the server through query parameter bindings.

     

    When you perform a SELECT query within an interactive (or shell) PL/SQL session, all the data displayed on screen will be displayed using the [VAR]CHAR semantics, except tabular data that is returned by the server through column bindings: it is the interactive client that formats the data on screen within columns, by converting the N[VAR]CHAR data into [VAR]CHAR.

    By default, a Oracle database is installed with just the minimum packages for encodings. For others, you need to load the extra packages on the server from a SYSTEM Oracle account.
    Oracle has an extensive documentation (a complete book) about its NLS support, and it documents the necessary extra packages that are needed to support more characters.

     

    So install the extra package for supporting the ISO-8869-15 in the system database and you’ll solve your problem: Oracle Server will then accept and honor client sessions specifying it, and will convert the characters itself.

     

    Note however that this will not change the query semantics, notably for the sort order and text match, which is still performed in all SQL queries using the server internal mappings and order. Independently of the fact that the server may still convert the sorted data by converting it to the charset used in the client session. This is important to understand because this may affect what the client expects to be distinct unique keys (if you wanted to store “œuf” or “Œuf” as distinct keys in some database table, the server will still see “?uf” as a single key and the server will sort it as “?uf”; on the opposite the server may already contain rows with distinct keys “œuf” and “Œuf”, and SQL queries will return them as two rows (and COUNT(*) will evaluate to 2) even if your client performing a select receives two rows apparently containing the same “?uf” (generated by the unmappable characters).

     

    Finally, note that the character used as a substitute for unmappable characters can be specified in the session NLS parameters. Personnally, I prefer maiking my Oracle applications using a C0 control for this substitute, rather than the default question mark.

     

    Philippe.

     

      _____

    De : unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] De la part de Ankit Jain
    Envoyé : vendredi 1 juin 2007 07:20
    À : unicode@unicode.org
    Cc : Addison Phillips
    Objet : Re: Œœ on IBM AIX

     

    Hi Addison

     

    My configuration is :
    1. IBM AIX version 5.1 with oracle 9.2 client. (my web application and my java programs running continously in the background on this machine)
    2. Oracle 10g server on windows 2003 Server Edition.
    3.
     
    I am saving the following characters: "Àà Ââ Ææ Çç Éé Èè Êê Ëë Îî Ïï Œœ Ôô Ùù Ûû Üü Ÿ ÿ" in the database,

    These characters are stored correclty in the Oracle database by my Web Application using page encoding 'ISO-8859-15'. I tested the database usign PL/SQL and found that it is correctly stored in the database.
    but my continously running java programs when retrieve these characters from the database, "€" and "Œœ " becomes inverted question mark...

    please find my settings and then tell me where i could be wrong

    SQL> select * from V$NLS_PARAMETERS
    PARAMETER VALUE
    ---------------------------------------------------------------- ----------------------------------------------------------------
    NLS_LANGUAGE AMERICAN
    NLS_TERRITORY AMERICA
    NLS_CURRENCY $
    NLS_ISO_CURRENCY AMERICA
    NLS_NUMERIC_CHARACTERS
    NLS_CALENDAR GREGORIAN
    NLS_DATE_FORMAT DD-MON-RR
    NLS_DATE_LANGUAGE AMERICAN
    NLS_CHARACTERSET WE8MSWIN1252
    NLS_SORT BINARY NLS_TIME_FORMAT HH.MI.SSXFF AM
    NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
    NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
    NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
    NLS_DUAL_CURRENCY $
    NLS_NCHAR_CHARACTERSET AL16UTF16
    NLS_COMP BINARY
    NLS_LENGTH_SEMANTICS BYTE
    NLS_NCHAR_CONV_EXCP FALSE
    19 rows selected.

    NLS_LANG was not set on AIX machine (client machine-Oracle <http://9.2.0.1> 9.2.0.1)
    LANG parameter earlier was en_US for all

    i tried setting NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P15 on both oracle user and myapp user
    and also i set (for oracle user , myapp user) it to en_US.8859-15

    Even after making no changes, i get inverted question marks.

    Hence please guide me what else configuration could be revised?

    Regards, Ankit



    This archive was generated by hypermail 2.1.5 : Fri Jun 01 2007 - 02:51:53 CDT