From: Philippe Verdy (firstname.lastname@example.org)
Date: Sat Mar 21 2009 - 12:10:32 CST
> [mailto:email@example.com] De la part de Petr Tomasek
> Envoyé : samedi 21 mars 2009 17:42
> À : Unicode@unicode.org
> Objet : Does OpenOffice 3.0 handle unicode?
> Can someone, please, confirm whether the new version of
> OpenOffice can handle unicode? OpenOffice 2.0 unfortunatelly
> can handle only the BMP, while I need characters from the SMP.
That's quite a stupid question: if OpenOffice can "handle" the BMP
characters, it means that it "handles" Unicode.
Appanretly you seem to ignore that OpenOffice was designed using Unicode as
a goal, and using file formats that require the correct support of Unicode.
This support has always been part of the file format specifications (that
are based on XML files compressed within a zipped archive).
I can perfectly open Chinese documents containing characters from the SIP,
with OpenOffice (all versions, including those before 2.0).
This is not a problem of OpenOffice version but of support of the display of
the characters and scripts (for complex scripts) in the system's or
application's renderer. But if you don't have any font for those scripts you
want to render and that are part of the SMP, all you'll get is a set of
empty boxes. But even in that case, OpenOffice will not destroy the document
if it contains such sequences of characters that it cannot render with
OpenOffice contains a limited set of fonts, but not for all characters and
scripts found in Unicode. Complex scripts that require a specific layout
engine for correct rendering (because the simple one-to-one mapping from a
character to a glyph does not work as expected, or result in very poor
layout and missing contextual forms) will also need upgrade either in your
system or in your (MS/Open/Star-)Office application as well.
So, on the same system, if I can open a document containing non-BMP
characters with MS Office, I can as well open it with OpenOffice (or Sun
StarOffice). And on the reverse, I can also save a document with OpenOffice
into the legacy format supported by MSOFfice and open it in MS Office; This
makes no difference for the rendering and support of characters (there may
exist some differences in the support of specific macros, or advanced
stylesheets, or in specific page layouts, but the text itself is not
affected, and equally readable in both softwares).
Note that if you can already display those characters you want in a web
browser or in a email agent, you'll be able to see them in an Office app.
The reverse is not always true, i.e. some texts that can be worked on and
displayed corectly in an Office application may be rendered poorly or not at
all in your local web browser when converted to HTML, and it is also not
true if your "Office" application is just a legacy Notepad or similar
application designed for simple plain text documents only.
For example NotePad++ is one of those "advanced" editors that work even
worse than Notepad for characters out of the local-only "ANSI" legacy 8-bit
codeset of Windows, and it still really does not support Unicode internally
but just contains an external converter to/from UTF-8, in a VERY lossy
conversion scheme. Its support for larger character sets is a bit better in
the latest version, but still, most of its tools are not compliant and can
only handle characters that have roun-trip conversion with the local ANSI
codepage (and for some of them, it ionly works correctly if this codepage is
only a specific one, like 1250 or 1252 only). It also doesnot work at all
with the BiDi algorithm. It should not be used to edit XML or HTML documents
containing any RTL script or complex script (some of the descructive actions
made by it are orrevocable and performed silently without any warning).
On the opposite, working on those XML or HTML documents in OpenOffice is
very safe: the fact that a character or string cannot be properly displayed
using something else than empty boxes does not mean that it will replace the
characters by others (of its choice) without notice. OpenOffice accepts and
respects the whole UCS (i.e. with code points in range U+0000..U+10FFFF),
possibly only giving restrictions for some of them (see the strict XML
specifications about permanently forbidden characters within this range: the
forbidden characters are most controls like U+0000, or code points
permanently bound to non-characters like U+FFFF or U+FFFE; there's not a lot
of forbidden characters, and forbidden characters do not include any
unassigned code points because they may be assigned to valid characters at
any time in an undefined future or may alreay be assigned in a version
unknown at the time when your application was last written and delivered to
This archive was generated by hypermail 2.1.5 : Sat Mar 21 2009 - 12:14:41 CST