RE: Mixing languages on a Web site

Date: Fri Jun 30 2000 - 07:02:02 EDT wrote:

> 1) How do I convert Latin-* text to UTF-8 text?
> 2) How do I convert Shift-JIS text to UTF-8 text?

There are plenty ways, including writing your own tools.
Latin-1 to UTF-8 is completely algorithmic, and it would be as simple as

        void Latin1ToUtf8()
            int c;
            while ((c = getchar()) != FEOF)
                if (c < 0x80)
                  putchar(0xC0 | (c >> 6));
                  putchar(0x80 | (c & 0x3F));

Other encodings (including Latin-2, etc.) cannot be done without a
conversion table look up.
Shift-JIS would require a 3-step process:

- Decode shift-JIS to get binary ku/ten (row/column) encoded JIS;
- Convert JIS to Unicode looking up a conversion table;
- Encode Unicode as UTF-8.

It is much simpler to use ready-made tools. A great one is the UniConv demo
by Basis Technology:

> 3) How do I mark text as UTF-8?

In your <head> section:

        <meta http-equiv="content-type" content="text/html; charset=utf-8">

Theoretically, you don't need this: Unicode (UTF-16 or UTF-8) are the
default for the web. In practice, however, each different browser behaves in
a slightly different way, so it can be a good idea to use the explicit

> 4) Will people actually be able to SEE BOTH the Japanese AND
> the Turkish?

Yes, provided they have a UTF-8 enabled browser and a font with all
necessary glyphs.

If the text is just a language name, used as a link leading to a single
language version, a common alternative is to use pictures containing the
language name or a symbol for it (a flag, a map, etc.).

> 6) Is there a "Unicode Help" site so people like me don't
> have to post these questions on lists like these?

I think this mailing list is the proper place (if not, oooops!, it means I
have misused it for months...).
There are other mailing lists, not open to people like us, to discuss more
technical and political things.

_ Marco

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT