> 1) How do I convert Latin-* text to UTF-8 text?
> 2) How do I convert Shift-JIS text to UTF-8 text?
There are plenty ways, including writing your own tools.
Latin-1 to UTF-8 is completely algorithmic, and it would be as simple as
while ((c = getchar()) != FEOF)
if (c < 0x80)
putchar(0xC0 | (c >> 6));
putchar(0x80 | (c & 0x3F));
Other encodings (including Latin-2, etc.) cannot be done without a
conversion table look up.
Shift-JIS would require a 3-step process:
- Decode shift-JIS to get binary ku/ten (row/column) encoded JIS;
- Convert JIS to Unicode looking up a conversion table;
- Encode Unicode as UTF-8.
It is much simpler to use ready-made tools. A great one is the UniConv demo
by Basis Technology:
> 3) How do I mark text as UTF-8?
In your <head> section:
<meta http-equiv="content-type" content="text/html; charset=utf-8">
Theoretically, you don't need this: Unicode (UTF-16 or UTF-8) are the
default for the web. In practice, however, each different browser behaves in
a slightly different way, so it can be a good idea to use the explicit
> 4) Will people actually be able to SEE BOTH the Japanese AND
> the Turkish?
Yes, provided they have a UTF-8 enabled browser and a font with all
If the text is just a language name, used as a link leading to a single
language version, a common alternative is to use pictures containing the
language name or a symbol for it (a flag, a map, etc.).
> 6) Is there a "Unicode Help" site so people like me don't
> have to post these questions on lists like these?
I think this mailing list is the proper place (if not, oooops!, it means I
have misused it for months...).
There are other mailing lists, not open to people like us, to discuss more
technical and political things.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT