Detecting UTF-8 Locale Question

From: Edward H Trager (ehtrager@umich.edu)
Date: Tue Mar 25 2003 - 12:01:11 EST

Next message: Kent Karlsson: "RE: Several BOMs in the same file"

Previous message: Doug Ewell: "Re: Several BOMs in the same file"
Next in thread: Noah Levitt: "Re: Detecting UTF-8 Locale Question"
Reply: Noah Levitt: "Re: Detecting UTF-8 Locale Question"
Reply: James H. Cloos Jr.: "Re: Detecting UTF-8 Locale Question"
Maybe reply: Muhammad Asif: "Re: Detecting UTF-8 Locale Question"
Reply: Otto Stolz: "Re: Detecting UTF-8 Locale Question"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hope some of the gurus with programming experience who read this list can
provide me with some additional insight or pointers to resources about the
following (NOTE: I've already looked at Markus Kuhn's FAQ):

QUESTIONS:

(1) Is examination of the LC_CTYPE environment variable on UNIX-like
environments a sufficient way of detecting locale?

(2) Are there a UTF-8 competent terminals available for OS X/Darwin, or
should one just use an X-based terminal like mlterm or xterm?

(3) Aside from xterm or mlterm running under Cygwin, are there other UTF-8
competent terminals available on Win32? Which one are "the best"?

I don't mind subjective responses regarding which are "the best"
terminals. For example, I would personally rank mlterm as much more
capable than xterm since it handles Arabic, Hebrew, and Indic scripts.

DETAILS:

I'm writing an interactive console-based program (i.e., started from
xterm, mlterm, or other terminal emulator) for UNIX-like environments
(this would include Cygwin on Win32 and Mac OS X/Darwin in addition to the
obvious other ones like Solaris and Linux) which will support just two
"locales": the ASCII subset of UTF-8, and UTF-8. That's it! For UTF-8,
initially the program will support plane 0 (BMP). Support beyond plane 0
probably won't ever be necessary.

My initial plan for finding out about the current locale is that the
program will, at start up, look at the LC_CTYPE environment variable. If
that variable is defined and contains the substring "UTF-8" or regex-able
variants thereof (like "utf8" on Linux), then everything is fine. If not
present, the program prints a warning message to the user suggesting they
set the locale to a UTF-8 locale and provides an example of how to do
that. If the locale is not set properly, the program still functions, but
of course any UTF-8 encoded data will not be displayed properly on the
terminal.

(Of course, even if a locale *is* set to a UTF-8 locale, it doesn't
guarantee that UTF-8 data will be displayed properly because (1) glyphs
still may not be available in the fonts on the system (2) the terminal may
not handle the script properly (i.e., when I last checked, xterm didn't
handle Indic or RTL scripts)).

If anybody sees limitations to this approach (Actually, I'm hoping you
will!), please let me know.

This approach seems sufficient using xterm under Cygwin and mlterm on
Linux and OpenBSD, and I haven't got around to testing with Solaris yet.
There might, for example, be much better ways to do it on Cygwin/Win32
that I don't know about. Also, I don't have a clue how to do it on OS X.

Next message: Kent Karlsson: "RE: Several BOMs in the same file"
Previous message: Doug Ewell: "Re: Several BOMs in the same file"
Next in thread: Noah Levitt: "Re: Detecting UTF-8 Locale Question"
Reply: Noah Levitt: "Re: Detecting UTF-8 Locale Question"
Reply: James H. Cloos Jr.: "Re: Detecting UTF-8 Locale Question"
Maybe reply: Muhammad Asif: "Re: Detecting UTF-8 Locale Question"
Reply: Otto Stolz: "Re: Detecting UTF-8 Locale Question"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Mar 25 2003 - 12:55:56 EST