Re: locale-*independent* vi editor supporting UTF-8

From: Jungshik Shin (
Date: Thu Nov 26 1998 - 06:30:18 EST

On Thu, 26 Nov 1998, Markus Kuhn wrote:

  Thank you for the answer.

> Jungshik Shin wrote on 1998-11-26 01:17 UTC:
> > I'm wondering if there's any vi(or clone) that supports UTF-8 in a
> > locale-*independent* way so that it can be used to edit UTF-8 files on a
> > system(mostly Unix) without UTF-8 locale.
> No, there isn't. The only freely available UTF-8 editors are

   Of course, I know of both of them :-) and I've been using Yudit (and
have contributed a little to it) for a while. Yudit is nice(and two
accompanying programs uniprint and uniconv are great), but it's obvious
it's not so feature-rich as vi and Emacs so I was asking for UTF-8
enabled vi(clones).

> AFAIK, nobody is working on extending one of the many vi clones for
> UTF-8. I also do not know about work on making emacs UTF-8 capable,
> except that Richard Stallman has said that that would be a good thing to
> do. Feel free to make your contribution here!

  I guess you have received the following message as well(perhaps
after writing the above paragraph). For the sake of other Unicoders,
I'm enclosing below what's sent via GNU Emacs-Unicode list.
I guess it's time for me to upgrade Emacs to 20.3 or later on my computer.
(currently with Emacs 20.2)

>Otfried I have made a small package that allows GNU Emacs 20.3 to read
>Otfried and write files encoded in UTF-8 (as far as that is possible).
>Otfried It is based on a small C program utf2mule that converts between
>Otfried UTF-8 and Emacs-Mule encoding (it supports most characters sets
>Otfried supported by Emacs, with the exception of only Ethiopic,
>Otfried Tibetan, Indian, and the small Sisheng character set), and some
>Otfried Emacs lisp code that registers a new encoding "unicode-utf8"
>Otfried that is implemented by calling the external converter when
>Otfried loading/saving files.
>Otfried A small webpage with a description and everything you need to
>Otfried install it is at "".

> The changes necessary would not be too significant. The major change is
> that in order to count the number of characters in a string, you have to
> count the bytes with (x & 0xc0) != 0xc0 instead of all bytes. The only
> other significant problem with UTF-8 is the [] operator in regular
> expressions, which currently assumes 1 byte = 1 character.

  That's what I thought and Mark Leisher's ucdata would be sufficient
for the job(well, for regular expression handling would be beyond the
range covered by it, for sure), but the author of one of several vi
clones for Korean EUC encoding(EUC-KR) and JOHAB(another 1byte-2byte
popular encoding for Korean that encodes all modern complete *and*
_incomplete/partial_ syllables) claimed differently. I'll try to
figure out .....

     Jungshik Shin

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:43 EDT