RE: Devanagari

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Tue Jan 22 2002 - 11:25:55 EST


David Starner wrote:
> On Mon, Jan 21, 2002 at 02:20:17PM +0100, Marco Cimarosti wrote:
> > What this means in practice for website developers is:
> >
> > 1) SCSU text can only be edited with a text editor which
> properly decodes
> > the *whole* file on load and re-encodes it on save. On the
> other hand, UTF-8
> > text can also be edited using an encoding-unaware editor,
> although non-ASCII
> > text is invisible.
>
> True for users of Latin-based writing systems. Probably of little
> comfort to users of Indic or Chinese-based writing systems.

I was referring to the task of editing *source* files in HTML, XML, or other
computer languages and format. Most of the time, programmers and webmasters
are interested in changing the "ASCII" part of the file (mark-up,
instructions), which is the part which most likely contains bugs to be
fixed, or to need changes unrelated with the linguistic contents.

Of course, the people in charge of writing the *content*, need tools that
can display the actual characters. And this is true for users of Latin-based
writing system as well: imagine writing in French or German with all
occurrences of é, è, ä, ö, ü, etc. transformed into pairs of funny bytes.

> Better to stick with editors that are aware of your encoding.

Of course. Provided that one exists on your platform, and that you are not
bound to development tools which don't support it.

> > 2) SCSU text cannot be built by assembling binary pieces coming from
> > external sources.
>
> It's not really designed for that. If you're assembling things, just
> run the output through a UTF-8 to SCSU converter.

Which translates to: SCSU is not appropriate for dynamic HTML pages, or for
encoding text inside any other kind of application.

More generally, SCSU is not appropriate as text encoding, but just as a
compression method for documents in their final form.

Ciao.
_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jan 22 2002 - 10:55:28 EST