Re: texteditors that can process and save in different encodings

From: Doug Ewell <doug_at_ewellic.org>
Date: Wed, 17 Oct 2012 23:03:46 -0600

Philippe Verdy wrote:

> But the most basic converters between encodings (not syntax
> transformers such as converting characters into escape sequences for
> specific computer languages) should be integrated (this includes
> standard UTF's, notably UTF-8 and probably UTF-16,

So far so good.

> ASCII,

A strict subset of UTF-8, so no need to support this separately.

> and most probably ISO-8859 1,

People outside of the Americas and Western Europe might disagree with
this "obvious" default SBCS choice.

> and its Windows 1252 extension which replaces the deprecated C1
> controls from ISO 8859, as agreed now in HTML5 and most common
> practices ;

C1 controls are deprecated from HTML5, and probably from other versions
of HTML, and from XML. Even in 2012, other types of text files are
rumored to exist. Until C1 controls are formally deprecated from ISO
6429 and/or ECMA 48, it is incorrect to declare them "deprecated" in
general.

> this should also include the integrated support for local encodings
> that are already natively integrated in the OS for its legacy 8-bit
> encoding, which should be supported by using local OS API's,

Step by step, this started with "the most basic converters" and has
evolved into something much more extensive. The .NET framework supports
dozens of non-Unicode encodings. Once you go down this path, users will
reasonably expect your app to provide all kinds of character processing,
like CRLF conversion and \Uxxxx conversion and trailing-space stripping
and tab/space conversion and maybe normalization. This is the situation
we are in today.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell  
Received on Thu Oct 18 2012 - 00:09:02 CDT

This archive was generated by hypermail 2.2.0 : Thu Oct 18 2012 - 00:09:04 CDT