"textels" (was: Default character encoding for each operating system?)

From: Janusz S. Bień <jsbien_at_mimuw.edu.pl>
Date: Thu, 15 Sep 2016 21:12:53 +0200

On Thu, Sep 15 2016 at 16:36 CEST, john.w.kennedy_at_gmail.com writes:


> In the new Swift programming language, which is white-hot in the Apple
> community, Apple is moving toward a model of a transparent, generic
> Unicode that can be “viewed” as UTF-8, UTF-16, or UTF-32 if necessary,
> but in which a “character” contains however many code points it needs
> (“e” with a stacked macron, acute accent, and dieresis is
> algorithmically one “character” in Swift). Moreover,
> e-with-an-acute-accent and e followed by a combining acute accent, for
> example, compare as equal. At present, the underlying code is still
> UTF-16LE.

For several years I use the name "textel" (text element, in Polish
"tekstel") for such objects. I do it mostly orally in my presentations
for my students, but I used it also in writing e.g. in
http://bc.klf.uw.edu.pl/118/, unfortunately without a proper
definition. A rudymentary definition was provided for me only in my
recent paper in Polish: http://bc.klf.uw.edu.pl/480/. It states simply
(on p. 69) "an elementary text element independently of its Unicode
representation" (meaning in particular composed vs precomposed). I still
hope to formulate sooner or later a more satisfactory definition :-)

I think Swift confirms that such a notion is really needed.

Best regards


Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsbien@uw.edu.pl, jsbien@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/
Received on Thu Sep 15 2016 - 14:15:52 CDT

This archive was generated by hypermail 2.2.0 : Thu Sep 15 2016 - 14:15:53 CDT