Re: Concise term for non-ASCII Unicode characters

From: Daniel Bünzli <daniel.buenzli_at_erratique.ch>
Date: Sun, 20 Sep 2015 20:57:10 +0100

Le dimanche, 20 septembre 2015 à 18:59, Steve Swales a écrit :
> Exactly. I think the reason that non-ASCII feels non-concise is that there is widespread confusion between ASCII and Latin-1/ISO 8859-1 (which in turn is widely confused with Windows-1252).

For this reason I usually use the term US-ASCII, which is the IANA name for the 7-bit-ASCII characters [1].

Someone referring to the non-US-ASCII scalar values of unicode would make precise sense to me. But then maybe Peter's very last suggestion is actually the most precise you can get to.

Also if you are talking about UTF-8 I would use the term scalar values rather than "characters" or "code points" since surrogates can't be encoded in UTF-8.

Best,

Daniel

[1] http://www.iana.org/assignments/character-sets
Received on Sun Sep 20 2015 - 14:58:48 CDT

This archive was generated by hypermail 2.2.0 : Sun Sep 20 2015 - 14:58:48 CDT