RE: Concise term for non-ASCII Unicode characters

From: Peter Constable <petercon_at_microsoft.com>
Date: Sun, 20 Sep 2015 19:24:14 +0000

Well, if the point is to refer to characters that would require two or more code units in UTF-8, then _accurate_ expressions would be, "Unicode characters beyond the Basic Latin block" or "Unicode characters above U+007F".

Peter

-----Original Message-----
From: Steve Swales [mailto:steve_at_swales.us]
Sent: Sunday, September 20, 2015 11:00 AM
To: Phillips, Addison <addison_at_lab126.com>
Cc: Peter Constable <petercon_at_microsoft.com>; Sean Leonard <lists+unicode_at_seantek.com>; unicode_at_unicode.org
Subject: Re: Concise term for non-ASCII Unicode characters

Exactly. I think the reason that non-ASCII feels non-concise is that there is widespread confusion between ASCII and Latin-1/ISO 8859-1 (which in turn is widely confused with Windows-1252).

-steve

Sent from my iPhone

> On Sep 20, 2015, at 10:05 AM, Phillips, Addison <addison_at_lab126.com> wrote:
>
> I agree, although I note that sometimes the additional (redundant) specificity of "non-7-bit-ASCII characters" is needed when talking to people unclear on what "ASCII" means.
>
> Addison
>
>> -----Original Message-----
>> From: Unicode [mailto:unicode-bounces_at_unicode.org] On Behalf Of Peter
>> Constable
>> Sent: Sunday, September 20, 2015 9:52 AM
>> To: Sean Leonard; unicode_at_unicode.org
>> Subject: RE: Concise term for non-ASCII Unicode characters
>>
>> You already have been using "non-ASCII Unicode", which is about as
>> concise and sufficiently accurate as you'll get. There's no term
>> specifically defined in any standard or conventionally used for this.
>>
>>
>> Peter
>>
>> -----Original Message-----
>> From: Unicode [mailto:unicode-bounces_at_unicode.org] On Behalf Of Sean
>> Leonard
>> Sent: Sunday, September 20, 2015 7:48 AM
>> To: unicode_at_unicode.org
>> Subject: Concise term for non-ASCII Unicode characters
>>
>> What is the most concise term for characters or code points outside
>> of the US-ASCII range (U+0000 - U+007F)? Sometimes I have referred to
>> these as "extended characters" or "non-ASCII Unicode" but I do not
>> find those terms precise. We are talking about the code points U+0080
>> - U+10FFFF. I suppose that this also refers to code points/scalar
>> values that are not formally Unicode characters, such as U+FFFF.
>> Basically, I am looking for a concise term for values that would
>> require multiple UTF-8 octets if encoded in UTF-8 (without referring to UTF-8 encoding specifically).
>> "Non-ASCII" is not precise enough since character sets like Shift-JIS
>> are non- ASCII.
>>
>> Also a citation to a relevant standard (whether Unicode or otherwise)
>> would be helpful.
>>
>> The terms "supplementary character" and "supplementary code point"
>> are defined in the Unicode standard, referring to characters or code
>> points above U+FFFF. I am looking for something like those, but for
>> characters or code points above U+007F.
>>
>> Thank you,
>>
>> Sean
>
>
Received on Sun Sep 20 2015 - 14:25:14 CDT

This archive was generated by hypermail 2.2.0 : Sun Sep 20 2015 - 14:25:14 CDT