Re: Concise term for non-ASCII Unicode characters

From: Sean Leonard <>
Date: Tue, 29 Sep 2015 10:30:59 -0700

On 9/29/2015 9:40 AM, Daniel Bünzli wrote:
> I would say there's already enough terminology in the Unicode world to add more to it. This thread already hinted at enough ways of expressing what you'd like, the simplest one being "scalar values greater than U+001F". This is the clearest you can come up with and anybody who has basic knowledge of the Unicode standard
Uh...I think you mean U+007F? :)

Perhaps it's because I'm writing to the Unicode crowd, but honestly
there are a lot of very intelligent software engineers/standards folks
who do not have the "basic knowledge of the Unicode standard" that is
being presumed. They want to focus on other parts of their systems or
protocols, and when it comes to the "text part", they just hand-wave and
say "Unicode!" and call it a day. In particular there is a flow-down
effect where terms from one standards body don't match with another
standards body, perhaps because they got redefined over time for various
reasons. The distinction between "characters", "abstract characters",
"code points", and "scalar values" is not intuitively obvious to people
without specialized knowledge of text processing issues. The fact that
(modern implementations of) UTF-8 encoders and decoders are not supposed
to process the surrogate code points (arbitrarily), for example, is a
rather advanced topic that presumes knowledge of the interaction between
UTF-16, UTF-8, what surrogate code points actually are, and the security
implications of so-doing (UTR-36). Furthermore one has to parse the
distinction between "well-formed" and "ill-formed".

In the twenty minutes since my last post, I got two different
responses...and as you pointed out, there are a lot of ways to express
what one would like. I would prefer one, uniform way (hence,
"standardized way"). Just surveying the various standards that have
tried to tackle this distinction with their own organic terminology will
probably be revealing. Evidence-based should be the yardstick.

Best regards,

Received on Tue Sep 29 2015 - 12:32:43 CDT

