Re: Concise term for non-ASCII Unicode characters

From: Daniel Bünzli <daniel.buenzli_at_erratique.ch>
Date: Tue, 29 Sep 2015 19:02:54 +0100

Le mardi, 29 septembre 2015 à 18:30, Sean Leonard a écrit :
> Uh...I think you mean U+007F? :)

Yes… see how it was easy to point out that the definition was wrong. It would also have been, if this was code and we were talking about a protocol whose specification was using this notation rather than a new Unicode concept.

> Perhaps it's because I'm writing to the Unicode crowd, but honestly
> there are a lot of very intelligent software engineers/standards folks
> who do not have the "basic knowledge of the Unicode standard" that is
> being presumed. They want to focus on other parts of their systems or
> protocols, and when it comes to the "text part", they just hand-wave and
> say "Unicode!" and call it a day.

Introducing more terminology and jargon is not going to help in this case. Make the definitions as obvious as possible and strive for minimality in the exposed concepts.

> The fact that (modern implementations of) UTF-8 encoders and decoders are not supposed to process the surrogate code points (arbitrarily), for example, is a
> rather advanced topic

I wouldn't say this is advanced knowledge, this is basic knowledge any programmer dealing with Unicode text should have. FWIW this [1] is the absolute minimal knowledge I think programmers should have about Unicode (the last section can be skipped it's specific to a programming language). This corresponds to maybe 3 to 4 A4 pages. If your programmers are not able to grok this small amount of knowledge, hire better ones.

Best,

Daniel

[1] http://erratique.ch/software/uucp/doc/Uucp.html#uminimal
Received on Tue Sep 29 2015 - 13:03:53 CDT

This archive was generated by hypermail 2.2.0 : Tue Sep 29 2015 - 13:03:53 CDT