Re: Code point vs. scalar value

From: Philippe Verdy <>
Date: Thu, 19 Sep 2013 02:00:15 +0200

The UCD is the "Unicode Characters Database". not the "Unicode Codepoints
Database". and we've used extremely frequently the terms "character
properties" (the expression is also found outside TUS, in the names of many
APIs, even if their input is a code point, or a "character" in the meaning
of the programming language, or a 1 or 2 code units

APIs exist that are not limited to use ONLY code points as input,
frequently they also use pointers or references to streams of code units.
And they can return properties from them (even if this requires an internal
conversion of the input) ; this is what the standard "string" APIs have
used since always in C, C++, Java, Javascript, BASIC, Cobol, Fortran, Lisp,
Prolog, PHP, Ruby, Python, Pascal, Ada, SQL, Eiffel... and many of their
dialects (in fact probably all programming languages we've ever heard that
are capable of handling some text). And even for assembly languages.

But none of them have been designed to use only "Unicode scalar values" on
input (this could eventually exist in OO programming or functional
programming, if the language supports ONLY strong type safety at compile
time, to avoid constant checks of value ranges at runtime, with internal
debugging assertions or extra return values or events).

2013/9/19 Markus Scherer <>

> On Wed, Sep 18, 2013 at 3:52 PM, Philippe Verdy <>wrote:
>> But the UCD and contents of the standard text are listing... oh well...
>> only the so-called "character properties"
> Untrue. There are definitely code point properties, and surrogates have
> non-trivial property values for Block, Derived_Age, General_Category,
> Grapheme_Cluster_Break, and Line_Break.
> APIs for Unicode properties normally take Unicode code points.
Received on Wed Sep 18 2013 - 19:02:42 CDT

This archive was generated by hypermail 2.2.0 : Wed Sep 18 2013 - 19:02:43 CDT