RE: Continue: Glaring Mistake in the Code List of South Asian Script, Reply to Daug [sic] Ewell and Others

From: Doug Ewell <>
Date: Mon, 12 Sep 2011 07:18:47 -0700

First, Michael and Mark and others are correct: this is just a block
name, and is subject to stability restrictions, and as such will not be

Beyond that, though, as Christoph observed, Delex is taking several
different concepts and treating them as though they were the same, to

- alphabets
- scripts
- Unicode block names

(Let's ignore the alphabet/abjad/abugida distinction for now.)

An "alphabet" is the set of letters used to write a specific language,
such as the English or Italian or Bengali or Assamese languages. This
is what most people who are NOT professionally involved in writing
systems or character standardization are focused on. This is what
children learn in school.

There is an "English alphabet" which is not typically thought of as
including, say, the letter U+00C8 LATIN CAPITAL LETTER E WITH GRAVE,
except in loanwords. Likewise, the "Italian alphabet" is not typically
thought of as including U+0059 LATIN CAPITAL LETTER Y, except in

There is a "Bengali alphabet" which apparently does not include U+09F0
BENGALI LETTER RA WITH MIDDLE DIAGONAL, and an "Assamese alphabet" which
apparently does.

A "script" is a set of related letters that belong to a common writing
tradition. It is the superset of all alphabets that use some or all of
those related letters. This is what experts in writing systems care

A script does not necessarily correlate 1-to-1 with any one alphabet.
Furthermore, the name of a script is not necessarily the same as that of
the "most important" or "most influential" or "most widespread" alphabet
belonging to that script, and it may not be the name of any alphabet at
all (e.g. "Cyrillic").

The English and Italian alphabets both belong to the Latin script. They
both contain, for example, the letter U+0045 LATIN LETTER CAPITAL E, and
it would be ridiculous to consider 'E' to be two different letters
because it is used to write both English and Italian. There is nothing
magic about the name "Latin script" here; it is just a name.

Likewise, the Bengali and Assamese alphabets both belong to the Bengali
script. There is nothing magic about the name "Bengali script" here; it
is just a name.

A "Unicode block" is neither an alphabet nor a script. It is simply a
contiguous block of characters that are encoded next to each other, in
terms of code points.

Typically, a set of one or more Unicode blocks aligns more-or-less
closely with a script, but this is not always the case. For example,
the "Basic Latin" and "Latin-1 Supplement" blocks contain many
punctuation marks and other symbols that are used with many other
alphabets and scripts.

There is nothing "nefarious" about the naming of the "Bengali" Unicode
block or any other block. It is JUST A NAME that was considered to be
representative of the script that its characters belong to. It makes
absolutely NO statement about the identity of the communities that speak
the Bengali or Assamese languages, or about their sovereignty, or
loyalty, or allegiance, or pride, or anything else. Unicode is not in
the business of doing that.

I'm sure I'm not the only one who resents being accused of favoring one
South Asian language over another, or being the unwitting pawn of those
who do. This will be my last post in this thread.

Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14 | | @DougEwell ­
Received on Mon Sep 12 2011 - 09:23:28 CDT

This archive was generated by hypermail 2.2.0 : Mon Sep 12 2011 - 09:23:30 CDT