Re: Unicode block for programming related symbols and codepoints?

From: Alfred Zett <>
Date: Mon, 09 Feb 2015 13:55:02 +0100

OK, I will now try to answer all of you in one mail, otherwise it gets
hard to overlook...

Shervin Afshar:
> All of the requirements mentioned here can be (and are) implemented in
> higher levels of software (like IDEs). IMO, there isn't any need for
> adding new characters to Unicode to address these issues.
But then it would be incompatible from IDE to IDE, like Python is
incompatible using 2 spaces, 4 spaces and tabs.
It's the data that is important, not the software.
> Additionally, people tend to forget that simply because Unicode is
> doing emoji out of compatibility (or other) requirements, it does not
> mean that "now anything goes". I refer folks to TR51[1] (specifically
> sections 1.3, 8, and Annex C).
> [1]:
You know, the fact that this consortium ever took emoji into
consideration immediately justifies to include everything everyone ever
wanted. There is no such thing as important data including emoji. :)

Jean-Francois Colson:
> I need a few tens of characters for a conlang I’m developping. ☺
Except two or three control characters don't make a con language.
Also, if you don't like con languages in Unicode, what's this:

> The problem is that Unicode only encodes characters which are
> effectively used today or which have been used in the past. It doesn’t
> encode characters which could perhaps be used in a hypothetical new
> programing language in the future.
So you want the font encoding scheme to be a limitating factor for new

Pierpaolo Bernardi:
> How would your proposed character be displayed as plain text?
There is no such thing as plain text.
Even line breaks and tabs are a matter of interpretation. It's just that
they usually have typographic semantics, even in programming editors,
with all the side effects.

In very simple (and with that I mean shitty or not even remotely
programming oriented) editors, it may show like a control character, like ␄.

Browsers and any editor passing the "based on scintilla" complexity mark
of course should display something that makes more sense, like an arrow
or ⍈ plus surrounding space.

> Unicode is a standard for plain text. If you require a special IDE
> for your programming language then why use plain text at all?
Because binary custom encoded databases or blob files are the death of

Konstantin Ritt:
> Easier than latin1, a layout one could find on [almost] every
> keyboard? Good luck.

Jean-Francois Colson:
> Hard to input? Not harder than the new symbols you’d like to propose.
> That’s only a matter of keyboard layout and input method.

Indent by pressing tab and insert the literal thing by pressing ".
Nothing changes, the IDE/editor does the work on the fly.
Just that you have clean semantics, interoperability and customizability.

Beat that, APL. Where you would >10 key bindings or an annoying software

> I’ve never used APL so I don’t remember the meanings of its symbols,
> but couldn’t ⍘ U+2358 APL FUNCTIONAL SYMBOL QUOTE UNDERBAR or ⍞ U+235E
> APL FUNCTIONAL SYMBOL QUOTE QUAD work as “string litteral quotes” in a
> new programming language?
That's a good idea.

That still leaves the indentation character, which is harder than that,
because one would want a control character with certain semantics.
E.G.: For programming editors it would make sense to only allow it after
line breaks and convert other occurences into tabs.

> If the IDE inputs your new character when you press tab, then your new
> character is a tab…
Not if it detects the beginning of a line.

Best regards

A. Z.

Unicode mailing list
Received on Mon Feb 09 2015 - 06:56:42 CST

This archive was generated by hypermail 2.2.0 : Mon Feb 09 2015 - 06:56:43 CST