Re: Unicode block for programming related symbols and codepoints?

From: Alfred Zett <>
Date: Sun, 08 Feb 2015 23:07:52 +0100

Hi Jean-Francois Colson,

I hope this doesn't mess up the mailing list.

>> - Indentation codepoint, with no fixed defined graphical
>> representation. For indentation based programming languages.
> That wouldn’t be compliant with existing languages and future
> languages might use any existing character.

This was for new languages. Creators of future languages mostly orient
on whatever is available and make sense, so I may make this proposal as
well, so they don't have to choose the half-assed workarounds they use now.

Also, as long as there is stuff like it still makes more sense.

>> Because:
>> -- specific clients may want to show it different (for example as
>> arrows, lines etc., using another color):
> Can’t good editors display tabs in a different color when required ?
Not as reliable and customizable as a special codepoint. For example

>> --- browsers could let the web page creator let decide the visual
>> representation (character and size) via CSS

can't be done and on-the-fly copy and paste conversion with JavaScript
is horrid and broken for security reasons.
But it's an issue even in good editors as well. You need a lexing plugin
that may work or not. And the size and other factors are still fixed.
After all, tabs have whitespace semantics that may appear everywhere in
the text.

>> --- the same with editors, independent from the actual font
>> --- in case of visual impairment, the user could even change the
>> accoustical representation if the editor allows it
>> -- unlike a space symbol, it wouldn't need more than one character
>> per indentation
>> -- unlike tabs or space, it wouldn't be whitespace
>> -- unlike normal arrow characters, one could customize the length in
>> an editor and wouldn't have to insert extra spaces for a better
>> visual imagery
>> - A codepoint for string literal quotes, that would spare one the
>> escaping.
> I rarely escape quotes.
> In a text, I use ’ (U+2019) as an apostrophe and «»“”‘’ as quotes, so
> I don’t need to escape them.
> When I use PHP to generate some HTML code, I try to alternate simple
> and double quotes as much as possible. That way I rarely need to
> escape them.
OK, but that's just your scenario. With a language design from the past.
With probably an editor from the past that allows non-unicode encodings.
In a better world, manual code point inserting was a last resort.

Imagine someone wants to make his text look like written with a
typewriter. Or something else.

>> - A statement separator symbol.
> To replace the semicolon in C and the languages based on its syntax?
Again, for future uses. To be honest, this might sound questionable, but
this could blur the line between visual line breaks and visual
characters like semicolons.
Line-break ended comments are separator ended comments.
Of course, that's the least required part of those three proposed
characters, but I thought for the sake and completeness that shouldn't miss.

Come to think of it, two sets of opening and closing block symbols
couldn't harm either. And a continue-after-linebreak symbol as well.

>> - Other ideas?
> Aren’t you trying to reinvent APL?
No. APL places a lot of alien-looking, annoying characters to anyone
except mathematicians into your code that are hard to input. In
particular from the context.

My proposal on the other hand - if implemented right - introduces some
really intuitive looking and easy to input characters, because a bold
arrow at the left doesn't need further explanation and your IDE of the
future can easily place them when pressing tab in the right position.
Unicode mailing list
Received on Sun Feb 08 2015 - 16:08:47 CST

This archive was generated by hypermail 2.2.0 : Sun Feb 08 2015 - 16:08:47 CST