Re: Unicode block for programming related symbols and codepoints?

From: Jean-François Colson <jf_at_colson.eu>
Date: Sun, 08 Feb 2015 23:45:27 +0100

Le 08/02/15 23:07, Alfred Zett a écrit :
> Hi Jean-Francois Colson,
>
> I hope this doesn't mess up the mailing list.
>
>>>
>>> - Indentation codepoint, with no fixed defined graphical
>>> representation. For indentation based programming languages.
>>
>> That wouldn’t be compliant with existing languages and future
>> languages might use any existing character.
>
> This was for new languages. Creators of future languages mostly orient
> on whatever is available and make sense, so I may make this proposal
> as well, so they don't have to choose the half-assed workarounds they
> use now.

I need a few tens of characters for a conlang I’m developping. ☺

The problem is that Unicode only encodes characters which are
effectively used today or which have been used in the past. It doesn’t
encode characters which could perhaps be used in a hypothetical new
programing language in the future.

>
> Also, as long as there is stuff like
> https://github.com/sferik/active_emoji it still makes more sense.
>
>>> Because:
>>> -- specific clients may want to show it different (for example as
>>> arrows, lines etc., using another color):
>>
>> Can’t good editors display tabs in a different color when required ?
> Not as reliable and customizable as a special codepoint. For example
>
>>
>>> --- browsers could let the web page creator let decide the visual
>>> representation (character and size) via CSS
>
> can't be done and on-the-fly copy and paste conversion with JavaScript
> is horrid and broken for security reasons.
> But it's an issue even in good editors as well. You need a lexing
> plugin that may work or not. And the size and other factors are still
> fixed. After all, tabs have whitespace semantics that may appear
> everywhere in the text.
>
>>> --- the same with editors, independent from the actual font
>>> --- in case of visual impairment, the user could even change the
>>> accoustical representation if the editor allows it
>>> -- unlike a space symbol, it wouldn't need more than one character
>>> per indentation
>>> -- unlike tabs or space, it wouldn't be whitespace
>>> -- unlike normal arrow characters, one could customize the length in
>>> an editor and wouldn't have to insert extra spaces for a better
>>> visual imagery
>>>
>>> - A codepoint for string literal quotes, that would spare one the
>>> escaping.
>>
>> I rarely escape quotes.
>> In a text, I use ’ (U+2019) as an apostrophe and «»“”‘’ as quotes, so
>> I don’t need to escape them.
>> When I use PHP to generate some HTML code, I try to alternate simple
>> and double quotes as much as possible. That way I rarely need to
>> escape them.
> OK, but that's just your scenario. With a language design from the
> past. With probably an editor from the past that allows non-unicode
> encodings. In a better world, manual code point inserting was a last
> resort.
>
> Imagine someone wants to make his text look like written with a
> typewriter. Or something else.
>
>>
>>> - A statement separator symbol.
>>
>> To replace the semicolon in C and the languages based on its syntax?
> Again, for future uses. To be honest, this might sound questionable,
> but this could blur the line between visual line breaks and visual
> characters like semicolons.
> Line-break ended comments are separator ended comments.
> Of course, that's the least required part of those three proposed
> characters, but I thought for the sake and completeness that shouldn't
> miss.
>
> Come to think of it, two sets of opening and closing block symbols
> couldn't harm either. And a continue-after-linebreak symbol as well.
>
>>
>>> - Other ideas?
>>
>> Aren’t you trying to reinvent APL?
>>
> No. APL places a lot of alien-looking, annoying characters to anyone
> except mathematicians into your code that are hard to input. In
> particular from the context.
>
> My proposal on the other hand - if implemented right - introduces some
> really intuitive looking and easy to input characters, because a bold
> arrow at the left doesn't need further explanation and your IDE of the
> future can easily place them when pressing tab in the right position.
> _______________________________________________
> Unicode mailing list
> Unicode_at_unicode.org
> http://unicode.org/mailman/listinfo/unicode

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Sun Feb 08 2015 - 16:46:38 CST

This archive was generated by hypermail 2.2.0 : Sun Feb 08 2015 - 16:46:38 CST