Re: Oxford proposes a leaner alphabet

From: Hans Aberg (
Date: Thu Apr 09 2009 - 08:09:40 CDT

  • Next message: Hans Aberg: "Re: Oxford proposes a leaner alphabet"

    On 9 Apr 2009, at 08:47, William_J_G Overington wrote:

    >> In fact I would like to see a
    >> clear distinction between ASCII and Unicode so that
    >> characters like "&" if typed as text (i.e.
    >> Unicode) would NOT be interpreted as an ASCII interrupt
    >> character in HTML/Java/PHP/.... etc.
    > The problem is because the Unicode characters are used to mean
    > something other than the Unicode meaning in what is regarded as a
    > mark-up format.
    > There is a similar problem with XML where the Unicode < and >
    > characters are used to mean things other than the defined Unicode
    > meanings.
    > However, as I understand the situation, at least in former times -
    > maybe still now, the Unicode Technical Committee does not want to
    > encode anything which could be regarded as mark-up and simply states
    > that things considered as mark-up should be encoded using higher
    > level protocols.
    > A U-turn on this policy could be worth considering seriously. If an
    > "escape ampersand open" and an "escape semicolon close" and an "xml
    > bubble open" and an "xml bubble close" were encoded as regular
    > Unicode characters, then various edge effects could be resolved for
    > the future.

    This problem a problem of computer language design, rather than the
    character set used. There is a tendency to set of certain character
    combinations (tokens) as context independent keywords. For example, C+
    + introduced "<" and ">" as matching pairs (like parenthesizes) for
    templates. So one can write
       template<class T, class Comp> Sort {

       void f (...) {
         Sort< Comparator<int> >::sort(vi);

    Now, the problem is that one cannot write, as would be natural for
    natural pairs,
         Sort< Comparator<int>>::sort(vi);
    because ">>" is a keyword: a reserved, context-free token. Further,
    this is a legacy from the C-syntax.

    But from the point of view of computer language design, it is easy to
    fix such problems. It was on the agenda for some C++ revision. But
    then it has to be synced with the legacy code.

    So as such, introducing special Unicode characters will not solve the
    problem of poor computer language syntax. And once a computer syntax
    has been fixed, it may be difficult to change it, in view of that it
    may break legacy code. So one will have to asses how much and how
    important the code is that will break, and how likely it is that it
    will be rewritten.


    This archive was generated by hypermail 2.1.5 : Thu Apr 09 2009 - 08:13:05 CDT