Re: Character found in national standard not defined in Unicode?

From: George W Gerrity (
Date: Fri Apr 25 2008 - 10:30:29 CDT

  • Next message: Philippe Verdy: "RE: Character found in national standard not defined in Unicode?"

    On 2008-04-25, at 19:47, JFC Morfin wrote:

    > At 02:41 25/04/2008, Asmus Freytag wrote:
    >> If the character doesn't violate a principle in the standard,
    >> there's no reason why it couldn't be encoded; however, if its
    >> presence in the standard is not correlated with it showing up in
    >> actua documents (for example, because of the way systems and fonts
    >> have implemented the standard) then there's perhaps no need to
    >> encode the character based on its presence in a code chart.
    >> On the other hand, perhaps the standard did base the design on a
    >> real character. If sufficient information can be assembled to
    >> define that character, it would open up an avenue to encode it,
    >> which would be independent of the character.
    > This is the problem I already reported of the difference we
    > encounter between norms and standard concepts. In French language
    > we initially emphasisied the norm reporting the world, and in
    > English they emphasised the standard ruling the world. The
    > globalization problem makes that norms and standards are no more
    > locally interoperable the standard influencing the way the world is
    > and its normative description, but that norm is global and standard
    > is local.

    To people writing specifications for Programming Languages, the
    difficulty of specifying meaning (or correct behaviour) in a Natural
    Language is well known. That is why specifications for newer
    Programming Languages are written in a meta-language, whose semantics
    and syntax is defined abstractly and Mathematically. SGML is powerful
    enough to act as a meta-language, and that is one reason why it has
    been used as a specification language for HTML, XTML, and XML: there
    are others.

    While you may have a valid argument that French is a more precise
    language to use for a Standard Specification, so that it may more
    precisely represent the Norm (ie, the understanding or semantics of a
    specification) the idea of multi-linguistics (or is it multi-
    lateralisation?) is doomed to failure. The Semantics of different
    natural languages simply do not overlap completely: every natural
    language is capable of expressing ideas not expressible in some other
    language without reference to the culture and environment upon which
    it is grounded. The subtleties in one language not mappable into
    another language simply do not exist for the target language: that is
    always the dilemma of translators, and especially when the work being
    translated is a cultural object such as a novel or a religious text.

    For instance, colour names in most languages simply do not map well,
    even between Indo-European Languages. Try to map Slavic or Greek
    colour words into English or French. We get around this in Scientific
    circles by using Psycho-Physical Language, in which colour is defined
    by physical measurements of colours whose difference can be perceived
    by normally-sighted humans. We even steal the Greek word for dark
    blue (Green?), κυανος, to name the blue-green colour we
    perceive “cyan”. The linguistic defect is even more obvious when
    trying to agree on colour with a person with Red-Green Confusion
    Colour Blindness. Thus, the best we can hope for in defining a
    standard is to be as precise as possible in conveying the normative
    meaning in whatever language the standard is written. For the cases
    where exact semantics are required, we must provide an algorithmic
    definition, preferably in a well-understood algorithmic language. If
    no universe of discourse is available to specify meaning, then there
    is not even any way to prove that meaning is possible to assign to a
    concept: it is a thought that cannot be uttered.

    I repeat: the problem of assigning semantics to a grammatical
    structure is a well-known one and one that in the case of semantic
    mapping between natural languages is known to be insoluble. That is
    why we try to use algorithmic or artificially-defined meta languages
    in parts of standards where natural language is too imprecise to
    specify semantics. That has always been the approach in the Unicode
    Standard. If the semantics is too subtle to express in an artificial
    language, then it is most certainly impossible to express in every
    natural language.

    As a postscript, I find your arguments and statements confusing,
    especially when you suggest that maybe Chinese is a better language
    for writing standards. My knowledge of Mandarin is pretty shallow,
    but in fact the spoken language is extremely vague in specifying
    classification, using the same sounds and tones for multiple
    meanings, so that modern Mandarin uses compound sounds to clarify
    meaning, although the written characters can be more precise A simple
    example is the words for he and she: same sound, different
    characters. However, even the written language has perceived
    weaknesses (to those who speak Indo-European languages) because of
    its complete lack of inflection for both verb and noun forms, and
    indeed, the lack of distinction between verbs and nouns. We now
    perceive similar problems in Modern English because we have dropped
    most of our inflection apparatus. (How many meanings can you find in
    the simple English Sentence “Time flies like an arrow”?) Note that
    I am not stating that Mandarin is incapable of great subtlety of
    meaning, but rather that, as we now have to do in Modern English, so
    in Mandarin, we have to overload sentence structure and particles and
    context to provide subtlety in temporal and and physical descriptions.

    > This has two main consequences :
    > - the alternative between standard internationalisation
    > (interoperability in using the same rules) and normatic
    > multilateralisation (interoperability based upon the same
    > understanding).
    > - metalanguage development introducing analysis and often (as you
    > mention it) leading to constraints, i.e. complication, to address
    > thre resulting complexity.
    > If adding a code is subject to a metalanguage limitations (for
    > example, because of the way systems and fonts have implemented the
    > standard) this means that Unicode is a Standard and not a Norm. The
    > ambiguity is that it is mostly understood as being both.

    See above. There is never any doubt in the minds of most standards
    users that they are Standards, not necessarily Norms. The intention
    is that the standard be readable to those people understanding the
    language in which it is written so as to convey the Norm (Semantics).
    If the text of the standard fails to do so, then it is either faulty
    and needs to be revised, or the possible differences in meaning are
    not cogent to the universe of discourse to which the standard is
    directed. The problem of which language to use for specifying a
    standard is clearly pragmatic. Choose the language that the majority
    of the educated peoples of the world understand either as a first or
    second language. That language today is English: the question of
    whether or not English is the best language to express conciseness is
    at best moot, and in any case, is totally beside the point.

    If then there is a need to translate the standard into another
    language, then clearly it should be done by someone who speaks both
    languages fluently and who is also fully conversant in the universe
    of discourse to which the standard is directed. I still remember with
    distain dropping a class in Scientific Russian taught by a Ukrainian
    with no Scientific background whatsoever, after he translated
    “Parsec” as the distance of a stellar or planetary object from the
    earth when the angle of parallax under the feet of the radius of the
    Earth's orbit is one arc second, or some such, rather than the
    technical terminology in English the angle of parallax subtended by
    the radius of the Earth's orbit.

    Finally, there is the question of whether or not a translation is in
    some sense equal or equivalent to the standard as written in the
    language of its conception. The short answer is obviously no, since
    it is usually the case that the persons proposing the original
    standard, who know all the subtleties, usually do not all speak the
    second language. That seems to be the position of ISO, and it is a
    reasonable one. Your idea of writing a standard simultaneously in
    more than one language simply isn't practical: you won't be able to
    find enough multilingual people qualified and interested in preparing
    such a standard.

    Dr George W Gerrity Ph: +61 2 156 0286
    GWG Associates Fax: +61 2 156 0286
    4 Coral Place Time: +10 hours (ref GMT)
    Campbell, ACT 2612 PGP RSA Public Key Fingerprint:
    AUSTRALIA 73EF 318A DFF5 EB8A 6810 49AC 0763 AF07

    This archive was generated by hypermail 2.1.5 : Fri Apr 25 2008 - 10:35:10 CDT