Re: Character found in national standard not defined in Unicode?

From: Asmus Freytag (
Date: Thu Apr 24 2008 - 19:41:25 CDT

  • Next message: Doug Ewell: "Re: Character found in national standard not defined in Unicode?"

    On 4/24/2008 2:53 PM, André Szabolcs Szelp wrote:
    > Oh,
    > it was not a real concern, I don't work with Armenian data, nor do I
    > read it, I just came across that piece of information and was
    > wondering why. I thought, best place to ask was here. Who else would
    > know, if not the people here? :-)
    > Also I was puzzled, because I thought there was a guidline to create
    > one-to-one mappings of pre-existing (including national) standards
    ...that aim was always limited to a certain more-or-less well-defined
    set. Esp. for lesser-used
    standards, an attempt was made to not encode any characters that were
    questionable, whereas
    for really widely used standards even things that appear to be outright
    mistakes had a shot to be
    encoded. The thinking behind that was to get the maximum coverage of
    existing *data* with
    the smallest number of problematic characters added to Unicode.

    Over the years, experience and additional information that became
    available has lead to a gradual
    adaptation to better reach the underlying goal. To a very limited
    extent, even standards created
    after Unicode was initially published, have been covered. That's a
    tricky thing to do, because
    one the one hand, you don't want to exclude potentially large user
    communities that got used to
    characters in their standard, while at the same time, you don't want to
    make this a sure ticket to
    force characters into Unicode by going outside the process.
    > (that's why Dutch ij is included as a single character, while its
    > encoding as two characters is recommended, that's why the alphabetical
    > presentation forms of fi, fl etc. are included, ...)
    > So this does not hold?
    If the character doesn't violate a principle in the standard, there's no
    reason why it couldn't be
    encoded; however, if its presence in the standard is not correlated with
    it showing up in actual
    documents (for example, because of the way systems and fonts have
    implemented the standard)
    then there's perhaps no need to encode the character based on its
    presence in a code chart.

    On the other hand, perhaps the standard did base the design on a real
    character. If sufficient
    information can be assembled to define that character, it would open up
    an avenue to encode
    it, which would be independent of the character.

    >> > (I indeed did not find the character in the Armenian block, but it
    >> > could hide somewhere among the dingbats (but if so without an
    >> > annotation saying "eternity sign")).
    > There isn't an exact match, but something in the U+274x range can
    > serve as a good approximation.
    > Leo
    If the standard is in use and if there's an indication that people are
    using this particular character, then the last thing we would want to do
    is to map it "approximately", especially not to something in the 274X
    range. That range, by design, was supposed to have somewhat lesser
    variability in glyph design than other blocks. But even without the
    special nature of this range, the damage of having mapped characters
    "approximately" (esp. ASCII characters) is still with us today.

    If this thing is real and someone can prove it, code it, if not, wait
    for users of the standard to speak up that they need it for compatibility.




    This archive was generated by hypermail 2.1.5 : Thu Apr 24 2008 - 19:44:38 CDT