three characters?

From: Roozbeh Pournader (roozbeh@sharif.edu)
Date: Tue Apr 24 2001 - 17:18:23 EDT


Hi all!
 
Trying to decipher an old word-processor's file format, something named
Zarnegar that's probably the most widely used one here in Iran, I came to
three characters in its main codepage that where not encoded in Unicode.
I'm seeking advice on these:
 
1. The first one is an "Arabic Subscript Alef", used for marking the /i:/
sound in Korans and Arabic texts published in Iran. Although this is not
in a national charset, it's available in some codepages as old as 1990,
and can be seen even in school books. Writing a proposal is on the top of
my Unicode todo list.
 
2. The second, is something like a "Combining Arabic Two Dots Below". It
is used for converting a "U+06CC Arabic Letter Farsi Yeh" to a "U+064A
Arabic Letter Yeh" (that has some uses, like quoting Arabic text). This is
a font hack for sure, and I'm not considering proposing it. But it may
have other uses; does anyone recommend it in any way?
 
3. The most weird of all, was that after finding all the dingbats and
weird shapes, one was missing: a "White Square Containing White Small
Square" (compare with "U+25A3 White Square Containing Black Small
Square").
 
What should I do regarding this? Will you tell that it can be encoded as
U+25AB U+20DE ("White Small Square"+"Combining Enclosing Square")? If so,
what is reason U+25A3 doesn't have a decomposition of U+25AA U+20DE? Also,
what are the requirements for "Geometric Shapes" to be encoded? Should
they have some semantics in some text? Are they there because they were in
some legacy code set? Again, what is the guidelines for encoding such
things? What is the meaning of
 
        "Some symbols mark the transition between pictorial items and
         text elements; because they do not have a well-defined place
         in plain text, they are not encoded here." (TUS 3.0, p. 295)?
 
And after all, do you recommend writing a proposal? I have a use, users
will be happy someday in future when they can see their old documents show
exactly the same in a future browser, for example (it's also nice to tell
them that all their old documents are fully supported in Unicode). They
have had it available for four years at least (the oldest version of the
application I have, is dated 1996), and I'm sure I will be able to find it
in some magazines and books (perhaps even some semantic usages) ...
 
Does anyone see a weighting point in writing a proposal? I need some
reasons to start it.
 
BTW, I love to see a decomposition of U+25A3 into U+25AA U+20DE. It may
encourage people to implement combining enclosing marks. (Really?!) It
will also encourage me to do something more useful...

--roozbeh



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT