From: Kent Karlsson (email@example.com)
Date: Tue Mar 28 2006 - 15:59:33 CST
Antoine Leca wrote:
> > Yes, and they already are. U+0308 COMBINING DIAERESIS vs. U+030B
> > COMBINING DOUBLE ACUTE. There is no "umlaut" character...
> I did use Umlaut to clearly (at least I thought) denote the characteristic
> German *feature*, NOT the codepoints.
For typeset modern German text DIEARESIS is consistently used (though
most often via precomposed letters).
> > And m² is not at all the same as m2.
> I guess no, although I am not completely sure (particularly
> since I expect
> the second to read "m<SUP>2</SUP>" instead,
No. While that is an good approach in the general case (for arbitrary
power-to *math* expression), I think it is a bad idea for the SI unit powers.
> >> So, if the original encoder does NOT make a distinction in
> >> meaning between the two forms, why would Unicode require
> >> him to encode this difference at codepoint level?
> > How do you know if the "original encoder" makes the difference or
> > not?
> Because *I* am the original encoder, in this stanza. :-)
So you only read your own texts. Interesting... ;-)
> Because my feeling (in fact, my interpretation of the Unicode
> and ISCII
> description) is that the Indic codepoints are abstract
> characters, not those
> elements which combine in defined ways to produce some
> glyphic intermediate
> elements, which only remains to be actually drawn by the
> font, as it seems you are thinking.
I do not see why characters in Indic scripts should be more "abstract"
than for other scripts.
> I base that view, first on the fact that the virama concept
> forces a need
> for some abstraction layer (reordering, combination,
> so-called backstore,
> etc.) which is absent even from Thai, and even more from
> Western scripts;
> and secondly because of the underlying nature of the
> Brahmi-derived scripts,
> with the sounds associated, the sandhi phenomena, etc.
The "sounds associated" are completely and totally irrelevant.
Unicode encodes scripts, not sounds.
> when the author
> is supposed to add some precision; this is much like the
> character styles
> used in Western typography (rendered as HTML spanning styles,
> for example).
That does not apply to different spellings. I would not expect any
kind of style span (HTML or otherwise) to say "display 'š' as 'sh'".
Nor do I expect any acceptable font to have an "sh" glyph for "š".
> > I have a really hard time understanding why apparent spell changes
> > should be mediated by fonts changes for Indic scripts. It is not the
> > done that way for any other scripts
See reply above.
> If I want a rounded 'a' in Latin, I am required to
> select a font with
> such a design. Similar for a z or a J with descender, or a
> low-striked q. I
> do not expect to be forced to use the "alternative"
> codepoints, that have
> been added for special purposes, like U+0251 or U+0292, for
> an illustrative
> use where I do NOT want to add specific meaning.
> The difference here is that you are saying changing a
Will you please stop putting words in my mount!
> z-shaped 'a' to a
> rounded one (etc.) is *not* a spelling change, while writing
> the i matra in
> one or other place *is*. My wild guess is that some Indians may see it
> exactly reversed...
Some characters do have overlapping glyph chapes. However:
*You* are saying that there are two "camps" (your word) for at least one
of the Indic scripts as to how to display some letters. That sounds very
much like a difference worthy of more than a font change. Likewise for
the changes in Indic writing that are referred to as "old orthography" vs.
"new orthograpy"; they are even CALLED spell changes, why not treat
them as such then?
> It is certainly such a difference (not purely aesthetic, I mean). See
> attached image.
I think that difference may be worthy of at least a ZWJ/ZWNJ...
> Should emphasis be recorded as different Unicode codepoints?
> My reading was it should not...
No, and I did not say that.
> The best I can find is the acknowledgement (in the Indic OpenType
> specifications) that there is a need to distinguish two
> genuinely different
> "styles" in Uniscribe and related, one named "old style"
> encoded MAL as
> "language system", the other "reformed" encoded MLR.
That does not seem (to me) to be anywhere near the ideal way of
dealing with this.
This archive was generated by hypermail 2.1.5 : Tue Mar 28 2006 - 16:07:55 CST