From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Wed Mar 29 2006 - 02:25:58 CST
On Tuesday, March 28, 2006 21:59Z, Kent Karlsson wrote:
> Antoine Leca wrote:
>>> Yes, and they already are. U+0308 COMBINING DIAERESIS vs. U+030B
>>> COMBINING DOUBLE ACUTE. There is no "umlaut" character...
>> I did use Umlaut to clearly (at least I thought) denote the
>> characteristic German *feature*, NOT the codepoints.
> For typeset modern German text DIEARESIS is consistently used (though
> most often via precomposed letters).
So, does it mean I am allowed to have/design a font that draws diaeresis as
two strokes (not dots), for example to give some script-style look? Or am I
And if I am, am I furthermore allowed to have some option which allows me to
select, at *presentation* level, the stroke vs the dots, for the same
Finally, if I am also allowed that, how is it different for the position of
the I matra in the rendering of Nagari conjunct NG.K.I ङ्कि?
>>> And m² is not at all the same as m2.
>> I guess no, although I am not completely sure (particularly
>> since I expect
>> the second to read "m<SUP>2</SUP>" instead,
> No. While that is an good approach in the general case (for arbitrary
> power-to *math* expression), I think it is a bad idea for the SI unit
"No" what? No to the idea to make the 2 a superscript in m2?
Or "no" to "I am not completely", in other words, you know that I am sure?
>>>> So, if the original encoder does NOT make a distinction in
>>>> meaning between the two forms, why would Unicode require
>>>> him to encode this difference at codepoint level?
>>> How do you know if the "original encoder" makes the difference or
>> Because *I* am the original encoder, in this stanza. :-)
> So you only read your own texts. Interesting... ;-)
I noticed the smileys, but... It happens that sometimes, I _also_ read my
texts, and then I _do_ know what the original encoder meant then. :-)))
My sentence was not to mean that the original encoder /always/ means that; i
t was merely to point out that _when_ she means that, then... (both drawings
are "correct", at least in her eyes.)
> I do not see why characters in Indic scripts should be more "abstract"
> than for other scripts.
I was pointing out they should not be less abstract (for example, have an
impact on the neighbours, which characters from other alphabets do not
The real point here is how to define abstractiveness; it seems clear to me
that the notion is not easy to define accros scripts, particularly since
there are critical differences between say alphabets and abugidas.
> The "sounds associated" are completely and totally irrelevant.
What is the basic difference between A (Α,А) and E (Ε,Е)? a sound difference
(in old Greek, whose exact difference I cannot explain).
What is the basic difference between क KA (ক,ਕ,ક,க,ක,က,ก,ក, etc.) and ख KHA?
a sound difference (in "Vedic", or whichever the name you gives to the
language spoken in the VIIth-Vth c. India), whose almost-exact difference
every Sanskrit student can explain.
> Unicode encodes scripts, not sounds.
I thought it was characters.
> Some characters do have overlapping glyph chapes.
And why could it *not* be the case for the Indian scripts?
Moreover, why should it be determined by the reading of a book published in
(And yes, I know quite a big number of the most skilled Indians *are* living
> *You* are saying that there are two "camps" (your word) for at least
> one of the Indic scripts as to how to display some letters. That
> sounds very much like a difference worthy of more than a font
I agree it sounds very much this worth.
However, TUS4, page 248 ss., describes those differences (without much
details), but does not mention any difference in codepoints.
So as a result, I am confused when one says to me that such differences
SHALL be recorded at codepoint level.
> Likewise for
> the changes in Indic writing that are referred to as "old
> orthography" vs. "new orthograpy"; they are even CALLED spell
> changes, why not treat them as such then?
Because it was (not) how I was reading the Unicode/10646 Standards, at least
I am not to say this cannot be changed.
I am just saying that *if* that change had occurred (for example, in
*current* [sic] version of the 5.0 Standard *wink*), I am not aware.
> That does not seem (to me) to be anywhere near the ideal way of
> dealing with this.
As I said, with Indic scripts encodings, I feel we are still in the
experimentation phase; stabilization will come later (or is beginning, I do
not know for sure).
It would be much luck if the encoding did reach the ideal state (for all
aspects) on this try.
This archive was generated by hypermail 2.1.5 : Wed Mar 29 2006 - 02:31:15 CST