RE: No Invisible Character - NBSP at the start of a word

From: Jony Rosenne (rosennej@qsm.co.il)
Date: Tue Nov 30 2004 - 00:36:30 CST

Next message: Allen Haaheim: "RE: Radicals and Ideographs"

Previous message: Doug Ewell: "Re: Relationship between Unicode and 10646"
In reply to: Peter Constable: "RE: No Invisible Character - NBSP at the start of a word"
Next in thread: Kenneth Whistler: "Re: No Invisible Character - NBSP at the start of a word"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> -----Original Message-----
> From: unicode-bounce@unicode.org
> [mailto:unicode-bounce@unicode.org] On Behalf Of Peter Constable
> Sent: Tuesday, November 30, 2004 1:20 AM
> To: Unicode Mailing List
> Subject: RE: No Invisible Character - NBSP at the start of a word
>
>
> > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
> On Behalf
> > Of Jony Rosenne
>
...

>
> Jony, where you and I have had a different worldview is that, it seems
> to me, you view characters as encoding language, and I view characters
> as encoding letterforms; or, put another way, for you, text is
> necessarily linguistic, whereas for me text is text, independent of
> linguistic interpretation. To make this concrete, the fact that a qere
> sequence involves the vowel points of word A rather than word B is
> linguistically interesting, but irrelevant as far as encoding is
> concerned. If the displayed letterforms consist of a lamed with two
> vowel points, then the encoded character sequence IMO should be lamed
> with two vowel points -- and I would not consider that a hack.

When I look at the text, even with a magnifying glass, I do not see a Lamed
with two points. The displayed form, from my point of view, is a Lamed with
a single point and another point without a base character. The Hiriq is not
under the Lamed, it is between the Lamed and the Mem. The linguistic
approach is just the explanation, the displayed letterforms are quite clear.

Even when I look at old Latin manuscripts, which I did once again when I
visited the flea market in Milan a few months ago, they are not plain text
and they cannot be faithfully reproduced in Unicode without markup. Although
the nature of Hebrew manuscripts is different, I do not understand the
desire to make Hebrew different, and I cannot accept it if it makes the
computerized handling of Hebrew unnecessarily more complicated that it is
already.

To make it very clear: The use of CGJ approved by the UTC is fine by me, and
I have no objection to anyone using it, but it is not required for Hebrew,
and we do not have a standard plain text solution for Qere and Ketiv and for
Yerushala(y)im. Regarding the latter, the UTC discussion was based on a
mistaken or incomplete presentation of the problem. Yes, for those need two
vowels for a single letter, CGJ would do it, but since this is not my
question, CGJ is not the answer. The hack needed here is an invisible base
character.

If anyone wants to use CGJ or any other Unicode characters that are not
included in the standard Hebrew subset (Unicode does not define subsets, but
other bodies do and implementers necessarily have to) to encode Hebrew
texts, they should do their users a favor and explain to them that they
require specific implementations, operating systems and fonts.

Jony

...

>
>
> Peter Constable
>
>
>
>

Next message: Allen Haaheim: "RE: Radicals and Ideographs"
Previous message: Doug Ewell: "Re: Relationship between Unicode and 10646"
In reply to: Peter Constable: "RE: No Invisible Character - NBSP at the start of a word"
Next in thread: Kenneth Whistler: "Re: No Invisible Character - NBSP at the start of a word"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Nov 30 2004 - 00:36:54 CST