L2/00-026

 

 

These are 5 comments from Lloyd Anderson on ZWJ and ZWL in this document

 

ZWJ - Consistent Contexts

ZWJ contradictions DO REMAIN

ZWL still no new properties

? ZWJ "doubling" as ZWLigator

Fixing ZWJ contradictions

 

 

 

From: ECOLING@aol.com

Sent: Tuesday, January 25, 2000 12:03 PM

Subject: ZWJ - Consistent Contexts

 

Since Ken's discussion today does not deal with the following material,

and would almost lead a reader to think it did not exist, I repeat here

as a separate message my compilations of the contexts of use of ZWJ,

which together also assist in showing that the suggestion to completely

disregard ZWJ is an irregularity (an inconsistency) which leads to the

need for a countervailing unnecessarily elaborate sequence

 

ZWJ + ZWNJ + ZWJ

to get the results which a simple

ZWJ

would yield if the semantics are kept consistent with the basic statement

on Unicode 2.0 page 6-70 (the interpretation as "tiny letter"

still valid for Unicode 3.0 according to Ken).

 

***

 

 

4.  ZWJ is needed in several scripts (Devanagari, Arabic, even Latin)

     for its basic function as defined by Unicode 2.0 page 6-70.

     (Thus answering Mark's request for information.)

 

Mark asked on January 14th:

 

>If anyone has any actual evidence of frequent usage of

>ZWJ  between cursive characters,

>please let us know the details before the UTC meeting!

 

First, "frequent" is not an appropriate part of the question,

because by definition ZWJ and ZWNJ are used on

 

     "occasions where an author may wish to override the normal

     automatic selection of joining glyphs"

 

Since these are not intended for normal automatic situations,

they are really quite by definition rare occurrences.

 

Here is my summary of the distribution of occurrences,

using both "before", "after", and "between" as contexts:

 

***

 

Arabic Script:

 

ZWJ before, ZWJ after, ZWNJ+ZWJ between,

(and fitting the pattern, it would be dangerous to prohibit

ZWJ between; these uses were forseen in the basic definition)

 

Use before or after a cursively linkable character to show the

cursively linkable form in isolation (meta-commentary,

citation of forms for instructional purposes, etc.).

 

Use between in the combination ZWNJ + ZWJ for

Persian or Mongolian, before certain suffixes which look as if they

are joined to a word stem which does not look as if it is joined

to the suffixes.

 

Use between if one wishes to block a ligature but keep

a cursive linkage on *both* sides, not merely on the following

side as for the Persian and Mongolian special suffixes.

This was forseen in the original definition of ZWJ,

and is considered more basic than the Persian and Mongolian

uses, hence requires only one code ZWJ not two ZWNJ+ZWJ.

Various writing systems based on Arabic

Script certainly differ in their obligatory and optional ligatures.

This can be handled to some degree by differing fonts,

so a font for Persian language will have different sets of

obligatory, optional, and rare ligatures than will a font for

Arabic language.  Etc.

It would be very bad design to abolish the capability of using ZWJ

to block a single ligature yet permit the cursive linking to remain

(consistent with the basic definition on page 6-70).

 

***

 

Devanagari:

 

ZWJ after

ZWJ between

 

Use of ZWJ after a sequence of Consonant + Virama

to render the linked half-form of the Consonant, as if another

Consonant were following, but when there is none such.

Exactly analogous to the Arabic case.

 

Uses of ZWJ before a YA might be a method of

indicating the special combining form of YA after another

consonant plus Virama, but without the preceding consonant

or Virama present.  Certainly for some Indic alphabets

for which the normal rendering of /ky/ involves a change

in the <y> rather than in the <k>.  Generally for Telugu

and Kannada subscript consonants this might be done.

Less obvious for Devanagari, but citation of isolated

subscript forms there too might be done this way.

Exactly analogous to the Arabic case.

 

Use of ZWJ between, exemplified in section 3. just above

(and illustrated on Unicode 2.0 page 6-37)

to show a half-consonant + following consonant

instead of permitting the two to combine into a conjunct.

Exactly analogous to the case of blocking an Arabic ligature

yet keeping characers linked (cursively in the case of Arabic).

 

***

 

Latin

ZWJ after, ZWJ before,

(perhaps ZWJ between in a cursive font).

 

These are illustrated on page 6-71.

The uses of ZWNJ between, ZWNJ+ZWJ between,

and ZWJ+ZWNJ between all produce different results.

Only, irregularly, the use of mere ZWJ between

does not produce a result different from that of

having nothing between the <f> and the .

Why this irregularity, what on earth caused its introduction?

It certainly makes implementations more complex.

 

========================================================================

 

From: ECOLING@aol.com

Sent: Tuesday, January 25, 2000 12:03 PM

Subject: ZWJ contradictions DO REMAIN

 

Thanks to Ken for giving us the full history of the ZWJ (etc.) wordings.

Most of them I had seen, and remembered correctly.

The additional information does help to interpret the history,

but does not change the existence of the contradiction,

rather it reinforces the fact that the contradiction is there,

especially when Ken says that the interpretation of Unicode 1.1

continues through to Unicode 3.0.  Unicode 1.1 (Ken's citations)

seems to have been more fully explicit than other versions 1.0 or 2.0,

as indeed Ken says noting the "consolidation" relative to 1.1.

Retaining the explicitness of 1.1 would have been very helpful.

 

 

The contradiction remains, as I will explain here.

The analysis in my original message on this topic REMAINS VALID.

I am sorry if the demands of comprehensiveness made my previous message

necessarily long.  This one is focused only on the contradiction

 

I will suggest in a separate message two ways of handling the

contradiction.

 

***

First a question as to whether Ken's presentation is totally complete.

 

When the "fish" example was first introduced in Unicode 1.1,

did it contain the special exception that

[f+ZWJ+i] would or should *not* block a a ligature

(at least of the type which would be automatic by smart fonts,

handled in those sorts of protocols external to the text)?

If so, it would have been in contradiction to the wording quoted by Ken.

 

Ken also writes in discussion:

 

"The "fish" example was to show, however, that the mere

presence of either character *could* result in a non-ligation, by

breaking up the sequence expected by "protocols or resources [[i.e. fonts]]

external to the text sequence" for a ligature to be formed."

 

The version of the "fish" example in Unicode 2.0 does *not* show

that for ZWJ, because [f+ZWJ+i] is shown as forming a ligature

just as if the ZWJ were not present.

 

So I have to differ here, at least in the version of the "fish" example

we have in Unicode 2.0 (though perhaps not for the earlier version,

which Ken did not quote in full).

 

***

 

Now on to the explicit statements and the contradiction.

 

The relevant portion of 1.1 I here repeat from Ken's posting:

 

"The intent of these characters is to address cursive graphical connections

between the glyphs of a script, e.g. in scripts like Arabic whose printed

form emulated handwriting. ZWNJ and ZWJ are best though of as behaving

like tiny letters that neighboring glyphs may connect to (ZWJ) or avoid

connecting to (ZWNJ). They are thus processed as ordinary cursive letters

rather than as control characters.

 

"ZWNJ and ZWJ affect how the two neighboring glyphs connect to *them*, not

to *each other*. As such, they have no direct relationship with ligature

formation; in particular, ZWJ does not in any way request that its two

neighbors be ligatures to each other. Indeed, both ZWNJ and ZWJ may break up

ligatures by interrupting the character sequence required to form the

ligature."

 

So Unicode 1.1 states that ZWJ may break up a ligature,

and the special exception which is at least now in the "fish" example

clearly says that ZWJ does not break up a ligature, thus behaving

UNLIKE mere tiny letters that neighboring glyphs

may connect to (ZWJ) or not (ZWNJ).

In the absence of some overriding protocols external to the text,

any letter intervening would do exactly that, as stated in Unicode 1.1.

 

And it could not be more explicit in the following, where the idea that

[f+ZWJ+i] would or should form a ligature is explicitly rejected

(barring some resources external to the text sequence):

 

"f + ZWJ + i will not form the ligautre fi. Instead, if cursive versions

of the f and i are available in the font, each will independently connect

to the ZWJ on the appropriate side (having the same appearance as f + i).

 

[LA: This is the crux of the evidence for the contradiction.]

 

"Usage of optional ligatures such as fi is not currently controlled by

any codes within the Unicode standard, but is determined by protocols or

resources external to the text sequence."

 

Ken states the following:

 

>The semantics of ZWNJ and ZWJ has subsequently been inherited

>without change from Unicode 1.1, through 2.0, and 3.0. Unicode 2.0

>consolidated the text and intent from Unicode 1.1. Nothing that happened

>in Unicode 3.0 has touched that intent in any way.

 

Therefore I would presume that the demonstration of the contradiction

above still holds valid, and do not need to pursue it here unless further

wrinkles are introduced by others.

 

***

 

I am clearly not using terminology sufficiently precise for some readers,

so hope we can get past this:

 

Ken replies to my statement

 

>> In a cursively linkable Latin font, however, it could be used, consistent

>> with the wording above, as a means of blocking the ligature while still

>> permitting the cursive linking

>> (under the basic default that it is merely a linkable "neighbor"

character).

 

as follows:

 

>No, it could not be used "as a means of blocking the ligature..."

>A ZWJ in such a context might, on the other hand,

>have the (unintended) side-effect of blocking a ligation.

 

In that case, it could be used to block a ligation.

I am not sure of the difference between "ligation" and "ligature",

except perhaps this difference might refer to other protocols outside

of the text stream.

 

So how about if I rephrase it (consistent with the clear text of Unicode 1.1

whose interpretation Ken says still holds):

 

"The use of a ZWJ, consistent with its interpretation as simply a tiny letter

to which neighbors can cursively join, normally *will* have the effect of

interrupting the sequence of characters which might, without ZWJ,

be rendered as a ligature by protocols outside of the text stream,

if such protocols are in use."

 

Ken's statement on Arabic misunderstands me:

 

>No. The actual effect depends on the implementation. The ZWJ is not

>*intended* to interrupt ligating sequences, but processes that are

>unaware of this may do the wrong thing. As you pointed out below, for

>the purposes of ligation, a ZWJ in the midst of an Arabic sequence,

>for example, should be handled effectively like an Arabic voweling --

>it should not disrupt the choices of the basic consonant outline

>(ligated or not) from the font.

 

That is most definitely not my intent, and I think not what I said at all.

Sorry if my wording was too *short* :-) and therefore not explicit

enough. 

 

What I said was that a ZWL (*not* a ZWJ) could, by acting like

an Arabic voweling, have no effect on ligaturing (via external protocols

or whatever precision needs to be added here) without invoking

any kinds of character properties not already known.

 

A ZWJ (*not a ZWL, now, just the reverse), by acting as a tiny letter

to which other letters can join, is *not* acting like an Arabic voweling

when used to show connecting forms in isolation, it is rather acting

just like any other Arabic base letter.  (If ZWJ were acting like an

Arabic voweling, then it would not cause the adjacent letters to take

on a connecting form, and we would have a contradiction with the

explicit statements in the section on Arabic.)

 

A propos of the earlier discussion with Mark, I wrote:

 

>> Mark was thus incorrect in stating that this older interpretation had

>> been rejected.  It is still the basic wording, placed first, and must

govern

>> other interpretations until changed.

 

Ken replied:

 

>The "older interpretation" that Mark was stating had been rejected was

>that of Unicode 1.0, in which the ZWJ could be conceived of as a

join-requester,

>including a request of a ligature.

 

Perhaps Mark can speak for himself on this,

but in context, I thought Mark was rejecting my insistence

that the original interpretation

(by which I meant the "tiny letters, not controls" interpretation,

intended in 1.0 but only fully clarified in 1.1) was still valid.

According to Ken's comments, it is still valid.

 

>Under the new interpretation introduced (by Mark, primarily) in Unicode

>1.1,  I do not see a contradiction.

 

Since my demonstration of the contradiction above depends on the wording

of Unicode 1.1,  according to Ken still valid for Unicode 2.0 and 3.0

in this respect, I ask that people deal with the contradiction.

 

I attempt to suggest two ways of doing so in a separate message.

 

***

 

Lloyd Anderson

Ecological Linguistics

 

***************************************************

 

Some other matters concern wordings and understandings,

which are not as central to establishment of the contradiction

I am pointing to, but which may clarify previous or future discussions,

for those interested.

 

First, concerning Devanagari, where I do not yet have an explanation

of what Mark Davis considers inconsistent in the handling of ZWJ.

Second, other matters on history of interpretation.

 

Ken expresses his agreement with the following

(but believes I have misunderstood Mark about inconsistency

in the use of ZWJ in Devanagari):

 

[LA]

 

>>There is currently nothing special

>> about the use of ZWJ for Devanagari.  It has the basic

>> interpretation of a linkable neighbor character for which

>> conjunct combinations are not defined by fonts, exactly as

>> in the basic wording of page 6-70.

 

to which Ken responds:

 

>This latter statement I agree with. In Devanagari, the use of

>the ZWJ creates the context for the explicit half-form, which

>is a "right-linking" form of the consonant. It is then the

>presence of the right-linking form of the consonant that blocks

>the (otherwise automatic) conjunct formation (if the font

>supports it). In this sense, the use of a ZWJ in Devanagari

>can have the indirect effect of breaking a conjunct (i.e. ligature),

>and such usage is intentional in Devanagari. But the breaking

>of the conjunct is secondary -- and not the direct implication

>of a ZWJ requesting a ligature blocking.

 

Exactly the same wording would I suppose apply to Arabic,

because the sequence [q + ZWJ + l]

would cause [q] to take its link-to-following form,

and [l] to take its link-to-preceding form,

and the presence of these forms would have the indirect effect

of breaking a ligature in Arabic. 

 

Of course I agree that in all cases the effects on ligatures are

secondary, because ZWJ is merely a "tiny letter".

That has been my point the entire time.

 

If Ken believes there is no inconsistency (I believe Mark's word)

in the usage of ZWJ in Devanagari, at least not in the respect

we have discussed most recently, then I would very much

appreciate explanation of what inconsistency in Devanagari

Mark was referring to.  Is there something else we need to be

looking at to ensure consistency?  Could we please have this

discussion publicly and not only at the UTC meeting?

 

I also do not understand this from Ken's message:

 

> (b)  The other aspect of Mark's statement quoted above is this:

>

>      "just as current fonts that don't fully support ZWJ

>      cause it to break ligatures."

>

> This is incorrect.  Current fonts which fully support ZWJ

> *do* cause it to break ligatures, that is in fact a usage made

> entirely explicit on Unicode 2.0 page 6-37 for Devanagari:

>

>This is entirely different from Mark's intent in this statement.

 

Since the statement above *is a quote* from Mark,

I would very much appreciate knowing what Mark's intent

was in the statement quoted.  I simply took it at what I assumed

was face value.  Since there was other discussion which seemed

to reinforce it, I saw no reason not to do so.

 

Ken's agreement that

 

>Any number of other

>invisible format controls -- if not properly ignored by a rendering process

--

>could have the same unintended and user-inexplicable effect.

 

Seems to be agreeing with my position that this is not something

special about ZWJ nor about a proposed ZWL.

 

***

 

 

My wording was clearly insufficiently precise in one point.

When I wrote concerning the history of ZWJ that

 

>people tended to interpret the ZWJ as a ligature request.

 

I was referring to the interpretation which was explicitly

rejected in Unicode 1.1.  I believed at the time that such

an interpretation had crept into the *text* of Unicode 1.0

partially by accident, and that the original (Becker's) interpretation had

always been there. 

I stand corrected if in fact some people (other than Becker)

originally intended that ZWJ could "request" a ligature,

but my understanding of what some other people (including

Becker) originally intended, enough to cause the correction

to the wording in 1.1, was correct.

 

Ken wrote:

 

>No. This was explicitly allowed as part of the semantics of ZWJ in

>Unicode 1.0. It was explicitly defined out of the semantics of ZWJ

>in Unicode 1.1. Read the text.

 

***

 

Second, concerning Devanagari, Ken's clarification is very helpful.

 

>When the Devanagari section

>was edited and rewritten for consistency and incorporation into

>Unicode 2.0, it was noted that the text in Unicode 1.0, Volume 2

>regarding ZWJ used this way was inconsistent with the new, restricted

>interpretation; ZWJ could not be used to cause a "virama [to] be

>absorbed into the half consonant form". Instead, the more rigorous

>model of the C + virama --> Cd and Cd + ZWJ --> Ch for Devanagari was

>introduced.

 

In fact, I was one of the primary, and perhaps the first,

advocate of coding in the order

 

C + Virama + ZWJ + C

 

rather than

 

C + ZWJ + Virama + C.

 

So I am aware of most of this history.

 

========================================================================

 

From: ECOLING@aol.com

Sent: Tuesday, January 25, 2000 12:04 PM

Subject: ZWL still no new properties

 

In the context of a general consideration of alternatives

for handling ZWLigator, either by changes to ZWJ semantics

or via a new ZWL character, I pointed out that:

 

>> If we did introduce a ZWL character, it need have no new or unique

>> character properties at all.

 

and Ken responded:

 

>Incorrect. The most important issue is precisely that it *does* introduce

>a new character property: ligation request. That property is new, because

>no character currently has it. That is the whole reason for requesting

>the encoding of such a character in the first place.

 

I completely fail to understand the point here.

"Ligation request" is not a character property,

the ZWL is merely a "tiny invisible letter" without

additional special properties.

 

ZWL might be *used in* triples of the form

[A + ZWL + B]

which (I hope I am choosing words sufficiently specific

to fit everyone's technical cup of tea here)

are used in protocols external to the text as the contexts

for rendering via ligatures.

 

But that is true of the character <f> also!

It is used in pairs and triples such as [fi] and [ffi]

which protocols external to the text use as the contexts

for rendering via ligatures.

 

No difference at all.

 

Still, ZWL just like ZWJ would merely be a tiny invisible

letter (or, more precisely, have properties like the Arabic vowelings).

As I in fact stated immediately:

 

>> It should be expected to work exactly like an

>> Arabic floating vowel in not interrupting cursive linking or ligatures.

>>

>> Such a ZWL would be distinctively special *only* to the extent that fonts

>> could use it by treating it as a dummy character with no other uses

>> than to be part of triples of the type 

>> Hungarian Runes [d + ZWL + d] to be rendered as <dd> ligature.

 

Ken responds:

 

>This is how a font might implement the ligatures involving ZWL, but that

>is not the end of the story. The software itself has to be cognizant

>of the property in some way.... Also, the software will need to

>have hierarchies of interaction between global ligation settings and

>local ligation requests (or blockages), so that the appropriate thing(s)

>can be done to ensure that local preferences correctly override global

>settings, and so on.

 

Of course, true of any approach to global ligation settings and

local ligation requests.  That is necessary independent of any *particular*

choice of means to inform external protocols that a ligature should be

used if available.  It is simply a consequence of the mere existence of

both global and local cues for ligation.  (Whether using ZWJ makes

that easier than adding ZWL, or the reverse, is another question entirely.)

 

>> What if we use ZWJ to do double duty as ZWL?

>>

>> The thing to watch out for here is overloading a character

>> with non-analogous uses, to the extent that a contradiction might arise.

>

>This concern should be addressed as Mark has -- with an explicit

>listing of all the contrastive possibilities, matched up against

>the expected outcomes. If there are more expected outcomes than

>can reasonably be handled by judicious expanding of the semantics

>of ZWJ, then perhaps an independent ZWL is warranted. If not,

>then not.

 

Ken should have noted that I *did* provide a listing of contrastive

possibilities, in response to Mark's request for information,

(Mark had not lined those ones up explicitly in his request),

and no doubt did not include all of the ones Mark would include.

 

There is no need to repeat that listing

which was at the end of my message

 

"ZWJ contradictions; ZWL".

 

Readers should go there.

 

Sincerely,

Lloyd Anderson

Ecological Linguistics

 

=======================================================================

 

From: ECOLING@aol.com

Sent: Tuesday, January 25, 2000 12:05 PM

Subject: ? ZWJ "doubling" as ZWLigator

 

Ken did not today address the implications of the following

example, and since my earlier message was taken as too long

by Mark, I here highlight this single exampel in a separate message.

 

What if we use ZWJ to do double duty as ZWL?

 

The thing to watch out for here is overloading a character

with non-analogous uses, to the extent that a contradiction might arise.

 

A contradiction in this case might take the form of a script

with a cursive rendering, in which ZWJ was needed to block

a ligature yet leave the rendering cursive, and ZWL was needed

to request a ligature in a particular local spelling of one word

which was not part of the default set of  ligatures.

 

The second usage could be triggered only if the font contained

a triple [A + ZWL + B].  Otherwise it would default to the first:

 

The first usage would be triggered only if the font did not contain

a triple [A + ZWL + B].

 

If we used the same ZWJ (not ZWL) character for both functions,

then the inputter would have to know what ligatures the font would

contain, and might get opposite results if the contents of the font

were not what the inputter expected.

 

I believe that is a sufficient reason,

other than care not to risk problems farther down the road by

using characters without keeping to a consistent semantics,

to not take this second route.

 

Sincerely,

Lloyd Anderson

Ecological Linguistics

 

========================================================================

 

From: ECOLING@aol.com

Sent: Tuesday, January 25, 2000 12:04 PM

Subject: Fixing ZWJ contradictions

 

I *have* read every word in Ken's reply received this morning.

Another message today confirms that the contradictions I pointed

to really are there.   It is focused *only* on that matter.

 

This message concerns only what to do about it.

 

In what follows, I will attempt to adjust my exact wording to what

others prefer, so that we are talking about substance and not

preferences in wording.

 

Two obvious choices, either of them possible

(and there may of course be third and fourth general

paths to solution which I have not thought of), are:

 

1.  Accept the irregularity in [f+ZWJ+i] page 6-71,

     and compensate for that as necessary,

     and add protections to avoid the propagation of further such

     irregularities

 

or

 

2. Fix the wording which created this irregularity,

     bring it back into line with the general interpretations of ZWJ.

 

I will treat these in reverse order,

because things will be much clearer that way.

 

*********************************************************

 

2.  Here are the changes needed to restore consistency, to keep a

      consistent interpretation for ZWJ as merely a tiny invisible letter

     (one linkable to neighboring letters, in contexts where

      protocols external to the text handle linking, ligaturing, etc.).

 

2A. 

 

Taking route 2, the simplest because it removes the only irregularity

(inconsistency) I have been able to detect, we would have the following 

statement instead of the one currently on Unicode 2.0 page 6-71

which suggests ZWJ should be disregarded:

 

"The use of a ZWJ, consistent with its interpretation as simply a tiny letter

to which neighbors can cursively join, will normally have the effect of

interrupting the sequence of characters which might, without ZWJ,

be rendered as a ligature by protocols outside of the text stream,

if such protocols are in use."

 

2B.

 

And consistent with that, a further statement should be added

something as follows.  I am pleased that Ken agrees something

like this would be useful:

 

>>      "ZWJ should not normally be introduced between characters which

>>      form a ligature in fonts which are not cursively linking.  ZWJ has

>>     no legitimate function there.  Its introduction there is a spelling

error

>>     and will usually produce exactly the opposite effect from that

intended,

>>      by breaking the sequence of characters and preventing their rendering

>>      as a ligature"

 

Ken writes:

 

>I concur that something similar to this might be usefully added -- although

>I don't think it is resolving a contradiction. Any number of other

>invisible format controls -- if not properly ignored by a rendering process

--

>could have the same unintended and user-inexplicable effect.

 

I particularly appreciate Ken's pointing out that any number of other

characters (invisible format controls) could have the same unintended and

user-inexplicable effect.   I would add more explicitly that presence

of wrong spellings cannot be said itself to "break" any implementation.

 

Treating ZWJ as an ignorable character would be in contradiction

with the basic statement (Unicode 1.1, continued through to 3.0

according to Ken) that ZWJ acts as a tiny letter.  A tiny letter is not an

ignorable character, it is a letter.  It is not ignored when it is used

to get connecting forms of Arabic letters which otherwise would

appear as isolated forms (in external protocols).

It should not be introduced where its normal default effects

(with or without protocols external to the text) are not desired.

 

2C.

 

The current example at the bottom of Unicode 2.0 page 6-71

should be changed into a non-Latin font, since (as Ken agrees),

ZWJ would not normally have a use in the dominant type of Latin

text.

Readers have a difficult time squaring their assumptions about

the most familiar kinds of Latin text with the assumptions

necessary to make use of ZWJ relevant or even interpretable.

 

I would suggest Arabic script, but not just any Arabic letters will do.

Those for use in an example for non-Arabic users should be ones

whose forms are most recognizably related in isolating and various

linking contexts.  So *not* using Arabic letters "t,d,n,c,j,h,s,n,y"

but using such as "f,q,k,l,<t-underdot>".

 

If a Latin cursive script is used, it should be one which does contain

ligatures, so the difference between ligatured and merely linking

is meaningful to the reader.

 

2D.

 

In addition to using the Arabic script so readers will understand,

or else using a real Latin cursive *and* ligaturing font...

 

The current example of "fish" at the bottom of Unicode 2.0 page 6-71

would then be corrected to show the behavior of ZWJ not irregular

(having no effect) but having an effect consistent with its effect

in other contexts, allowing linking of neighboring characters to it.

For the total patterns showing that this is an irregular exception,

please see the separate message today on that subject.

 

Mark Davis suggested use of the following sort of sequence

to get the effect of linking but not ligaturing (now I substitute

the example with Arabic letters)

 

[l + ZWJ + ZWNJ + ZWJ + m]

would yield cursively linked but not ligatured renderings.

 

By contrast, if the inconsistency is removed,

we simply use

 

[l + ZWJ + m] to get the same result.

 

Mark's solution is unnecessarily elaborate,

is required only by the irregular interpretation of simple ZWJ

that external protocols should do something more than and

different than treating it simply as a "tiny invisible" linkable letter,

the irregular interpretation that should in addition ignore it

except in a fixed *listing* of special cases;

and the irregularity that they should treat the dependency

between [ZWJ + m] differently depending on whether

there is a preceding ZWNJ or another letter.

 

In the treatment without the irregularity,

the dependency between [ZWJ + m] is treated exactly the

same, whether or not there is a preceding ZWNJ or space

or letter or anything else. 

Respecting precisely that ZWJ is merely a "tiny letter" (linkable).

 

************************************************************

 

1.  If we take the other alternative, and keep the irregular

     interpretation of ZWJ in special contexts definable

     as an "elswhere" case (when no adjacent ZWNJ, for example), then

 

1A.  Keep the irregular exception, but note it as such so that further

     irregular interpretations do not cascade from it.

 

1B.  As 2B, appropriate in any event.

 

>>      "ZWJ should not normally be introduced between characters which

>>      form a ligature in fonts which are not cursively linking.  ZWJ has

>>     no legitimate function there.  Its introduction there is a spelling

error

>>     and will usually produce exactly the opposite effect from that

intended,

>>      by breaking the sequence of characters and preventing their rendering

>>      as a ligature"

 

1C.  As for 2C., Change the "fish" example into an appropriate cursive font,

          both likable and with ligatures, so the example makes sense.

 

1D.  If keeping the irregularity that

           [letter + ZWJ + letter] "should" normally have the ZWJ disregarded,

           so ligatures would still be formed by the protocols external to

the text,

          then add the example with Mark's sequence

          [letter + ZWJ + ZWNJ + ZWJ + letter]

          to indicate how one CAN get the effect of linking on both sides

          but without ligatures being rendered by the external protocols.

 

************************************************************

 

If Devanagari semantics for ZWJ are consistent with the semantics for

ZWJ elsewhere, then alter the statements on Unicode 2.0 p.6-71 to make

that interpretation unambiguous.

 

If Ken's explicit agreement with the "last statement" in the paragraph quoted

from me just below includes both of these sentences, rather than merely

the last sentence, then it might be helpful to alter the wording in Unicode

2.0

page 6-71 so that it does not leave *open* any interpretation that the use

of ZWJ in Devanagari is irregular.

 

>>There is currently nothing special

>> about the use of ZWJ for Devanagari.  It has the basic

>> interpretation of a linkable neighbor character for which

>> conjunct combinations are not defined by fonts, exactly as

>> in the basic wording of page 6-70.

 

I pointed out the possibly misinterpretable phrasing:

 

>> but the wording now on page 6-71 treats this as if it were exceptional:

>>

>>      "The function of the ZWJ may also have a particular interpretation

>>      in specific scripts.  For example, in Indic scripts it provides

>>      an

>>      invisible neighbor to which a dead consonant may join in order to

>>      induce a half-consonant form. ..."

>>

>> This is in fact no different than the Arabic case, and both are

>> completely consistent with the basic wording, Unicode 2.0 p.70.

 

Ken responded:

 

>This is incidental. The text of page 6-71 points to a particular,

>script-specific usage. Yes, it is consistent with the generic sense

>of ZWJ, but in the context of Devanagari, the specific rules regarding

>half-consonant formation are invoked. Those *are* script-specific.

 

My point was that "may also have a particular interpretation"

suggests something more and quite different from

"may also have particular uses", in that it allows and even suggests

"may also have a particular semantics (different from that in specific

other scripts)".  An interpretation of the wording which I think we would

all want to avoid.

 

I do not even see that the uses (yielding a linking form of an adjacent

consonant) are any different in Devanagari.  So it would be better to

remove the possibly misleading suggestion, by rewording.

 

*******************************************************

 

Sincerely,

Lloyd Anderson

Ecological Linguistics