From: Peter Kirk (firstname.lastname@example.org)
Date: Mon Aug 04 2003 - 18:57:07 EDT
On 04/08/2003 14:59, Kenneth Whistler wrote:
>Peter Kirk asked:
>>>In other words, if what you need is to glue things together,
>>>i.e. a zero width no-break space *function*, then use
>>>U+2060. If what you need is a BOM for the encoding scheme
>>>specifications, then use U+FEFF.
>>>What is *discouraged*, but not prohibited, of course, is
>>>using U+FEFF for a zero width no-break space *function*,
>>>precisely because that interacts so confusingly with
>>And what if you need a ZWNBS function for something other than gluing
>>things together? For example, as a carrier for a string or line initial
>>diacritical mark when no spacing is required?
>This is not something sanctioned by the standard.
>The carrier for a combining mark that is to display in isolation without
>a base character is U+0020 SPACE. If you want to also indicate the
>absence of a line break opportunity, then the carrier is U+00A0
>NO-BREAK SPACE (NBSP).
Neither of these is appropriate to the case I have in mind (described in
greater detail below) as they are not zero width and therefore give an
unwanted indent at the start of a line. U+200B ZERO WIDTH SPACE might be
appropriate, but this has the problem that it is a break opportunity,
which is not always appropriate.
>Despite its name, U+FEFF ZWNBS is *NOT* a space character. It is
>formally gc=Cf, not gc=Zs. It also does not have the White_Space
>So "a ZWNBS function for something other than gluing things together"
>is a contradiction in terms of the current definition of the standard.
>The *meaning* of the "ZWNBS function" is its behavior in the
>context of UAX #14, Line Breaking Properties. See the WJ Word joiner
>entry (normative) of UAX #14:
Thank you, Ken, and also Mark. I didn't know where to find these
details. Mark wrote:
>names may be misleading; people intending to use them for any other
>function should carefully read the sections of the Unicode Standard
>that discuss their usage.
But which sections? Where is the index, online? It is unfortunate that
there are no links from the character charts or the database to the
various places where the uses of the characters are explained. All there
is is a character name, and as I have found quite often this character
name is seriously misleading if not actually incorrect. It is highly
unfortunate that it is not permitted to change these misleading names.
As it is, the note at U+FEFF in the character charts reads "use as an
indication of non-breaking is deprecated...", although you wrote that
this was not deprecated. But there is no note that use of ZERO WIDTH
NO-BREAK SPACE as a zero width no-break space is deprecated or "a
contradiction in terms of the current definition of the standard". Are
you surprised that I am confused?
>>This is one of the
>>suggestions for some of the Hebrew problems, but I have had no response
>>to my suggestion of using U+2060, which is inappropriately named for the
>>function I have in mind.
>The function I think you have in mind is not isolated display of
>a combining mark, but rather trying to find a mechanism for
>getting around the conformance strictures of the standard, to
>get a combining mark to apply to a *following* base
>character, rather than to a *preceding* base character.
If by "apply" in the above you mean "be positioned adjacent to", there
is already a problem with the standard: the EXISTING Hebrew page of the
standard is in contravention to its conformance strictures. This is
because under the existing standard (irrespective of any changes being
proposed) and in legacy encodings, the combining mark holam, which is
usually graphically positioned above the preceding base character, is in
certain environments, specifically when followed by a silent alef (holam
male is a separate issue), graphically positioned above the following
base character. But the standard has anticipated this kind of difficulty
by recognising that positioning is not always consistent with logical
ordering, see the note on Indic vowel signs in The Unicode Standard 4.0
section 2.10, subsection "Sequence of Base Characters and Diacritics",
http://www.unicode.org/book/preview/ch02.pdf. This is a documented
special case; Hebrew holam followed by silent alef is also a special
case whether you like it or not, it just hasn't been documented. It
could be removed, but that would require changes to every existing
(ancient or modern) pointed Hebrew text.
>Trying to use U+FEFF *or* U+2060 to do this would be inappropriate.
Understood. I await alternative suggestions.
-- Peter Kirk email@example.com http://web.onetel.net.uk/~peterkirk/
This archive was generated by hypermail 2.1.5 : Mon Aug 04 2003 - 19:30:52 EDT