Re: The Cent & Florin Signs VS. C-Slash & Left-Tailed F

From: Richard Gillam (rgillam@jtcsv.com)
Date: Wed Jan 19 2000 - 18:31:03 EST


Peter Constable wrote:
>
> >F-hook should also be representable with U+0066 LATIN SMALL
> LETTER F followed by (I'm guessing here) U+0321 COMBINING
> PALATALIZED HOOK BELOW. [If I'm wrong, then that'd explain why
> f-hook is separately encoded and doesn't have a canonical
> decomposition.] I wonder if you'd get a different glyph from
> the florin sign with some fonts by using this combination
> instead of U+0192.
>
> U+0192 doesn't have a decomposition, as you've noted, and so
> U+0066 U+0321 *can not* be used to represent this.

Okay, I guessed wrong.

> >At any rate, to me the big argument in favor of adding a
> "florin sign" character to the standard would be if the glyph
> shapes for these two characters aren't always identical. Is
> that true, and if so, how do they differ?
>
> There is another possible argument: differing semantics....
> If they remain
> unified, then implementers would face problems with knowing
> what to do about case mapping or bidi behaviour.

I understand this issue and glossed over it on putpose. I'm wondering whether
the issue is really that serious. The reason is that there are already many
regular letters that also get used for currency symbols. Unicode has a special
French franc symbol, but based on our own locale data, they actually just use a
regular capital F as the franc sign most of the time. A process operating on a
piece of text can't easily tell whether an isolated F in a French document is a
currency symbol or a letter, so it usually punts and just treats it as a letter.

This happens so often already (and I at least haven't heard lots of complaints
about it) that adding one more instance of ambiguous currency-symbol/letter
semantics scarcely makes a difference.

Furthermore, a process could probably tell whether a capital F in French was a
currency symbol by looking at the context and using a simple heuristic (e.g.,
"if the previous non-white-space character was a digit and the following
character is not a letter, it's a currency sign; otherwise, it's a letter").
This would work equally well for U+0192.
 
> Options:
>
> 1) leave it as it is
> 2) disunify; U+0192 used for IAI bilabial f, and new character
> assigned for florin
> 3) disunify; U+0192 used for florin, and leave semantics of
> U+0192 as is
> 4) disunify; U+0192 used for florin, but change semantics of
> U+0192 to name = FLORIN SIGN, cat = Sc, bidi = Et

You're forgetting option 5, which is how this situation has been dealt with in
the past (consider the ASCII hyphen-minus character, for example). Leave U+0192
the way it is and add TWO new characters: a new character that unambiguously
means "f-hook" and has the right properties gets added somewhere in the Latin
blocks, and a new character that unambiguously means "florin sign" and has the
right properties gets added to the currency-symbols block.

This leaves U+0192 messed up, but it provides good alternatives to it, and all
existing applications continue working.

(Although, because of my reasoning above, I'm still personally leaning toward
option 1.)

--Rich Gillam
  Unicode Technology geek
  IBM



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT