From: Pim Blokland (firstname.lastname@example.org)
Date: Wed Mar 19 2003 - 17:44:42 EST
Doug Ewell wrote:
> There have been lots of attempts to define short mnemonic names or
> "entities" for Unicode. SGML names are one. The "i18nrep
> repertoiremap," originally defined in RFC 1345 and more recently
> ISO/IEC TR 14652, is another. These schemes work well for a
> small number of characters, say a thousand, but become unwieldy
> anti-mnemonic when applied to a larger set of characters. There
> aren't enough short mnemonic names to go around.
Yes, but my suggestion was to read character names on a per-font
basis. So when a HTML file contains something like <FONT
face="Symbol"> only the Symbol font has to be read and scanned for
names. This completely nullifies the need to have the complete list
of entity names in memory at all times.
> (..) the scenario Pim describes might work (although
> asking a browser to interpret the internal structure of a font
> seems excessive to me). But the same mechanism is less likely to
Is it really so excessive? Browsers already have to retrieve a lot
of font info, like the line height and such. Why not character
names? I could easily envision a function call like
and John Hudson wrote:
> You are presuming that all fonts contain name strings for glyphs.
As a matter of fact, I know they don't, and even if they do, it's
often names like "uniF10C" which aren't really very helpful.
Naturally my scheme would break down if the given fonts do not
contain the character names we expect. But, in the current scheme
that's in use, things go wrong if the fonts don't contain the
numerical indexes we expect! As a real-life example, if the browser
program assumes a character like "INTEGRAL EXTENSION" to be in the
Symbol font at codepoint U+23AE, but its index in the font really is
U+F0F4, this does not work out. Now if the browser were to look for
a character named "integralex" in a font, the character would have
been found, no matter what codepoint it was on.
Anyway, I'm aware this is getting a bit out of scope for the Unicode
Mailing List, so I propose we leave it at this. Unless someone is
willing to rewrite an Internet browser, then I'll be happy to
continue this discussion by private e-mail...
This archive was generated by hypermail 2.1.5 : Wed Mar 19 2003 - 18:29:32 EST