From: Doug Ewell (firstname.lastname@example.org)
Date: Sat Aug 29 2009 - 12:09:46 CDT
"William_J_G Overington" <wjgo underscore 10009 at btinternet dot com>
> Allocation of the sequence of two codepoints for each glyph would be
> done either by the Unicode Consortium and ISO jointly,
Then it's "encoding," as Asmus said.
> or maybe by delegated registration centres, such as government trade
> mark offices and national libraries in the various jurisdictions of
> the world, each being allocated a block of encoding space to use for
> specific purposes.
Then it's sort of like ISO 2022: still encoding, but decentralized so
that implementations pick and choose only the pieces they care about and
ignore the rest.
> I know that various items such as logos and personal gaiji are not
> encoded at present due to rules, not due to space considerations.
> However, I feel that progress in information technology needs a way of
> encoding all such items in plain text for electronic library
You may be in a bit of luck, because the encoding of emoji as used in
Japanese cell phones shows that previous attitudes within WG2 and UTC
against encoding novel and short-lived symbols, and even logos, may be
relaxing. (A set of corporate logos, with representative glyphs
excised, was included in early emoji proposals.)
But the items to be encoded still need to be proposed explicitly, even
if part of a block of hundreds. Even with emoji, there was never a
proposal to set aside an empty block and let others populate it as they
You are far better off thinking of the graphical thingies of the world
as belonging to one of two categories, just as UTC and WG2 do:
1. The ones that merit formal encoding can be proposed and accepted
into Unicode/10646, and will get their own code points, and can be used
in any conformant Unicode/10646 implementation (preferably with a font
that supports them).
2. The others can be represented with one of the 137,468 available
private-use code points, or with an inline image (as people used to
represent ordinary characters outside their code page, before Unicode
came along), or with a higher-level escape sequence, as Asmus described.
You are *not* better off proposing a third category between 1 and 2,
things that Unicode "encodes but doesn't encode." Several people over
the years have commented on how unlikely that is to happen.
As a matter of fact, from time to time people in positions of authority
make statements that private-use mechanisms (in Unicode or elsewhere)
are inherently evil and problematic, and should not be used. This is
why I find such statements so frustrating and counterproductive: they
lead some people to seek non-private-use solutions to essentially
private-use problems. Perhaps we should be grateful that Unicode has
not deprecated the PUAs altogether.
> Hopefully this thread will raise awareness of this issue and hopefully
> some people reading this will post agreement that something needs to
> be done, not necessarily using my encoding idea, but that something
> needs to be done. I feel that it is not very good if copied and
> pasted text from archived documents needs strings with ampersand or
> colon in them to signify the meaning of characters.
Propose the characters. Don't just state that there are gazillions of
characters that need to be represented in documents and are not
-- Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Sat Aug 29 2009 - 12:13:10 CDT