I'm preparing some mappings of teletext character sets to Unicode. You can
see my results so far at
[hope that URL doesn't get split..] This is a LARGE page, btw (150k). In
IE5+, hover over the character to get its name.
As you can see, I have some ambiguous characters and unknows, and am
wondering whether anyone would like to answer these questions :)
1) I'm not sure about the forms in G0_ARABIC. I've had some excellent help
from an Arabic-speaker, but am wondering whether it could be further
refined. I've uploaded the tables in the teletext spec to
so you can make a comparison. I haven't finished G2_ARABIC yet, so there's
a few gaps.
2) Hyphens or dashes - what's the difference?
3) Which to use: 2016: DOUBLE VERTICAL LINE, or 0x2225 PARALLEL TO, or
0x2251 BOX DRAWING DOUBLE VERTICAL, or 0x01C1 LATIN LETTER LATERAL CLICK ?
4) Turkish Lira - the teletext spec represents this with a combined ligature
'TL', which I can't find a Unicode character for. I've put in 20A4 LIRA
SIGN, but I don't think this is what the teletext designers had in mind. Is
this a case for a new Unicode character?
5) G0_LATIN_LETTISH_LITHUIAN looks to have a LATIN SMALL LETTER I WITH
CEDILLA, which I can't find in Unicode (so I've stuck in i with ogonek
instead). Is this missing?
6) Is there a 041F CYRILLIC CAPITAL LETTER PE with a curved top, like 0x22C2
N-ARY INTERSECTION, in both uppercase and lowercase forms? Perhaps this a
particular glyph of the PE character, represented as a separate entry in the
7) Misc. other characters: Couldn't decide between
a) 2126: OHM SIGN or GREEK CAPITAL LETTER OMEGA, 03A9
b) 0110: LATIN CAPITAL LETTER D WITH STROKE, or LATIN CAPITAL LETTER
c) 00DF: LATIN SMALL LETTER SHARP S, or GREEK SMALL LETTER BETA, 03B2
d) 0251: LATIN SMALL LETTER ALPHA, or GREEK SMALL LETTER ALPHA, 03B1
e) 00B0: DEGREE SIGN, or MASCULINE ORDINAL INDICATOR, 00BA
8) And some others I'm not sure of:
a) Character 0x28 of G2_GREEK, looks like a colon
b) Character 0x6e of G2_LATIN, looks like a tall Greek eta
c) Character 0x7e of G2_LATIN, looks like an eta
d) Character 0x52 of G0_GREEK, I've put it in as 0374 GREEK NUMERAL SIGN
but can't be sure
Perhaps there's some 7-bit sets knocking about which the teletext ones were
based on, which would help. The full teletext spec is available from
http://www.etsi.org, named ETSI 300 706 (you'll have to register to
download, but it's free). I suspect the designers of the spec would use a
single glyph to represent two characters in some cases, e.g. D with a stroke
would mean both 0110 and 00D0, seeing as both lowercase forms are further up
in the same set.
Hope I haven't asked too much in my first posting to this list :)
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT