Michael Everson wrote on 1999-04-06 12:10 UTC:
> I got one answer, what to paste in the header. The other question, what
> CAPITAL LETTER D WITH DOT ABOVE is, remains unanswered....
Michael,
Is it essential that you use UTF-8?
Without UTF-8 and any special headers, you can always specify in HTML
these characters via a decimal numeric character reference.
The following tiny table contains for all Unicode characters of the form
"LATIN * WITH DOT ABOVE" the decimal entity reference, the hexadecimal
entity reference, and the UTF-8 character:
Ċ Ċ Ċ LATIN CAPITAL LETTER C WITH DOT ABOVE
ċ ċ ċ LATIN SMALL LETTER C WITH DOT ABOVE
Ė Ė Ė LATIN CAPITAL LETTER E WITH DOT ABOVE
ė ė ė LATIN SMALL LETTER E WITH DOT ABOVE
Ġ Ġ Ġ LATIN CAPITAL LETTER G WITH DOT ABOVE
ġ ġ ġ LATIN SMALL LETTER G WITH DOT ABOVE
İ İ İ LATIN CAPITAL LETTER I WITH DOT ABOVE
Ż Ż Ż LATIN CAPITAL LETTER Z WITH DOT ABOVE
ż ż ż LATIN SMALL LETTER Z WITH DOT ABOVE
Ḃ Ḃ Ḃ LATIN CAPITAL LETTER B WITH DOT ABOVE
ḃ ḃ ḃ LATIN SMALL LETTER B WITH DOT ABOVE
Ḋ Ḋ Ḋ LATIN CAPITAL LETTER D WITH DOT ABOVE
ḋ ḋ ḋ LATIN SMALL LETTER D WITH DOT ABOVE
Ḟ Ḟ Ḟ LATIN CAPITAL LETTER F WITH DOT ABOVE
ḟ ḟ ḟ LATIN SMALL LETTER F WITH DOT ABOVE
Ḣ Ḣ Ḣ LATIN CAPITAL LETTER H WITH DOT ABOVE
ḣ ḣ ḣ LATIN SMALL LETTER H WITH DOT ABOVE
Ṁ Ṁ Ṁ LATIN CAPITAL LETTER M WITH DOT ABOVE
ṁ ṁ ṁ LATIN SMALL LETTER M WITH DOT ABOVE
Ṅ Ṅ Ṅ LATIN CAPITAL LETTER N WITH DOT ABOVE
ṅ ṅ ṅ LATIN SMALL LETTER N WITH DOT ABOVE
Ṗ Ṗ Ṗ LATIN CAPITAL LETTER P WITH DOT ABOVE
ṗ ṗ ṗ LATIN SMALL LETTER P WITH DOT ABOVE
Ṙ Ṙ Ṙ LATIN CAPITAL LETTER R WITH DOT ABOVE
ṙ ṙ ṙ LATIN SMALL LETTER R WITH DOT ABOVE
Ṡ Ṡ Ṡ LATIN CAPITAL LETTER S WITH DOT ABOVE
ṡ ṡ ṡ LATIN SMALL LETTER S WITH DOT ABOVE
Ṫ Ṫ Ṫ LATIN CAPITAL LETTER T WITH DOT ABOVE
ṫ ṫ ṫ LATIN SMALL LETTER T WITH DOT ABOVE
Ẇ Ẇ Ẇ LATIN CAPITAL LETTER W WITH DOT ABOVE
ẇ ẇ ẇ LATIN SMALL LETTER W WITH DOT ABOVE
Ẋ Ẋ Ẋ LATIN CAPITAL LETTER X WITH DOT ABOVE
ẋ ẋ ẋ LATIN SMALL LETTER X WITH DOT ABOVE
Ẏ Ẏ Ẏ LATIN CAPITAL LETTER Y WITH DOT ABOVE
ẏ ẏ ẏ LATIN SMALL LETTER Y WITH DOT ABOVE
ẛ ẛ ẛ LATIN SMALL LETTER LONG S WITH DOT ABOVE
You can try to cut&paste the characters from this table in any of the
three forms into your raw HTML document with any 8-bit plain text
editor.
I can easily dump to you the entire Unicode table in such a form if this
is of any help.
I've just spent the last 3 minutes writing the following tiny Perl
program that produced this table. Perl is extremely useful for
transforming the Unicode database into anything in a few minutes.
#!/usr/bin/perl
# subroutine to convert an integer into a UTF-8 string
sub utf8 ($) {
my $c = shift(@_);
if ($c < 0x80) {
return sprintf("%c", $c);
} elsif ($c < 0x800) {
return sprintf("%c%c", 0xc0 | ($c >> 6), 0x80 | ($c & 0x3f));
} elsif ($c < 0x10000) {
return sprintf("%c%c%c",
0xe0 | ($c >> 12),
0x80 | (($c >> 6) & 0x3f),
0x80 | ($c & 0x3f));
} else {
return utf8(0xfffd);
}
}
# read list of all Unicode names (UnicodeData-Latest.txt) and
# output a list with NCRs (dec and hex) as well as UTF-8 and the name
while (<>) {
if (/^([0-9,A-F]{4});([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*)$/) {
next if ($2 eq "<control>");
$ncr_dec = sprintf("&#%d;", hex($1));
$ncr_hex = sprintf("&#x%x;", hex($1));
printf("%s%s%s %s\n",
$ncr_dec . (" " x (10-length($ncr_dec))),
$ncr_hex . (" " x (10-length($ncr_hex))),
utf8(hex($1)), $2);
} else {
die("Syntax error in line '$_' in file '$unicodedata'");
}
}
Markus
-- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT