From: Jonathan Pool (firstname.lastname@example.org)
Date: Sat May 17 2008 - 00:09:27 CDT
> I'd expect the reason to be very simple: U+0027 APOSTROPHE is easy to key in.
True, but most characters are difficult for most users to enter. I was asking
about the consortium’s own pages, which I surmised aren’t difficult for the
consortium to enter.
But even the mail from this list seems not to be sent with a Unicode encoding;
> you mean normalization in the sense of transforming to a Normalization
Sorry, I meant normalizing according to a set of Unicode-inspired orthographic
> Regarding in particular U+0027 APOSTROPHE in existing data, I strongly suggest
> that if you do not know absolutely and provably what it really “stands for,”
> don‘t touch it. When reading text in a natural language that you know
> well, you can usually know what U+0027 should be changed to, but if it‘s
> anything that might be a foreign name or code-like notation, it‘s easy
> to go wrong.
I appreciate your caution. On the other hand, not touching it is a decision,
too. If different sources represent the same lexeme with different apostrophes
and we refrain from touching them, then we’re asserting (in our project) that
these lexemes are distinct, and this interferes with our discovery of
translation paths through the lexeme.
> This was discussed some time ago on this list when I raised the issue.
> Check back the list archives if you are interested in people’s views as
> they expressed them, but my impression was that this was not regarded as
> important enough to be done right.
Apparently, though it was by some (e.g., James Kass, who argued--against the
view of Asmus Freytag--that Web pages are more, not less, subject to an
expectation of standard conformity than are paper-printed works, and finished
with: “Web pages on the Unicode site should be exemplary”). For my purposes,
it would certainly help if they were exemplary, and it casts doubt on the
claim of practicality of the standard when the standardizing authority doesn’t
comply. Thanks for the reference.
> There is this problem in ukrainian language, where apostrophe means hard sign.
> How to reproduce it in original cyrillic script? It would not be a "diacritic"
> character as apostrophe, but it is really the original cyrillic character at
> the moment (The Ukrainian National Library thake it as an apostrophe U+0027).
> Same as in the Latin script: U+2019
Why? This seems to conflict with the standard as I understand it. I believe
it’s a letter with a phonological value, not a punctuation mark, so I
understand the standard to state that the correct character is 02BC (MODIFIER
LETTER APOSTROPHE). I believe that this is argued for at
in message 87. If I’m incorrect, I’d appreciate an explanation. Thanks.
This archive was generated by hypermail 2.1.5 : Sat May 17 2008 - 00:13:44 CDT