Fwd: Character Identity and Font Selection

From: André Szabolcs Szelp <a.sz.szelp_at_gmail.com>
Date: Sat, 11 Jun 2011 07:19:16 +0200

I'm posting this message just for the record. The original message was
bounced by unicode, I became victim of the server transition. (Elisabeth has
already replied to this, in her reply the message was cited in parts).

---------- Forwarded message ----------
From: André Szabolcs Szelp <a.sz.szelp_at_gmail.com>
Date: 2011/6/10
Subject: Re: Character Identity and Font Selection
To: ejp10 <ejp10_at_psu.edu>
Cc: Ecartis Unicode <ecartis_at_unicode.org>, unicode_at_unicode.org


thank you for your attempt.

2011/6/9 ejp10 <ejp10_at_psu.edu>

> I do agree with Everson's comment that IPA is essentially the Latin
> alphabet with lots of characters. Specifically, the common sounds (e.g.
> /a,e,i,o,u/ and /p,t,k, s,f,x b,d,g, m,n, l,r.../ are all from the old ASCII
> range. Many more are from the rest of the Latin A block (the top half of
> the old Latin-1 encoding). This is by design since many of the most common
> sounds are represented by characters from ASCII,

> Any phonetic transcription a linguist makes will be generated by inputting
> standard Latin characters with a smaller percentage of characters coming
> from the "IPA phonetic block" or one of the other more extended blocks (see
> this paragraph transcribed below)
> /ɛni fənɛtɪk trænskrɪPtʃn ə lɪngwɪst meks wɪl bi dʒɛnɨretɨd baj InpUtɪŋ
> stændərd lætɪn wɪθ ə smɔlər pərsɛntɛdʒ ʌv kærɨtətrz kʌmɪŋ frʌm.../
> (phonemic, USA standard Mid Atlantic)

Well, I counted in your very example:
Other 41
Assumptions for this count: your capital P were taken as [p], your capital I
as [ɪ], your capital U as the correct IPA (non-ascii) char. <by> which you
transcribed as [baj] was corrected to the more common practice of [baɪ],
just for the record.
For your claimed pronunciation, probalby a great number of your <r>-s would
be given with an other symbol than [r], further changing the relation (to
ca. 66:49).

That's a whopping 57% ASCII vs. 43% non-ASCII!

Clearly, IPA has a similar relation to Latin (even in your example!) as
Cyrillic to greek or Latin to Greek.

> not to mention that no linguist wants to use exotic input methods for
> common sounds and typesetting non-standard characters has always been tricky

What you describe very much sounds like a problem of input methods, rather
than character encoding.

> If phonetic transcription were separated out as a "separate script", there
> would be (another) duplication of glyphs from the Latin block (probably
> almost all of the ASCII range).

That's a common problem with related scripts, but it's not unheard of.
That's why we have three (graphically) A-s, three K-s, three H-s three E-s,
etc., two a-s, two e-s, two y-s etc.

> While texts could be converted to a "proper" separate script for
> transcription, I suspect it would very rarely happen.

This really is an input method problem. Actually, I'm quite sure a keyboard
layout containing all IPA symbols (pointing to the correct characters) would
be readily used. Switching to it for IPA input is actually less bothersome
to use, than using the language-specific Latin keyboard, then change to a
character map program every second-third character, there constantly switch
between character ranges (as you mentioned, Latin-1 (e.g. for æ) and
_several_ phonetic extension blocks... I know what I'm speaking about, as I
have done it several times.

> A final issue is that linguists are notorious for inventing new
> transcription symbols informally.

Yes, and once these informal signs are for whatever reasons wide-spread
enough, they might be included in Unicode. That's also now, independently of
the script-identity of IPA issue, how it works right now.

> That block would be getting updated a lot in parallel with the Latin block
> or lots of Uniciode machinery would need to be added if the scripts were
> formally separated.

Why would the block be updated in parallel with Latin? I doubt any more
plain Latin characters would be added to IPA. And if so, then the Latin
block would not be updated. What you describe corresponds to the unlikely
scenario, that one invents an ad-hoc character, it becomes widespread, and
it does not only become part of official IPA, but of some Latin orthography
as well... Yeah, this happens all the time.

Hope this helps,


Szelp, André Szabolcs
+43 (650) 79 22 400
Received on Sat Jun 11 2011 - 00:23:15 CDT

This archive was generated by hypermail 2.2.0 : Sat Jun 11 2011 - 00:23:16 CDT