From: Marco Cimarosti (firstname.lastname@example.org)
Date: Fri Oct 17 2003 - 10:42:42 CST
John Cowan wrote:
> You persist in misunderstanding. Suppose I came along and told you
> I wanted to create a Unicode codepoint for each word in every language
> on Earth. Would you blithely allocate me a 24-billion-codepoint
> private space?
Why? 200 millions should be more than enough: that's more than 30.000 words
for each living language.
Of course, you should only encode abstract words, such as <ENGLISH VERB
JOKE>, and combining morphemes such as <ENGLISH COMBINING INFLECTION PAST
TENSE>, <ENGLISH COMBINING INFLECTION PRESENT PARTICIPLE>, etc.
It will be the task of the uttering engine to utter a sequence like <ENGLISH
VERB JOKE> + <ENGLISH COMBINING INFLECTION PRESENT PARTICIPLE> with the
ligature "joking". Of course, this will only happen with OpenLex-enabled
uttering engines: naive uttering engine based on old TrueLex would render
with the fallback uttering "joke -ing".
To make it more interesting, you could also encode a few useless
compatibility presentation inflected forms such <ENGLISH VERB SPEAK PAST
TENSE FORM>, which will get decomposed to <ENGLISH VERB SPEAK> + <ENGLISH
COMBINING INFLECTION PAST TENSE>, and finally be rendered as "spoke",
"speaked" or "speak -ed", depending on the platform.
Notice that a few words will need contextual forms, such as <ENGLISH
INDETERMINATE ARTICLE>, which will display as "a" or "an" depending on the
following code point.
Languages, such as Swahili, which use prefixes instead than suffixes will be
encoded in "logical order", i.e. with the combining prefix after the root.
It will be the task of the uttering engine to reorder the prefix. E.g., the
Swahili word "watu" (plural of "mtu" = "man") will be encoded as <SWAHILI
NOUN TU> + <SWAHILI COMBINING INFLECTION PLURAL FOR PEOPLE> and, in theory,
it will be rendered as "watu". In practice, it will always be rendered as
"-tu wa-" because no one will invest in implementing Swahili rendering.
This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST