Ken,

Why not keep the essential meaning of "grapheme" as a minimal unit of written script, as a "phoneme" is a minimal unit of sound in a language.

and use "graphemic joiner" for the mark which forces together units that could stand separately, but which have different meanings/functions in the (sub)script when they stand separately?

Occam's razor suggests to me that "grapheme cluster," when used in this way, overly complicates the problem by clouding the meaning of “grapheme.”

The concept of "phoneme cluster" has been used in speech recognition but it is used to find "words" rather than to define another minimal unit of sound in a script, e.g.,

Detection of Unregistered-Words Using Phoneme Cluster Models
Hiroyuki SAKAMOTO** and Shoichi MATSUNAGA***
(* A TR Interpreting Telecommunications Research Laboratories, Kyoto-fu, 619-02 Japan)
(* Presently, Nitsuko Corporation Central Research & Development Laboratories)
(** Presently, NTT Human Interface Laboratories)
(Vol. J80-D-II, No. 9, pp. 226 1-2269)

This paper proposes a method of detecting unregistered words using phoneme cluster models, which are generated from phoneme clusters that classify all phonemes under some kind of cluster. We make a comparative study of cluster models which 1) taken into account the Japanese syllabic construction, 2) automatically split a single model and 3) unify all phonemes. In sentence recognition experiments including unregistered words, the cluster models that take into account the Japanese syllabic construction reduced the processing time by half and achieved equivalent word accuracy, compared with past processing using phoneme models. We confirmed the effectiveness of the proposed method in suppressing the amount of processing for unregistred word detection. And to improve the score of unregistered words, a penalty using a cluster N-gram was effective.
key words:continuous speech recognition, phoneme cluster model, unregistered-word, garbage HMM, N-gram


Jim Caldwell


On 1/11/02 10:44, "Kenneth Whistler" <kenw@sybase.com> wrote:

> Kent,
>
>
>>> Which is why I didn't suggest we call it a "conjunct". The Unicode
>>> Standard already uses "conjunct" in a specific meaning. It does
>>
>> Hmmm, is that the usual terminology for that specific meaning?
>
> Yes. That term is well-established in Indic graphology. Which
> means that...
>
>> my vote is for "conjunct", use another word for
>> hard (and Brahmic) ligatures;
>
> is not a viable option.
>
> --Ken
>
>

--
"Seek Harmony, Cherish Diversity, Enjoy Paradox"
James T. Caldwell, Ph.D.
Multilingual Communications Consultant

Pacific Rim Connections, Inc. http://www.pacrim.net
Computing Solutions for a Multilingual World
3030 Atwater Drive
Burlingame, CA 94010-5128
Phone: 1-650-692-7182  Mobile: 1-650-678-2493 email: jtc@pacrim.net