Ken,
Why not keep the essential meaning of "grapheme" as a minimal unit
of written script, as a "phoneme" is a minimal unit of sound in
a language.
and use "graphemic joiner" for the mark which forces together units
that could stand separately, but which have different meanings/functions
in the (sub)script when they stand separately?
Occam's razor suggests to me that "grapheme cluster," when used in
this way, overly complicates the problem by clouding the meaning of “grapheme.”
The concept of "phoneme cluster" has been used in speech recognition
but it is used to find "words" rather than to define another
minimal unit of sound in a script, e.g.,
Detection of Unregistered-Words Using
Phoneme Cluster Models
Hiroyuki SAKAMOTO** and Shoichi MATSUNAGA***
(* A TR Interpreting Telecommunications Research Laboratories, Kyoto-fu, 619-02 Japan)
(* Presently, Nitsuko Corporation Central Research & Development Laboratories)
(** Presently, NTT Human Interface Laboratories)
(Vol. J80-D-II, No. 9, pp. 226 1-2269)
This paper proposes a method of detecting unregistered words using phoneme
cluster models, which are generated from phoneme clusters that classify all
phonemes under some kind of cluster. We make a comparative study of cluster
models which 1) taken into account the Japanese syllabic construction, 2)
automatically split a single model and 3) unify all phonemes. In sentence
recognition experiments including unregistered words, the cluster models that take
into account the Japanese syllabic construction reduced the processing
time by half and achieved equivalent word accuracy, compared with past processing
using phoneme models. We confirmed the effectiveness of the proposed method
in suppressing the amount of processing for unregistred word detection.
And to improve the score of unregistered words, a penalty using a cluster N-gram was effective.
key words:continuous speech recognition, phoneme cluster model, unregistered-word, garbage HMM, N-gram
Jim Caldwell
On 1/11/02 10:44, "Kenneth Whistler" <kenw@sybase.com> wrote:
> Kent,
>
>
>>> Which is why I didn't suggest we
call it a "conjunct". The Unicode
>>> Standard already uses "conjunct" in a specific meaning. It does
>>
>> Hmmm, is that the usual terminology for that specific meaning?
>
> Yes. That term is well-established in Indic graphology. Which
> means that...
>
>> my vote is for "conjunct",
use another word for
>> hard (and Brahmic) ligatures;
>
> is not a viable option.
>
> --Ken
>
>
--
"Seek Harmony, Cherish Diversity, Enjoy Paradox"
James T. Caldwell, Ph.D.
Multilingual Communications Consultant
Pacific Rim Connections, Inc. http://www.pacrim.net
Computing Solutions for a Multilingual World
3030 Atwater Drive
Burlingame, CA 94010-5128
Phone: 1-650-692-7182 Mobile: 1-650-678-2493 email: jtc@pacrim.net