Finite State Automata for Unicode
Thomas Emerson - Basis Technology Corporation
The literature on finite state automata generally assumes a relatively small alphabet, often 128 or fewer characters. Small alphabets allow one to implement an FSA efficiently and in a straight-forward manner.
Large alphabets (of which Unicode is a prime example) can make the efficient and compact implementation of automata difficult. This talk presents the problems encountered when handling large alphabets in an FSA implementation and describes some methods to handle them.
The presentation presumes some knowledge of automata, though time will be spent in the beginning to present a brief overview of the necessary theory.
|When the world wants to talk, it speaks Unicode|
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS).
GMS is pleased to be able to offer the International Unicode Conferences under an exclusive
license granted by the Unicode Consortium. All responsibility for conference finances and
operations is borne by GMS. The independent conference board serves solely at the pleasure
of GMS and is composed of volunteers active in Unicode and in international software
development. All inquiries regarding International Unicode Conferences should be addressed
Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.
21 February 2002, Webmaster