Unicode and Homeland Security

Carl Hoffman - Basis Technology Corporation

Intended Audience:	Managers, Software Engineers, Systems Analysts, Marketers
Session Level:	Intermediate

Government agencies involved in homeland security are slowly appreciating the need to build information systems capable of handling multilingual data. Historically, these agencies have pursued a course of translating written text into English and transliterating foreign names into the Roman alphabet before entering such data into an information system. The limitations of these approaches are now painfully evident, and the volumes of incoming data have climbed to the point where there is no other choice but to build systems capable of handling such data in native script.

Adopting Unicode as the underlying representation for electronic text is a major step forward, but integrating software modules which can intelligently process Unicode is also essential. This talk will survey a number of topics pertaining to building multilingual information systems for homeland security, including document summarization and triage; intelligent and reversible transliteration; fuzzy name matching; cross-script search; and cross-lingual search. It will also discuss ways in which terrorist organizations are exploiting the Internet, and measures which are being taken by intelligence and law enforcement agencies to stay one step ahead of the bad guys.