conference title image



Conference Home Page | Register Now! | Program | Accommodation | Conference Board | Exhibitor Showcase

Past Conferences | Proceedings | Sponsors | Sponsorship | Travel | Unicode Standard



Creating a Software Internationalization Requirements Taxonomy


Andrea Vine - Sun Microsystems, Inc.


Sun has created a comprehensive software internationalization requirements document and checklist, called a taxonomy, which is available to the public. How did we do it? What were the considerations and pitfalls? How can you use it to improve international support in your software products? The requirements taxonomy itself will be described, along with its history and basis. Its use for education, roadmapping, and internationalization status will be demonstrated. I will give you ideas on customizing this tool for your products, such as including implementation specifics for standards and technologies.





Localization in Microsoft.NET Using Multiple and Shared Sources


Bill Hall - MLM Associates, Inc.


The localization model in Microsoft .NET differs significantly from the older Windows model. At first glance, .NET localization seems to be very similar to Java, especially if your only experience has been with .NET's file-based (loose) resources. Indeed, you may come away with the idea that you only have to replace underscores in file names with dashes and you are practically done. In fact, .NET localization has a number of unique features that can add versatility to your application, whether it be console, form, or web based. In this talk, we show how to use single and multiple source files for a single language and how to share resources across multiple languages. We will also explain the ensuing directory structure, required naming conventions, and access methods. Illustrations will come from console programs, Windows forms, and if time permits, web forms. For the latter, we shall also demonstrate using resources and inner html to modify the contents within opening and closing tags of HTML server controls.





Categorizing, Describing, and Evaluating Software for Human Language Technology


Jennifer DeCamp - MITRE Corporation


In the last few years, there has been a rapid increase in the development of human language technologies and in the use of these technologies in new applications. For instance, Machine Translation (MT) is now a common element in search tools and collaborative computing. There has also been an increase both in the separation of tools (e.g., morphological analyzers that used to be sold as part of a larger application are now sold separately)and in integration into larger solutions. There are also more types of tools. With such a proliferation of types of tools and types of applications, it becomes very challenging to provide categorization, description, and assessment for these tools. In addition, the field of language technology evaluation has been rapidly evolving.


MITRE has been dealing with these issues in their work for the U.S. Government to provide Internet information on language tools. They would like to share their approach and results, and obtain input from marketers, systems engineers, standards developers, and others on how best to represent both products and user needs.





New Internationalization Features of the Java(TM) Platform


Naoto Sato - Sun Microsystems, Inc. &

Craig Cummings - Oracle Corporation


At past Unicode Conferences, Rich Gillam, Tom McFarland and others have presented tutorials on the topic of Java Internationalization. Their presentations covered internationalization APIs in the Java Class Libraries in great detail. This tutorial is intended to take up where they left off - covering the latest and greatest enhancements in Java internationalization.

In particular, this tutorial will focus on the new enhancements for J2SE v5.0 including Unicode supplementary character support, new Vietnamese locale data, font support improvements, I/O printf API internationalization, true Unicode application handling, and more.





XML and Localization


Yves Savourel - ENLASO Corporation


The purpose of this session is to examine the different aspect of XML from the viewpoint of localization.


After a very brief summary of the basics of XML, the tutorial explores in more details the different internationalization aspects of the markup language. It discusses how to develop XML document types for an easier localization process, and the different ways XML can be used during such process, even with non-XML data. Translation of XML documents is also discussed. The whole session is illustrated by many concrete examples and demonstrations.


Conclusion of the session: Today, despite a few issues, XML and XML-related technologies offer one of the best environments to store, manipulate, localize, and present data in various languages.


Who will benefit: Localization engineers and managers dealing XML projects, developers and author of localizable data, and anyone who is interested in understanding better how to take advantage of XML in the localization process.


Benefits: The attendees will get a better vision of how XML technologies can be integrated within their localization processes, sometime going back to the authoring and development process. They will also get a feel for the various problems that may occur when using XML for localization.





Introduction to Internationalization in Microsoft .NET


Bill Hall - MLM Associates, Inc.


The .NET platform is Microsoft's new computing environment designed to consolidate its plethora of disparate technologies developed over the years into a single object-oriented programming paradigm with end-to-end support of Unicode. Unlike many past initiatives of this type, .NET arrived with a rich set of classes and a programming paradigm designed to facilitate development that meets worldwide requirements. In this presentation, an overview of internationalization and localization support in .NET will be explained and illustrated by client and web demonstrations. Emphasis will be on the vital role that Unicode plays in .NET combined with a discussion of the members of the Globalization namespace including language and regional information, calendars, common formats (dates, times, numbers, currencies, and percentages), textual analysis, collation, and the ease of separating code and user interface data in both client and web applications. Notes will be supplied from a book on .NET internationalization written by the presenter and published by MultiLingual Computing.





Web Internationalization: Standards and Practice


Tex Texin - XenCraft &

Yves Savourel - ENLASO Corporation



• Attendees will be able to design and implement Web documents and Web applications that will work properly for users around the world.

• Attendees will also be able to design and implement multinational and multilingual Web documents and Web applications.

• Attendees will be able to identify standards and features that should be included in Requests for Proposals (RFPs) or Requests for Quotes (RFQs), or that should be evaluated before purchase of software for use on the Web.


Attendees of this tutorial will learn about the architecture of the Web with respect to character processing and the facilities of markup languages for internationalization. The tutorial also identifies which features are currently implemented by browsers and which are not.


The Web can be considered a single application, all parts of which must work together. To be a world-wide web, these parts must work for every country, language, and culture. Internationalization is important to ensure that users world-wide can equally benefit from Web technology.


This tutorial is an introduction to internationalization on the World Wide Web. The audience will learn about the standards that provide for global interoperability and come away with an understanding of how to work with multilingual data on the web.

Character representation and the Unicode-based Reference Processing Model are described in detail. HTML, XHTML, XML (eXtensible Markup Language; for general markup), and CSS (Cascading Style Sheets; for styling information) are given particular emphasis. The tutorial addresses language identification and selection, character encoding models and negotiation, text presentation features, and more. The design and implementation of multilingual Web sites and localization considerations are also introduced.


Topics covered include:

• Unicode Character Encoding Model

• Character Encoding Negotiation

• W3C Reference Processing Model

• Language Identifiers

• Character Representation and Transformation: Quotes, Casing, Escaping

• Choices: Unicode versus Markup

• W3C Normalization

• String Indexing

• Numbered Listings

• Sorting

• Bidirectional And Vertical Text

• Ruby Text

• Multinational and Multilingual File and Directory Organization

• Language Selection


Continually Updated, Refreshed and Reviewed.


This tutorial is continually updated to reflect the most recent versions of Web standards and the actual behaviors of the latest browsers (particularly Internet Explorer, Netscape Navigator/Mozilla and Opera).


The Web Internationalization Tutorial is reviewed by Web standards and internationalization experts on the Unicode Conference Technical Review Committee. The tutorial has been delivered at Unicode Conferences in Dublin (IUC21), San Jose, CA (IUC22), Prague (IUC23), Atlanta, GA (IUC24), and Washington, D.C.(IUC25).





The Dao of Unihan


Deborah Goldsmith - Apple Computer, Inc.


Over half of the characters in the Unicode Standard are ideographs. This ideographic repertoire, termed Unihan, is intended to provide complete coverage for all the characters in current or past use in all varieties of Chinese, Japanese, Korean, and Vietnamese.


In this talk, we will give an overview of the structure of the current repertoire of Unihan and its organization. We will discuss some practical implementation issues and how to deal with them.


We will also provide an overview of the Unihan database. This is a large body of normative and informative data which is maintained by the Unicode Consortium and included among the data files which are a part of each release of the standard. We will discuss the nature of the data in the database and how it can be used.





An Introduction to Writing Systems: A review of script characteristics affecting computer-based script support and Unicode


Richard Ishida - W3C


The tutorial will provide you with a good understanding of the many unique characteristics of non-Latin writing systems, and illustrate the problems involved in implementing such scripts in products. It does not provide detailed coding advice, but does provide the essential background information you need to understand the fundamental issues related to Unicode deployment, across a wide range of scripts. It has also proved to be an excellent orientation for newcomers to the conference, providing the background needed to assist understanding of the other talks!


The tutorial goes beyond encoding issues to discuss characteristics related to input of ideographs, combining characters, context-dependent shape variation, text direction, vowel signs, ligatures, punctuation, wrapping and editing, font issues, sorting and indexing, keyboards, and more.


The concepts are introduced through the use of examples from Chinese, Japanese, Korean, Arabic, Hebrew, Thai, Hindi/Tamil, Russian and Greek.


While the tutorial is perfectly accessible to beginners, it has also attracted very good reviews from people at an intermediate and advanced level, due to the breadth of scripts discussed. No prior knowledge is needed.





Authentic Arabic


Thomas Milo - DecoType


The first part of this tutorial has the subtitle "Backgrounds". It is in this part,that I want to elucidate with relevant historical information about the development of the alphabet in general and the Arabic alphabet in particular.

The second part,with the subtitle "Aesthetic and Technical Challenges" dwells on the problems and solutions relative to mechanical reproduction of Arabic. During the talk I shall expand with a more precise account of the excellent Middle Eastern typographic technologies and why they vanished during the first half of the 20th century.

Together they will serve as a case study of cross-cultural technology.





Unicode 4.0 Tutorial: Unicode Algorithms


Asmus Freytag - ASMUS, Inc.


The Unicode Standard and related specifications by the Unicode Consoirtum specify a number of algorithms. The specification of these algorithms in the Unicode Standard depends on the Unicode Character Properties. Part III surveys the algorithms specified in the Unicode Standard, and extends the discussion of Unicode character properties as they relate to each algorithm. Part III provides answers to these questions:


• What is a Unicode Algorithm?

• How is an abstract algorithm different from an actual implementation?

• How does it relate to Unicode Character Properties?

• What is Unicode Normalization?

• What requirements does it address?

• What is a Unicode Normalization form?

• What is the actual specification of NFC, NFD, NFKC, NFKD?

• What do I need to know in applying normalization?

• How does Normalization interact with the web?

• What is the Unicode Bidirectional Algorithm?

• How is it defined and how does it interact with other text layout tasks?

• When do I need to support it?

• How do I determine text boundaries and line breaks?

• What are the issues?

• What resources does the Unicode Standard provide?

• Is any specific type of support required?

• What are character foldings?

• How do character transformation interact with Normalization.

Part III is very detailed and will touch on the description of algorithms and other material that may require some familiarity with technical concepts.





Associating Character Encoding and Language Information with HTML, XHTML and CSS Files


Richard Ishida - W3C


This short tutorial explains how to associate information about character encodings and natural language with XHTML, HTML and CSS pages. Exploration of these topics has uncovered considerable divergences of opinion. Things are not so straightforward as they may seem. The tutorial incorporates the latest thinking and developments from the W3C. It also takes into account whether XHTML is served as text/html or XML, and the distinction between HTML/XHTML served in 'standards' vs 'quirks' mode. These approaches have an important bearing on the question. The tutorial assumes a basic familiarity with HTML, XHTML and CSS.





Creating Bidi XHTML/HTML Pages


Richard Ishida - W3C


This short tutorial explains how to go about creating XHTML and HTML pages containing text written in the Arabic or Hebrew scripts.


The tutorial examines how best to achieve the correct effect for these bidirectional scripts using appropriate markup, CSS properties and Unicode code points or entities. It covers the basics, and goes beyond to provide recommended techniques for some of the tricky situations that even native speakers can struggle with.


The tutorial assumes a basic familiarity with the bidirectional characteristics of Arabic and Hebrew, as well as a basic knowledge of HTML and CSS.



Conference Day 1 Keynote


Becoming a Business without Borders


Donald DePalma - Common Sense Advisory, Inc.


We often lose sight of just why we spend as much time as we do in building internationalized applications — it's to make money in a global economy. The presentation will define the concept of the "real world enterprise," outline the requirements, and provide some actionable steps on the journey to this more global enterprise. I raise the issues of internationalization, translation, localization, and market adaptation to the level of corporate investment, governance, trust, and competitiveness.


Companies like Airbus, GE, IBM, Merck, and Toyota have approached this transcendent global state and now operate on a planetary level. No longer just multinational, these "world enterprises" gird the planet with development labs, centers of competence, adaptive manufacturing plants, and customer service centers. Ideally they develop goods and services from inception with the goal of meeting the needs of many markets; they rely on in-country subsidiaries to identify the local value proposition and adapt their world-ready products to appeal to national tastes. They design their wares, manufacture them, and manage supporting data and document repositories to comply with national legislation and conventions.


To be successful, other companies will have to follow their lead. The demands placed on these aspiring world enterprises go far beyond translating documentation and localizing software, but extend into adapting internal and outward-facing transactional systems, tailoring global marketing through websites and other means, and managing the vast amounts of collaborative, support, and other business information required for legal operation in their various markets.



Conference Day 2 Keynote


The Windows Language Roadmap or When Do We Get Rongo-Rongo?


John McConnell - Microsoft Corporation


The confluence of new technologies, especially Unicode and the worldwide web, and new business opportunities, especially in China and southern Asia, has created enormous demand for language support in Microsoft products. This session describes how Microsoft is responding to this demand, how we prioritize, how we research, what are our long term goals, and what obstacles we face. I will also try to provide some perspective comparing today's business with that when I first became involved in globalization and speculating on how the business may evolve.



Conference Welcome Address


Welcome Address


Lisa Moore - Software Engineer, IBM Corporation, Co-Chair, Internationalization and Unicode Conferences


Lisa Moore, Software Engineer, of IBM Corporation and Conference Board Co-Chair of the International Unicode Conferences, will conduct the opening address and welcome all attendees, speakers and exhibitors to this Twenty-sixth Internationalization and Unicode Conference in San Jose, CA



Special Event Dinner


Dave & Buster's


Thursday, September 9, 2004


All conference attendees are invited to join us at Dave & Buster's on September 9. D&B signature Attractions are a collection of diverse experiences that are surprising, amusing and always irresistibly cool. Join us at the Million Dollar Midway, featuring from classic arcade to cutting edge simulators. There's something for everyone. Dinner will be served.


Schedule of events:


• 6:30 p.m. - departure to venue per motor coach


• Then sit back and enjoy your buffet dinner


• Approximately 09:00 p.m. - return to hotel


All conference participants are welcome to attend.


We invite you to bring a guest(s) for a nominal fee of $65.00





Unicode 4.0 Tutorial: Core Concepts in Action


Asmus Freytag - ASMUS, Inc.


The first part of the Unicode Tutorial is a uniquely accessible and entertaining way to visualize the core concepts of the Unicode standard. In this part you will find answers to these questions:

• What is a Unicode character?

• How are Unicode characters represented?

• How do Unicode character codes fit into a modern computing environment?

• How are Unicode characters interchanged?

• What is the interaction between Unicode and rich text (markup)?

• What are Unicode character properties, and why are they important?

• How do end-users experience Unicode?


Throughout this part Unicode Tutorial highlights gives typical examples of how the Unicode Standard interacts with the other elements of an internationalized software architecture. With the help of concrete scenarios for the use of Unicode characters you will become familiar with the role the Unicode Standard plays and the benefits of supporting it.


This part is accessible to and recommended for audiences of all background.





Unicode 4.0 Tutorial: Fundamental Specifications


Asmus Freytag - ASMUS, Inc.


Part II of the Unicode Tutorial builds on the concepts introduced in Part I and systematically presents the details of fundamental specifications that are part of the Unicode Standard.


• What is the organization of the Unicode Code Space?

• What are the principles used to allocate and unify characters?

• What is Han Unification?

• What is a Unicode Encoding form?

• What is the actual definition of UTF-8, UTF-16, UTF-32?

• What is a byte order mark?

• Which encoding form should I select?

• What is the Unicode Character Property Model?

• Where are all the pieces that make up the Unicode Standard?


Part II is recommended for anyone interested in more detailed information.



B11 and B12


Common Locale Data: Process, Issues and Challenges


Steven Loomis (Moderator) &

George Rhoten - IBM, San Jose Globalization Center of Competency


In the internationalization arena, Unicode has provided a lingua franca for communicating textual data. But there remain differences in the locale data used for a variety of tasks, such as formatting dates and times according to the conventions of different languages. Many of those differences are simply gratuitous; all within acceptable limits for human beings, but resulting in software failure. In many other cases there are outright errors.


The Common Locale Data Repository is a project for the exchange of culturally sensitive (locale) information used in application and system development, and to gather, store, and make available data generated in that format. The Repository is intended to become a source for such data. By pooling resources, the time and expense of collecting good data can be minimized. As well, minority languages and small countries will have a focal point for submitting data.


In this panel, the speakers will first present an overview of the CLDR project and associated XML data formats. Then, the recent transition of the project to the auspices of the Unicode Consortium will be discussed. Finally, the panel will discuss some of the issues and difficulties in determining the correct 'common' data from among conflicting options.


The panel will consist of persons from multiple vendors involved in the data gathering and vetting process. Comments and questions will be welcomed from the audience.





Exhibitor's Panel


Tex Texin (Moderator) - XenCraft


Vendors of tools, products and services for the internationalization and localization will present the latest advances in their technologies and methodologies and describe the challenges that face the industry.


The panel format provides attendees an opportunity to ask questions of vendors in a public forum without fear of sales pressure. Vendors responses can be compared and attendees also benefit from the questions and comments of others.


Discussion among the audience is also promoted. Ideas and tactics for writing Requests for Proposals (RFPs), how to conduct evaluations of software against requirements, and other topics relevant to industry, government and academic users or those responsible for software acquisitions or hiring contractors, are open for discussion.


See the Invitation to Exhibit.


Tex Texin will moderate the panel.


The panel presentations and presenters are as follows: (tbd)



C8 and C9


The University and Unicode: Bridging the Gap


Deborah Anderson (Moderator) - Dept. of Linguistics, UC Berkeley


While the university has educated the leaders of the Unicode Consortium, its presence among Consortium members is currently lacking. Indeed, as the historic scripts and lesser-known remain to be encoded, the increasing presence of institutions from the higher education would seem to be a natural. Unicode is playing an increasingly important role in digital library projects and the development of free online scholarly publication


Representatives from the university who work on projects that involve lesser- known scripts will make up a panel to discuss the questions, specifically:

• Why is the university's presence missing in Unicode?

• The university's presence is needed for input on the scripts remaining to be encoded; how can it participate more fully?

• The university educates the next generation of engineers and scholars. Is Unicode being taught and do scholars understand its importance?

• What potentials are there for forging closer ties between joint university / industry specifically for scholars working on lesser-studied scripts? How can these be encouraged?


A question and answer period will conclude the session.





WS-I18N: Why Web Services are Not Internationalized.


Addison Phillips - webMethods, Inc.


Web services have been touted as the key to the next generation Internet products: the machine-to-machine equivalent of, say, HTML in terms of impact. This presentation examines the author's conclusions from studying Web services at webMethods and W3C: what's wrong and what needs fixing? It also presents a proposal for how a "WS-I18N" might look that addresses some of these problems, based on work with products such as webMethods Glue, a Web service product product.





What's in a Name? Handling Personal Names and Information in a Global Application


Addison Phillips - webMethods, Inc.


People's names, their presentation, collection, collation, and validation, are rich in cultural and linguistic variation and nuance. Handling people's personal information (which may also include gender, age, and other related information, as well as regulatory concerns) is a key problem when internationalizing an application that deals with this type of information. This presentation gives an introduction to the variations in name handling and demonstrates some different approaches to designing multilingual, multiculturally capable systems.





Mapping Text in Unspecified Character Sets to Unicode as a Canonical Representation in a Hostile Environment


David Clarke - Dragon Thoughts Ltd & Sheffield University


Content filtering of emails and web content from well formed sources can be difficult because of the proliferation of conventions in encoding non-ASCII text in email clients and web pages.


One manner of easing the burden is to remap content from the received character set to a Unicode representation before applying the filtering rules. In the real world, many email clients and web page authors fail to accurately declare the character sets used.


In a security environment the situation can be even worse, as hostile applications (including scripted viruses and spam engines) may intentionally make false declarations about their character sets to conceal information from scanners and filters.

The paper describes the author's approach to dealing with automated recognition of character sets during the development of content filtering software. There is a specific emphasis on the issues surrounding handling of the Japanese text from untrusted sources whilst working as consultant to an IT security company. The unique feature of this paper is that it describes a software engineering approach to systematically decoding text representations where the MIME header information regarding the encoding is incorrect or absent. The same techniques can be applied outside the security environment for other applications which display or process content aimed at human consumers.





Pattern Matching with Multilingual Regular Expressions


Weiran Zhang - Oracle Corporation


Regular expressions have long gained general popularity in most computing environments as a powerful tool for text and data pattern matching and manipulation. They offer a tremendous amount of processing power to a broad range of applications through a versatile and concise syntax that can be used to solve large and small problems alike. However, regular expression implementations are traditionally designed to support Western European data only, it follows that certain match concepts are not well-defined when extended to support multiple languages. It is therefore highly desirable to have a universal regular expression model that can work with all languages with different linguistic characteristics and be able to perform pattern matching in a locale-sensitive manner. The Unicode Regular Expression Guidelines (UTR#18) documents the general guidelines for adapting regular expression engines to support Unicode and describes the levels of support possible.


This paper explores the design and development of a multilingual regular expression engine capable of handling arbitrary number of languages and character sets. We will cover the support for locale-sensitive features such as Unicode character support, character properties, linguistic ranges, special collation elements, equivalence classes, common optimization techniques, performance considerations, and so on. We will survey the multilingual capabilities in the existing major regular expression packages and utilities, including Perl 5, Java, GNU, XML, etc. In conclusion, we will illustrate the ideas discussed by introducing the new multilingual regular expression features in the Oracle 10g release, which brings the power of complete multilingual regular expression search to Oracle database through native support in SQL and PL/SQL environments.





Supplementary Characters in the Java(TM) Platform


Norbert Lindenberg - Sun Microsystems, Inc.


This paper describes how supplementary characters are supported in the Java(TM) platform. Supplementary characters are characters in the Unicode standard whose code points are above U+FFFF, and which therefore cannot be described as single 16-bit entities such as the char data type in the Java programming language. Such characters are generally rare, but some are used, for example, as part of Chinese and Japanese personal names, and so support for them is commonly required for government applications in East Asian countries.


The Java platform has been enhanced to enable processing of supplementary characters with minimal impact on existing applications. New low-level APIs enable operations on individual characters where necessary. Most text-processing APIs, however, uses character sequences, such as the String class or character arrays. These are now interpreted as UTF-16 sequences, and the implementations of these APIs is changed to correctly handle supplementary characters.

Besides explaining these enhancements in detail, the paper also provides guidelines for application developers for determining and implementing necessary changes to enable use of the complete Unicode character set.





Telling Time Internationally


Bill Hall - MLM Associates, Inc.


Developers as a group would rather not think about the problems of writing code that supports world wide use. Yet, some of the most interesting problems occur here, and those of us that teach this subject try to combine "ordinary" development steps with internationalization coding to show that the two are closely intertwined and are essentially indistinguishable. The topic becomes particularly interesting when the same development problem combines multiple development issues such as a web or platform service and a client that uses the service to accomplish its task.


In this talk, we show how to build a simple program that introduces date, time, and calendar handling capabilities on Microsoft .NET. Specifically, we show how to write a locale neutral platform or web service that talks to the National Institute of Standard and Technologies' very accurate clock to obtain the current time and date as well as the status of daylight or standard time. Along the way, we encounter a bit of socket programming and learn how to exploit Julian dates to eliminate century ambiguities. We also explain how to build a client to use the service and show how to render all categories of dates and times in all supported .NET cultures and using not only the Gregorian calendar but two lunar and several era calendars as well. You will see how dates and times are classified in .NET, basic localization techniques, examples of several localized custom controls, and a demonstration of mirroring for bi-directional languages. As a final bonus, we will also show the application running on a hand-held device.





Building Global Internet Applications using the Oracle Globalization Development Kit


Simon Law - Oracle Corporation


Designing and developing a globalized J2EE application can be a daunting task, even for the most experienced developers. This presentation introduces and demonstrates a new Oracle JDK that can simplify and speed the task of globalizing your J2EE applications to support multiple languages and locales.


Oracle Database 10g introduces the Oracle Globalization Development Kit (GDK), which provides a framework for accelerating the development of globalized internet applications. The Oracle GDK complements existing features in Java while removing the complexity of developing global applications. It also provides key globalization features, brings them to the middle tier, and handles compatibility between the middle-tier application and the database server seamlessly.





Transliteration-Based Unicode Input Methods for OS X


Kenneth Beesley - Xerox Research Centre Europe


Transliteration-based Unicode input methods occupy a middle ground between the complicated input methods required for CJK and the keyboard layouts typically proposed for entering most alphabet scripts. The paper provides an introduction to transliteration theory, OS X input methods, and the implementation of n-graph input sequences in Apple's XML "keylayout" language. Finally, the paper introduces a new, more intuitive XML language for defining transliteration-based input methods and a program that translates them into the Apple XML language.





The Microsoft Windows Core Fonts - Recent Extensions and Updates


Simon Daniels - Microsoft Corporation


Contrary to recent reports in the German press Arial is not dead. In fact Microsoft has spent the last few years extending Arial, and the other Windows core fonts including Times New Roman, Tahoma and Microsoft Sans Serif to include complete support for every code-point in the Latin, Greek, Cyrillic, Hebrew and Arabic ranges of Unicode 4.0, and added OpenType Layout tables to support combining diacritics across the board. This paper will describe the processes involved in this project, the technical challenges faced by the developers and type designers involved, as well as a demonstration of the final fonts.


We hope the details of this project will serve as a useful model for those making and shipping core system-fonts, and that Microsoft's customers will see value in the investment made in these work-horse fonts.


Microsoft made the business decision to fill out the complete ranges rather than add individual code-points based on customer requests. Although in the short-term this may have resulted in a longer and more expensive project, in the medium to long term this decision will reduce support costs and on-going maintenance over the fonts.





Displaying Indic in Java


Craig Cummings - Oracle Corporation


With the releases of J2SE v1.4.X, Java added support for some Indic scripts and languages. This is practical look at how-tos of Indic display in Java.


In this session, I will talk about which platforms best support Java Indic display, what Indic fonts are available that work with Java, and how to configure Java to use these fonts -- including insights into modification. In previous Unicode conferences this presentation focused on Devanagari and Telugu scripts. This time, it will take a look at Gujarati, Punjabi, Tamil and possibly Bengali, Malayalam, and Oriya.





ICU Overview: The Open-Source Unicode Library, v3.0


Markus Scherer - IBM Corporation


In today's global market, it is crucial to develop and maintain programs that support a wide variety of national languages. Unicode is the foundation for dealing with text world-wide: it has been adopted by ALL major software vendors and modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, and CORBA.


ICU is the premier Unicode-enablement software library, providing a full range of services for supporting internationalization - especially in server environments. ICU is principally developed by IBM, and used in IBM products, but is also freely available as open-source. It provides cross-platform C, C++ and Java APIs, with a thread-safe programming model. The ICU project is licensed under the X License, which is compatible with GPL but non-viral; it can be freely incorporated into any product.


This paper will provide an overview of the ICU features, with special emphasis on the new features in the ICU 3.0 release (June 2004), and also discuss the planned features in the upcoming ICU 3.2 release (2004 Q4).



International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to

Conference Home Page | Register Now! | Program | Accommodation | Conference Board | Exhibitor Showcase

Past Conferences | Proceedings | Sponsors | Sponsorship | Travel | Unicode Standard

footer color bar