DTD downloading from Unicode.org

From: Rick McGowan (rick@unicode.org)
Date: Mon Jan 19 2009 - 14:19:19 CST

Next message: Werner LEMBERG: "Ossetian letter missing in Unicode?"

Previous message: Kenneth Whistler: "Re: UCD.html and simple titlecase"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This message is relevant to anyone who uses tools that download any of
the DTDs from the unicode.org site, which includes browsers that read
any of the XML files on the unicode.org site.

If you don't know what I'm talking about, you can safely skip this message.

DTDs are used in XML parsing. DTDs relevant to CLDR are found on the
Unicode web site, and appear in declarations at the tops of CLDR files,
for example:

<!DOCTYPE ldml SYSTEM "http://www.unicode.org/cldr/dtd/1.6/ldml.dtd">

The DTD for a specific version of LDML never changes. Thus, if you have
a specific released version of LDML (currently 1.6 and before) and have
ever downloaded the DTD once, you can cache it and keep it around, and
use it whenever you parse CLDR data from the same version.

However, some processes apparently download DTD files repeatedly during
parsing. Repeated and unnecessary DTD downloading currently accounts for
a significant percentage of the outbound network bandwidth on the
unicode.org server. We sometimes see in the log files a single machine
downloading one of the DTD files hundreds of times in quick succession.
This causes an undue burden on the server, slowing down server response
significantly, making our website appear slower than necessary.

We would strongly encourage people to use a caching implementation, such
as is used by CLDR itself (see
http://unicode.org/cldr/data/tools/java/org/unicode/cldr/util/CachingEntityResolver.java).
If this continues to be a problem, we may have to resort to some kind of
throttling operation to limit or block downloading from specific IP
addresses.

Regards,
Rick McGowan
Unicode, Inc.

Next message: Werner LEMBERG: "Ossetian letter missing in Unicode?"
Previous message: Kenneth Whistler: "Re: UCD.html and simple titlecase"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jan 19 2009 - 14:21:52 CST