DTD downloading from Unicode.org

From: Rick McGowan (rick@unicode.org)
Date: Mon Jan 19 2009 - 14:19:19 CST

  • Next message: Werner LEMBERG: "Ossetian letter missing in Unicode?"

    This message is relevant to anyone who uses tools that download any of
    the DTDs from the unicode.org site, which includes browsers that read
    any of the XML files on the unicode.org site.

    If you don't know what I'm talking about, you can safely skip this message.

    DTDs are used in XML parsing. DTDs relevant to CLDR are found on the
    Unicode web site, and appear in declarations at the tops of CLDR files,
    for example:

      <!DOCTYPE ldml SYSTEM "http://www.unicode.org/cldr/dtd/1.6/ldml.dtd">

    The DTD for a specific version of LDML never changes. Thus, if you have
    a specific released version of LDML (currently 1.6 and before) and have
    ever downloaded the DTD once, you can cache it and keep it around, and
    use it whenever you parse CLDR data from the same version.

    However, some processes apparently download DTD files repeatedly during
    parsing. Repeated and unnecessary DTD downloading currently accounts for
    a significant percentage of the outbound network bandwidth on the
    unicode.org server. We sometimes see in the log files a single machine
    downloading one of the DTD files hundreds of times in quick succession.
    This causes an undue burden on the server, slowing down server response
    significantly, making our website appear slower than necessary.

    We would strongly encourage people to use a caching implementation, such
    as is used by CLDR itself (see
    http://unicode.org/cldr/data/tools/java/org/unicode/cldr/util/CachingEntityResolver.java).
    If this continues to be a problem, we may have to resort to some kind of
    throttling operation to limit or block downloading from specific IP
    addresses.

    Regards,
       Rick McGowan
       Unicode, Inc.



    This archive was generated by hypermail 2.1.5 : Mon Jan 19 2009 - 14:21:52 CST