Re: ldml dtd

From: Theo Veenker (Theo.Veenker@let.uu.nl)
Date: Wed Aug 24 2005 - 04:09:52 CDT

  • Next message: Andrew West: "Re: GB 18030 Certification"

    Mark Davis wrote:
    > Some responses below.
    >
    > Theo Veenker wrote:
    >
    >> Mark Davis wrote:
    >>
    >>> Let me explain what is going on. Quite a bit of the structure of (and
    >>> constraints on) LDML are in the specification, and cannot be
    >>> encapsulated in the DTD. For most elements in LDML, we allow for
    >>> alternate elements. So you could have the following, for example.
    >>>
    >>> <week>
    >>> <minDays count="1"/>
    >>> <firstDay day="sun"/>
    >>> <firstDay day="mon" alt="financial" draft="true"/>
    >>> <weekendStart day="sat"/>
    >>> <weekendEnd day="sun"/>
    >>> </week>
    >>
    >>
    >>
    >> I see. But how does one know which alternate forms exist (in this
    >> particular
    >> case for example). It isn't a key/type option in a localeID. Of course
    >> I see
    >> the alternate forms when I parse them, but my application still
    >> wouldn't know
    >> which one applies.
    >
    >
    > Sorry, the above example is illustrative; 'financial' doesn't actually
    > occur. The available alt values are in Appendix K and L.
    >
    > The metadata we currently have is in
    > <http://unicode.org/cldr/data/common/supplemental/supplementalData.xml>:
    > search for "<metadata>"
    >
    > We have an RFE to add more metadata
    > <http://dev.icu-project.org/cgi-bin/locale-bugs?findid=641>; we had
    > originally intended to add it in 1.3, but delayed it to 1.4. (If you
    > have any comments on 641 you can add a reply.)
    >
    >>
    >>>
    >>> You may ask: how about XML Schema? While this would better than a DTD
    >>> in describing more of the structure, it would still be far from
    >>> complete. So it hasn't been a high priority because it wouldn't buy
    >>> us that much.
    >>

    What sort of things couldn't be described in an XML Schema approach?

    >>
    >> Look like it's best to just ignore the DTD. <mumble>I hope to wake up one
    >> morning to find out that XML and associated crap had never been invented.
    >> Do we really have to XML-ize everything? Apparently yes, because the
    >> format
    >> is there and everybody else does.</mumble>
    >
    >
    > The design of XML could have been significantly simplified, eg by not
    > having CDATA or entities (except NCRs), only using UTF-8, etc. That
    > being said, it is a huge improvement in terms of having a standardized
    > format that all tools can use and interchange. So despite the few warts
    > (many for historical reasons), it's definitely the way forward. Reminds
    > me of a few other technologies...

    Well if the DTD isn't sufficient to describe the LDML as accurate as you
    want to and you (the CLDR team) feel the need to add information in a
    supplemental XML file just to be able to check the main XML files, then
    I seriously doubt if XML is the way forward. I would say it is holding us
    back. But that may be just a matter of opinion. The CLDR data isn't to
    be modified by users or interchanged between people. It's just a set
    of system files which basically can have any desired format as long as
    it is formally defined and machine readable. Never mind, choices had to
    be made, fine.

    >
    >>
    >>> What we have been doing is adding metadata to the supplemental data
    >>> file so that particular areas can be mechanically checked. There are
    >>> undoubtedly still areas where the description of the structure can be
    >>> improved in the spec (see the working draft for the next release at
    >>> http://unicode.org/cldr/data/docs/web/tr35.html) or where metadata
    >>> can be added; if you have suggestions for improvements, you can file
    >>> them at http://unicode.org/cldr/filing_bug_reports.html.
    >>>
    >>> (BTW, we are planning to move this particular element into the
    >>> supplemental data in the next release; the goal is to only have
    >>> language-based data in the locale files such as
    >>> http://unicode.org/cldr/data/common/main/fr.xml, and all other data
    >>> in the supplemental data file
    >>> (http://unicode.org/cldr/data/diff/supplemental/supplemental.html:
    >>> information about territories, currencies, scripts, timezones, etc.)
    >>
    >>
    >>
    >> Sounds good. Will this move have taken place before the 1.4 Phase 2 Beta
    >> Release, or will the restructuring continu until the final 1.4 release?
    >
    >
    > Our goal is to get all the structural changes in early, and have the
    > final phase be only gathering data.
    >
    >>
    >> Theo
    >>
    >>



    This archive was generated by hypermail 2.1.5 : Wed Aug 24 2005 - 04:13:37 CDT