Re: ldml dtd

From: Mark Davis (mark.davis@icu-project.org)
Date: Tue Aug 23 2005 - 12:27:15 CDT

  • Next message: Sinnathurai Srivas: "Re: 28th IUC paper - Tamil Unicode New"

    Some responses below.

    Theo Veenker wrote:

    > Mark Davis wrote:
    >
    >> Let me explain what is going on. Quite a bit of the structure of (and
    >> constraints on) LDML are in the specification, and cannot be
    >> encapsulated in the DTD. For most elements in LDML, we allow for
    >> alternate elements. So you could have the following, for example.
    >>
    >> <week>
    >> <minDays count="1"/>
    >> <firstDay day="sun"/>
    >> <firstDay day="mon" alt="financial" draft="true"/>
    >> <weekendStart day="sat"/>
    >> <weekendEnd day="sun"/>
    >> </week>
    >
    >
    > I see. But how does one know which alternate forms exist (in this
    > particular
    > case for example). It isn't a key/type option in a localeID. Of course
    > I see
    > the alternate forms when I parse them, but my application still
    > wouldn't know
    > which one applies.

    Sorry, the above example is illustrative; 'financial' doesn't actually
    occur. The available alt values are in Appendix K and L.

    The metadata we currently have is in
    <http://unicode.org/cldr/data/common/supplemental/supplementalData.xml>:
    search for "<metadata>"

    We have an RFE to add more metadata
    <http://dev.icu-project.org/cgi-bin/locale-bugs?findid=641>; we had
    originally intended to add it in 1.3, but delayed it to 1.4. (If you
    have any comments on 641 you can add a reply.)

    >
    >>
    >> You may ask: how about XML Schema? While this would better than a DTD
    >> in describing more of the structure, it would still be far from
    >> complete. So it hasn't been a high priority because it wouldn't buy
    >> us that much.
    >
    >
    > Look like it's best to just ignore the DTD. <mumble>I hope to wake up one
    > morning to find out that XML and associated crap had never been invented.
    > Do we really have to XML-ize everything? Apparently yes, because the
    > format
    > is there and everybody else does.</mumble>

    The design of XML could have been significantly simplified, eg by not
    having CDATA or entities (except NCRs), only using UTF-8, etc. That
    being said, it is a huge improvement in terms of having a standardized
    format that all tools can use and interchange. So despite the few warts
    (many for historical reasons), it's definitely the way forward. Reminds
    me of a few other technologies...

    >
    >> What we have been doing is adding metadata to the supplemental data
    >> file so that particular areas can be mechanically checked. There are
    >> undoubtedly still areas where the description of the structure can be
    >> improved in the spec (see the working draft for the next release at
    >> http://unicode.org/cldr/data/docs/web/tr35.html) or where metadata
    >> can be added; if you have suggestions for improvements, you can file
    >> them at http://unicode.org/cldr/filing_bug_reports.html.
    >>
    >> (BTW, we are planning to move this particular element into the
    >> supplemental data in the next release; the goal is to only have
    >> language-based data in the locale files such as
    >> http://unicode.org/cldr/data/common/main/fr.xml, and all other data
    >> in the supplemental data file
    >> (http://unicode.org/cldr/data/diff/supplemental/supplemental.html:
    >> information about territories, currencies, scripts, timezones, etc.)
    >
    >
    > Sounds good. Will this move have taken place before the 1.4 Phase 2 Beta
    > Release, or will the restructuring continu until the final 1.4 release?

    Our goal is to get all the structural changes in early, and have the
    final phase be only gathering data.

    >
    > Theo
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Tue Aug 23 2005 - 12:29:06 CDT