Configuration File Design

Ram Viswanadha
2005-09-26
Draft

The tools that convert LDML data to different formats are currently ladden with special cases for generating correct data. It would be nice to move these special cases out of code to a configuration file. The tools would then read this file and generate the data. The configuration file can be included in the build.xml. An ant plug-in would read the appropriate tags and generate the data according to the rules spelled out. The syntax for the plug-in in build.xml is as given below:

<target name="icu-locales" description="builds locale files in ICU text format">
    <cldr-build toolName="LDML2ICUConverter">
        <!-- launch the tool and generate the data after reading the config file -->
        <run type="locales">
            <args>
                <arg name="sourcedir" value="../../common/main" />
                <arg name="destdir" value="${env.ICU4C_LOCALES_DIR}"/>
                <arg name="extras-dir" value="${env.ICU4C_LOCALES_DIR}../xml"/>
                ....
            </args>
            <!-- http://ant.apache.org/faq.html#xml-entity-include --!>
            <import file="./icu-config.xml"/>
        </run>
        <run type="collation">
            <args>
                <arg name="sourcedir" value="../../common/collation" />
                <arg name="destdir" value="${env.ICU4C_LOCALES_DIR}"/>
                <arg name="extras-dir" value="${env.ICU4C_LOCALES_DIR}../xml"/>
                ....
            </args>
            <!-- http://ant.apache.org/faq.html#xml-entity-include --!>
            <import file="./icu-config.xml"/>
        </run>
        <!-- launch the tool and create colfiles.mk, refiles.mk and others -->   
        <run name="makefiles">
            ....
        </run>
    </cldr-build>
</target>
<target name="posix-locales" />
    <cldr-build toolName="GeneratePOSIX">
         <run type="locales">
            <args>
               ...
            </args>
            <import file="./posix-config.xml"/>
        </run>
    </cldr-build>
</target>

The Configuration file can be organized in the following ways.
  1. Configuration for generating a list of locales or all non draft locales.
    e.g:
    1. Generate all non draft locales
      <config>
      <locales>
      <include locale=".*" allDraft="false">
      <!--
      allDraft= true|false|.* - this line matches if locale regex and ALL items are draft=true/false.
      '.*' is the default, means draft is ignored. allDraft is a regex.
      locale=.* a Perl compatible regular expression.
      -->
      </locales>
      </config>
    2. Generate the non draft locales for the given list
      <config>
      <locales>
      <include locale="de.*" allDraft="false">
      <exclude "fi"/>
      <include "f.*"/>
      <exclude ".*"/>
      <!--
      The rules are additive. So the above rules mean: Include all 'de' locales,
      exclude 'fi' locale but include other locales starting with 'f' and exclude
      all other locales!
      -->
      </locales>
      </config>
    3. Generate the locales for the given list irrespective of the draft status
      <config>
      <locales>
      <include locale="de.*" allDraft=".*"> <!-- ignore the draft status-->
      <exclude "fi"/>
      <include "f.*"/>
      <exclude ".*"/>
      </locales>
      </config>
  2. Ignore draft status of certain nodes in the LDML document. In LDML2ICUConverter we currently ignore the draft status of following nodes:
      1. exemplarCharacters
      2. Min days, first day
      3. Weekend data
      4. Symbols data
      5. Currencies

      Examples:
      1. Ignore the draft status on specified nodes for all locales
        <config>
        <locales>
        <include locale=".*">
        </locales>
        <paths>
        <include xpath="//ldml/.*" draft="false"/>
        <!--
        draft=true|false|.*
        Only incldude data from nodes that are not marked draft.
        -->
        <exclude xpath="//ldml/.*/weekendData/.*" draft=".*"/>
        </paths>
        </config>
      2. Ignore the draft status on specified nodes for specifed locales
        <config>
        <locales>
        <include locale=".*">
        </locales>
        <paths>
        <include xpath="//ldml/.*" draft="false"/>
        <exclude xpath="//ldml/.*/weekendData/.*" draft=".*" locale="de be_IN bla" />
        <!--
        Ignore draft status on node weekendData but only for the locales specified
        -->
        </paths>
        </config>
  3. Filter certain nodes
    1. Ignore the specified node for all locales irrespective of the draft status
      <config>
      <locales>
      <include locale=".*" draft="false">
      </locales>
      <paths>
      <exclude xpath="//ldml/.*/fields/.*"/>
      <include xpath="//ldml/.*"/>
      </paths>
      </config>
    2. Ignore the specified node for specified locales irrespective of the draft status
      <config>
      <locales>
      <include locale=".*">
      </locales>
      <paths>
      <exclude xpath="//ldml/.*/fields/.*" draft=".*" locale="en fi"/>
      <include xpath="//ldml/.*" draft="false"/>
      </paths>
      </config>
  4. Prefer the nodes that have the alternate attribute set.
    1. Prefer the nodes that are marked alt irrespective of the draft status
      <config>
      <locales>
      <include locale=".*" draft="false">
      </locales>
      <paths>
      <exclude xpath="//ldml/.*/fields/.*"/>
      <include xpath="//ldml/.*" preferAlt="variant proposed default" draft=".*"/>
      <!-- preferAlt=variant, proposed, default,.*.
      The value of this attribute is a preference list.
      The ordering of values determines which node will be picked.
      -->
      </paths>
      </config>
    2. Prefer only the nodes that are marked alt.
      <config>
      <locales>
      <include locale=".*">
      </locales>
      <paths>
      <exclude xpath="//ldml/.*/languages/.*" preferAlt="variant" draft=".*"/>
      <exclude xpath="//ldml/.*/countries/.*" preferAlt="proposed" draft=".*"/>
      <include xpath="//ldml/.*" draft="false"/> 
      </paths>
      </config>
  5. Override mechanism for the default fallback or explicit fallback in the data at build time.
    <config>
    <locales>
    <include locale=".*">
    </locales>
    <paths>
    <exclude xpath="//ldml/.*/delimiters/.*" preferAlt="variant" draft=".*"/>
    <exclude xpath="//ldml/.*/measurements/.*" preferAlt="proposed" draft=".*"/>
    <include xpath="//ldml/.*" draft="false"/>
    </paths>
    <overrideFallback fallback="se_NO no_NO">
    <locales>
    <include locale="sms_FI" allDraft="false">
    </locales>
    <paths>
    <include xpath="//ldml/displayNames"/>
    <include xpath="//ldml/dates/timeZoneNames"/>
    </paths>
    </overrideFallback>
    </config>
  6. For generating stub files for deprecated locales we hard code the data for alias generation in the configuration file. A test will be wirtten for veryfing that the list of  aliases are in synch with list in supplementalData.xml.
    <!-- collation aliases -->
    <deprecates type="collation">
    <alias from="de__PHONEBOOK" to="de@collation=phonebook" xpath="//ldml/collations/default[@type='phonebook']"/>
    <alias from="es__TRADITIONAL" to="es@collation=traditional" xpath="//ldml/collations/default[@type='traditional']"/>
    <alias from="in" to="id" />
    ...
    </deprecates>
    <!-- locale aliases (main) -->
    <deprecates type="main" >
    <alias from="in" to="id" />
    <alias from="in_ID" to="id_ID" />
    <alias from="iw" to="he" />
    ...
    </deprecates>

    PROS:
CONS: