Files
XML files
The source form of our translations is XML. We have
designed a simple set of “tags” to capture the
organization of UDHR. We have a Relax-NG schema: in rnc syntax and in rng syntax.
The encoding of our files is UTF-8, with or without numeric
character entities.
Charcount files
The “charcount” files help ensure
that no strange character makes it in the data, and help spot easily
the presence of ambiguous characters. We do not count characters,
but rather clusters of characters, which are more or less combining
sequences.
Plain text files
The plain text files are encoded in UTF-8, and produced from
the XML files by applying this
XSLT stylesheet.
HTML files
The HTML files are encoded in UTF-8, and produced from
the XML files by applying this
XSLT stylesheet.
PDF files
For the PDF files, we selected fonts which we believe are
available without a fee to anyone working on this project. Many
thanks to the individuals and foundries who generously made those
fonts available. If we misinterpreted a license, please let us know
and accept our apologies. This site does not provide the fonts
themselves, but here are the places where we found them: