Re: "minus" usage

From: Philippe Verdy (
Date: Wed Aug 10 2005 - 08:05:24 CDT

  • Next message: "Re: Cp1256 (Windows Arabic) Characters not supported by UTF8"

    From: "JFC (Jefsey) Morfin" <>
    >I need to create a key for a multilingual database (terms entered in
    >various scripts). The key will therefore to be numeric (0-9).However I need
    >a separator "999-778". I suppose there may be dash homographs in some
    >scripts. May I trust there is a single "minus" being used as part of the
    >numbering sequence in every script and specify the format as "4 numerics,
    >minus, 5 numerics" ?
    > Thank you for the help.

    Do you mean that your translation resources may use distinct keys according
    to the prefered script used to write a language?
    This is rather unusual. All modern languages recognize the Arabo-European
    digits and writers can mix them easily and unambiguously when entering text

    So you'd use a format like:
    0000=Some text
    for each resource property, where only "Some text" is to be translated, and
    the key is immutable.

    Here I use the "=" separator between the key and value. You may
    alternatively use a simple space, which is present in all scripts (even if
    they are not typically used within localized phrases to separate words, they
    are typically present to separate humane language and numbers or subphrases
    in other scripts).

    I see little reason why you would want to accept localized resource keys. It
    is possible however, but then if you want an additional separator within the
    key, which will not be a digit, it can't be a letter (absent from other
    scripts), and there will be little choices for the punctuation or symbol to
    use. The minus is good, but effectively you may accept any "dash" character
    (including the underscore).

    One solution is then to consider any character which is not a decimal digit
    (possibly localized) as a separator. So when parsing the resources file, you
    first split lines on the first space to get the localized key, then you
    transcode the localized digits into default digits, and all other characters
    which are not digits into a standard dash, and you have then a common key to
    store the resource value found after the first space.

    You may eventually be more strict about separator characters and only accept
    those that have a dash property, so that you can use other punctuation as
    additional semantically distinct separators for your resource keys.

    This archive was generated by hypermail 2.1.5 : Wed Aug 10 2005 - 08:06:14 CDT