Getting A Newb Started

From: J (
Date: Mon Jul 07 2008 - 13:00:29 CDT

  • Next message: Kenneth Whistler: "Re: Proposal to add four characters for Kashmiri to the BMP of the UCS"

    Greetings all.

    I apologize if this is the incorrect forum for these kind of questions.
    I'll try to keep this email as clear and concise as possible.

    I'm trying to write an app that will be both open-source (GPL) and
    possibly sold closed source (on windows) using the same internal engine.
    I wish to have i18l and l10n but after about a month of research I'm
    somewhat at a loss.

    Things I've learned:
    the wchar_t definition is platform dependent. That makes it completely
    useless, which also makes all standard wide-char functions useless.

    Using wchar_t would double or quadruple the memory usage of my app.

    Only Asian countries benefit directly from defaulting to a wide
    character for internal app use, where the west would benefit from UTF-8
    internal use.

    There seems to be *no* standard or de-facto standard libraries to deal
    with unicode. It's either added ad-hoc to existing libraries or written
    from scratch for each app.

    I've seen ICU from IBM which seems pretty good but it was built with the
    "java mentality" and doesn't conform very well to linux practices and
    seems to be overkill for most apps (20M Runtime Download?!?!). It also
    doesn't have a compiler for it's resource files. Also, sticking all
    character strings in a resource file doesn't sit well with me.

    And So,
    My question(s) to you all are:
    If ICU is big, bloated and doesn't follow conventions, and glib doesn't
    handle all the things necessary (string manipulation), is there a good
    library that handles unicode well and doesn't come along with megs of
    unnecessary things? (Glib has tons of stuff I wouldn't be using)

    I would like to use UTF-8 internally within my app as it seems much less
    memory intensive, but that would also mean I have to rewrite every
    single string and char manipulation function myself to deal with UTF-8
    (wow what a chore!). Is there a better way to deal with that?

    Am I barking up the wrong tree here? A lot of people say to use UTF-16
    internally and convert to UTF-8 for output...

    Please forgive my ignorance, I've simply been unable to find a *good*
    howto on unicode programming that doesn't contradict other guides.

    I appreciate any help you can give me.



    This archive was generated by hypermail 2.1.5 : Mon Jul 07 2008 - 13:13:03 CDT