Re: Getting A Newb Started

From: J (j@intuitivecreations.com)
Date: Mon Jul 07 2008 - 15:28:09 CDT

  • Next message: Kenneth Whistler: "Re: Normalisation and directionality (was: how to add all latin (and greek) subscripts)"

    Thank you Kenneth, William and Noah for your help and insight!

    It sounds like ICU is the best approach as it covers all bases. I'll
    just have to get used to the java-ey feel of it and of course the 20M
    run-time download ;)

    It uses UTF-16 internally which they claim is "enough for everyone" (and
    who am I to question the minds at IBM??), so I guess 4 bytes (or is that
    UTF-32 and UTF-16 uses 2 bytes?) instead of 1 byte for each Char really
    isn't that bad considering the ease of programming it allows.

    Heh, I just gotta say one thing:
    "Of course some might argue you shouldn't be writing programs in C
     anymore unless you have a really good reason."

    Who the heck would argue that?!?!
    GIVE ME NAMES!!
    hehe

    I write system stuff in C (daemons, command line apps) and GUI programs
    in C++.

    I never drank the java kool-aid, it just wasn't my path...

    Again, thanks for all your help!

    j

    On Mon, 2008-07-07 at 12:31 -0700, Noah Levitt wrote:
    > Take a look at glib -- "GLib is the low-level core library that forms
    > the basis for projects such as GTK+ and GNOME". All strings are
    > regular char* in utf-8. There's this stuff:
    > http://library.gnome.org/devel/glib/unstable/glib-Unicode-Manipulation.html
    > but the whole library is built on utf-8, all the i/o abstractions and
    > everything.
    >
    > Of course some might argue you shouldn't be writing programs in C
    > anymore unless you have a really good reason.
    >
    > Noah
    >
    > On Mon, Jul 7, 2008 at 11:00, J <j@intuitivecreations.com> wrote:
    > > Greetings all.
    > >
    > > I apologize if this is the incorrect forum for these kind of questions.
    > > I'll try to keep this email as clear and concise as possible.
    > >
    > > I'm trying to write an app that will be both open-source (GPL) and
    > > possibly sold closed source (on windows) using the same internal engine.
    > > I wish to have i18l and l10n but after about a month of research I'm
    > > somewhat at a loss.
    > >
    > > Things I've learned:
    > > the wchar_t definition is platform dependent. That makes it completely
    > > useless, which also makes all standard wide-char functions useless.
    > >
    > > Using wchar_t would double or quadruple the memory usage of my app.
    > >
    > > Only Asian countries benefit directly from defaulting to a wide
    > > character for internal app use, where the west would benefit from UTF-8
    > > internal use.
    > >
    > > There seems to be *no* standard or de-facto standard libraries to deal
    > > with unicode. It's either added ad-hoc to existing libraries or written
    > > from scratch for each app.
    > >
    > > I've seen ICU from IBM which seems pretty good but it was built with the
    > > "java mentality" and doesn't conform very well to linux practices and
    > > seems to be overkill for most apps (20M Runtime Download?!?!). It also
    > > doesn't have a compiler for it's resource files. Also, sticking all
    > > character strings in a resource file doesn't sit well with me.
    > >
    > > And So,
    > > My question(s) to you all are:
    > > If ICU is big, bloated and doesn't follow conventions, and glib doesn't
    > > handle all the things necessary (string manipulation), is there a good
    > > library that handles unicode well and doesn't come along with megs of
    > > unnecessary things? (Glib has tons of stuff I wouldn't be using)
    > >
    > > I would like to use UTF-8 internally within my app as it seems much less
    > > memory intensive, but that would also mean I have to rewrite every
    > > single string and char manipulation function myself to deal with UTF-8
    > > (wow what a chore!). Is there a better way to deal with that?
    > >
    > > Am I barking up the wrong tree here? A lot of people say to use UTF-16
    > > internally and convert to UTF-8 for output...
    > >
    > > Please forgive my ignorance, I've simply been unable to find a *good*
    > > howto on unicode programming that doesn't contradict other guides.
    > >
    > > I appreciate any help you can give me.
    > >
    > > Cheers!
    > >
    > > j
    > >
    > >
    > >
    > >
    > >
    > >



    This archive was generated by hypermail 2.1.5 : Mon Jul 07 2008 - 15:29:20 CDT