Re: Unix Codes for Diacritics

From: Krishna Birth (krishnabirth@gmail.com)
Date: Mon Sep 20 2010 - 17:42:24 CDT

  • Next message: Krishna Birth: "Re: Xmodmap Project - Please contact if interested in cooperating"

    Hi

    Would you be able to deal with Xmodmap project -
    http://www.unicode.org/mail-arch/unicode-ml/y2010-m09/0042.html

    Best,

    Meeकu

    On Sat, Sep 18, 2010 at 10:42 AM, Richard Wordingham <
    richard.wordingham@ntlworld.com> wrote:

    > On Sat, 18 Sep 2010 00:06:07 +0100
    > Krishna Birth <krishnabirth@gmail.com> wrote:
    >
    > > Could someone please correctly tell the codes to use on Unix operating
    > > systems to produce the below diacritics:
    > >
    > > A
    > > Ā = http://www.fileformat.info/info/unicode/char/0100/index.htm
    > ...
    >
    > > I need to find this for a project/coder's question?
    >
    > If you are asking how to type these precomposed letters at a keyboard,
    > we need to know which Unix operating system you have in mind, and the
    > X-terminal model may be relevant. For example, if the X-terminal is
    > a Windows PC running Exceed, this may reduce to a Windows question.
    >
    > My answer is directed to what one would write in a program. It is
    > possible that more detail is required as to the coder's problem.
    >
    > The codepoint (i.e. number encoding the character) for these letters
    > is part of the name of the links you gave, e.g. the code for Ā is 0100
    > in hex.
    >
    > If you are simply trying to produce the single, precomposed character
    > in a program, the information is given in the table headed 'Encodings'
    > in the pages you referenced. It may be worth also giving the
    > information for the plain letter 'A' at
    > http://www.fileformat.info/info/unicode/char/0041/index.htm so that the
    > coder may understand the information better. UTF-8 is the encoding
    > which for most purposes can work on Unix in exactly the same fashion as
    > 8-bit codes (ASCII, ISO-8859, ISCII, TSCII), though multibyte EUC
    > encodings are a better analogy. (If the coder doesn't understand EUC,
    > it's not worth explaining.)
    >
    > For example, when I run a terminal window using the locale en_GB.utf8,
    > I can have the letter printed to the terminal by a bash script using
    > the command
    > % printf "\xc4\x80" # Use UTF-8 form explicitly
    > The printf of bash version 4.1.5(1) does not understand escape codes
    > using '\u'.
    >
    > On the other hand, /usr/bin/printf on the Linux system I'm using does,
    > and I could achieve the same effect using
    > % /usr/bin/printf "\u0100" # What happens in non-UTF-8 locales?
    >
    > If you want the codes for the diacritics themselves, so that the
    > letters you listed may be entered as plain Roman letter plus diacritic
    > mark, the information you need
    > is in http://www.unicode.org/Public/UNIDATA/UnicodeData.txt , with an
    > explanation in http://www.unicode.org/reports/tr44/#UnicodeData.txt .
    > As an example, consider the line for U+0100:
    >
    > 0100;LATIN CAPITAL LETTER A WITH MACRON;Lu;0;L;0041 0304;;;;N;LATIN
    > CAPITAL LETTER A MACRON;;;0101;
    >
    > The data items are separated by semicolons. The first two are the
    > codepoint, the number for the character, expressed in hecadecimal
    > notation. The second field gives the character name. The interesting
    > field for you may be the sixth field, which, unless it starts with
    > '<', gives another way of expressing the same character - in this case
    > as the sequence of <U+0041 LATIN CAPITAL LETTER A WITH MACRON, U+0304
    > COMBINING MACRON>.
    >
    > If you want to write the diacritics themselves without attaching them
    > to a letter, there are two or three methods. Firstly, you can
    > write them on a hardspace, e.g. <U+00A0 NO-BREAK SPACE, U+0304>. This
    > will not always work; using the spacing modifier letter is the safe way
    > of writing it. For this you need to look at their code chart. For the
    > macron, you will use <U+02C9 MODIFIER LETTER MACRON>. The third
    > method is to use the ISO-8859 characters, in this case <U+00AF
    > MACRON>. The drawback with the third method is that this is a symbol,
    > not a letter, and you may encounter bad line-breaking or the macron may
    > be combined with a preceding letter.
    >
    > Richard.
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Sep 20 2010 - 17:45:18 CDT