Re: Unix Codes for Diacritics

From: Krishna Birth (krishnabirth@gmail.com)
Date: Mon Sep 20 2010 - 17:42:24 CDT

Next message: Krishna Birth: "Re: Xmodmap Project - Please contact if interested in cooperating"

Previous message: Krishna Birth: "Re: Unix Codes for Diacritics"
In reply to: Richard Wordingham: "Re: Unix Codes for Diacritics"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Would you be able to deal with Xmodmap project -
http://www.unicode.org/mail-arch/unicode-ml/y2010-m09/0042.html

Best,

Meeकu

On Sat, Sep 18, 2010 at 10:42 AM, Richard Wordingham <
richard.wordingham@ntlworld.com> wrote:

> On Sat, 18 Sep 2010 00:06:07 +0100
> Krishna Birth <krishnabirth@gmail.com> wrote:
>
> > Could someone please correctly tell the codes to use on Unix operating
> > systems to produce the below diacritics:
> >
> > A
> > Ā = http://www.fileformat.info/info/unicode/char/0100/index.htm
> ...
>
> > I need to find this for a project/coder's question?
>
> If you are asking how to type these precomposed letters at a keyboard,
> we need to know which Unix operating system you have in mind, and the
> X-terminal model may be relevant. For example, if the X-terminal is
> a Windows PC running Exceed, this may reduce to a Windows question.
>
> My answer is directed to what one would write in a program. It is
> possible that more detail is required as to the coder's problem.
>
> The codepoint (i.e. number encoding the character) for these letters
> is part of the name of the links you gave, e.g. the code for Ā is 0100
> in hex.
>
> If you are simply trying to produce the single, precomposed character
> in a program, the information is given in the table headed 'Encodings'
> in the pages you referenced. It may be worth also giving the
> information for the plain letter 'A' at
> http://www.fileformat.info/info/unicode/char/0041/index.htm so that the
> coder may understand the information better. UTF-8 is the encoding
> which for most purposes can work on Unix in exactly the same fashion as
> 8-bit codes (ASCII, ISO-8859, ISCII, TSCII), though multibyte EUC
> encodings are a better analogy. (If the coder doesn't understand EUC,
> it's not worth explaining.)
>
> For example, when I run a terminal window using the locale en_GB.utf8,
> I can have the letter printed to the terminal by a bash script using
> the command
> % printf "\xc4\x80" # Use UTF-8 form explicitly
> The printf of bash version 4.1.5(1) does not understand escape codes
> using '\u'.
>
> On the other hand, /usr/bin/printf on the Linux system I'm using does,
> and I could achieve the same effect using
> % /usr/bin/printf "\u0100" # What happens in non-UTF-8 locales?
>
> If you want the codes for the diacritics themselves, so that the
> letters you listed may be entered as plain Roman letter plus diacritic
> mark, the information you need
> is in http://www.unicode.org/Public/UNIDATA/UnicodeData.txt , with an
> explanation in http://www.unicode.org/reports/tr44/#UnicodeData.txt .
> As an example, consider the line for U+0100:
>
> 0100;LATIN CAPITAL LETTER A WITH MACRON;Lu;0;L;0041 0304;;;;N;LATIN
> CAPITAL LETTER A MACRON;;;0101;
>
> The data items are separated by semicolons. The first two are the
> codepoint, the number for the character, expressed in hecadecimal
> notation. The second field gives the character name. The interesting
> field for you may be the sixth field, which, unless it starts with
> '<', gives another way of expressing the same character - in this case
> as the sequence of <U+0041 LATIN CAPITAL LETTER A WITH MACRON, U+0304
> COMBINING MACRON>.
>
> If you want to write the diacritics themselves without attaching them
> to a letter, there are two or three methods. Firstly, you can
> write them on a hardspace, e.g. <U+00A0 NO-BREAK SPACE, U+0304>. This
> will not always work; using the spacing modifier letter is the safe way
> of writing it. For this you need to look at their code chart. For the
> macron, you will use <U+02C9 MODIFIER LETTER MACRON>. The third
> method is to use the ISO-8859 characters, in this case <U+00AF
> MACRON>. The drawback with the third method is that this is a symbol,
> not a letter, and you may encounter bad line-breaking or the macron may
> be combined with a preceding letter.
>
> Richard.
>
>
>

Next message: Krishna Birth: "Re: Xmodmap Project - Please contact if interested in cooperating"
Previous message: Krishna Birth: "Re: Unix Codes for Diacritics"
In reply to: Richard Wordingham: "Re: Unix Codes for Diacritics"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Sep 20 2010 - 17:45:18 CDT