Re: Letters missing for 19th century Latvian orthography?

From: Kenneth Whistler (
Date: Wed Aug 03 2005 - 14:13:56 CDT

  • Next message: Mark E. Shoulson: "Re: Jumping Cursor. Was: Right-to-Left Punctuation Problem"

    > According to sources (1) and (2), Latvian used some letters
    > with diagonal stroke in its 19th century orthography. These are
    > G,g, K,k, L,l, N,n, R,r, S,s, long s.

    Equivalent to the modern orthography:

    G-cedilla (0122, 0123), K-cedilla (0136, 0137), L-cedilla (013B, 013C),
    N-cedilla (0145, 0146), R-cedilla (0156, 0157), and presumably
    S- and Z-hacek.

    > See attached scans from (1) p.231 (Faulmann-p231.png) and from (2),
    > p.595 (Allen-p595.png).
    > Of these, only L,l are encoded in Unicode 4.1 (unless I overlooked
    > something; I doubt that G,g with diagonal stroke can be treated as
    > font variants of U+01E4, U+01E5).

    Yes. It would be unrelated to that, I think.

    > Is this sufficient evidence for encoding the missing ones?

    Sufficient evidence of the existence of the letterforms, sure.

    Sufficient reason for encoding as characters, no.

    > (As I have not any special knowledge of Latvian, I don't consider me
    > qualified to write a proposal).

    Karl continued:

    > Of course. But I remember somehow that combinations of letters with
    > things which cross or cover them will be treated as new encodeable characters,
    > unlike combinations of letters with diacritics which are attached to the
    > letter (like ogonek) or do not touch the letter at all (like macron).

    As *potentially* encodable characters. The existence of all kinds of
    overstruck forms of Latin letters doesn't automatically give them
    a free pass into the standard. The *need* for encoding them still
    has to be presented.
    > Is this assumption correct? If yes, is this documented somewhere?

    In UTC decisions. In particular, the recent 3 additions of slash
    overstruck letters (A-slash, C-slash, T-slash) were justified on
    the basis of their use in a contemporary orthography (of the
    Sencoten language). In such cases the likelihood of font glitching
    with attempts to compose glyphs on the fly is more problematical
    for users than in the case of scholars working on digital representations
    of historic texts in obsolete orthographies. I think there is
    an arguable difference in the needs factor here.

    For the representation of historic texts in obsolete orthographies,
    there is clearly the need to be able to represent the plain text
    content, but I think that is already covered, as Chris Jacobs
    suggested, by the existence of the overstruck combining marks
    (U+0338 is most appropriate for the Latvian case, I think).
    Specialized fonts may be needed to render such texts in fine
    detail, but that was *already* going to be true, since a
    lot of these texts for Latvian (Lettisch) were printed in
    Fraktur style, and would need special glyph treatment for the
    overstruck forms in *any* case.

    Redactors of 18th and 19th century Lettisch texts have a couple
    of options, it seems to me. On the one hand, they can simply
    be represented using the modern orthography -- which would probably
    be of most use to the readers of the texts. It is not beyond
    the realm of possibility in such an approach to develop a
    special display font which would simply map the modern characters
    to glyphs for the struck-through letterforms to display the
    text roughly as it was originally printed. There might be some
    issues with the long-s forms, which don't seem to map one-to-one,
    but I'm sure smart people can figure out the context rules for that.

    The other choice is to represent the context of the Lettisch texts
    in terms of the letters used *directly* -- namely encoding with
    <g, combining-slash-overlay> sequences, and so on. This would be
    more true to the surficial text context. And again, a specialized
    display font could be built that would dispaly the text as it
    would be read in the modern orthography.

    Some of these tasks might be a little easier if single code
    points were encoded for each of the old orthography slashed
    characters, but all it would amount to would be eliminating a
    few instances of 2-1 mappings in the processing. All the rest
    of the editorial preparation and presentation issues would
    end up being about the same.

    In short, I don't see a compelling needs case demonstrated
    here, yet.


    This archive was generated by hypermail 2.1.5 : Wed Aug 03 2005 - 14:14:58 CDT