RE: Is there any unambiguous vowel length mark code point for classicists?

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Oct 27 2010 - 19:40:25 CDT

  • Next message: Peter Constable: "RE: Samogitian E with dot above and macron"

    Gy. Dobner asked:

    > But my original question was not how to encode a combining macron in one
    > more possible way but how to encode a length mark that would display as
    > something _visually_ _distinguishable_ _from_ _a_ _macron_ (because the
    > macron is functionally ambiguous and hence unsuitable for my purposes).
    > Is it e.g. possible (i.e. is it Unicode-compliant) to combine a macron with
    > some
    > non-displaying character for this purpose, and if so, with which
    > non-displaying
    > character? I understand that ZERO-WIDTH JOINER is not supposed to be used
    > in this way (or am I mistaken?).

    But this is the wrong question.

    The Unicode Standard encodes characters for scripts (and writing
    systems).

    It doesn't provide a standard for the representation of syllabic
    structure or other phonological constructs per se.

    Even if you are using some phonetic transcription system like
    IPA, which is used as a technical system for representing
    sounds, the Unicode Standard's encoding of that is one step
    removed. It is the International Phonetic Association that
    defines how IPA characters, marks, and other conventions are
    used to specify linguistic sounds. What the Unicode Standard
    does, in turn, is encode those character and marks for
    digital representation on computers.

    So I think you have the cart before the horse here.

    What you (or the Classicist community in general) need to
    do is specify orthographical conventions for the representation
    of whatever length distinctions you are trying to systematically
    distinguish.

    That could be with a colon. It could be with the IPA length
    mark. It could be with a doubled-macron. It could be with
    some entirely different diacritic. It could be with some
    other visible convention.

    Once you know *what* you want to write for this, *then* you
    ask, how can this written text be represented in Unicode
    characters, so I can enter, transmit, print, and otherwise
    process it on computers.

    It isn't a matter of some hidden format code in Unicode that
    normatively denotes lengthiness. Rather, you decide what you
    want to write and print for the distinction you need to make.

    Hint: Pick some *other* diacritic that already exists in
    the Unicode Standard. That way, you won't need to spend two
    years hassling with the character encoding process to add some
    newly invented mark which isn't yet encoded.

    Hint #2: Pick some diacritic mark that is already widely
    supported in system fonts. That way you won't need to spend
    years hassling system vendors to add the glyphs you need,
    or scouring the web looking for custom fonts, in order to
    be able to easily display your research on the web and
    with easily available tools.

    --Ken



    This archive was generated by hypermail 2.1.5 : Wed Oct 27 2010 - 19:42:39 CDT