Re: "textels"

From: Janusz S. Bień <jsbien_at_mimuw.edu.pl>
Date: Thu, 15 Sep 2016 21:56:32 +0200

On Thu, Sep 15 2016 at 21:27 CEST, eliz_at_gnu.org writes:

[...]

> Isn't "grapheme cluster" the definition you are looking for?

I don't think so.

On Thu, Sep 15 2016 at 21:27 CEST, leoboiko_at_namakajiri.net writes:
> Isn't the Swift "character" and the "textel" merely the same thing as
> what Unicode already named "grapheme clusters"? (Well, technically UAX
> #29[1] defines them as "user-perceived characters", but then says
> grapheme clusters approximate user-perceived characters
> algorithmically).
>
> And, indeed, Swift "Characters" are explicitly defined as "extended
> grapheme clusters" (also from UAX #29):
>
> https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/StringsAndCharacters.html
>
> Such a notion is indeed needed, but it has been always there.
>
> [1] http://unicode.org/reports/tr29/

Perhaps I don't understand properly the rather obscure definitions, like

        An extended grapheme cluster is the same as a legacy grapheme
        cluster, with the addition of some other characters.

However:

1. Graphemes, if I understand correctly, are language dependent, textels
are not.

2. Textel "ń" means both U+0144 and <U+006E,U+0301>, so it is a notion
on a higher abstraction level then a grapheme cluster.

Moreover I don't want to call <U+006E,U+0301> (LATIN SMALL LETTER N,
COMBINING ACUTE ACCENT) an extended grapheme cluster for at least 2
reasons:

1. there is nothing extended in it
2. U+0301 is not a grapheme according to Polish linguistics terminology

Regards

Janusz

-- 
                           ,   
Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsbien@uw.edu.pl, jsbien@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/
Received on Thu Sep 15 2016 - 14:57:02 CDT

This archive was generated by hypermail 2.2.0 : Thu Sep 15 2016 - 14:57:02 CDT