    As a programmer myself, I actually followed that explanation. But I wonder if
    it's the right approach. Would it not be a more ... interesting ... approach,
    to forget Flex, and instead write a brand new Unicode lexer generator which
    generates a lexer that processes characters (not bytes)?

    Just a thought

    A lexer generator like Flex does not process Unicode directly, it generates a
    lexer that processes bytes.

