A C++14 Lambda Library

In C++11 and later, we have language support for lambda expressions, so we can write this:

int count_even( int const * first, int const * last )
{
    return std::count_if( first, last, []( int x )
        { return x % 2 == 0; } );
}

and this:

char const * find_whitespace( char const * first, char const * last )
{
    return std::find_if( first, last, []( char ch )
        { return ch == ' ' || ch == '\t'
            || ch == '\r' || ch == '\n'; } );
}

but people aren’t quite happy. Compared to other languages, that’s a verbose way to express a simple inline function object, they say.

And you know what? They are right. It is. Here’s how these looked like in the dark ages of C++03, when we had to use primitive tools such as Boost.Lambda:

int count_even( int const * first, int const * last )
{
    return std::count_if( first, last, _1 % 2 == 0 );
}

char const * find_whitespace( char const * first, char const * last )
{
    return std::find_if( first, last,
        _1 == ' ' || _1 == '\t' || _1 == '\r' || _1 == '\n' );
}

We kind of went backwards here. How about we recreate the Boost.Lambda experience in C++14, using modern technology; how many lines would that take?

I asked a highly representative sample of C++ developers (two people chosen at random) and they answered “3,000” and “5,000”, respectively.

They couldn’t have been more wrong if they tried. The correct answer is 52.

Yes, implementing a lambda library that makes the above two examples compile and work literally fits into a blog post, as I shall demonstrate:

#include <functional>
#include <type_traits>

template<class T, class T2 = std::remove_reference_t<T>>
    using is_lambda_expression = std::integral_constant<bool,
        std::is_placeholder<T2>::value ||
        std::is_bind_expression<T2>::value>;

template<class A> using enable_unary_lambda =
    std::enable_if_t<is_lambda_expression<A>::value>;

#define UNARY_LAMBDA(op, fn) \
    template<class A, class = enable_unary_lambda<A>> \
    auto operator op( A&& a ) \
    { \
        return std::bind( std::fn<>(), std::forward<A>(a) ); \
    }

template<class A, class B> using enable_binary_lambda =
    std::enable_if_t<is_lambda_expression<A>::value ||
        is_lambda_expression<B>::value>;

#define BINARY_LAMBDA(op, fn) \
    template<class A, class B, class = enable_binary_lambda<A, B>> \
    auto operator op( A&& a, B&& b ) \
    { \
        return std::bind( std::fn<>(), std::forward<A>(a), \
            std::forward<B>(b) ); \
    }

BINARY_LAMBDA(+, plus)
BINARY_LAMBDA(-, minus)
BINARY_LAMBDA(*, multiplies)
BINARY_LAMBDA(/, divides)
BINARY_LAMBDA(%, modulus)
UNARY_LAMBDA(-, negate)

BINARY_LAMBDA(==, equal_to)
BINARY_LAMBDA(!=, not_equal_to)
BINARY_LAMBDA(>, greater)
BINARY_LAMBDA(<, less)
BINARY_LAMBDA(>=, greater_equal)
BINARY_LAMBDA(<=, less_equal)

BINARY_LAMBDA(&&, logical_and)
BINARY_LAMBDA(||, logical_or)
UNARY_LAMBDA(!, logical_not)

BINARY_LAMBDA(&, bit_and)
BINARY_LAMBDA(|, bit_or)
BINARY_LAMBDA(^, bit_xor)
UNARY_LAMBDA(~, bit_not)

This even generates reasonably efficient code.

What’s the trick here?

The trick is that the author of std::bind included, quite consciously, a few foundational bits that make implementing the above lambda library trivial, as long as we have function objects corresponding to all the operators we want to support; and then, in C++14 a different unrelated proposal added the exact function objects we need.

In the corporate circles this is known as “synergy”.

Implementing visit_by_index

When manipulating variants, one often needs to perform different actions depending on the type in the variant. For example, when a std::variant<int, float> contains an int x, we want to perform

std::printf( "int: %d\n", x );

but in the case it contains a float x, we want

std::printf( "float: %f\n", x );

One way to do this is to define an appropriate function object

struct V
{
    void operator()(int x) const
    {
        std::printf( "int: %d\n", x );
    }

    void operator()(float x) const
    {
        std::printf( "float: %f\n", x );
    }
};

and then call std::visit with it:

void f( std::variant<int, float> const & v )
{
    std::visit( V(), v );
}

This works, but having to define a dedicated function object is not as convenient as just using a lambda. Lambdas, however, can’t have more than one operator().

One solution that has been suggested is to have an overload function that, when passed several lambdas, creates a composite function object with an overloaded operator() out of its arguments.

However, this doesn’t strike me as very elegant. We combine our several actions into one, pass it to std::visit, which then needs to undo this composition in order to call the right overload depending on the containing type.

How about we throw all this composition and decomposition away and just define a visitation function that takes the lambdas directly, and invokes the correct one based on the containing type? Or better yet, based on the variant index, so that we no longer need to depend on the types being unique?

template<class V, class... F> void visit_by_index( V&& v, F&&... f );

// Effects: if v.index() is I, calls the I-th f with get<I>(v).

This would allow us to just do

void f( std::variant<int, float> const & v )
{
    visit_by_index( v,

        [](int x)
        {
            std::printf( "int: %d\n", x );
        },

        [](float x)
        {
            std::printf( "float: %f\n", x );
        }
    );
}

Is visit_by_index a one liner using Mp11?

template<class V, class... F> void visit_by_index( V&& v, F&&... f )
{
    constexpr auto N = std::variant_size<
        std::remove_reference_t<V> >::value;

    boost::mp11::mp_with_index<N>( v.index(), [&]( auto I )
    {
      std::tuple<F&&...> tp( std::forward<F>(f)... );
      std::get<I>(std::move(tp))( std::get<I>(std::forward<V>(v)) );
    } );
}

Almost, at least in its initial iteration. To make this more suitable for production use, we need to handle the valueless_by_exception() case, and check that the number of function objects matches the number of variant alternatives:

template<class V, class... F> void visit_by_index( V&& v, F&&... f )
{
    if( v.valueless_by_exception() )
    {
        throw std::bad_variant_access();
    }

    constexpr auto N = std::variant_size<
        std::remove_reference_t<V> >::value;

    static_assert( N == sizeof...(F),
        "Incorrect number of function objects" );

    boost::mp11::mp_with_index<N>( v.index(), [&]( auto I )
    {
      std::tuple<F&&...> tp( std::forward<F>(f)... );
      std::get<I>(std::move(tp))( std::get<I>(std::forward<V>(v)) );
    } );
}

How does this work?

mp_with_index<N>(i, f) (which requires i to be between 0 and N-1 inclusive) calls f with std::integral_constant<std::size_t, i> (by generating a big switch over the possible values of i.)

f in our case is a lambda object of the form [&](auto I){ ... }, so I gets to be a variable of type std::integral_constant<std::size_t, i>.

Since std::integral_constant<T, K> has a constexpr conversion operator to T returning K, we can use I in places where a constant expression of type size_t is expected, such as std::get<I>.

std::tuple<F&&...> tp( std::forward<F>(f)... ); creates a tuple referencing the function objects passed as arguments. std::get<I>(std::move(tp)) returns the I-th one of them, properly forwarded.

std::get<I>(std::forward<V>(v)), since I is v.index(), returns a reference to the type contained into v, again properly forwarded. Therefore, std::get<I>(std::move(tp))( std::get<I>(std::forward<V>(v)) ); calls the correct function object with the correct argument. QED.

Combinations with Mp11

I was asked how can Mp11 be used to implement a combinations metafunction, one that, given a list of types and a number, produces all sublists of the given length.

For instance, combinations<2, std::tuple<int, void, float>> needs to produce std::tuple<int, void>, std::tuple<int, float>, and std::tuple<void, float>.

This is not a one liner. Far from it, in fact:

struct Q1;
struct Q2;
struct Q3;

template<std::size_t N, class L> using combinations =
    mp_invoke_q<
        mp_cond<
            mp_bool<N == 0>, Q1,
            mp_bool<N == mp_size<L>::value>, Q2,
            mp_true, Q3>,
        mp_size_t<N>, L>;

// N == 0
struct Q1
{
    template<class N, class L> using fn = mp_list<mp_clear<L>>;
};

// N == mp_size<L>
struct Q2
{
    template<class N, class L> using fn = mp_list<L>;
};

// else
struct Q3
{
    template<class N, class L, class R = mp_pop_front<L>> using fn =
        mp_append<
            mp_transform_q<
                mp_bind_back<mp_push_front, mp_front<L>>,
                combinations<N::value - 1, R>
            >,
            combinations<N::value, R>
        >;
};

The general idea is to “simply” take all the combinations of length N that contain the first element, by recursively taking all combinations of length N-1 from the elements excluding the first, then add to that all combinations not containing the first element, by recursively taking all combinations of length N from the elements excluding the first.

However, some metaprogramming limitations make the code more complicated than it needs to be. First, template aliases can’t be forward-declared, and can’t be used in their own definitions, so it’s not possible to define a recursive alias template in one go.

Second, the eager nature of template aliases makes it hard to write “short-circuiting” metafunctions. We want to break the recursion when N is 0 or when N is equal to the length of the list, and in these two special cases, we don’t want to perform the above recursive steps.

What we need to do in order to sidestep these limitations is to use qualified metafunctions (Q1, Q2, Q3 above.) Since qualified metafunctions are ordinary structs, we can forward-declare them, and the act of selecting one of them using mp_cond doesn’t evaluate the ones that remain unselected.

LLVM and memchr

LLVM, the back end of Clang, does something remarkable with this ordinary-looking call to memchr:

bool is_whitespace( char ch )
{
    return std::memchr( " \t\r\n", ch, 4 ) != 0;
}

It turns it into this:

is_whitespace(char):
        cmp     dil, 64
        setb    cl
        movabs  rax, 4294977024
        bt      rax, rdi
        setb    al
        and     al, cl
        ret

What’s going on here? This doesn’t bear any resemblance to what we wrote!

Well, the first step of the explanation is that LLVM recognizes this memchr(...) != 0 pattern and understands that it’s equivalent to

bool is_whitespace_2( char ch )
{
    return ch == ' ' || ch == '\t' || ch == '\r' || ch == '\n';
}

It so happens that LLVM generates something slightly different for is_whitespace_2, because the pattern in is_whitespace_2 engages a different code path in the optimizer.

Nevertheless, if we use MSVC instead, we can get something very close to the code produced for is_whitespace:

bool is_whitespace_2(char) PROC
        cmp     cl, 32
        ja      SHORT $LN5@is_whitesp
        mov     rax, 4294977024
        bt      rax, rcx
        jae     SHORT $LN5@is_whitesp
        mov     al, 1
        ret     0
$LN5@is_whitesp:
        xor     al, al
        ret     0
bool is_whitespace_2(char) ENDP

The constant 4294977024 here is 100002600 in hex, and has bits 9 ('\t'), 10 ('\r'), 13 ('\n'), and 32 (' ') set.

So, basically, what this does is

bool is_whitespace_3( unsigned char ch )
{
    uint64_t mask = (1ull << ' ') | (1ull << '\t')
        | (1ull << '\r') | (1ull << '\n');

    return ch <= 32 && ((1ull << ch) & mask);
}

instead of doing four comparisons.

Fine, but why does LLVM recognize the above memchr use at all? Nobody would write the is_whitespace function this way.

Well, it turns out that the find_first_not_of member function of std::string and std::string_view is commonly implemented as something along the lines of

std::size_t string_view::find_first_not_of( string_view s ) const
{
    char const * p = data();
    std::size_t n = size();

    for( std::size_t i = 0; i < n; ++i )
    {
        if( std::memchr( s.data(), p[i], s.size() ) == 0 ) return i;
    }

    return npos;
}

Note the memchr(...) == 0 pattern in the loop. So when someone compiles

std::size_t f( std::string_view s )
{
    return s.find_first_not_of( " \t\r\n" );
}

with Clang, the memchr optimization kicks in under both libstdc++ and libc++.

Cheaters, all of them.

Implementing tuple_all_true

Suppose you need a function tuple_all_true that, when invoked on a tuple, returns whether all of its elements are true when interpreted as booleans.

Why? Well, you might have a std::tuple<std::optional<X>, std::optional<Y>, std::optional<Z>> and want to check whether all the optionals are engaged.

Can you do that with Mp11? Yes, you can.

template<class... T> bool tuple_all_true( std::tuple<T...> const& tp )
{
    bool r = true;
    boost::mp11::tuple_for_each( tp, [&](auto&& e){ r = r && e; } );
    return r;
}

mp11::tuple_for_each invokes the supplied function object on each element of the tuple, so we use a lambda that checks whether the element evaluates as true.

If we could use a for loop here, we would probably have used an early return, something like (using a hypothetical for... syntax)

    for...(auto e: tp)
    {
        if( !e ) return false;
    }

    return true;

Unfortunately, if we use return in our lambda, it would just return from the lambda, not the containing function, so we need to use the less intuitive approach of accumulating the result into a variable r.

This may look inefficient – the loop will visit every e instead of stopping early – but today’s compilers are smart enough to figure out that r && e does not evaluate e when r is false, and generate reasonably efficient code.

A tip when using Compiler Explorer: the MSVC output is significantly cleaner when you pass the -Zc:inline option, as the compiler doesn’t then emit the unreferenced inline functions.