22 Jul 2020
In C++11 and later, we have language support for lambda expressions,
so we can write this:
int count_even( int const * first, int const * last )
{
return std::count_if( first, last, []( int x )
{ return x % 2 == 0; } );
}
and this:
char const * find_whitespace( char const * first, char const * last )
{
return std::find_if( first, last, []( char ch )
{ return ch == ' ' || ch == '\t'
|| ch == '\r' || ch == '\n'; } );
}
but people
aren’t quite happy.
Compared to other languages, that’s a verbose way to express a simple inline
function object, they say.
And you know what? They are right. It is. Here’s how these looked like
in the dark ages of C++03, when we had to use primitive tools such as
Boost.Lambda:
int count_even( int const * first, int const * last )
{
return std::count_if( first, last, _1 % 2 == 0 );
}
char const * find_whitespace( char const * first, char const * last )
{
return std::find_if( first, last,
_1 == ' ' || _1 == '\t' || _1 == '\r' || _1 == '\n' );
}
We kind of went backwards here. How about we recreate the Boost.Lambda
experience in C++14, using modern technology; how many lines would
that take?
I asked a highly representative sample of C++ developers (two people
chosen at random) and they answered “3,000” and “5,000”, respectively.
They couldn’t have been more wrong if they tried. The correct answer is 52.
Yes, implementing a lambda library that makes the above two examples
compile and work literally fits into a blog post, as I shall demonstrate:
#include <functional>
#include <type_traits>
template<class T, class T2 = std::remove_reference_t<T>>
using is_lambda_expression = std::integral_constant<bool,
std::is_placeholder<T2>::value ||
std::is_bind_expression<T2>::value>;
template<class A> using enable_unary_lambda =
std::enable_if_t<is_lambda_expression<A>::value>;
#define UNARY_LAMBDA(op, fn) \
template<class A, class = enable_unary_lambda<A>> \
auto operator op( A&& a ) \
{ \
return std::bind( std::fn<>(), std::forward<A>(a) ); \
}
template<class A, class B> using enable_binary_lambda =
std::enable_if_t<is_lambda_expression<A>::value ||
is_lambda_expression<B>::value>;
#define BINARY_LAMBDA(op, fn) \
template<class A, class B, class = enable_binary_lambda<A, B>> \
auto operator op( A&& a, B&& b ) \
{ \
return std::bind( std::fn<>(), std::forward<A>(a), \
std::forward<B>(b) ); \
}
BINARY_LAMBDA(+, plus)
BINARY_LAMBDA(-, minus)
BINARY_LAMBDA(*, multiplies)
BINARY_LAMBDA(/, divides)
BINARY_LAMBDA(%, modulus)
UNARY_LAMBDA(-, negate)
BINARY_LAMBDA(==, equal_to)
BINARY_LAMBDA(!=, not_equal_to)
BINARY_LAMBDA(>, greater)
BINARY_LAMBDA(<, less)
BINARY_LAMBDA(>=, greater_equal)
BINARY_LAMBDA(<=, less_equal)
BINARY_LAMBDA(&&, logical_and)
BINARY_LAMBDA(||, logical_or)
UNARY_LAMBDA(!, logical_not)
BINARY_LAMBDA(&, bit_and)
BINARY_LAMBDA(|, bit_or)
BINARY_LAMBDA(^, bit_xor)
UNARY_LAMBDA(~, bit_not)
This even generates reasonably
efficient code.
What’s the trick here?
The trick is that the author of std::bind
included, quite consciously,
a few foundational bits that make implementing the above lambda library
trivial, as long as we have function objects corresponding to all the
operators we want to support; and then, in C++14 a different unrelated proposal
added
the exact function objects we need.
In the corporate circles this is known as “synergy”.
21 Jul 2020
When manipulating variants, one often needs to perform different actions
depending on the type in the variant. For example, when a
std::variant<int, float>
contains an int x
, we want to perform
std::printf( "int: %d\n", x );
but in the case it contains a float x
, we want
std::printf( "float: %f\n", x );
One way to do this is to define an appropriate function object
struct V
{
void operator()(int x) const
{
std::printf( "int: %d\n", x );
}
void operator()(float x) const
{
std::printf( "float: %f\n", x );
}
};
and then call std::visit
with it:
void f( std::variant<int, float> const & v )
{
std::visit( V(), v );
}
This works, but having to define
a dedicated function object is not as convenient as just using a lambda.
Lambdas, however, can’t have more than one operator()
.
One solution that has been suggested
is to have an overload
function that, when passed several lambdas, creates
a composite function object with an overloaded operator()
out of its
arguments.
However, this doesn’t strike me as very elegant. We combine our
several actions into one, pass it to std::visit
, which then needs to
undo this composition in order to call the right overload depending on
the containing type.
How about we throw all this composition and decomposition away and
just define a visitation function that takes the lambdas directly, and
invokes the correct one based on the containing type? Or better yet,
based on the variant index, so that we no longer need to depend on
the types being unique?
template<class V, class... F> void visit_by_index( V&& v, F&&... f );
// Effects: if v.index() is I, calls the I-th f with get<I>(v).
This would allow us to just do
void f( std::variant<int, float> const & v )
{
visit_by_index( v,
[](int x)
{
std::printf( "int: %d\n", x );
},
[](float x)
{
std::printf( "float: %f\n", x );
}
);
}
Is visit_by_index
a one liner using Mp11?
template<class V, class... F> void visit_by_index( V&& v, F&&... f )
{
constexpr auto N = std::variant_size<
std::remove_reference_t<V> >::value;
boost::mp11::mp_with_index<N>( v.index(), [&]( auto I )
{
std::tuple<F&&...> tp( std::forward<F>(f)... );
std::get<I>(std::move(tp))( std::get<I>(std::forward<V>(v)) );
} );
}
Almost, at least in its initial iteration. To make this more suitable
for production use, we need to handle the valueless_by_exception()
case,
and check that the number of function objects matches the number
of variant alternatives:
template<class V, class... F> void visit_by_index( V&& v, F&&... f )
{
if( v.valueless_by_exception() )
{
throw std::bad_variant_access();
}
constexpr auto N = std::variant_size<
std::remove_reference_t<V> >::value;
static_assert( N == sizeof...(F),
"Incorrect number of function objects" );
boost::mp11::mp_with_index<N>( v.index(), [&]( auto I )
{
std::tuple<F&&...> tp( std::forward<F>(f)... );
std::get<I>(std::move(tp))( std::get<I>(std::forward<V>(v)) );
} );
}
How does this work?
mp_with_index<N>(i, f)
(which requires i
to be between 0
and N-1
inclusive) calls f
with std::integral_constant<std::size_t, i>
(by generating a big switch
over the possible values of i
.)
f
in our case is a lambda object of the form [&](auto I){ ... }
, so I
gets to be a variable of type std::integral_constant<std::size_t, i>
.
Since std::integral_constant<T, K>
has a constexpr
conversion operator
to T
returning K
, we can use I
in places where a constant expression
of type size_t
is expected, such as std::get<I>
.
std::tuple<F&&...> tp( std::forward<F>(f)... );
creates a tuple referencing
the function objects passed as arguments. std::get<I>(std::move(tp))
returns the I
-th one of them, properly forwarded.
std::get<I>(std::forward<V>(v))
, since I
is v.index()
, returns a
reference to the type contained into v
, again properly forwarded. Therefore,
std::get<I>(std::move(tp))( std::get<I>(std::forward<V>(v)) );
calls
the correct function object with the correct argument. QED.
20 Jul 2020
I was asked how can Mp11 be used to implement a combinations
metafunction, one that, given a list of types and a number,
produces all sublists of the given length.
For instance, combinations<2, std::tuple<int, void, float>>
needs
to produce std::tuple<int, void>
, std::tuple<int, float>
, and
std::tuple<void, float>
.
This is not a one liner. Far from it, in fact:
struct Q1;
struct Q2;
struct Q3;
template<std::size_t N, class L> using combinations =
mp_invoke_q<
mp_cond<
mp_bool<N == 0>, Q1,
mp_bool<N == mp_size<L>::value>, Q2,
mp_true, Q3>,
mp_size_t<N>, L>;
// N == 0
struct Q1
{
template<class N, class L> using fn = mp_list<mp_clear<L>>;
};
// N == mp_size<L>
struct Q2
{
template<class N, class L> using fn = mp_list<L>;
};
// else
struct Q3
{
template<class N, class L, class R = mp_pop_front<L>> using fn =
mp_append<
mp_transform_q<
mp_bind_back<mp_push_front, mp_front<L>>,
combinations<N::value - 1, R>
>,
combinations<N::value, R>
>;
};
The general idea is to “simply” take all the combinations of length N
that contain the first element, by recursively taking all combinations
of length N-1
from the elements excluding the first, then add to that
all combinations not containing the first element, by recursively taking
all combinations of length N
from the elements excluding the first.
However, some metaprogramming limitations make the code more complicated
than it needs to be. First, template aliases can’t be forward-declared,
and can’t be used in their own definitions, so it’s not possible to
define a recursive alias template in one go.
Second, the eager nature of template aliases makes it hard to write
“short-circuiting” metafunctions. We want to break the recursion when
N
is 0 or when N
is equal to the length of the list, and in these two
special cases, we don’t want to perform the above recursive steps.
What we need to do in order to sidestep these limitations is to use
qualified metafunctions (Q1
, Q2
, Q3
above.) Since qualified
metafunctions are ordinary structs, we can forward-declare them, and the
act of selecting one of them using mp_cond
doesn’t evaluate the ones
that remain unselected.
19 Jul 2020
LLVM, the back end of Clang, does something remarkable with this
ordinary-looking call to memchr
:
bool is_whitespace( char ch )
{
return std::memchr( " \t\r\n", ch, 4 ) != 0;
}
It turns it into this:
is_whitespace(char):
cmp dil, 64
setb cl
movabs rax, 4294977024
bt rax, rdi
setb al
and al, cl
ret
What’s going on here? This doesn’t bear any resemblance to what we wrote!
Well, the first step of the explanation is that LLVM recognizes this
memchr(...) != 0
pattern and understands that it’s equivalent to
bool is_whitespace_2( char ch )
{
return ch == ' ' || ch == '\t' || ch == '\r' || ch == '\n';
}
It so happens that LLVM generates something
slightly different for is_whitespace_2
,
because the pattern in is_whitespace_2
engages a different code path
in the optimizer.
Nevertheless,
if we use MSVC instead,
we can get something very close to the code produced for is_whitespace
:
bool is_whitespace_2(char) PROC
cmp cl, 32
ja SHORT $LN5@is_whitesp
mov rax, 4294977024
bt rax, rcx
jae SHORT $LN5@is_whitesp
mov al, 1
ret 0
$LN5@is_whitesp:
xor al, al
ret 0
bool is_whitespace_2(char) ENDP
The constant 4294977024 here is 100002600 in hex, and has bits 9 ('\t'
), 10
('\r'
), 13 ('\n'
), and 32 (' '
) set.
So, basically, what this does is
bool is_whitespace_3( unsigned char ch )
{
uint64_t mask = (1ull << ' ') | (1ull << '\t')
| (1ull << '\r') | (1ull << '\n');
return ch <= 32 && ((1ull << ch) & mask);
}
instead of doing four comparisons.
Fine, but why does LLVM recognize the above memchr
use at all? Nobody would
write the is_whitespace
function this way.
Well, it turns out that the find_first_not_of
member function of std::string
and std::string_view
is commonly implemented as something along the lines of
std::size_t string_view::find_first_not_of( string_view s ) const
{
char const * p = data();
std::size_t n = size();
for( std::size_t i = 0; i < n; ++i )
{
if( std::memchr( s.data(), p[i], s.size() ) == 0 ) return i;
}
return npos;
}
Note the memchr(...) == 0
pattern in the loop. So when someone compiles
std::size_t f( std::string_view s )
{
return s.find_first_not_of( " \t\r\n" );
}
with Clang, the memchr
optimization kicks in under both
libstdc++ and libc++.
Cheaters, all of them.
18 Jul 2020
Suppose you need a function tuple_all_true
that, when invoked on a tuple,
returns whether all of its elements are true
when interpreted as booleans.
Why? Well, you might have a
std::tuple<std::optional<X>, std::optional<Y>, std::optional<Z>>
and want
to check whether all the optionals are engaged.
Can you do that with Mp11? Yes, you can.
template<class... T> bool tuple_all_true( std::tuple<T...> const& tp )
{
bool r = true;
boost::mp11::tuple_for_each( tp, [&](auto&& e){ r = r && e; } );
return r;
}
mp11::tuple_for_each
invokes the supplied function object on each
element of the tuple, so we use a lambda that checks whether the
element evaluates as true
.
If we could use a for
loop here, we would probably have used an
early return
, something like (using a
hypothetical
for...
syntax)
for...(auto e: tp)
{
if( !e ) return false;
}
return true;
Unfortunately, if we use return
in our lambda, it would just return
from the lambda, not the containing function, so we need to use the less
intuitive approach of accumulating the result into a variable r
.
This may look inefficient – the loop will visit every e
instead of
stopping early – but today’s compilers are smart enough to figure out
that r && e
does not evaluate e
when r
is false
, and
generate reasonably efficient code.
A tip when using Compiler Explorer: the MSVC output is significantly
cleaner when you pass the -Zc:inline
option, as the compiler doesn’t
then emit the unreferenced inline functions.