Named Parameters in C++20

A programming language supports named parameters when one can call a function supplying the parameters by name, as in the following hypothetical example (using C++ syntax):

void f( int x, int y );

int main()
{
    f( x = 1, y = 2 );
}

C++ is obviously not such a language and there have been numerous proposals to rectify this omission, unfortunately none of them successful. The latest attempt is Axel Naumann’s paper Self-explanatory Function Arguments, which tries to attack the problem from another angle by just allowing normal function calls to be tagged with the parameter name, as in

    f( x: 1, y: 2 );

enabling compilers to issue helpful warnings when a name doesn’t match, but not allowing one to omit, or reorder, arguments.

Even in this limited form, named parameters would still be immensely useful, but this is not what this post is about. What this post is about is that we can already achieve something very close to named parameters in C++20, by using a C99 feature called designated initializers.

Designated initializers allow one to initialize structures by member name, as in the following example:

struct A
{
    int x;
    int y;
};

A a1 = { .x = 1, .y = 2 };
A a2 = { .x = 3 }; // a2.y == 0
A a3 = { .y = 4 }; // a3.x == 0
A a4 = { .y = 5, .x = 6 }; // valid C, invalid C++ (reorder)

C++ introduces a restriction C doesn’t have: the initializers must follow the declaration order, similarly to how class member initalizers are executed in member declaration order. But in exchange, it allows us to supply default values:

struct A
{
    int x = 0;
    int y = 0;
};

A a3 = { .y = 4 }; // a3.x == 0, no warning

You can already see where this is going. Instead of

void f( int x, int y );

we declare

void f( A args );

and then call it like this:

int main()
{
    f({ .x = 1, .y = 2 });
}

This works under GCC and Clang even without -std=c++20 because they support designated initializers in earlier language modes as an extension, and it works under MSVC with -std:c++latest.

For a more realistic example, consider this snippet, taken from real code, that sets a 10 second timeout on a Boost.Beast websocket:

#include <boost/beast/websocket/stream.hpp>
#include <boost/beast/core/tcp_stream.hpp>
#include <chrono>

void f1(boost::beast::websocket::stream<boost::beast::tcp_stream>& ws)
{
    auto opt = boost::beast::websocket::stream_base::timeout();

    opt.keep_alive_pings = true;
    opt.idle_timeout = std::chrono::seconds(10);

    ws.set_option(opt);
}

Here’s how we can reformulate it by using the above idiom and <chrono> literals:

#include <boost/beast/websocket/stream.hpp>
#include <boost/beast/core/tcp_stream.hpp>
#include <chrono>

using namespace std::chrono_literals;

void f2(boost::beast::websocket::stream<boost::beast::tcp_stream>& ws)
{
    ws.set_option({ .idle_timeout = 10s, .keep_alive_pings = true });
}

Apart from the slightly awkward ({ ... }) syntax and the need to observe the right parameter order, that’s not that far from the ideal; and it’s considerably better than f1.

This also works for constructors. Consider this hypothetical vector class that is like std::vector, except with its various constructor overloads replaced with one taking named parameters:

template<class T, class A = std::allocator<T>> class vector
{
private:

    struct params
    {
        std::size_t size = 0;
        T element{};
        std::size_t capacity = 0;
        A allocator{};
    };

public:

    explicit vector( params p );
};

This is how it’s used:

auto f()
{
    vector<int> v{{ .size = 4, .element = 11, .capacity = 64 }};
    return v;
}

Again, apart from the odd {{ ... }} syntax, not that bad.

Why You Should Use the Boost Software License

Because it doesn’t require attribution for binaries.

All popular licenses – MIT, Apache, BSD – contain language similar to the following:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

And, in fact, so does the Boost license:

The copyright notices in the Software and this entire statement, including the above license grant, this restriction and the following disclaimer, must be included in all copies of the Software, in whole or in part, and all derivative works of the Software

except it continues with

unless such copies or derivative works are solely in the form of machine-executable object code generated by a source language processor.

and the others contain no such exemption.

For the purposes of copyright law, when you compile the source text, the resulting object code, library code or executable program is considered a derived work. That is, the original license terms still apply to it as they would have applied to a copy of the source code, processed in some other way (reformatted, for instance.)

What this means is that the requirement to include the copyright notice still applies. This, in practice, is met by either including the copyright notice in the documentation, having a dialog box or a --license command line option that displays the license, or sometimes both (lawyers like to be on the safe side.)

If you’re writing an open source C++ library, it’s much more convenient for your users if you don’t impose this attribution requirement for binaries. You still want it to apply to copies in source code form, just not to compiled code.

This is what the Boost Software License was created to enable, and this is why you should use it for your open source libraries.

The Boost Software License is not just for Boost libraries. Everyone can, and should, use it.

It’s true that it’s a requirement to get your code in Boost, but that’s not the only benefit. It can also get your code in standard library implementations. Microsoft’s STL, for example, is now open source on Github, but since Microsoft’s customers cannot abide by a binary attribution clause, code inside the STL can only use a license that doesn’t impose one. As explained by Stephan T. Lavavej in this Reddit comment, the two licenses that meet this requirement are the Boost Software License and the Apache 2.0 License with LLVM Exception, and the Boost license is simpler, clearer, better known, and already pre-approved in many organizations.

Use it. The C++ community will appreciate your generosity.

From SIMD to AST Extraction

Suppose we have the functions

float f( float x )
{
    return x * 2.0f + 1.0f;
}

float g( float x, float y )
{
    return f( x ) * 0.3f + f( y ) * 0.7f;
}

and we need to apply g to the two arrays x and y, storing the result in the array z:

void h( float const * x, float const * y, float * z, std::size_t n )
{
    for( std::size_t i = 0; i < n; ++i )
    {
        z[ i ] = g( x[ i ], y[ i ] );
    }
}

Nowadays, all major compilers automatically vectorize this code and generate SIMD instructions for it – at most, we need to pass -O3 instead of -O2 to GCC. But let’s suppose, for the sake of discussion, that it’s 2008, the compilers don’t autovectorize, and we still want to employ SIMD.

One elegant technique that allows us to keep our functions mostly unchanged is to convert them to templates:

template<class T> T f( T x )
{
    return x * 2.0f + 1.0f;
}

template<class T> T g( T x, T y )
{
    return f( x ) * 0.3f + f( y ) * 0.7f;
}

This still lets us call them with float as before, but it also enables us calling them with a SIMD pack of four floats:

using m128 = __attribute__(( vector_size( 4*sizeof(float) ) )) float;

so that we can now rewrite h to work at four elements at a time:

void h( float const * x, float const * y, float * z, std::size_t n )
{
    std::size_t i = 0;

    for( ; i + 3 < n; i += 4 )
    {
        m128 xi;
        std::memcpy( &xi, x + i, sizeof( m128 ) );

        m128 yi;
        std::memcpy( &yi, y + i, sizeof( m128 ) );

        m128 zi = g( xi, yi );

        std::memcpy( z + i, &zi, sizeof( m128 ) );
    }

    for( ; i < n; ++i )
    {
        z[ i ] = g( x[ i ], y[ i ] );
    }
}

OK, but what’s the point of all this in 2020?

Well, it turns out that templatizing our functions enables more than vectorization. We can pass other things to them. In particular, we can define a type that instead of doing calculations when operators such as + and * are applied to it, builds an abstract syntax tree instead.

This means that when we call g with this type, instead of the value g(x) at some point x, we can get a symbolic representation of the body of g.

To illustrate that, I will define a simple type Q that for reasons of brevity will build a string representation of the function body, instead of a proper syntax tree:

struct Q
{
    std::string s_;

    Q( std::string const & s ): s_( s ) {}
    Q( float x ): s_( std::to_string( x ) ) {}
};

Q operator+( Q const& q1, Q const& q2 )
{
    return { "(" + q1.s_ + " + " + q2.s_ + ")" };
}

Q operator*( Q const& q1, Q const& q2 )
{
    return { "(" + q1.s_ + " * " + q2.s_ + ")" };
}

std::ostream& operator<<( std::ostream& os, Q const& q )
{
    return os << q.s_;
}

Now, when I pass this type to our function g:

    std::cout << g( Q{"x"}, Q{"y"} ) << std::endl;

I get

((((x * 2.000000) + 1.000000) * 0.300000) + (((y * 2.000000) + 1.000000) * 0.700000))

which is exactly what g does.

Compilers Do Static Analysis, They Just Don't Tell You

Let’s take the following code:

int * f()
{
    int x = 2;
    return &x;
}

int g( int * p  )
{
    return 3 + *( p? f(): p );
}

It has two errors in it. First, f returns a pointer to a local variable, which goes out of scope. Second, the check in g is reversed; in the case p is a null pointer, it dereferences it, instead of doing the opposite – dereference p only when it’s not null.

This is what GCC emits for g:

g(int*):
        mov     eax, DWORD PTR ds:0
        ud2

That is, it can obviously see that g will either dereference a null pointer (undefined behavior), or an invalid pointer (undefined behavior), so it generates code that dereferences a null pointer (mov eax, [0]), which crashes, and then emits the guaranteed-invalid instruction ud2, which crashes.

(You can’t be too sure, is GCC’s motto.)

And this is what Clang emits for the same function:

g(int*):
        ret

Presumably, the logic here is that one of the branches leads to undefined behavior, and the other also leads to undefined behavior, so we might as well remove the whole thing, it’s not going to get called anyway in a correct program. (But if it does, let’s just return, as crashing would be gauche.)

Now, I don’t know about you, but I’m left wondering here; if the compilers can clearly see that all possible code paths through this function lead to undefined behavior, why don’t they tell us that? Something like “Warning: function invokes undefined behaviour on all control paths, you might want to check your code, mate”, except less formal.

Who cares about that, some might say, nobody writes such code anyway, this will catch no bugs. Well, I have obviously oversimplified a bit. Let’s take a slightly more realistic example, such as this one:

int slice_sum( std::vector<int> const& v, int i, int n )
{
    int s = 0;

    for( int j = 0; j < n; ++j )
    {
        assert( i+j >= 0 );
        assert( i+j < v.size() );
        
        s += v[ i+j ];
    }

    return s;
}

int f()
{
    std::vector<int> v{ 1, 2, 3, 4 };
    return slice_sum( v, 3, 2 );
}

What does GCC do?

.LC0:
        .string "int slice_sum(const std::vector<int>&, int, int)"
.LC1:
        .string "./example.cpp"
.LC3:
        .string "i+j < v.size()"

f():
        sub     rsp, 8
        mov     ecx, OFFSET FLAT:.LC0
        mov     edx, 11
        mov     esi, OFFSET FLAT:.LC1
        mov     edi, OFFSET FLAT:.LC3
        call    __assert_fail

Goes straight to __assert_fail, without passing Go and collecting $200.

What does Clang do? Same thing.

Again, if the compilers can clearly see that calling f produces a guaranteed assertion failure, why don’t they tell us?

Because we are put here on this planet to suffer, that is why.

(Also because assert is a macro that the compilers do not even see, and they don’t know that __assert_fail is the assertion failure handler, and contracts, which would have allowed us to write [[assert: i+j < v.size()]] , were removed from C++20 as being too useful, but those are just random second-order manifestations of the cosmic need for suffering.)

Tuple in a Tweet

Some time ago I tweeted the following mini-implementation of a tuple class template:

template<class I, class T> struct tuple_element_base
{
    T t_;
};

template<class... T> struct tuple: mp_apply<mp_inherit,
    mp_transform<tuple_element_base,
        mp_iota_c<sizeof...(T)>, mp_list<T...>>>
{
};

template<class... T> tuple(T...) -> tuple<T...>;

This is a functional aggregate tuple. You can create one, and pass it around:

template<class T> void f( T t );

int main()
{
    tuple tp{ 1, 2.1f, "3.14" };
    f( tp );
}

It’s missing an implementation of the basic tuple primitives tuple_size, tuple_element_t and get, so you can’t do much else with it yet. But before we add these, let’s first figure out how what we have so far works.

The basic idea is that we want to derive tuple<T1, T2, T3> from tuple_element_base<0, T1>, tuple_element_base<1, T2>, and tuple_element_base<2, T3>. Each of these base classes will hold the corresponding tuple element. We also want to keep the index as a template parameter, both to disambiguate the case when some of the types are identical, and to be able to look up an element by index in get<I>.

Since Mp11 likes types better than integers, we’ll declare tuple_element_base to have two type parameters, and will use mp_size_t<I> instead of just I as the first template parameter.

So now, given tuple<T...>, we need to somehow turn the parameter pack T... into tuple_element_base<mp_size_t<I>, T>....

First, we prepare type lists holding the two sequences we need, mp_size_t<I>... and T.... The second one is trivially mp_list<T...>, the first one is mp_iota_c<N>, where N is the number of type in T..., i.e. sizeof...(T).

Once we have these two lists, let’s call them L1 and L2, we need a way to create a new list such that for every element A1 from the first list and A2 from the second list the result has tuple_element_base<A1, A2>. This is what mp_transform does, more specifically mp_transform<tuple_element_base, L1, L2>. Call that final list L3.

Now we need to make tuple<T...> derive from each element of L3. Mp11 provides mp_inherit<T...>, a type deriving from each element of the passed parameter pack T.... So, given our list L3, which is of the form mp_list<T...>, we need to obtain mp_inherit<T...> somehow.

This is done by using mp_apply<mp_inherit, L3>, or its equivalent mp_rename<L3, mp_inherit>. Using one or the other is a matter of personal style. Let’s go with mp_apply here, and the result is as above:

template<class... T> struct tuple: mp_apply<mp_inherit,
    mp_transform<tuple_element_base,
        mp_iota_c<sizeof...(T)>, mp_list<T...>>>
{
};

The odd-looking (if you haven’t seen one before) line

template<class... T> tuple(T...) -> tuple<T...>;

is a C++17 deduction guide. It allows us to use the template tuple directly as if it were a type:

tuple tp{ 1, 2.1f, "3.14" };

without supplying the template parameters (<int, float, char const*> in this case.)

Now we just need to add tuple_size:

template<class T> using tuple_size = mp_size<std::remove_cv_t<T>>;

tuple_element_t:

template<std::size_t I, class T> using tuple_element_t =
    mp_at_c<std::remove_cv_t<T>, I>;

and get:

template<std::size_t I, class T>
  auto get( tuple_element_base<mp_size_t<I>, T> const & e )
    -> decltype((e.t_))
{
    return e.t_;
}

In get we take advantage of the fact that the compiler can perform template argument deduction on some base class when a derived type is passed. When get<1>(tp) is invoked, it sees that the parameter has a type of tuple_element_base<mp_size_t<1>, T>, where T is a free template parameter, and looks into the base classes of tp for one that would match.

Since all of the tuple_element_base base classes have a distinct first parameter, only one will match, the one we need. So we just return its member.

Our mini-tuple is now functionally complete, and we can use it:

int main()
{
    tuple tp{ 1, 2.1f, "3.14" };

    mp_for_each<mp_iota<tuple_size<decltype(tp)>>>( [&]( auto I )
    {
        std::cout << get<I>( tp ) << '\n';
    });
}