Compilers Do Static Analysis, They Just Don't Tell You
24 Jul 2020Let’s take the following code:
int * f()
{
int x = 2;
return &x;
}
int g( int * p )
{
return 3 + *( p? f(): p );
}
It has two errors in it. First, f
returns a pointer to a local variable,
which goes out of scope. Second, the check in g
is reversed; in the case
p
is a null pointer, it dereferences it, instead of doing the opposite –
dereference p
only when it’s not null.
This is what GCC emits for g
:
g(int*):
mov eax, DWORD PTR ds:0
ud2
That is, it can obviously see that g
will either dereference a null pointer
(undefined behavior), or an invalid pointer (undefined behavior), so it generates
code that dereferences a null pointer (mov eax, [0]
), which crashes, and then
emits the guaranteed-invalid instruction ud2
, which crashes.
(You can’t be too sure, is GCC’s motto.)
And this is what Clang emits for the same function:
g(int*):
ret
Presumably, the logic here is that one of the branches leads to undefined behavior, and the other also leads to undefined behavior, so we might as well remove the whole thing, it’s not going to get called anyway in a correct program. (But if it does, let’s just return, as crashing would be gauche.)
Now, I don’t know about you, but I’m left wondering here; if the compilers can clearly see that all possible code paths through this function lead to undefined behavior, why don’t they tell us that? Something like “Warning: function invokes undefined behaviour on all control paths, you might want to check your code, mate”, except less formal.
Who cares about that, some might say, nobody writes such code anyway, this will catch no bugs. Well, I have obviously oversimplified a bit. Let’s take a slightly more realistic example, such as this one:
int slice_sum( std::vector<int> const& v, int i, int n )
{
int s = 0;
for( int j = 0; j < n; ++j )
{
assert( i+j >= 0 );
assert( i+j < v.size() );
s += v[ i+j ];
}
return s;
}
int f()
{
std::vector<int> v{ 1, 2, 3, 4 };
return slice_sum( v, 3, 2 );
}
What does GCC do?
.LC0:
.string "int slice_sum(const std::vector<int>&, int, int)"
.LC1:
.string "./example.cpp"
.LC3:
.string "i+j < v.size()"
f():
sub rsp, 8
mov ecx, OFFSET FLAT:.LC0
mov edx, 11
mov esi, OFFSET FLAT:.LC1
mov edi, OFFSET FLAT:.LC3
call __assert_fail
Goes straight to __assert_fail
, without passing Go and collecting $200.
What does Clang do? Same thing.
Again, if the compilers can clearly see that calling f
produces a guaranteed
assertion failure, why don’t they tell us?
Because we are put here on this planet to suffer, that is why.
(Also because assert
is a macro that the compilers do not even see, and they
don’t know that __assert_fail
is the assertion failure handler, and contracts,
which would have allowed us to write [[assert: i+j < v.size()]]
, were removed
from C++20 as being too useful, but those are just random second-order
manifestations of the cosmic need for suffering.)