This patch adds a first set of tests to check memory runtime checks
generated by the vectorizer.
The it runs scalar and vectorized versions of a loop requiring runtime
checks on the same inputs with pointers to the same buffer using various
offsets. It fails if they do not produce the same results.
The test functions are provided as lambdas, which are passed to a
driver function that generates the inputs and calls the lambdas with
pointers to overlapping buffers. The driver functions are marked as
noinline, which should act as an optimization barrier so the lambdas in
turn cannot be inlined and optimized without runtime checks.
Unfortunately 2 separate lambdas need to be specified for the scalar and
vector versions, with the only difference being the pragma to disable
vectorization. If anybody knows a nice generic & convenient way to
specify the loop once, what would be great.
Can we let cmake select the flag?
https://cmake.org/cmake/help/latest/prop_tgt/CXX_STANDARD.html