This patch implements the idea that I posted as a reply to Matthias on D25277:
On Wed, Oct 5, 2016 at 5:33 PM, Matthias Braun <matze@braunis.de> wrote:
I was mainly wondering here whether there may be a sensible generic mechanism to combine a list of floatingpoint numbers
For all the tests in the polybench, I think we can do bisimulation.
What I mean by bisimulation is that we would copy the kernel()
function of each polybench/test.c and name it kernelNoFP() and add
flag attributes "-fno-fast-math -ffp-contract=off" (can be split
compilation if flag attributes do not work.)
main() will call kernel() and kernelNoFP() and compare their output
with FP_TOLERANCE.
Only the execution of kernel() will be timed for benchmark performance result.
main() will only print the output from kernelNoFP() that will be
hashed and compared against the reference hash (as we currently expect
exact match of the output hash.)
The good things:
- no modifications to CMake and Makefiles
- no extra space to store the extra reference output
- tests both user CFLAGS specified mode and fast-math and fp-contraction=off.
The bad things: (because of the extra reference run of kernelNoFP())
- compilation time will double: e.g., Polly will optimize both kernels,
- memory requirements on the device will almost double: added one
extra output array, input arrays are not modified, so no need to
duplicate them,
- compute time on the device will more than double: running the kernel
twice, plus an extra loop over both outputs to compare with
FP_TOLERANCE.