LLVM has sanitizers for thread safety, memory, UB,... We propose nsan
as a new sanitizer for numerical (floating-point) issues.
For each floating point IR instruction, the instrumentation pass inserts
an equivalent instruction in double the precision (e.g. float -> double,
double -> fp128), called the "shadow".
For any instruction that is observable outside of a function (return,
function call with float parameters, store, ...), the results of the
original and shadow computations are compared, and a warning is emitted
if the values do not match.
Original values in memory are shadowed in double the precision, with an
additional tag for each byte to represent the memory type (untyped,
float, long, double, long double, ...). Libc functions (and corresponding
llvm intrinsics) are intercepted to copy or set shadow types accordingly.
Unit tests include some well-known examples of numerical instabilities.
nsan is still work in progress, but this patch is able to run and detect
issues on several applications, including the whoel SPECfp benchmark
nsan-instrumented applications are typically 3-100x slower and take ~4x
more memory than the original.
More details can be found in this paper: https://arxiv.org/abs/2102.12782