Page MenuHomePhabricator

[RFC][nsan] A Floating-point numerical sanitizer.
Needs ReviewPublic

Authored by courbet on Mar 3 2021, 6:21 AM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

LLVM has sanitizers for thread safety, memory, UB,... We propose nsan
as a new sanitizer for numerical (floating-point) issues.

For each floating point IR instruction, the instrumentation pass inserts
an equivalent instruction in double the precision (e.g. float -> double,
double -> fp128), called the "shadow".

For any instruction that is observable outside of a function (return,
function call with float parameters, store, ...), the results of the
original and shadow computations are compared, and a warning is emitted
if the values do not match.

Original values in memory are shadowed in double the precision, with an
additional tag for each byte to represent the memory type (untyped,
float, long, double, long double, ...). Libc functions (and corresponding
llvm intrinsics) are intercepted to copy or set shadow types accordingly.

Unit tests include some well-known examples of numerical instabilities.

nsan is still work in progress, but this patch is able to run and detect
issues on several applications, including the whoel SPECfp benchmark
suite.

nsan-instrumented applications are typically 3-100x slower and take ~4x
more memory than the original.

More details can be found in this paper: https://arxiv.org/abs/2102.12782

Diff Detail

Unit TestsFailed

TimeTest
190 msx64 windows > LLVM.Instrumentation/NumericalStabilitySanitizer::basic.ll
Script: -- : 'RUN: at line 2'; c:\ws\w1\llvm-project\premerge-checks\build\bin\opt.exe < C:\ws\w1\llvm-project\premerge-checks\llvm\test\Instrumentation\NumericalStabilitySanitizer\basic.ll -nsan -nsan-shadow-type-mapping=dqq -nsan-truncate-fcmp-eq=false -S | c:\ws\w1\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w1\llvm-project\premerge-checks\llvm\test\Instrumentation\NumericalStabilitySanitizer\basic.ll --check-prefixes=CHECK,DQQ
80 msx64 windows > LLVM.Instrumentation/NumericalStabilitySanitizer::fcmp.ll
Script: -- : 'RUN: at line 2'; c:\ws\w1\llvm-project\premerge-checks\build\bin\opt.exe < C:\ws\w1\llvm-project\premerge-checks\llvm\test\Instrumentation\NumericalStabilitySanitizer\fcmp.ll -nsan -nsan-shadow-type-mapping=dqq -nsan-truncate-fcmp-eq=false -S | c:\ws\w1\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w1\llvm-project\premerge-checks\llvm\test\Instrumentation\NumericalStabilitySanitizer\fcmp.ll --check-prefixes=CHECK,DQQ
130 msx64 windows > LLVM.Instrumentation/NumericalStabilitySanitizer::memory.ll
Script: -- : 'RUN: at line 2'; c:\ws\w1\llvm-project\premerge-checks\build\bin\opt.exe -mtriple=x86_64-linux-gnu < C:\ws\w1\llvm-project\premerge-checks\llvm\test\Instrumentation\NumericalStabilitySanitizer\memory.ll -nsan -nsan-shadow-type-mapping=dqq -S | c:\ws\w1\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w1\llvm-project\premerge-checks\llvm\test\Instrumentation\NumericalStabilitySanitizer\memory.ll

Event Timeline

courbet created this revision.Mar 3 2021, 6:21 AM
courbet requested review of this revision.Mar 3 2021, 6:21 AM
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptMar 3 2021, 6:21 AM
Herald added subscribers: llvm-commits, Restricted Project, cfe-commits. · View Herald Transcript

When bootstrapping LLVM with nsan, there are only a few issues.

Several of them stem from using double to measure elapsed time in seconds: We measure start time, end time, and subtract them. The resulting error depends on the arbitrary magnitude of the time since epoch, so as time passes the error will increase. This is especially visible when we measure short intervals of time (e.g. a few microseconds, which are small compared to the time since epoch).

For example one test has more than 2% error:

WARNING: NumericalStabilitySanitizer: inconsistent shadow results while checking store to address 0x4b87860
double       precision  (native): dec: 0.00000858306884765625  hex: 0x1.20000000000000000000p-17
__float128   precision  (shadow): dec: 0.00000880600000000000  hex: 0x9.3bd7b64e9fe4fc000000p-20
shadow truncated to double      : dec: 0.00000880600000000000  hex: 0x1.277af6c9d3fca0000000p-17
Relative error: 2.53158247040370201937% (2^47 epsilons)
Absolute error: 0x1.debdb274ff27e0000000p-23
(131595325226954 ULPs == 14.1 digits == 46.9 bits)
    #0 0x119db71 in llvm::TimeRecord::operator-=(llvm::TimeRecord const&) [...]/llvm/llvm-project/llvm/include/llvm/Support/Timer.h:63:14
    #1 0x119db71 in llvm::Timer::stopTimer() [...]/llvm/llvm-project/llvm/lib/Support/Timer.cpp:176:8
    #2 0x108b1d2 in llvm::TimePassesHandler::stopTimer(llvm::StringRef) [...]/llvm/llvm-project/llvm/lib/IR/PassTimingInfo.cpp:248:14
    #3 0x108b1d2 in llvm::TimePassesHandler::runAfterPass(llvm::StringRef) [...]/llvm/llvm-project/llvm/lib/IR/PassTimingInfo.cpp:267:3
    #4 0x108e159 in llvm::TimePassesHandler::registerCallbacks(llvm::PassInstrumentationCallbacks&)::$_2::operator()(llvm::StringRef, llvm::Any, llvm::PreservedAnalyses const&) const [...]/llvm/llvm-project/llvm/lib/IR/PassTimingInfo.cpp:281:15
    #5 0x108e159 in void llvm::detail::UniqueFunctionBase<void, llvm::StringRef, llvm::Any, llvm::PreservedAnalyses const&>::CallImpl<llvm::TimePassesHandler::registerCallbacks(llvm::PassInstrumentationCallbacks&)::$_2>(void*, llvm::StringRef, llvm::Any&, llvm::PreservedAnalyses const&) [...]/llvm/llvm-project/llvm/include/llvm/ADT/FunctionExtras.h:204:12
    #6 0xa4f826 in llvm::unique_function<void (llvm::StringRef, llvm::Any, llvm::PreservedAnalyses const&)>::operator()(llvm::StringRef, llvm::Any, llvm::PreservedAnalyses const&) [...]/llvm/llvm-project/llvm/include/llvm/ADT/FunctionExtras.h:366:12
    #7 0xa4f826 in void llvm::PassInstrumentation::runAfterPass<llvm::Module, (anonymous namespace)::MyPass2>((anonymous namespace)::MyPass2 const&, llvm::Module const&, llvm::PreservedAnalyses const&) const [...]/llvm/llvm-project/llvm/include/llvm/IR/PassInstrumentation.h:227:9
    #8 0xa4f826 in (anonymous namespace)::TimePassesTest_CustomOut_Test::TestBody() [...]/llvm/llvm-project/llvm/unittests/IR/TimePassesTest.cpp:137:6
...
qiucf added a subscriber: qiucf.Mar 4 2021, 4:20 AM
scanon added a subscriber: scanon.Mar 10 2021, 8:51 AM

Is there a mechanism to instruct the sanitizer to ignore a specific expression or function? From a cursory reading, I am mildly concerned about a deluge of false positives from primitives that compute exact (or approximate) residuals; these are acting to eliminate or precisely control floating-point errors, but tend to show up as "unstable" in a naive analysis that isn't aware of them.

Is there a mechanism to instruct the sanitizer to ignore a specific expression or function? From a cursory reading, I am mildly concerned about a deluge of false positives from primitives that compute exact (or approximate) residuals; these are acting to eliminate or precisely control floating-point errors, but tend to show up as "unstable" in a naive analysis that isn't aware of them.

Yes: like all sanitizers, what happens behind the scenes is that the frontend (clang) sets an annotation on each function in the program. It can be disabled for a specific function with the no_sanitize attribute.

If nsan is disabled for a specific function, any return value will be re-extended again to shadow precision, and the computations will resume from here. This is equivalent to assuming that the function, its parameters, and any memory reads were correct.

Matt added a subscriber: Matt.Thu, Mar 25, 3:29 PM