This is an archive of the discontinued LLVM Phabricator instance.

[msan] Implement -msan-pass-caller-to-runtime.
Needs RevisionPublic

Authored by glider on May 25 2022, 8:10 AM.

Details

Summary

Linux kernel has a concept of noinstr code, which is used to prevent
all kinds of instrumentation for annotated functions.
In particular, syscall and IRQ entry functions are implemented as
noinstr.

When these functions call KMSAN-instrumented functions, they fail to
properly set up the metadata for function arguments, potentially leading
to false positive reports.

In order to detect transitions from noinstr to instrumented code, we
introduce the -msan-pass-caller-to-runtime flag, which allows KMSAN to
call msan_get_context_state_caller() at the beginning of functions
that take one or more parameters.
msan_get_context_state_caller()
accepts the caller address passed to it by the instrumentation code.
That address can be used by the runtime to figure out whether a call
happened from a noinstr function, and wipe the context state, preventing
the error reports.

For backward compatibility with BSD systems that use KMSAN, we keep
-msan-pass-caller-to-runtime=0 a default value.

Diff Detail

Event Timeline

glider created this revision.May 25 2022, 8:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2022, 8:10 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
glider requested review of this revision.May 25 2022, 8:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2022, 8:10 AM

Do you have estimate of how often this happend? How many different instrumented functions which can be called from uninstrumented code?

llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
1279

Can you avoid ClPassCallerToRuntime and always pass the argument? I guess function will just not use it.

Do you have estimate of how often this happend? How many different instrumented functions which can be called from uninstrumented code?

There are ~200 noinstr annotations in the Linux kernel. Some are used in macros (e.g. there are several dozens of interrupt descriptor table entries that are implemented as noinstr functions).
It's hard to tell how many instrumented functions end up being called with consequences, but certainly annotating them manually is not an option.

We could probably optimize by not passing builtin_return_address() to msan_get_context_state() if the instrumented function doesn't take any arguments. Not sure it is worth the hassle (we'll need a separate version of __msan_get_context_state())

llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
1279

This would cost us a register spill, sounds like a regression for *BSD systems that don't need the argument.

glider updated this revision to Diff 432956.May 30 2022, 9:50 AM

Introduced __msan_get_context_state_caller()

glider updated this revision to Diff 432957.May 30 2022, 9:52 AM

Updated a comment

FWIW in my current configuration out of 128K instrumented functions there are 120K calls to msan_get_context_state_caller() and 8K calls to msan_get_context_state().

glider edited the summary of this revision. (Show Details)May 31 2022, 4:25 AM
vitalybuka requested changes to this revision.Dec 7 2022, 1:40 PM

I assume it's abandoned

This revision now requires changes to proceed.Dec 7 2022, 1:40 PM