The first version of origin tracking tracks only memory stores. Although this is sufficient for understanding correct flows, it is hard to figure out where an undefined value is read from. To find reading undefined values, we still have to do a reverse binary search from the last store in the chain with printing and logging at possible code paths. This is quite inefficient. Tracking memory load instructions can help this case. The main issues of tracking loads are performance and code size overheads. With tracking only stores, the code size overhead is 38%, memory overhead is 1x, and cpu overhead is 3x. In practice #load is much larger than #store, so both code size and cpu overhead increases. The first blocker is code size overhead: link fails if we inline tracking loads. The workaround is using external function calls to propagate metadata. This is also the workaround ASan uses. The cpu overhead is ~10x. This is a trade off between debuggability and performance, and will be used only when debugging cases that tracking only stores is not enough.
Details
- Reviewers
gbalats - Commits
- rG7fdf27096558: [dfsan] Track origin at loads
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
compiler-rt/test/dfsan/origin_track_ld.c | ||
---|---|---|
3 | ||
4 | This is redundant as it's the default. | |
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp | ||
256–263 | Using different integer values to encode the level of tracking is hard to understand without looking at this exact comment right here. Why can't we use an enum instead with descriptive names? E.g., enum OriginTrackingLevel { None, StoresOnly, LoadsAndStores }; https://llvm.org/docs/CommandLine.html#selecting-an-alternative-from-a-set-of-possibilities | |
645 | You mean InstAlignment? By the way, for documenting specific arguments of functions, I think the special \param Doxygen syntax would be clearer. https://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments | |
648 | ||
711 | ||
747 | What do you mean by this? The description doesn't indicate what's the different with loadShadowOrigin. Maybe loadShadowOriginSansLoadTracking? | |
2443–2444 | This function should be called only when tracking origins. Why not make that an assertion, instead of the if-stmt? | |
llvm/test/Instrumentation/DataFlowSanitizer/origin_track_load.ll | ||
2–3 |
updated
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp | ||
---|---|---|
256–263 | Thank you. This enum is helpful. It uses optimization levels as an example.
Changing this from int to enum will change existing test cases. if we wanted it to use enum, I prefer to doing so in a different CL. MSan's msan-track-origins is defined like dfsan-track-origins, with 0, 1 and 2 to control different levels. | |
645 | Thank you, |