After DFSan reports taint sinks, the next questions are "How did they get it?", "When did that happen?", "Who has tainted data originally?", etc. This change addresses this by adding origin tracking. This change will be split into small diffs for incremental review. //////////// The Design //////////// Inspired by MSan's origin tracking. 1) The new flag -dfsan-track-origins is added. It works only with 16bit mode. 2) Each 4 contiguous user bytes share one 4-byte origin information aligned by 4: the user byte at addr uses an origin at addr && ~3UL + origin_start_addr. 3) An 4-byte origin is a hash of an origin chain. An origin chain is a pair of a stack hash id and a hash to its previous origin chain. 0 means no previous origin chains exist. We limit the length of a chain to be 16. With origin_history_size = 0, the limit is removed. 4) Only at store and memory transfer operations, new chains are created when taint data are written. This is to reduce chain lengths. 5) At each instruction with > 1 operands, only one origin chain is propagated. This is to reduce chain widths. 6) Each customized function has two wrappers. The first one is for the normal shadow propagation. The second one is used when origin tracking is on. It calls the first one, and does additional origin propagation. Which one to use can be decided at instrumentation time. This is to ensure minimal additional overhead when origin tracking is off. 7) Provide an API dfsan_print_origin_trace that reports stack traces along a trace.
Details
- Reviewers
- morehouse 
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Thanks for a full diff. I'll be referring to it as I review the incremental changes.
I haven't looked at the code much yet, but the overall design SGTM.
Many of the tests added here are failing on the AArch64 buildbots (e.g. http://lab.llvm.org:8011/#/builders/7/builds/1974). Is this expected to work on AArch64, or should these tests be disabled for those bots?
This change supports only x86_64 arch on linux. Disabled testing other arches by https://github.com/llvm/llvm-project/commit/37520a0b2b2af025e40b17dbf99013cda9eb66a1
clang-tidy: warning: cast from 'const void *' to 'void *' drops const qualifier [clang-diagnostic-cast-qual]
not useful