This is an archive of the discontinued LLVM Phabricator instance.

DataFlowSanitizer; LLVM changes.
ClosedPublic

Authored by pcc on Jun 12 2013, 2:21 PM.

Details

Summary

DataFlowSanitizer is a generalised dynamic data flow analysis.

Unlike other Sanitizer tools, this tool is not designed to detect a
specific class of bugs on its own. Instead, it provides a generic
dynamic data flow analysis framework to be used by clients to help
detect application-specific issues within their own code.

Diff Detail

Event Timeline

kcc added a subscriber: eugenis.Jun 17 2013, 6:58 AM

eugenis@, who deals with msan during last months has better understanding of this kind of instrumentation code.
I'll also try to review it this week.

pcc updated this revision to Unknown Object (????).Jun 18 2013, 5:08 PM
  • Disable the use of IA_Args until function pointer issues are resolved
eugenis added inline comments.Jun 19 2013, 5:40 AM
include/llvm/Transforms/Instrumentation.h
96 ↗(On Diff #2380)

This does not seem to be used anywhere. And why is it guarded by GNUC?

lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
318 ↗(On Diff #2473)

Why not ++ui in the for() clause?

782 ↗(On Diff #2473)

Thsi is an optimization to skip calculation of shadow address on every load/store of allocas that are always accessed as a whole and don't escape, right?
Looks like it applies to MSan as well?

Btw, does it ever happen in optimized IR?

40 ↗(On Diff #2380)

A general description of the tool logic and shadow format would be nice to have somewhere above. Perhaps in the file comment.

60 ↗(On Diff #2380)

Does it mean you are passing shadow through argument list? Please comment.

190 ↗(On Diff #2380)

I wonder if you need to do something special with ByVal arguments, too.

pcc added inline comments.Jun 19 2013, 11:08 AM
include/llvm/Transforms/Instrumentation.h
96 ↗(On Diff #2380)

This does not seem to be used anywhere.

Right. The idea is that it can be used by JIT clients.

And why is it guarded by GNUC?

Because the functions getDFSanArgTLSPtrForJIT and getDFSanRetValTLSPtrForJIT depend on the availability of __thread and TLS model attibutes. Without this guard, the build will fail on (e.g.) Windows.

lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
318 ↗(On Diff #2473)

I wanted to be careful here and make sure that the RAUW wouldn't invalidate ui.

782 ↗(On Diff #2473)

Thsi is an optimization to skip calculation of shadow address on every load/store of allocas that are always accessed as a whole and don't escape, right?

Exactly.

Looks like it applies to MSan as well?

I don't see why not.

Btw, does it ever happen in optimized IR?

I don't think so. This was specifically implemented for the sake of -O0.

40 ↗(On Diff #2380)

OK, I'll add something.

60 ↗(On Diff #2380)

I implemented shadow through arguments (the argument ABI) as an alternative to shadow through TLS (the TLS ABI, like msan). This is generally only useful when we are instrumenting every function in the process from libc up, but since the greylist controls which functions do not get the argument ABI, it is possible in principle to get this to work by listing every libc function in the greylist, except that of course libc functions won't propagate labels. (Once libc is instrumented we can trim the greylist further to only include functions which need it, such as functions implemented in asm or functions which are called by the compiler backend). This is a work in progress at the moment, so by default we use TLS.

190 ↗(On Diff #2380)

Yes, this is on my todo list.

pcc updated this revision to Unknown Object (????).Jun 19 2013, 3:56 PM
  • Add some documentation comments
eugenis added inline comments.Jul 1 2013, 6:19 AM
lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
656 ↗(On Diff #2496)

You creating are a lot of basic blocks here. Would it be better to make a loop here an leave it to the loop unroller to make the decision?
On the other hand, in practice Size must be very limited.

723 ↗(On Diff #2496)

This looks like an optimization that belongs elsewhere. I wonder if there is a pass in LLVM that will do it for you?
Also, why 128 bit code here, and only 64 bit in LoadShadow?

pcc added inline comments.Jul 2 2013, 7:00 PM
lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
656 ↗(On Diff #2496)

Exactly. I expect generally about 1 iteration max through this loop. It was easier to code this without a loop and I don't think it's worth trying to optimise for the uncommon case where the load is large.

723 ↗(On Diff #2496)

This looks like an optimization that belongs elsewhere. I wonder if there is a pass in LLVM that will do it for you?

I don't think there is one. I tried optimising the following code with -O3; no improvement:

void f(short *p, short s) {

for (unsigned i = 8; i != 0; --i, ++p) {
  *p = s;
}

}

Also, why 128 bit code here, and only 64 bit in LoadShadow?

Because LoadShadow needs to be able to do equality comparisons and collapse the result into a single bit. It also needs to be able to compare every 16-bit element of the 128-bit vector against one another. There are some short instruction sequences for the former (on x86), but LLVM doesn't seem to be able to generate them, and I haven't thought too much about the latter yet.

eugenis accepted this revision.Jul 3 2013, 7:55 AM

Looks good.

lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
723 ↗(On Diff #2496)

Btw, this code is vectorized when trip count is raised to 16.

pcc closed this revision.Aug 7 2013, 3:49 PM

Closed by commit rL187923 (authored by @pcc).