This is an archive of the discontinued LLVM Phabricator instance.

Comprehensive Static Instrumentation (2/2): Clang flag
Needs ReviewPublic

Authored by tdenniston on Jun 27 2016, 8:04 AM.

Details

Summary

This diff implements a Clang -fcsi flag to support CSI. We will be submitting further diffs that incrementally add functionality to CSI.

Diff Detail

Repository
rL LLVM

Event Timeline

tdenniston updated this revision to Diff 61967.Jun 27 2016, 8:04 AM
tdenniston retitled this revision from to Comprehensive Static Instrumentation (2/2): Clang flag.
tdenniston updated this object.
tdenniston set the repository for this revision to rL LLVM.
tdenniston added a project: Restricted Project.
tdenniston added subscribers: cfe-commits, eugenis, pcc and 3 others.
bruening added inline comments.Jun 30 2016, 9:55 AM
docs/CSI.rst
75

See below: the sanitizers pass the -f flag (-fcsi here) to the link line and have the library automatically linked in, which is a simpler usage model than the user having to name this static library explicitly.

78

This should not be necessary: as mentioned above, if -fcsi is passed to the link line you should be able to have clang automatically add the static csi library, just like is done for the sanitizers.

81

Hmm, see above comments: is this already implemented and was deliberately split from this one for simplicity?

97

Wouldn't the after-hook here be the same as the after-hook for the function category? Generally the reason to have a post-function or function-exit hook would be to view or change the return value: couldn't that be done equally easily from a post-function hook at the instruction after the call site? I guess I'm asking why this is a separate category.

99

It seems like there is some redundancy here? This seems very similar to the "functions" category: I'm curious as to why they are separate?

164

Some tools also need thread-local handling: do you plan to provide thread initialization and exit hooks in the future?

164

What about a fini or destructor function called at program exit? Many profiling or analysis tools gather data and want to report it or dump it to a file at program exit.

173

Generally, tools that hook application functions want to examine the arguments. How does a hook access (or modify) the application function's arguments?

174

Similarly, how does a hook access or change the return value?

181

s/normally)/normally/

189

Grammar: provides, for

212

See above: I'm not sure why both call sites and function entry hooks are needed? Perhaps there could be some explanation of that here.

221

Grammar: s/to CSI/for CSI/; s/ID/ID that/

258

Hmm, there seems to be a missing feature in this interface design in general: static analysis or static operation of some kind. Tools often want to take one action if a memory address is aligned, but a different one if it's not aligned (usually a fastpath when aligned and a slowpath when unaligned). The compiler often knows whether a load or store is aligned, yet this interface forces the code that checks alignment and acts on it to be executed every single time at runtime, rather than executed just once statically. I guess this is just an inherent limitation of this interface approach -- perhaps it could be discussed in the limitations section?

264

Grammar: s/objects/object/

303

s/that,/that/

318

High-level comments:

Without inlining of code in these function calls, it is hard to see how a high-performance tool can be built. Something like ThreadSanitizer that does little or no inlined instrumentation and lives with high overhead could fit into this interface, but most tools will just not work well with this callout-only no-static-analysis interface: I would expect an order of magnitude performance loss or higher for other sanitizers or similar tools. Are there plans to extend the interface to allow it to become a shared infrastructure for the existing sanitizers? That would require large changes to the interface, it seems. Maybe this is more of a comment for the RFC.

If the interface is not concerned with performance, and does not seem to be leveraging much compiler information or static analysis, I would have to step back and ask: what advantage would a tool writer gain from using CSI versus a pure-dynamic tool like Pin or DynamoRIO? In these dynamic tool platforms, every hook here is also available, and such dynamic tools will operate on any binary including third-party libraries not amenable to recompilation. I guess I would expect a compiler tool interface to be taking more advantage of the compiler, but I don't see much discussion here of future extensions to accomplish that. Should there be any discussion in these docs as to advantages and disadvantages versus other tool platforms?

333

OK, so this is a partial answer to the long previous comment.

test/Lexer/has_feature_comprehensive_static_instrumentation.cpp
11

I think we also want a test/Driver/fcsi test that checks platforms by ensuring that -fcsi is reported as an unsupported option for other than Linux x86_64. If the instrumentation always adds a call to some symbol in the runtime library it could also have a sanity check for that.

mehdi_amini added inline comments.Jul 6 2016, 6:37 PM
docs/CSI.rst
46

The long thread on llvm-dev went to conclude that LTO should not be needed.

62

-emit-llvm should not be required. The user can use -flto but that would be orthogonal to CSI.

78

This is not clear to me: the sanitizers are auto-linking the clang supplied runtime. Here it seems to be about a user-supplied library.

bruening added inline comments.Jul 6 2016, 8:28 PM
docs/CSI.rst
78

No, the CSI runtime is not the user-supplied part: it is part of the clang build, just like the sanitizer runtime libraries (see line 79 below showing where it lives). The user-supplied part is "my-tool.o".

mehdi_amini added inline comments.Jul 6 2016, 8:32 PM
docs/CSI.rst
78

Oh you're totally right, I thought you were referencing the tool-specific implementation.

bruening added inline comments.Jul 14 2016, 5:02 PM
docs/CSI.rst
30

Are there any constraints on what libraries the tool library is allowed to use? Generally there are, for tool code that runs in the same process as the application. The tool library will be operating at arbitrary points during application execution. This means that it should avoid using the same resources as the instrumented application, because the application's routines are not all re-entrant and they use global state, and to minimize perturbation of the application's behavior (such as heap layout patterns) from how it behaves with no tool present. A tool using standard libraries becomes more likely to cause issues when libc routines are being intercepted by the tool (see related comment below) or libc itself is instrumented. The existing LLVM instrumentation runtime libraries, for the sanitizer tools, avoid calling libc routines and are not able to use the STL: they use their own custom implementations of all data structures and algorithms that they need, but this is a small set. Dynamic tool platforms like Pin and DynamoRIO go to great lengths to isolate tool libraries by loading separate copies of libc. Has any thought been put into isolating the tool library and its imports from the application? I realize that some of these may seem more long-term topics, but if the idea is to create a framework for use with a wide range of tools it is good to consider all issues up front.

230

For observing loads and stores, typically compiler-based tools intercept libc's memcpy, memset, etc. (or in some cases libc is built and instrumented along with the application), to avoid missing many memory references. The existing LLVM sanitizer tools all intercept a large number of libc routines to ensure they see more than just events happening in application code proper. Has there been any thought about this for CSI?