This diff implements a Clang -fcsi flag to support CSI. We will be submitting further diffs that incrementally add functionality to CSI.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
docs/CSI.rst | ||
---|---|---|
75 | See below: the sanitizers pass the -f flag (-fcsi here) to the link line and have the library automatically linked in, which is a simpler usage model than the user having to name this static library explicitly. | |
78 | This should not be necessary: as mentioned above, if -fcsi is passed to the link line you should be able to have clang automatically add the static csi library, just like is done for the sanitizers. | |
81 | Hmm, see above comments: is this already implemented and was deliberately split from this one for simplicity? | |
97 | Wouldn't the after-hook here be the same as the after-hook for the function category? Generally the reason to have a post-function or function-exit hook would be to view or change the return value: couldn't that be done equally easily from a post-function hook at the instruction after the call site? I guess I'm asking why this is a separate category. | |
99 | It seems like there is some redundancy here? This seems very similar to the "functions" category: I'm curious as to why they are separate? | |
164 | Some tools also need thread-local handling: do you plan to provide thread initialization and exit hooks in the future? | |
164 | What about a fini or destructor function called at program exit? Many profiling or analysis tools gather data and want to report it or dump it to a file at program exit. | |
173 | Generally, tools that hook application functions want to examine the arguments. How does a hook access (or modify) the application function's arguments? | |
174 | Similarly, how does a hook access or change the return value? | |
181 | s/normally)/normally/ | |
189 | Grammar: provides, for | |
212 | See above: I'm not sure why both call sites and function entry hooks are needed? Perhaps there could be some explanation of that here. | |
221 | Grammar: s/to CSI/for CSI/; s/ID/ID that/ | |
258 | Hmm, there seems to be a missing feature in this interface design in general: static analysis or static operation of some kind. Tools often want to take one action if a memory address is aligned, but a different one if it's not aligned (usually a fastpath when aligned and a slowpath when unaligned). The compiler often knows whether a load or store is aligned, yet this interface forces the code that checks alignment and acts on it to be executed every single time at runtime, rather than executed just once statically. I guess this is just an inherent limitation of this interface approach -- perhaps it could be discussed in the limitations section? | |
264 | Grammar: s/objects/object/ | |
303 | s/that,/that/ | |
318 | High-level comments: Without inlining of code in these function calls, it is hard to see how a high-performance tool can be built. Something like ThreadSanitizer that does little or no inlined instrumentation and lives with high overhead could fit into this interface, but most tools will just not work well with this callout-only no-static-analysis interface: I would expect an order of magnitude performance loss or higher for other sanitizers or similar tools. Are there plans to extend the interface to allow it to become a shared infrastructure for the existing sanitizers? That would require large changes to the interface, it seems. Maybe this is more of a comment for the RFC. If the interface is not concerned with performance, and does not seem to be leveraging much compiler information or static analysis, I would have to step back and ask: what advantage would a tool writer gain from using CSI versus a pure-dynamic tool like Pin or DynamoRIO? In these dynamic tool platforms, every hook here is also available, and such dynamic tools will operate on any binary including third-party libraries not amenable to recompilation. I guess I would expect a compiler tool interface to be taking more advantage of the compiler, but I don't see much discussion here of future extensions to accomplish that. Should there be any discussion in these docs as to advantages and disadvantages versus other tool platforms? | |
333 | OK, so this is a partial answer to the long previous comment. | |
test/Lexer/has_feature_comprehensive_static_instrumentation.cpp | ||
11 | I think we also want a test/Driver/fcsi test that checks platforms by ensuring that -fcsi is reported as an unsupported option for other than Linux x86_64. If the instrumentation always adds a call to some symbol in the runtime library it could also have a sanity check for that. |
docs/CSI.rst | ||
---|---|---|
46 | The long thread on llvm-dev went to conclude that LTO should not be needed. | |
62 | -emit-llvm should not be required. The user can use -flto but that would be orthogonal to CSI. | |
78 | This is not clear to me: the sanitizers are auto-linking the clang supplied runtime. Here it seems to be about a user-supplied library. |
docs/CSI.rst | ||
---|---|---|
78 | No, the CSI runtime is not the user-supplied part: it is part of the clang build, just like the sanitizer runtime libraries (see line 79 below showing where it lives). The user-supplied part is "my-tool.o". |
docs/CSI.rst | ||
---|---|---|
78 | Oh you're totally right, I thought you were referencing the tool-specific implementation. |
docs/CSI.rst | ||
---|---|---|
30 | Are there any constraints on what libraries the tool library is allowed to use? Generally there are, for tool code that runs in the same process as the application. The tool library will be operating at arbitrary points during application execution. This means that it should avoid using the same resources as the instrumented application, because the application's routines are not all re-entrant and they use global state, and to minimize perturbation of the application's behavior (such as heap layout patterns) from how it behaves with no tool present. A tool using standard libraries becomes more likely to cause issues when libc routines are being intercepted by the tool (see related comment below) or libc itself is instrumented. The existing LLVM instrumentation runtime libraries, for the sanitizer tools, avoid calling libc routines and are not able to use the STL: they use their own custom implementations of all data structures and algorithms that they need, but this is a small set. Dynamic tool platforms like Pin and DynamoRIO go to great lengths to isolate tool libraries by loading separate copies of libc. Has any thought been put into isolating the tool library and its imports from the application? I realize that some of these may seem more long-term topics, but if the idea is to create a framework for use with a wide range of tools it is good to consider all issues up front. | |
230 | For observing loads and stores, typically compiler-based tools intercept libc's memcpy, memset, etc. (or in some cases libc is built and instrumented along with the application), to avoid missing many memory references. The existing LLVM sanitizer tools all intercept a large number of libc routines to ensure they see more than just events happening in application code proper. Has there been any thought about this for CSI? |
Are there any constraints on what libraries the tool library is allowed to use? Generally there are, for tool code that runs in the same process as the application. The tool library will be operating at arbitrary points during application execution. This means that it should avoid using the same resources as the instrumented application, because the application's routines are not all re-entrant and they use global state, and to minimize perturbation of the application's behavior (such as heap layout patterns) from how it behaves with no tool present. A tool using standard libraries becomes more likely to cause issues when libc routines are being intercepted by the tool (see related comment below) or libc itself is instrumented. The existing LLVM instrumentation runtime libraries, for the sanitizer tools, avoid calling libc routines and are not able to use the STL: they use their own custom implementations of all data structures and algorithms that they need, but this is a small set. Dynamic tool platforms like Pin and DynamoRIO go to great lengths to isolate tool libraries by loading separate copies of libc. Has any thought been put into isolating the tool library and its imports from the application? I realize that some of these may seem more long-term topics, but if the idea is to create a framework for use with a wide range of tools it is good to consider all issues up front.