Use CountingFunctionInserter both mcount and cygprofile calls, before and after inlining
ClosedPublic

Authored by hans on Wed, Oct 25, 7:08 AM.

Details

Summary

Clang implements the -finstrument-functions flag inherited from GCC, which inserts calls to __cyg_profile_func_{enter,exit} on function entry and exit.

This is useful for getting a trace of how the functions in a program are executed. Normally, the calls remain even if a function is inlined into another function, but it is useful to be able to turn this off for users who are interested in a lower-level trace, i.e. one that reflects what functions are called post-inlining. (We use this to generate link order files for Chromium.)

LLVM already has a pass for inserting similar instrumentation calls to mcount(), which it does after inlining. This patch renames and extends that pass to handle calls both to mcount and the cygprofile functions, before and/or after inlining as controlled by function attributes.

Diff Detail

Repository
rL LLVM
hans created this revision.Wed, Oct 25, 7:08 AM

If you only want these calls post-inlining (and otherwise wish to minimize the effect on the rest of the optimizer), I'd recommend a different approach. What we did for mcount, which has similar requirements, was that we had the frontend add an attribute, and then a late pass actually added the function calls (lib/CodeGen/CountingFunctionInserter.cpp). I recommend extending CountingFunctionInserter to enable it to add these functions (based on some additional attribute), and then just add whatever attribute is necessary in the frontend based on some new flag.

rnk added a comment.Wed, Oct 25, 8:57 AM

If you only want these calls post-inlining (and otherwise wish to minimize the effect on the rest of the optimizer), I'd recommend a different approach. What we did for mcount, which has similar requirements, was that we had the frontend add an attribute, and then a late pass actually added the function calls (lib/CodeGen/CountingFunctionInserter.cpp). I recommend extending CountingFunctionInserter to enable it to add these functions (based on some additional attribute), and then just add whatever attribute is necessary in the frontend based on some new flag.

+1. I know it's a big departure from how the existing feature is implemented, but it's consistent with every other major form of instrumentation that clang emits these days.

hans added a comment.Thu, Oct 26, 1:22 AM

If you only want these calls post-inlining (and otherwise wish to minimize the effect on the rest of the optimizer),

As I mentioned in the description, the calls should generally be inserted before inlining and preserved. Doing them post-inlining should be an optional mode that I'm trying to add. This is sort of the crux of the problem, otherwise inserting the instrumentation later would be the obvious way to go.

I'd recommend a different approach. What we did for mcount, which has similar requirements, was that we had the frontend add an attribute, and then a late pass actually added the function calls (lib/CodeGen/CountingFunctionInserter.cpp). I recommend extending CountingFunctionInserter to enable it to add these functions (based on some additional attribute), and then just add whatever attribute is necessary in the frontend based on some new flag.

Would it be OK to run CountingFunctionInserter both before and after inlining and inserting these calls depending on some flag?

In D39287#907578, @hans wrote:

If you only want these calls post-inlining (and otherwise wish to minimize the effect on the rest of the optimizer),

As I mentioned in the description, the calls should generally be inserted before inlining and preserved. Doing them post-inlining should be an optional mode that I'm trying to add. This is sort of the crux of the problem, otherwise inserting the instrumentation later would be the obvious way to go.

I understood. I figured that you would add them later in some mode. Otherwise, you would leave the logic as-is. As you suggest, you could have CountingFunctionInserter run both pre and post inlining and then not have duplicate logic.

I'd recommend a different approach. What we did for mcount, which has similar requirements, was that we had the frontend add an attribute, and then a late pass actually added the function calls (lib/CodeGen/CountingFunctionInserter.cpp). I recommend extending CountingFunctionInserter to enable it to add these functions (based on some additional attribute), and then just add whatever attribute is necessary in the frontend based on some new flag.

Would it be OK to run CountingFunctionInserter both before and after inlining and inserting these calls depending on some flag?

I'm not sure the logic of inserting the calls is complex enough to worry about reducing the duplication of having both logic in the frontend and in CountingFunctionInserter, but if you feel that it is, then I think this makes sense.

hans updated this revision to Diff 122360.Thu, Nov 9, 4:29 PM
hans retitled this revision from Add a flag to disable inlining calls to cygprofile functions generated by -finstrument-functions to Use CountingFunctionInserter both mcount and cygprofile calls, before and after inlining.
hans edited the summary of this revision. (Show Details)
hans edited reviewers, added: hfinkel; removed: thakis.

Here's a new approach which renames and extends the existing CountingFunctionInserter to run both before and after inlining (with the old PM it's two passes actually) and handle both cygprofile and mcount calls as controlled by function attributes.

I haven't hooked up the Clang side yet, but I'd be interested to hear your thoughts on this approach.

rnk added a comment.Thu, Nov 9, 4:34 PM

I think this is the way to go. Thanks! :)

lib/Transforms/Utils/EntryExitInstrumenter.cpp
21–23 ↗(On Diff #122360)

nit: mark these static, sink the anonymous namespace to the two pass structs.

56 ↗(On Diff #122360)

nit: static

hans updated this revision to Diff 122554.Fri, Nov 10, 4:51 PM
hans marked 2 inline comments as done.

Rebasing and addressing comments.

I've also added the pass in PassManagerBuilder::populateFunctionPassManager() because I realized that's how Clang sets up the passes.

One thing that worries me is that this set-up is pretty fragile against how the passes are run. I think it will currently work, but if the pass were to somehow run more than once, the instrumentation could get inserted multiple times. One way to avoid that would be to remove the attribute (or probably faster, give it an empty value?) when the pass "consumes" the attribute. Does that sound reasonable?

I also looked into consolidating this with the FEntryInserter pass which inserts calls to fentry. That turned out to be tricky however, because the calls are supposed to be inserted before the function prologue, which means the pass operates on MachineFunctions and inserts MachineInstrs. So I've left that out.

I've updated the Clang-side patch here: https://reviews.llvm.org/D39331

rnk accepted this revision.Mon, Nov 13, 10:50 AM
In D39287#922534, @hans wrote:

Rebasing and addressing comments.

I've also added the pass in PassManagerBuilder::populateFunctionPassManager() because I realized that's how Clang sets up the passes.

One thing that worries me is that this set-up is pretty fragile against how the passes are run. I think it will currently work, but if the pass were to somehow run more than once, the instrumentation could get inserted multiple times. One way to avoid that would be to remove the attribute (or probably faster, give it an empty value?) when the pass "consumes" the attribute. Does that sound reasonable?

What do other instrumentation passes do to defend against this? I guess they just don't have this problem because they don't appear twice in the pipeline?

I'm in favor of removing the function attribute when we instrument. AttributeLists are immutable, so nuking the whole attribute should be just as fast as setting the value.

I also looked into consolidating this with the FEntryInserter pass which inserts calls to fentry. That turned out to be tricky however, because the calls are supposed to be inserted before the function prologue, which means the pass operates on MachineFunctions and inserts MachineInstrs. So I've left that out.

Oh, wow, that's pretty special functionality. Seems best to leave it alone.

I've updated the Clang-side patch here: https://reviews.llvm.org/D39331

I think this is pretty uncontroversial, so let's remove the attribute on instrumentation, land this, and we can address any feedback from Hal or others in follow-ups.

This revision is now accepted and ready to land.Mon, Nov 13, 10:50 AM
hans added a comment.Mon, Nov 13, 2:00 PM
In D39287#923553, @rnk wrote:
In D39287#922534, @hans wrote:

Rebasing and addressing comments.

I've also added the pass in PassManagerBuilder::populateFunctionPassManager() because I realized that's how Clang sets up the passes.

One thing that worries me is that this set-up is pretty fragile against how the passes are run. I think it will currently work, but if the pass were to somehow run more than once, the instrumentation could get inserted multiple times. One way to avoid that would be to remove the attribute (or probably faster, give it an empty value?) when the pass "consumes" the attribute. Does that sound reasonable?

What do other instrumentation passes do to defend against this? I guess they just don't have this problem because they don't appear twice in the pipeline?

CountingFunctionInserter wouldn't appear in the pipeline twice becaues it runs so late, as part of the codegen passes (which is also why it lived under Codegen/). The sanitizer instrumentation passes avoid the problem by not being in the pipeline by default, but requiring e.g. Clang to insert them.

I'm in favor of removing the function attribute when we instrument. AttributeLists are immutable, so nuking the whole attribute should be just as fast as setting the value.

Done.

I've updated the Clang-side patch here: https://reviews.llvm.org/D39331

I think this is pretty uncontroversial, so let's remove the attribute on instrumentation, land this, and we can address any feedback from Hal or others in follow-ups.

Great, thanks. (I'll just wait until the Clang side is completely worked out)

This revision was automatically updated to reflect the committed changes.