This is an archive of the discontinued LLVM Phabricator instance.

[GCOV] Emit the writeout function as nested loops of global data.
ClosedPublic

Authored by chandlerc on May 2 2018, 5:12 AM.

Details

Summary

Prior to this change, LLVM would in some cases emit *massive* writeout
functions with many 10s of 1000s of function calls in straight-line
code. This is a very wasteful way to represent what are fundamentally
loops and creates a number of scalability issues. Among other things,
register allocating these calls is extremely expensive. While D46127 makes this
less severe, we'll still run into scaling issues with this eventually. If not
in the compile time, just from the code size.

Now the pass builds up global data structures modeling the inputs to
these functions, and simply loops over the data structures calling the
relevant functions with those values. This ensures that the code size is
a fixed and only data size grows with larger amounts of coverage data.

A trivial change to IRBuilder is included to make it easier to build
the constants that make up the global data.

Diff Detail

Repository
rL LLVM

Event Timeline

chandlerc created this revision.May 2 2018, 5:12 AM
davidxl added a subscriber: davidxl.May 2 2018, 8:36 AM

Can you add a new test case testing the new emitted code patterns?

wmi added a comment.May 2 2018, 9:56 AM

The generated code looks great. We may want to do the similar thing in @__llvm_gcov_flush. It generated a bunch of memset in straight line code in a loop of CountersBySP.
call void @llvm.memset.p0i8.i64(i8* align 16 bitcast ([2 x i64]* @__llvm_gcov_ctr to i8*), i8 0, i64 16, i1 false)
call void @llvm.memset.p0i8.i64(i8* align 16 bitcast ([12 x i64]* @__llvm_gcov_ctr.6 to i8*), i8 0, i64 96, i1 false)
call void @llvm.memset.p0i8.i64(i8* align 16 bitcast ([2 x i64]* @__llvm_gcov_ctr.7 to i8*), i8 0, i64 16, i1 false)
...

Some minor/idle comments

llvm/lib/Transforms/Instrumentation/GCOVProfiling.cpp
913 ↗(On Diff #144860)

Maybe worth using 'zip' here to iterate the CUNodes (it does support iterators rather than using getOperand(i))? Might be easier, I guess, if llvm::seq supported more defaults (like defaulting to int, starting at zero, and unbounded (no need to specify an end value if the thing you zip it with is constrained anyway))

973 ↗(On Diff #144860)

Is this tested? (the test case only seems to have function_args)

(also I'm not sure I really understand this bit of the code - it's creating a global variable with a fixed name, but it's doing so inside the loop over all CUs? (would this create duplicate GVs? Do they somehow get renamed to be unique?) Is this code assuming there will only be one non-module-skeleton CU?)

llvm/test/Transforms/GCOVProfiling/function-numbering.ll
22–24 ↗(On Diff #144860)

I'd expect this test should probably test the contents of the gcov_writeout function? Nothing else appears to be testing that code? (given that you changed it significantly & no other tests need updating)

chandlerc updated this revision to Diff 144939.May 2 2018, 2:58 PM
chandlerc marked 2 inline comments as done.

Updated based on review feedback.

In D46357#1085440, @wmi wrote:

The generated code looks great. We may want to do the similar thing in @__llvm_gcov_flush. It generated a bunch of memset in straight line code in a loop of CountersBySP.
call void @llvm.memset.p0i8.i64(i8* align 16 bitcast ([2 x i64]* @__llvm_gcov_ctr to i8*), i8 0, i64 16, i1 false)
call void @llvm.memset.p0i8.i64(i8* align 16 bitcast ([12 x i64]* @__llvm_gcov_ctr.6 to i8*), i8 0, i64 96, i1 false)
call void @llvm.memset.p0i8.i64(i8* align 16 bitcast ([2 x i64]* @__llvm_gcov_ctr.7 to i8*), i8 0, i64 16, i1 false)
...

Sure, but let's do that in a follow-up? It isn't hitting scaling limits nearly as rapidly from my testing.

Can you add a new test case testing the new emitted code patterns?

Yep (see my response to dblaikie's comment below too).

llvm/lib/Transforms/Instrumentation/GCOVProfiling.cpp
913 ↗(On Diff #144860)

I'd like to do this in a follow-up change if I can? I already did more surgery here than I probably should have. Happy to iterate some on cleaner loops though.

973 ↗(On Diff #144860)

I think we always have both? The test only looks at one. I honestly don't understand the *contents* of this well enough to write a test. I now test the writeout which at least clearly shows that the structure ends up being sound here.

Regarding the global variable -- I was assuming they get renamed like SSA values. Their names aren't significant as they don't have linkage. But I can just make the names unique constructively, as that seems cleaner anyways.

llvm/test/Transforms/GCOVProfiling/function-numbering.ll
22–24 ↗(On Diff #144860)

Yeah, I was trying to avoid adding large amounts of new testing but the testing seems super bad here...

I've added a full test of the writeout function now.

dblaikie accepted this revision.May 2 2018, 3:23 PM

Seems good to me - but feel free to wait for Wei's OK too if you like, of course.

This revision is now accepted and ready to land.May 2 2018, 3:23 PM
wmi accepted this revision.May 2 2018, 3:26 PM

LGTM. Thanks!

This revision was automatically updated to reflect the committed changes.