Page MenuHomePhabricator

[PM] keeping history when original SCC split and then merge into itself in the same round of SCC update .

Authored by wmi on Oct 4 2018, 4:36 PM.



I am not sure whether this is the right approach, just send it out to get comments and improve it on the way.

In, inline history is added to prevent infinite inlining across multiple run of inliner and SCC update, but the history will only be kept when new SCC is actually generated during SCC update.

We found a case that SCC can be split and then merge into itself in the same round of SCC update, so the same SCC will be pop out from UR.CWorklist and then added back immediately, without any new SCC generated, that is why the existing patch cannot catch the case.

The patch changes the condition when inline history should be kept. If only UR.CWorklist has ever increased after SCC update, it means split has happen and we should keep the inline history to prevent infinite loop.

Diff Detail


Event Timeline

wmi created this revision.Oct 4 2018, 4:36 PM

Just wanted to send a quick note that I saw this and am thinking about the approach and the problem. Thanks for looking at this (really tricky) problem!

I also think checking for the change in worklist size better captures the splitting of the SCC than checking C != OldC

41 ↗(On Diff #168401)

Please shorten this code by making use of -inline-threshold appropriately.

wmi updated this revision to Diff 169111.Oct 10 2018, 3:17 PM

Addressing Easwaran's comment by shortening the testcase.

wmi updated this revision to Diff 169121.Oct 10 2018, 3:52 PM

FileCheck is not needed in the testcase. Removed it.

Minor implementation improvement suggestion...

1164–1171 ↗(On Diff #169121)

Using the size seems a bit odd here -- what about instead checking whether C != OldC or if C is in CWorklist? This is efficient with the given data structure and seems like it captures the intent of what you're checking.

Basic suggestions on test case...

1–2 ↗(On Diff #169121)

We have multiple tests in cgscc-cycle.ll already -- why not add this one there?

I suspect all of those will work fine with whatever inlining threshold makes it easy to exercise.

28 ↗(On Diff #169121)

I strongly prefer to always include check patterns that verify the inlining we expect to happen does happen (or does not happen). Otherwise some other change may make this entire test irrelevant and we would never know.

100–102 ↗(On Diff #169121)

Please always minimize the attributes for test cases.

Similarly, use unmangled symbol names, avoid dso_local, and avoid unnecessary tail, align and other noise in the test case. The goal is to make the IR of the test case as minimal as possible while still exercising the behavior and being easy to read and modify.

Ok, also had some time to really think about the core problem and approach. Write-up below.

Sorry for the spam of different responses as I looked at this from different angles.

24–26 ↗(On Diff #169121)

FWIW, this description and ascii art are *awesome*. Thanks for that.

The only high level question I have here is: *should* we have put the SCC onto the worklist?

One fix (which you have here) is to detect when we will revisit the SCC and retain information to break any cycles in inlining.

Another fix in theory is to avoid putting the SCC onto the queue.

What I haven't yet figured out (but you may have already thought about) is if the split+re-merge operations happen in a context where we might expect interesting SCC-structure changes to have occurred and thus it to be profitable to re-enqueue the SCC on the worklist. If that would normally happen, then ythe fix you have here makes perfect sense. But if this is (for example) just an artifact of the order in which we apply updates, and the SCC is never actually going to be different, maybe we should find a way to avoid enqueuing it? It might be too hard to do that though and we'd still end up where your current patch is...

wmi added a comment.Oct 10 2018, 9:48 PM

Thanks for the review.

1164–1171 ↗(On Diff #169121)

That is clearer to check if OldC is inside of UR.CWorklist. Done.

1–2 ↗(On Diff #169121)

Ok. I move the test into cgscc-cycle.ll

24–26 ↗(On Diff #169121)

Yeah, Easwaran and I considered the fix to avoid putting the unchanged original SCC into the worklist, but we don't know how to efficiently check when the internal structure of the original SCC is unchanged so we can avoid enqueuing it.

28 ↗(On Diff #169121)

Ok, check added.

100–102 ↗(On Diff #169121)

Ok, done.

wmi updated this revision to Diff 169150.Oct 10 2018, 9:58 PM

Address Chandler's comments.

wmi updated this revision to Diff 169715.Oct 15 2018, 9:21 AM

update testcase monster_scc.ll. With the patch, less inlining is happening for monster_scc.ll because we block the inlining inside of the same function in the same SCC from happening more than once.

I do some performance testing on x86 for the patch: run google internal perf benchmark suite and one server benchmark, the performance result is neutral.

wmi added a comment.Oct 16 2018, 11:00 AM

Chandler, could you take another look? Although like the testcase monster_scc.ll shows, current solution using inline history for SCC could reduce the inlining in a SCC, it didn't show up as a problem in performance testing, so can we use the solution for now to unblock the infinite compilation issue until we get a better idea for how to keep the inline history?

chandlerc accepted this revision.Oct 23 2018, 4:02 PM

Sorry for delay...

I'm still curious if there is a good reason why we are re-adding the SCC to the worklist (there may be! just hard to think through it all), but it does seem fine to just directly catch the cycle and block it here, and is the minimal change that should fix things. LGTM.

1168 ↗(On Diff #169715)

nit: s/cache/catch/, s/case/cases/

This revision is now accepted and ready to land.Oct 23 2018, 4:02 PM
wmi marked an inline comment as done.Oct 23 2018, 4:06 PM

Thanks for the review!


This revision was automatically updated to reflect the committed changes.