This is an archive of the discontinued LLVM Phabricator instance.

[CGP] Duplicate addressing computation in cold paths if required to sink addressing mode
ClosedPublic

Authored by reames on Feb 26 2016, 11:09 AM.

Details

Summary

This patch teaches CGP to duplicate addressing mode computations into cold paths (detected via explicit cold attribute on calls) if required to let addressing mode be safely sunk into the basic block containing each load and store.

In general, duplicating code into cold blocks may result in code growth, but should not effect performance. In this case, it's better to duplicate some code than to put extra pressure on the register allocator by making it keep the address through the entirely of the fast path.

This patch only handles addressing computations, but in principal, we could implement a more general cold cold scheduling heuristic which tries to reduce register pressure in the fast path by duplicating code into the cold path. Getting the profitability of the general case right seemed likely to be challenging, so I stuck to the existing case (addressing computation) we already had.

Diff Detail

Repository
rL LLVM

Event Timeline

reames updated this revision to Diff 49209.Feb 26 2016, 11:09 AM
reames retitled this revision from to [CGP] Duplicate addressing computation in cold paths if required to sink addressing mode.
reames updated this object.
reames added a subscriber: llvm-commits.
davidxl edited edge metadata.Feb 29 2016, 4:17 PM

A general problem I see in address sinking optimization is the lack of global cost analysis -- the analysis should consider

  1. the cost of the folded instruction (if the address can be folded in)
  2. whether the original address instruction can be eliminated (for instance, only sunk into some of the uses)
  3. if the address can not be folded, what is the additional cost
  4. what is the estimated register pressure reduction with the address sinking

The above should be all weighted with the block frequency information.

lib/CodeGen/CodeGenPrepare.cpp
1767 ↗(On Diff #49209)

Why limiting this just to call instruction? How about any other uses of address expressions in cold blocks? Also why just check 'cold' attribute of the callsite? How about other cold blocks?

1772 ↗(On Diff #49209)

If we extend this to handle calls -- the optimizeMemoryInst's comments need to be fixed -- the first instruction may not just be memorInstr.

reames added a comment.Mar 2 2016, 4:12 PM

A general problem I see in address sinking optimization is the lack of global cost analysis -- the analysis should consider

  1. the cost of the folded instruction (if the address can be folded in)
  2. whether the original address instruction can be eliminated (for instance, only sunk into some of the uses)
  3. if the address can not be folded, what is the additional cost
  4. what is the estimated register pressure reduction with the address sinking

I'm confused by your comment. Are you talking about my changes, or the existing code? The existing code does most of this (except 1) and my extension preserves all of that except when duplicating into a cold call.

The above should be all weighted with the block frequency information.

While I agree, that's a separate change - particularly since CFG does not currently use profiling info at all.

reames updated this revision to Diff 49689.Mar 2 2016, 4:50 PM
reames edited edge metadata.

Update per David's comments - not some emails don't seem to have made it into the phabricator thread.

The new test cases are great.

test/Transforms/CodeGenPrepare/X86/sink-addrmode.ll
7 ↗(On Diff #49689)

nit: computatio -- computation

54 ↗(On Diff #49689)

Do we need to add a test in Code gen to make sure the desired behavior is produced?

77 ↗(On Diff #49689)

Add a negative case to make sure it is not enabled for optSize

129 ↗(On Diff #49689)

Add a comment for this test.

reames updated this revision to Diff 49848.Mar 4 2016, 1:26 PM

Address David's comments

davidxl accepted this revision.Mar 4 2016, 1:32 PM
davidxl edited edge metadata.

lgtm. I have never approved patches in this area before, so it might better to wait for one of the other reviewers to chime in.

This revision is now accepted and ready to land.Mar 4 2016, 1:32 PM
mehdi_amini added inline comments.Mar 4 2016, 2:11 PM
test/Transforms/CodeGenPrepare/X86/sink-addrmode.ll
10 ↗(On Diff #49848)

(I know you didn't create, but) having named value should be avoided for optimized build, we should fix CGP. Adding tests that relies on the named value is making it harder to fix this.

reames added inline comments.Mar 4 2016, 2:29 PM
test/Transforms/CodeGenPrepare/X86/sink-addrmode.ll
10 ↗(On Diff #49848)

Huh? This is the first I've heard of this advice. Named values are widespread in the optimizer. If we want to change that, it's definitely a separate change.

mehdi_amini accepted this revision.Mar 7 2016, 4:08 PM
mehdi_amini edited edge metadata.

Talking with Duncan, we are proposing a scheme that allows to strip out the named value with a runtime flag: http://reviews.llvm.org/D17946

Please don't hold back this revision.

reames updated this revision to Diff 50185.Mar 9 2016, 12:51 PM
reames edited edge metadata.

Update test case per Mehdi's comments.

reames added a subscriber: reames.Mar 9 2016, 1:00 PM

As a gesture of good faith, I've updated the patch under discussion to
avoid relying on the named instructions.

On the general topic of value naming, I've replied to the review Mehdi
mentioned (http://reviews.llvm.org/D17946) with my opinions on the
matter. My main concern is making sure that we don't introduce an
expectation that release mode code doesn't have value names. Having the
value names is invaluable from a debugging standpoint. It seems like the
existing proposal in that review would not break this use case, so my
concern is much less than I initially thought it might be.

Mehdi, just to confirm, are you still happy to see this patch land? The
discussion got a bit fragmented so I want to confirm before landing it.

Philip

This revision was automatically updated to reflect the committed changes.