This is an archive of the discontinued LLVM Phabricator instance.

[PostRASink]Sink spill to a block reachable to reload
Needs ReviewPublic

Authored by junbuml on Mar 30 2018, 2:44 PM.

Details

Summary

This change sink a spill into a successor, if reloads from the stack slot
are reachable only through the successor.

For the machine IR below, we will sink the store to %stack.0 in bb.0 into
bb.1 because there is no reload from stack.0 in bb.2.

bb.0:
  STRXui $x0, %stack.0, 0
  Bcc 11, %bb.2

bb.1:
  $x0 = LDRXui %stack.0, 0
  RET $x0

bb.2:
  $x0 = COPY $xzr
  RET   $x0

Diff Detail

Event Timeline

junbuml created this revision.Mar 30 2018, 2:44 PM
junbuml edited the summary of this revision. (Show Details)Mar 30 2018, 2:46 PM
junbuml edited the summary of this revision. (Show Details)
junbuml updated this revision to Diff 140850.Apr 3 2018, 1:10 PM
junbuml retitled this revision from [PostRASink][WIP]Sink spill to a block reachable to reload to [PostRASink]Sink spill to a block reachable to reload.
junbuml added a reviewer: rnk.Apr 3 2018, 1:11 PM

With/without this change, I collected llvm stats for spec2000/2006/2017 on AArch64. I observed +31.87% more loads from stores promoted in AArch64LoadStoreOptimizer pass, and minor improvement in shrink-wrapping in case spills are sunk from the entry.

Thanks for working on this.

I’m wondering why do we end up with spills placed like this in the first place?

My initial motivation case for this was when spilling the incoming argument register. This is somewhat related with the initial motivation of PostRASink pass because we do not sink the COPY for argument register before allocating. When RA try to spill the incoming argument register, it probably don't want to change the placement of the spill during RA, since sinking it down will extend the live range of the spilled value. So I do this after RA just like we did it for Copy in PostRASink pass.

Kindly ping?

rnk removed a reviewer: rnk.Apr 26 2018, 1:45 PM

Sorry for the wait, this looks very good! Did you see any compile time impact on this?

I'll check how the benchmarks are affected on my side next week and let you know!

lib/CodeGen/MachineSink.cpp
1288

From what I see, hasLoadFromStackSlot checks for MachineMemoryOperands attached to the MI. It looks for things like :: (load 8 from %stack.0), and if the load source is a fixed stack it grabs the FI.

I wonder what happens if an instruction:

  • MI.mayLoad == true
  • any_of(MI.operands(), isFI) == true
  • but MI.machinememoperands().empty() == true

I've been confused about MachineMemoryOperands for a while now, so I'm not sure if an instruction with no MMOs should act like a barrier or something else. Are there any other passes that rely on this?

What do you think? I think if we can rely on MMOs it would be great, and we should document it somewhere.

This shouldn't be blocking for this patch, but if you can document these assumptions it would be great.

1288

Slightly related, I've been wanting to add this to the MachineVerifier. It would be nice if mayLoad / mayStore instructions don't pass the verifier on some conditions like:

  • has no MMO
  • has MMO == FixedStack but no (fixed) FI operands
  • has MMO != FixedStack but has FI operands
  • probably more we can check here
1319–1320

Actually, why don't we sink across function calls? I would assume the regmasks and the implicit operands would be enough to keep it safe, but I might be wrong.

thegameg added inline comments.Apr 27 2018, 12:30 PM
lib/CodeGen/MachineSink.cpp
1288

Hmm actually, why not use isLoadFromStackSlot here as you do for stores?

junbuml updated this revision to Diff 144753.May 1 2018, 11:11 AM

Refactors this change based on current tip. Made the check for load from FI more conservative.

In my test for spec2000/2006/2017 on AArach64, I didn't see any compile time impact.

lib/CodeGen/MachineSink.cpp
1288

From what I see, hasLoadFromStackSlot checks for MachineMemoryOperands attached to the MI. It looks for things >like :: (load 8 from %stack.0), and if the load source is a fixed stack it grabs the FI.

I wonder what happens if an instruction:

MI.mayLoad == true
any_of(MI.operands(), isFI) == true
but MI.machinememoperands().empty() == true
I've been confused about MachineMemoryOperands for a while now, so I'm not sure if an instruction with no MMOs >should act like a barrier or something else. Are there any other passes that rely on this?

What do you think? I think if we can rely on MMOs it would be great, and we should document it somewhere.

This shouldn't be blocking for this patch, but if you can document these assumptions it would be great.

Thanks for bring this up. I can see similary code in InstructionStoresToFI() @MachineLICM.
If MMO is empty, I conservatively assume an instruction load from some FI.

Slightly related, I've been wanting to add this to the MachineVerifier.
It would be nice if mayLoad / mayStore instructions don't pass the verifier on some conditions like:

has no MMO
has MMO == FixedStack but no (fixed) FI operands
has MMO != FixedStack but has FI operands
probably more we can check here

I agree with this check and I will be happy to do it. I will update you for this.

1319–1320

I agree that we can use regmask and implicit operands, but I'm not perfectly clear if it's safe in all targets and calling conventions. I add FIXME about it. Another issue here is that this also skip DBG_VALUE, causing different code being generated with -g. A fix is being discussed in https://reviews.llvm.org/D45878.

Any chance the way FIs are tracked here might be prone to this bug ?