This is an archive of the discontinued LLVM Phabricator instance.

[WIP][shrink-wrap]Sink COPYs to CSR from entry to successors
AbandonedPublic

Authored by junbuml on Dec 1 2017, 3:19 PM.

Details

Summary

In AArch64, ISel create Copy instructions to move function parameters to virtual registers in the entry block. If the parameters are live across a function call, then RA would allocate CSRs for such Copys in the entry.

We should be able to sink such Copy instructions if there is no use in the entry block, but MachineSink (before RA) cannot sink such Copys because SreReg is an allocatable PhyReg. After RA, there is no pass which try to sink such Copy instruction, so shrink-wrapping pass often miss opportunity because the first def of CSR is in the entry block.

This change try to encourage more shrink-wrapping by sinking Copys of which DestReg is CSR from the entry to successors if there there is no use in the entry.

In the IR below, one of the parameter (%paramAcrossCall) live across a function call, but it's only used in %BB0. Currently, in AArch64, we create a "COPY %X19, %W0 " in the entry for this parameter, which prevent this function from being shrink-wrapped. Since this COPY in the entry block is not used in the entry, we can sink it to %BB0, then this function will be shrink-wrapped.

define i32 @shrinkwrapme(i32 %paramAcrossCall, i32 %paramNotAcrossCall) {
entry:
  %cmp5 = icmp sgt i32 %paramNotAcrossCall, 0
  br i1 %cmp5, label %BB0, label %Exit
BB0:
  %call = call i32 @fun()
  %add = add i32 %call, %paramAcrossCall
  ret i32 %add
Exit:
  ret i32 0
}

With this change I observed 10% more shrink-wrapping happened in spec2000/2006/2017.

For now, I perform this only in the entry block as a way to encourage shrink-wrapping, but we might be able to make this more general as a separate pass like Post-RA machine sink. Please let me know any thought.

Diff Detail

Event Timeline

junbuml created this revision.Dec 1 2017, 3:19 PM
thegameg edited edge metadata.

Thanks for working on this!

A few things I am wondering about:

  • What if we can sink through but in the end it's not useful for shrink-wrapping (something else is blocking) ? We should be able to revert, or anticipate this in the first place.
  • I liked the idea that shrink-wrapping is only analyzing and not modifying the code until PEI, is there any other pass where this would fit better?
  • Would a post-RA MachineSink pass make sense here ? (in a way that's what shrink-wrapping really is)
  • I wonder how much extra work the register allocator would do if we sink (during pre-RA MachineSink) the %vreg = COPY %arg into the successor and add %arg as a live-in.
test/CodeGen/AArch64/sink-copy-for-shrink-wrap.ll
2

I think a MIR test would show the use case of this patch much better here.

Do you have benchmark performance results for this change?

What if we can sink through but in the end it's not useful for shrink-wrapping (something else is blocking) ? We should be able to revert, or anticipate this in the first place.

  • This is certainly possible, but I'm not sure about the negative cases by sinking such Copys in post-RA.
  • By moving Copys into a successor, we can avoid executing instructions in case the result is not used.
  • If there is any undesirable impact by sinking with no use of shrink-wrappping, I may want to check if doing so can help shrink wrapping first instead of reverting.

I liked the idea that shrink-wrapping is only analyzing and not modifying the code until PEI, is there any other pass where this would fit better?

  • Actaully, I'm also not fully happy with doing this in shrink-wrapping pass. That's why I said this is WIP.
  • It should be done between RA and shrink-wrapping, but I cannot see any good existing pass where this can be placed.

Would a post-RA MachineSink pass make sense here ? (in a way that's what shrink-wrapping really is)

  • I don't sink this is a bad option if there is any other case which is not handled in pre-RA MachineSink pass.

I wonder how much extra work the register allocator would do if we sink (during pre-RA MachineSink) the %vreg = COPY %arg into the successor and add %arg as a live-in.

  • It sounds like another option. However, for me, based on the comment in FindSuccToSinkTo() in MachineSink, it seems require pretty fundamental changes in RA. Quentin may have opinion about this.

Do you have benchmark performance results for this change?

As I mentioned in summary, I observed 10% more shrink-wrapping happened in spec2000/2006/2017. However, I didn't see significant improvement in scores on AArch64.
One observation I want to share is that if this is performed after the copy source forwaring (like rL312328, which was reverted though at this moment), it will improve spec2006/astar score about 10%.

junbuml updated this revision to Diff 126201.Dec 8 2017, 2:31 PM

Added MIR tests which should show use cases of this patch better.
Again, this patch is preliminary, but I posted this to hear any early high level feedback. Please let me know any thought.

junbuml abandoned this revision.Dec 20 2017, 1:59 PM

Submitted https://reviews.llvm.org/D41463 to cover this.