Page MenuHomePhabricator

[ARM][LowOverheadLoops] Remove dead loop update instructions
ClosedPublic

Authored by SjoerdMeijer on Dec 4 2019, 5:10 AM.

Details

Summary

After creating a low-overhead loop, the loop update instruction was still lingering around hurting performance. This removes dead loop update instructions (SUBS instructions).

To support this, some helper functions were added to MachineLoopUtils and ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses before a particular loop instruction, respectively.

This is a first version that removes a SUBS instruction when there are no other uses inside and outside the loop block, but there are some more interesting cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which show that there is room for improvement. For example, we can't handle this case yet:

    
  ..
  dlstp.32  lr, r2
.LBB0_1:
  mov r3, r2
  subs  r2, #4
  vldrh.u32 q2, [r1], #8
  vmov  q1, q0
  vmla.u32  q0, q2, r0
  letp  lr, .LBB0_1
@ %bb.2:
  vctp.32 r3
  ..

which is a lot more tricky because r2 is not only used by the subs, but also by the mov to r3, which is used outside the low-overhead loop by the vctp instruction.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Dec 4 2019, 5:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 4 2019, 5:10 AM
samparker added inline comments.Dec 5 2019, 7:06 AM
llvm/lib/CodeGen/ReachingDefAnalysis.cpp
275

I would favour just returning a bool here, or renaming the method.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
55

I think now would be a good time to add an assertion that we're always operating on a loop with a single block.

673

LastInstrBlock should be Def, right? Looks like you'll need to add another test case to catch this.

Comments addressed and thanks for catching that problem, a test case has been added.

samparker added inline comments.Dec 9 2019, 2:36 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
680

I'm not sure this is doing what you want.. wouldn't the vctp likely be a use before def..? We should also be checking that the Def isn't defining cpsr, and if it is, that the def is dead.

771

This should be in your new function, I don't think there's anything stopping us from generating a normal low-overhead loop with multiple blocks.

SjoerdMeijer updated this revision to Diff 233035.EditedDec 10 2019, 3:24 AM

I'm not sure this is doing what you want.. wouldn't the vctp likely be a use before def..?

Thanks for catching that one too. This was a off-by one error in that function, I wasn't considering the first instruction in the block. The check is now done in a loop, finding defs until we reached the beginning of the block, but ignoring it if is a VCTP instruction.

We should also be checking that the Def isn't defining cpsr, and if it is, that the def is dead.

Yep, cheers, and tests have been added too.

samparker added inline comments.Dec 10 2019, 5:52 AM
llvm/lib/CodeGen/ReachingDefAnalysis.cpp
282

How about using reverse iterators? Probably would have saved you from the the off-by-one?

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
695

Looks like it would be better to modify the RDA method to provide all the uses. Then we'll have similar functionality to getReachingLocalUses.

Thanks, comments addressed.

samparker accepted this revision.Dec 11 2019, 1:53 AM

I don't see the value in splitting getInstWithUseBefore and getAllInstWithUseBefore, I think the logic would be easier to understand if combined.

This revision is now accepted and ready to land.Dec 11 2019, 1:53 AM

Thanks for the reviews.

I don't see the value in splitting getInstWithUseBefore and getAllInstWithUseBefore

I thought the benefit is having these 2 primitives, as I see them both being useful. And since getAllInstWithUseBefore is not doing anything than just driving getInstWithUseBefore, I thought this was a nice abstraction. If I change my mind on this later, I will follow up on this.

This revision was automatically updated to reflect the committed changes.