This is an archive of the discontinued LLVM Phabricator instance.

[ARM][HWLoops] Create hardware loops for sibling loops
ClosedPublic

Authored by dmgreen on Apr 20 2020, 9:10 AM.

Details

Summary

Given a loop with two subloops, it should be possible for both to be converted to hardware loops. That's what this patch does, simply enough. It slightly alters the loop iterating order to try and convert all subloops. If one (or more) succeeds, it stops as before.

Diff Detail

Event Timeline

dmgreen created this revision.Apr 20 2020, 9:10 AM

Since the code change is target independent, better to add a IR test case in llvm/test/Transforms/HardwareLoops/?

dmgreen updated this revision to Diff 258914.Apr 21 2020, 1:05 AM

Good idea. Added an IR test.

samparker added inline comments.Apr 21 2020, 2:03 AM
llvm/test/CodeGen/Thumb2/mve-float16regloops.ll
1200

Looks like this loop shouls also be a hardware loop now?

shchenz accepted this revision as: shchenz.Apr 21 2020, 2:07 AM

hardware loop code change and IR case LGTM. Thanks for doing this.

May need ARM experts to have a look at the ARM related test case changes? @samparker @SjoerdMeijer

This revision is now accepted and ready to land.Apr 21 2020, 2:07 AM

oops, didn't realize @samparker already gave comments...

dmgreen marked an inline comment as done.Apr 21 2020, 2:12 AM

Thanks for the review! I'll try and see what's up with those fir tests.

llvm/test/CodeGen/Thumb2/mve-float16regloops.ll
1200

Yeah, I've noticed that we sometimes revert still. I will try and see what's going on in this case.

dmgreen marked an inline comment as done.Apr 27 2020, 3:29 AM
dmgreen added inline comments.
llvm/test/CodeGen/Thumb2/mve-float16regloops.ll
1200

I took a look. We are reverting because this was being create as a WLS and the required branch is now backwards. So we revert to a subs bne pair. I've seen the same things comes up in other places,so have created a ticket for it separately. Maybe we can force this to not happen, or soften WLS back to a compare/branch/DLS.

Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2020, 3:29 AM

Ok, thanks. LGTM

hi @dmgreen , could we get this be committed? We meet a similar issue need this fix. Thanks.

Hello. Sorry for the delay.

It turns out this patch, when run over all our benchmarks, was actually making things worse not better! The setup cost of the loop counter (from the expanded SCEV) can end up taking longer than you save for loops with a low iteration count. And I don't have a great way to mitigate that at the moment.

It's still something we obviously need though, and it's not the fault of this patch exactly that it's getting worse. I'll get the patch committed and see if there's anything we can do about the problems at a later point.

This revision was automatically updated to reflect the committed changes.

Hello. Sorry for the delay.

It turns out this patch, when run over all our benchmarks, was actually making things worse not better! The setup cost of the loop counter (from the expanded SCEV) can end up taking longer than you save for loops with a low iteration count. And I don't have a great way to mitigate that at the moment.

It's still something we obviously need though, and it's not the fault of this patch exactly that it's getting worse. I'll get the patch committed and see if there's anything we can do about the problems at a later point.

Thank you!