Given a loop with two subloops, it should be possible for both to be converted to hardware loops. That's what this patch does, simply enough. It slightly alters the loop iterating order to try and convert all subloops. If one (or more) succeeds, it stops as before.
Details
- Reviewers
samparker shchenz SjoerdMeijer - Group Reviewers
Restricted Project - Commits
- rG9e03547cab69: [ARM][HWLoops] Create hardware loops for sibling loops
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Since the code change is target independent, better to add a IR test case in llvm/test/Transforms/HardwareLoops/?
llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | ||
---|---|---|
1200 | Looks like this loop shouls also be a hardware loop now? |
hardware loop code change and IR case LGTM. Thanks for doing this.
May need ARM experts to have a look at the ARM related test case changes? @samparker @SjoerdMeijer
Thanks for the review! I'll try and see what's up with those fir tests.
llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | ||
---|---|---|
1200 | Yeah, I've noticed that we sometimes revert still. I will try and see what's going on in this case. |
llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | ||
---|---|---|
1200 | I took a look. We are reverting because this was being create as a WLS and the required branch is now backwards. So we revert to a subs bne pair. I've seen the same things comes up in other places,so have created a ticket for it separately. Maybe we can force this to not happen, or soften WLS back to a compare/branch/DLS. |
hi @dmgreen , could we get this be committed? We meet a similar issue need this fix. Thanks.
Hello. Sorry for the delay.
It turns out this patch, when run over all our benchmarks, was actually making things worse not better! The setup cost of the loop counter (from the expanded SCEV) can end up taking longer than you save for loops with a low iteration count. And I don't have a great way to mitigate that at the moment.
It's still something we obviously need though, and it's not the fault of this patch exactly that it's getting worse. I'll get the patch committed and see if there's anything we can do about the problems at a later point.
Looks like this loop shouls also be a hardware loop now?