This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Recognise the hardwareloop "loop.decrement.reg" intrinsic
ClosedPublic

Authored by SjoerdMeijer on Dec 16 2019, 12:01 PM.

Details

Summary

Teach SCEV about the @loop.decrement.reg intrinsic, which has exactly the same semantics as a sub expression. The generic HardwareLoop pass introduced 3 intrinsics to model hardwareloops in IR; the @loop.decrement.reg is used to update the loop induction variable.

Teaching SCEV about @loop.decrement.reg means we can also use SCEV for hardwareloops. For example, we would like to rematerialize the loop iteration count value in loop exit blocks for hardwareloops, and this change allows us to do exactly that. This will enable us to remove any use of loop iteration counts in the hardware loop. I will follow up shortly to further support this (but that will be more ARM specific).

This "int_loop_decrement_reg" intrinsic is defined as "IntrNoDuplicate". Thus, while hardwareloops and tripcounts now become analysable by SCEV, this prevents the usual loop transformations from applying transformations on hardwareloops, which is what we want. I have added test cases for loopunrolling and IndVarSimplify and LFTR for this.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Dec 16 2019, 12:01 PM

llvm.loop.decrement.reg is not documented in the langref. Is this a target specific thing?

Also, is the entire gamut of optimizations this enables OK? For instance, would it be okay to LFTR replace the trip count of the loop such that it doesn't use the loop_decrement_reg intrinsic anymore?

Thanks for taking a look!

llvm.loop.decrement.reg is not documented in the langref. Is this a target specific thing?

No, it's a target independent thing, and is generated by the generic Hardwareloop pass. It is defined here:

https://github.com/llvm/llvm-project/blob/ff07fc66d9eef577f3b44716f72e581a18cd9ac9/llvm/include/llvm/IR/Intrinsics.td#L1304

Also, is the entire gamut of optimizations this enables OK? For instance, would it be okay to LFTR replace the trip count of the loop such that it doesn't use the loop_decrement_reg intrinsic anymore?

Hmmm, I don't have a good answer for that yet. First, just to be clear, the llvm.loop.decrement.reg intrinsic is exactly the same as a sub expression, so I think SCEV analysis functions should be okay. But yes, I think I can imagine that SCEV transformations that modify the loop update expression could potentially be problematic. But this is a bit of unexplored territory for me, so first I need to look into what LFTR exactly does. So, I will need to so some homework first, but if we assume for a moment that there will be problematic cases with SCEV transformation modifying the loop update expression (if they would ignore the loop.decremenent intrinsic if there is one present), does that make this approach completely unfeasible, or would you still see possibilities to use the SCEV analysis functions?

nikic added a comment.Dec 16 2019, 1:19 PM

LFTR is quite picky, it only handles add/sub/getelementptr instructions in the addrec. So at least that particular case shouldn't pose a problem.

Thanks for taking a look!

llvm.loop.decrement.reg is not documented in the langref. Is this a target specific thing?

No, it's a target independent thing, and is generated by the generic Hardwareloop pass. It is defined here:

https://github.com/llvm/llvm-project/blob/ff07fc66d9eef577f3b44716f72e581a18cd9ac9/llvm/include/llvm/IR/Intrinsics.td#L1304

Also, is the entire gamut of optimizations this enables OK? For instance, would it be okay to LFTR replace the trip count of the loop such that it doesn't use the loop_decrement_reg intrinsic anymore?

Hmmm, I don't have a good answer for that yet. First, just to be clear, the llvm.loop.decrement.reg intrinsic is exactly the same as a sub expression, so I think SCEV analysis functions should be okay. But yes, I think I can imagine that SCEV transformations that modify the loop update expression could potentially be problematic. But this is a bit of unexplored territory for me, so first I need to look into what LFTR exactly does. So, I will need to so some homework first, but if we assume for a moment that there will be problematic cases with SCEV transformation modifying the loop update expression (if they would ignore the loop.decremenent intrinsic if there is one present), does that make this approach completely unfeasible, or would you still see possibilities to use the SCEV analysis functions?

I think the right mental model is that SCEV's users can rewrite the loop exit condition in arbitrary ways once SCEV can "understand" it. It may not be so bad in practice, but this is how I think about it.

Thanks again for sharing your thoughts and comments. I've done a bit of my homework:

LFTR is quite picky, it only handles add/sub/getelementptr instructions in the addrec. So at least that particular case shouldn't pose a problem.

That's one way of phrasing it :-) The other way is that LFTR analyses expressions, and gracelessly bails if it doesn't understand the expression. So yes, I also don't see a problem here.

I think the right mental model is that SCEV's users can rewrite the loop exit condition in arbitrary ways once SCEV can "understand" it. It may not be so bad in practice, but this is how I think about it.

Yep, got it, cheers.

What do we think? Is this something we could add to SCEV?
If there are more places to investigate or change, I am of course more than willing to do this.

It's currently the backends responsibility to lower and handle any of the intrinsics that are inserted for hardware loops, so I don't think we should be concerned with other transforms triggering. I think the benefit of SCEV still working for these loops far outweighs the codegen effort - especially since this effort already has to be done because of isel and machine optimisations. That said, it probably wouldn't hurt to add some tests for common transforms especially ones where the loop body would be duplicated.

SjoerdMeijer edited the summary of this revision. (Show Details)

I have added test cases for LFTR and loopunrolling showing that the said transformations don't trigger as int_loop_decrement_reg is described as IntrNoDuplicate

samparker accepted this revision.Jan 9 2020, 8:45 AM

I reckon adding something to that unroll test that would consistently drive the unroller, like a triple, should be added. Otherwise, LGTM.

This revision is now accepted and ready to land.Jan 9 2020, 8:45 AM

Thanks, and I will add that before committing.

This revision was automatically updated to reflect the committed changes.

SCEV code looks fine, no comment on the expected semantics of the intrinsic.

llvm/lib/Analysis/ScalarEvolution.cpp
4512

This might be slightly cleaner using m_Intrinsic pattern match, but it's minor at best and a subjective judgement call. Take your pick.

4513

Rather than a single case switch, use a if please.

Thanks, and comment addressed in rG07028b5a8780.