This patch uses helper function rewriteLoopExitValues that is refactored in D72602 to rematerialise the iteration count in exit blocks, so that we can clean-up loop update expressions inside the hardware-loops later in ARMLowOverheadLoops.
Details
Diff Detail
Event Timeline
I had not realised that this required LCSSA form. That will likely cause many more changed than you really want. It's probably best not to create LCSSA form here without destroying it afterwards. Do you know if LoopSimpleForm is needed too? It looks from the code that it shouldn't (it can handle multiple exits).
You might be better off only creating LCSSA form for the loops that really need it. Otherwise I imagine a lot of tests change. There is formLCSSA and formLCSSARecursively. The first I one is probably fine for individual loops. I vaguely remember something that will go through and remove the one operand phis too, but I'm not sure what that is called (other than instcombine).
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
596 | It's probably better to be explicit about what is needed and preserved here. I don't think LoopSimpleForm would be needed, and you can probably get away with not requiring LCSSA either. |
I vaguely remember something that will go through and remove the one operand phis too, but I'm not sure what that is called (other than instcombine).
I'm hoping that this close to codegen, the phis won't be a problem..?
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
61 | Doesn't look like these need to be members. |
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
426 | And we should be able to just insert a phi here, with the vctp operand as the operand to the phi and the vctp then using the phi. |
Thanks for sharing your thoughts!
So I had also not realised that this requires LCSSA, and for a bit more context: in my first experiment, I was running the whole IndVarSimplify pass directly after MVETailPredicationPass, which takes care of bringing it in LCSSA form as this is a required pre-condition for a few things. The more focused approach, only using rewriteLoopExitValues() as a helper which is what this patch is about, and not running IndVarSimplify, this LCSSA assert was triggered.
I will now look into this, and thanks for your suggestions: if it is just calling formLCSSA() that would be ideal, but if that is a bit overkill I will just insert the Phi.
I have removed WIP and TODO from the patch subject/description, as this revision now demonstrates I think what this patch should be doing:
- I have removed LoopPass from this, as this indeed causes a few more passes to run, which we don't need, and causes (unrelated) changes elsewhere. Now, the only test changes are in the LowOverheadLoop tests, which is what we want.
- I use formLCSSA() to transform the loop into LCSSA form if it isn't.
This gives you a change to already look at the codegen changes and differences, while I will probably need to add some more opt tail-predication tests for this. The codegen changes demonstrate the benefit from moving loop update instruction out of the loop, at the cost of setting up the count in the preheader.
Good to see those looks finally loosing those subs and movs!
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
141 | Do we know if the loop entering the pass are in LCSSA already? If so, it seems a little excessive to call this when we only have one instruction to fix up... | |
228 | I'm pretty sure you only want to do this when you've cloned the vctp in an exit block. | |
llvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll | ||
1000 | These changes seem unnecessary. |
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
141 | Do you need to test this? Will formLCSSA either just work or not by calling it? I just noticed an EXPENSIVE_CHECKS in formLCSSA that checks if the subloops are already in LCSSA form. Maybe use formLCSSARecursively instead? | |
144–145 | Space after the "," | |
147 | Does formLCSSA return false on failure, or nothing changed? |
I think this should cover it now, and address the comments:
- RematerializeIterCount() is only called when we've cloned the VCTP instruction in an exit block, this makes the unrelated test case go away.
- RematerializeIterCount() has become really simple with just a call to formLCSSARecursively() (and of course rewriteLoopExitValues()) because when there's nothing to do, formLCSSARecursively is a cheap operation, so we indeed don't need to check first if the loop is in LCSSA form.
- One of the motivating examples for this work and optimisation is a reduction in vector-reduce-mve-tail.ll. I decided to use `formLCSSARecursively, instead of manually inserting a phi, because there are a few values live-out, the ones that feed the final reduction. We need to insert 3 Phi nodes, thus I don't think it is worth optimising and calling formLCSSARecursively is best.
- This is tested by exiting tests. For example, the reductions vector-reduce-mve-tail.ll will bring it in LCSSA form first, and is testing that. Simpler tests elsewhere check when this doesn't need to happen first.
Seems good to me, if Sam is happy with the tail predication parts.
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
68–69 | This can be in the same block as the variables above. |
Doesn't look like these need to be members.