This patch combines a VCMP followed by a VPST into a VPT, which has the same semantics as the combination of the former two.
Details
Diff Detail
Event Timeline
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1302 | Shouldn't you be searching for any VCMP opcode? RDA would be a nicer way of finding the VPR def, but that shouldn't be unnecessary anyway - I'm pretty certain the VCMP should be the 'Divergent' instruction. | |
llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination.ll | ||
1 | Probably best not to run at -O3, just in case upstream/downstream have different optimisation pipelines. |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1302 | Checking the Divergent is much better, thanks. I also realised that I wasn't decrementing I if the ++I == E check failed so this new logic is more robust. |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1326 | The VPT block pass has some very similar code. Do we need to check that Operand 1 and Operand 2 have not been modified between the VCMP and where we are materializing the VPT to? |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1326 | That is a good spot. I'll submit a check for this in a follow-up. |
Shouldn't you be searching for any VCMP opcode? RDA would be a nicer way of finding the VPR def, but that shouldn't be unnecessary anyway - I'm pretty certain the VCMP should be the 'Divergent' instruction.