This patch adds support for combining a VPST with a dangling VCMP from a previous VPT block.
Details
Diff Detail
Event Timeline
Is it possible to write some tests that are hopefully simple but still tail predicated, and contain different kinds of vpt blocks with various instructions in various orders? It probably doesn't matter if the instructions are super sensible so long as they show the different sets of blocks with vcmps followed by predicated/unpredicated instructions.
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1540–1541 | I think we need to make sure this _is_ the last instruction in the block. Otherwise we can't remove it. |
I can try!
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1540–1541 | Shouldn't it be removable as long as instructions after it in the block don't use the VCMP's output? |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1540–1541 | The instructions after the VCMP in the vpt block will use the VCMP's predicate, as opposed to the one at the start of the block. VPST VCMP VORR .. VPST The VORR should be using the vpr from the VCMP (and so it can't be removed), and there's probably not a lot of point combining it into the VPST, as the predicate is already set. I couldn't spot anything that check for that already. |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1540–1541 | Sure for that example the VCMP shouldn't be changed. But if the instructions after the VCMP don't use the vpr, then it can be removed right? That, rather than it being the last instruction, is the fundamental requirement I think you're getting at. This would be fine, for example: VPST VCMP ADD (scalar) SUB (scalar) VPST |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1540–1541 | Yep, that sounds fine. I meant last instruction in the VPT block, not last instruction in the basic block. I guess put better - there cannot be any uses of vpr between the VCMP that i being removed and the VPST that is being converted into the VPT. |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1488–1508 | It may be better if this takes VCMP as an argument - that would help keep the logic around here clearer where things are being used. | |
1540–1541 | I feel like this should be using : nullptr, even if just to keep it simpler. From what I can tell the older VCMP wouldn't ever be useful after the point that we have seen something that has altered the vpr predicate. | |
1576 | Checking that an operand is vpr may be better. | |
1594–1595 | I _think_ this case would be invalid input, and not something we would have to worry about (it could be an assert - ideally there would be code in the verifier that checked that VPT blocks were always valid, but no one has written that yet). | |
llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination-across-blocks.mir | ||
7 | If you remove the ".entry" from "bb.0.entry" below, you can probably remove these skeletons too. | |
171–183 | A lot of this can often be removed. | |
348–349 | I don't think this is valid input, with the VPST not creating a block with any instruction and the MVE_VORR not being in a the previous block. Same for some of the tests below. Maybe make sure the last VPST block has some instruction in it, and the middle VPST has the correct block mask |
Check for VPR use rather than predicate, pass VCMP as argument to ReplaceVCMPWithVPT, make nullptr the default VCMP value and remove impossible VPST vase.
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1488–1508 | Sure! | |
1540–1541 | Ah yes, I missed you specifying it being the last in the *vpt* block. | |
1540–1541 | That's a good point! | |
1576 | 👍 | |
1594–1595 | I added this when I was creating test cases but it makes sense that this wouldn't happen naturally. |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1570 | I think this should always be true. Maybe turn it into an assert? | |
1572 | The if VCMP can probably be hoisted to the Insts.front()->getOpcode() == ARM::MVE_VPST if. | |
llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination-across-blocks.mir | ||
284 | MVE_VPST 8 I think, for this block | |
370 | Should be MVE_VPST 4 I think | |
372 | 0 -> 1 for "ARMVCC" operand |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1570 | 👍 | |
1572 | 👍 | |
llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination-across-blocks.mir | ||
7 | That's much cleaner | |
171–183 | Very nice | |
284 | 👍 | |
348–349 | That's true! I've fixed the blocks and renamed the functions to make it more clear what they are testing. | |
370 | 👍 | |
372 | I made it 0 intentionally to simulate a non-predicated vpr use. |
llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination-across-blocks.mir | ||
---|---|---|
370 | Actually the second instruction isn't predicated so this should be 8. |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1595 | Can you clang-format this? | |
1596 | Is it possible to have: VPST VORR vpr VCMP vpr .. VPST VORR vpr .. VPST VORR vpr And would this attempt to fold the VCMP into the last vptblock? Do we have a test for that case? | |
1597 | Do we need to do the same thing to remove kill flags as D90964? | |
llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination-across-blocks.mir | ||
188–191 | Some of these still look a little funny, like they would not really be valid inputs. Are they to test the various edge cases? |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1595 | Ah yes, sorry | |
1597 | Hmm I'm not sure. From what I've seen of the pass' output. the kill flags are still valid when the instruction is removed. | |
llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination-across-blocks.mir | ||
188–191 | The VPST mask does need to be changed. What else looks funny? |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1587 | This could be cleaned up by using std::any_of to iterate through the instructions and then also using the hasVPRUse helper too. |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1596 | This becomes VORR VPT VORR vpr VPST VORR vpr And I have added a test case for this. |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1587 | Oh nice, I didn't know these existed. |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1569 | If Next is only use in an assert, be careful that it does not cause a warning in release mode. | |
1588 | Same if this is only used in an assert. I'm still not 100% sure that this will always hold true with VPNOT or VPSEL (or anything else in the future as things change. They are not supported in TP loops yet but theoretically could be and things like this could easily with how subtle they are). It would probably be better as an if, to be honest. It's only a minor loss of perf if it does happen, and that way it's safer. | |
1597 | It would be something like: VPST 4 VLDR $vpr VCMP q0, q1, $vpr VORR q2, killed q1, $noreg VPST 8 VORR ... $vpr The unpredicated VORR kills the register that the VCMP will still use. | |
llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination-across-blocks.mir | ||
197 | This can be MVE_VPST 8 for a single instruction block. |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1597 | Oh I see. I thought that since the VCMP would use q1 on the next iteration it would still count as being alive. |
It may be better if this takes VCMP as an argument - that would help keep the logic around here clearer where things are being used.