There is a case where the new VPST created would precede the un-merged VCMP and so would fail a predicate mask assertion since the VCMP wasn't predicated. This is solved by converting the VCMP to a VPT instead of inserting the new VPST.
Details
Diff Detail
Unit Tests
Event Timeline
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1534 | We are sure here that Divergent will not be nullptr? So ++ on the iterator is valid? | |
1569 | Which VPST is this? The one for the beginning of this block, or the beginning of the next block? I think if we have a VCMP in the middle of a block we can convert it to a VPT always. Which looks like mostly what we have here now, but the folding should happen between the VPST of a block and a VCMP prior to it, if there is one. Is it worth making getDivergent loop until the last but one instruction in the block, so that it doesn't return divergent if the last instruction is producing a predicate. Then we just remove the predicate as needed. | |
llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination.ll | ||
44 | Can you remove hidden and the "local_unnamed_addr #0" |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1534 | There is guaranteed to be a divergent since the block is not uniformly predicated. | |
1569 | This is the one at the start of the loop body block. I can't quite wrap my head around the changes your suggesting. Perhaps they should come as a separate patch later on? | |
llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination.ll | ||
44 | Ah yes, I missed that. |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1571 | I'm still not sure this is the correct VPST to be looking at, in order to replace a VCMP with a VPT. I think either that this will be the last instruction in the block, in which case it needs to be folded into the _next_ instruction, or it will be in the middle of a block that is handled below. Oh, whilst I write this I see you have a patch for that, fantastic. Is this if block needed, or can it be removed without altering anything else? |
Don't merge a VCMP and a preceeding VPST.
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
1571 | Yes you're right! Thanks for pointing that out. Removing this if-statement can be done without changing anything else. My follow-up patch will then add support for merging across block boundaries. |
We are sure here that Divergent will not be nullptr? So ++ on the iterator is valid?