This is an archive of the discontinued LLVM Phabricator instance.

[ARM][LowOverheadLoops] Merge a VCMP and the new VPST into a VPT
ClosedPublic

Authored by samtebbs on Oct 30 2020, 7:51 AM.

Details

Summary

There is a case where the new VPST created would precede the un-merged VCMP and so would fail a predicate mask assertion since the VCMP wasn't predicated. This is solved by converting the VCMP to a VPT instead of inserting the new VPST.

Diff Detail

Event Timeline

samtebbs created this revision.Oct 30 2020, 7:51 AM
samtebbs requested review of this revision.Oct 30 2020, 7:51 AM
dmgreen added inline comments.Oct 30 2020, 9:28 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1534

We are sure here that Divergent will not be nullptr? So ++ on the iterator is valid?

1580

Which VPST is this? The one for the beginning of this block, or the beginning of the next block?

I think if we have a VCMP in the middle of a block we can convert it to a VPT always.
And if we have a VCMP at the end of a block, we may be able to fold it into the _next_ block if this check that the defs are the same holds.

Which looks like mostly what we have here now, but the folding should happen between the VPST of a block and a VCMP prior to it, if there is one.

Is it worth making getDivergent loop until the last but one instruction in the block, so that it doesn't return divergent if the last instruction is producing a predicate. Then we just remove the predicate as needed.

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination.ll
44

Can you remove hidden and the "local_unnamed_addr #0"

samtebbs added inline comments.Nov 2 2020, 2:59 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1534

There is guaranteed to be a divergent since the block is not uniformly predicated.

1580

This is the one at the start of the loop body block. I can't quite wrap my head around the changes your suggesting. Perhaps they should come as a separate patch later on?

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination.ll
44

Ah yes, I missed that.

samtebbs updated this revision to Diff 302249.Nov 2 2020, 5:26 AM

Clean up the test.

samtebbs marked an inline comment as done.Nov 2 2020, 5:26 AM
samtebbs updated this revision to Diff 303418.Nov 6 2020, 5:41 AM

Add a check for the VPR def at the VPST.

dmgreen added inline comments.Nov 6 2020, 6:40 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1571

I'm still not sure this is the correct VPST to be looking at, in order to replace a VCMP with a VPT. I think either that this will be the last instruction in the block, in which case it needs to be folded into the _next_ instruction, or it will be in the middle of a block that is handled below.

Oh, whilst I write this I see you have a patch for that, fantastic. Is this if block needed, or can it be removed without altering anything else?

samtebbs updated this revision to Diff 303471.Nov 6 2020, 8:51 AM
samtebbs marked an inline comment as done.
samtebbs edited the summary of this revision. (Show Details)

Don't merge a VCMP and a preceeding VPST.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1571

Yes you're right! Thanks for pointing that out. Removing this if-statement can be done without changing anything else.

My follow-up patch will then add support for merging across block boundaries.

dmgreen accepted this revision.Nov 6 2020, 9:16 AM

Thanks. In that case, LGTM!

This revision is now accepted and ready to land.Nov 6 2020, 9:16 AM