This is an archive of the discontinued LLVM Phabricator instance.

[ARM][LowOverheadLoops] Merge VCMP and VPST across VPT blocks
ClosedPublic

Authored by samtebbs on Nov 6 2020, 6:12 AM.

Details

Summary

This patch adds support for combining a VPST with a dangling VCMP from a previous VPT block.

Diff Detail

Event Timeline

samtebbs created this revision.Nov 6 2020, 6:12 AM
samtebbs requested review of this revision.Nov 6 2020, 6:12 AM

Is it possible to write some tests that are hopefully simple but still tail predicated, and contain different kinds of vpt blocks with various instructions in various orders? It probably doesn't matter if the instructions are super sensible so long as they show the different sets of blocks with vcmps followed by predicated/unpredicated instructions.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1537–1538

I think we need to make sure this _is_ the last instruction in the block. Otherwise we can't remove it.

Is it possible to write some tests that are hopefully simple but still tail predicated, and contain different kinds of vpt blocks with various instructions in various orders? It probably doesn't matter if the instructions are super sensible so long as they show the different sets of blocks with vcmps followed by predicated/unpredicated instructions.

I can try!

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1537–1538

Shouldn't it be removable as long as instructions after it in the block don't use the VCMP's output?

dmgreen added inline comments.Nov 9 2020, 9:52 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1537–1538

The instructions after the VCMP in the vpt block will use the VCMP's predicate, as opposed to the one at the start of the block.
So something like:

VPST
VCMP
VORR
..
VPST

The VORR should be using the vpr from the VCMP (and so it can't be removed), and there's probably not a lot of point combining it into the VPST, as the predicate is already set.

I couldn't spot anything that check for that already.

samtebbs added inline comments.Nov 10 2020, 6:17 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1537–1538

Sure for that example the VCMP shouldn't be changed. But if the instructions after the VCMP don't use the vpr, then it can be removed right? That, rather than it being the last instruction, is the fundamental requirement I think you're getting at. This would be fine, for example:

VPST
VCMP
ADD (scalar)
SUB (scalar)
VPST
dmgreen added inline comments.Nov 10 2020, 10:59 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1537–1538

Yep, that sounds fine. I meant last instruction in the VPT block, not last instruction in the basic block.

I guess put better - there cannot be any uses of vpr between the VCMP that i being removed and the VPST that is being converted into the VPT.

samtebbs updated this revision to Diff 304527.Nov 11 2020, 8:32 AM
samtebbs marked 2 inline comments as done.

Add more tests and check for instructions between the VCMP and VPST that use vpr.

dmgreen added inline comments.Nov 12 2020, 12:06 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1487–1507

It may be better if this takes VCMP as an argument - that would help keep the logic around here clearer where things are being used.

1537–1538

I feel like this should be using : nullptr, even if just to keep it simpler. From what I can tell the older VCMP wouldn't ever be useful after the point that we have seen something that has altered the vpr predicate.

1573

Checking that an operand is vpr may be better.

1591–1592

I _think_ this case would be invalid input, and not something we would have to worry about (it could be an assert - ideally there would be code in the verifier that checked that VPT blocks were always valid, but no one has written that yet).

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination-across-blocks.mir
7

If you remove the ".entry" from "bb.0.entry" below, you can probably remove these skeletons too.

171–183

A lot of this can often be removed.

348–349

I don't think this is valid input, with the VPST not creating a block with any instruction and the MVE_VORR not being in a the previous block. Same for some of the tests below.

Maybe make sure the last VPST block has some instruction in it, and the middle VPST has the correct block mask

samtebbs updated this revision to Diff 304810.Nov 12 2020, 6:00 AM
samtebbs marked 5 inline comments as done.

Check for VPR use rather than predicate, pass VCMP as argument to ReplaceVCMPWithVPT, make nullptr the default VCMP value and remove impossible VPST vase.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1487–1507

Sure!

1537–1538

Ah yes, I missed you specifying it being the last in the *vpt* block.

1537–1538

That's a good point!

1573

👍

1591–1592

I added this when I was creating test cases but it makes sense that this wouldn't happen naturally.

samtebbs updated this revision to Diff 304841.Nov 12 2020, 7:59 AM
samtebbs marked 3 inline comments as done.

Clean up tests

dmgreen added inline comments.Nov 12 2020, 10:04 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1567

I think this should always be true. Maybe turn it into an assert?

1569

The if VCMP can probably be hoisted to the Insts.front()->getOpcode() == ARM::MVE_VPST if.

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination-across-blocks.mir
284

MVE_VPST 8 I think, for this block

370

Should be MVE_VPST 4 I think

372

0 -> 1 for "ARMVCC" operand

samtebbs added inline comments.Nov 13 2020, 2:52 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1567

👍

1569

👍

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination-across-blocks.mir
7

That's much cleaner

171–183

Very nice

284

👍

348–349

That's true! I've fixed the blocks and renamed the functions to make it more clear what they are testing.

370

👍

372

I made it 0 intentionally to simulate a non-predicated vpr use.

samtebbs added inline comments.Nov 13 2020, 2:54 AM
llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination-across-blocks.mir
370

Actually the second instruction isn't predicated so this should be 8.

samtebbs updated this revision to Diff 305451.Nov 16 2020, 3:17 AM

Get VPR def when merging across blocks rather than keeping a pointer to the VCMP.

samtebbs marked 2 inline comments as done.Nov 16 2020, 3:17 AM
dmgreen added inline comments.Nov 16 2020, 3:50 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1592

Can you clang-format this?

1593

Is it possible to have:

VPST
VORR vpr
VCMP vpr
..
VPST
VORR vpr
..
VPST
VORR vpr

And would this attempt to fold the VCMP into the last vptblock? Do we have a test for that case?

1594

Do we need to do the same thing to remove kill flags as D90964?

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination-across-blocks.mir
188–191

Some of these still look a little funny, like they would not really be valid inputs. Are they to test the various edge cases?

samtebbs added inline comments.Nov 17 2020, 2:25 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1592

Ah yes, sorry

1594

Hmm I'm not sure. From what I've seen of the pass' output. the kill flags are still valid when the instruction is removed.

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination-across-blocks.mir
188–191

The VPST mask does need to be changed. What else looks funny?

samparker added inline comments.Nov 17 2020, 2:35 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1584

This could be cleaned up by using std::any_of to iterate through the instructions and then also using the hasVPRUse helper too.

samtebbs added inline comments.Nov 17 2020, 3:20 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1593

This becomes

VORR
VPT
VORR vpr
VPST
VORR vpr

And I have added a test case for this.

samtebbs updated this revision to Diff 305717.Nov 17 2020, 3:21 AM

Add another test case and format

samtebbs added inline comments.Nov 17 2020, 3:56 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1584

Oh nice, I didn't know these existed.

samtebbs updated this revision to Diff 305757.Nov 17 2020, 5:42 AM
samtebbs marked 3 inline comments as done.

Use std::any_of and hasVPRUse.

samparker added inline comments.Nov 17 2020, 8:52 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1523–1524

Can this TODO be removed now?

1564

Don't worry about putting it up for review, but before committing would you mind adding a comment about what condition(s) this logic is handling, like there is for the other conditional blocks?

samparker accepted this revision.Nov 18 2020, 2:47 AM

I'm happy if Dave has no other suggestions.

This revision is now accepted and ready to land.Nov 18 2020, 2:47 AM
dmgreen added inline comments.Nov 18 2020, 5:35 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1566

If Next is only use in an assert, be careful that it does not cause a warning in release mode.

1585

Same if this is only used in an assert.

I'm still not 100% sure that this will always hold true with VPNOT or VPSEL (or anything else in the future as things change. They are not supported in TP loops yet but theoretically could be and things like this could easily with how subtle they are).

It would probably be better as an if, to be honest. It's only a minor loss of perf if it does happen, and that way it's safer.

1594

It would be something like:

VPST 4
VLDR $vpr
VCMP q0, q1, $vpr
VORR q2, killed q1, $noreg
VPST 8
VORR ... $vpr

The unpredicated VORR kills the register that the VCMP will still use.

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination-across-blocks.mir
198

This can be MVE_VPST 8 for a single instruction block.

samtebbs added inline comments.Nov 18 2020, 8:01 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1594

Oh I see. I thought that since the VCMP would use q1 on the next iteration it would still count as being alive.