This is an archive of the discontinued LLVM Phabricator instance.

[Target][ARM] Improvements to the VPT Block Insertion Pass
ClosedPublic

Authored by Pierre-vh on Mar 11 2020, 7:53 AM.

Details

Summary

This patch adds a feature to the MVE VPT block insertion patch: It can now remove VPNOT instructions in some circumstances in order to place the instructions predicated by the VPNOT in an else block instead; This results in more compact code.

For example, this pass used to generate this kind of assembly:

vldrw.u32	q1, [r5]
vpt.s32	ge, q1, r2                      ; Added by MVE-VPT block insertion pass
vcmpt.s32	le, q1, r3
vpnot
vpst	                                ; Added by MVE-VPT block insertion pass
vstrwt.32	q0, [r5], #16

Now, when the VPNOT's result (stored in VPR) is not needed, and when the above block has enough room, it'll generate this instead:

vldrw.u32	q1, [r5]
vpte.s32	ge, q1, r2                     ; Added by MVE-VPT block insertion pass - Notice the "te" instead of "t"
vcmpt.s32	le, q1, r3
vstrwe.32	q0, [r5], #16       ; "t" changed to "e", indicating that this instruction is now part of the "else"

This is much shorter and should be more efficient, and the pass will only remove the VPNOT when VPR is used+killed by one instruction or when VPR is written to in its block.
This should be enough to avoid losing the result of a VPNOT if it's needed, however I'm open to suggestions as I don't actually know if it's good enough.

Diff Detail

Event Timeline

Pierre-vh created this revision.Mar 11 2020, 7:53 AM
Pierre-vh edited the summary of this revision. (Show Details)Mar 11 2020, 8:01 AM
Pierre-vh edited the summary of this revision. (Show Details)Mar 11 2020, 8:20 AM
Pierre-vh updated this revision to Diff 249632.Mar 11 2020, 8:26 AM

Running clang-format on the file.

(My IDE was using its own settings instead of clang-format, this is why the style was incorrect)

Good to see this getting updated.

Does this handle predicates like TET/TEET/etc, where we go back to a Then predicate? Any instruction that defs the VPR should reset us back to a Then (so a VCMP will reset the predicate to Then for subsequent instructions in the block). Obviously a VPNOT can flip back to Then too.

Can you make sure we have tests for those situations? Best bet nowadays would probably be to use instrinsics to create some interesting testcases.

Pierre-vh added a comment.EditedMar 12 2020, 2:34 AM

Good to see this getting updated.

Does this handle predicates like TET/TEET/etc, where we go back to a Then predicate? Any instruction that defs the VPR should reset us back to a Then (so a VCMP will reset the predicate to Then for subsequent instructions in the block). Obviously a VPNOT can flip back to Then too.

Can you make sure we have tests for those situations? Best bet nowadays would probably be to use instrinsics to create some interesting testcases.

This was not a goal for this initial patch, I just focused on handling else blocks.
However, this is certainly something I can do if you want, but it should it be part of this patch or another patch?

Pierre-vh updated this revision to Diff 249877.Mar 12 2020, 3:13 AM

Updated the diff to fix a bug: The pass refused to generate vpsteee blocks because the condition in SkipPredicatedInstrs was wrong (It needed to do one more iteration when MaxSkips == 0, and then stop).

The pass can now correctly generate this kind of assembly:

	vpsteee	
	vcmpt.s32	le, q1, r12
	vcmpe.s32	lt, q1, r3
	vstrwe.32	q0, [r6], #16
	vstrwe.32	q0, [r4], #16
Pierre-vh planned changes to this revision.Mar 12 2020, 7:42 AM

This revision will need significant changes as we found a few issues with it.

Pierre-vh updated this revision to Diff 250230.Mar 13 2020, 9:14 AM

Here is the refactored version of this patch. I tried to make the structure a bit cleaner and easier to work on.

  • Made the function that expands the block masks more generic, it can now add one or more T/Es to any mask.
  • Split the code into multiple functions

Note that I still need to add tests for the new supported cases, I'll try to add them later today or on Monday.

Pierre-vh planned changes to this revision.Mar 16 2020, 3:47 AM

Once again it needs a bit of work before being ready for review. I need to add support for TETE masks and add a few tests for this improved pass.

Pierre-vh updated this revision to Diff 250534.Mar 16 2020, 5:37 AM
  • Added support for TETE masks to the pass
  • Added multiple MIR tests for every possible mask that can be generated by the pass
  • Added a few LLVM IR tests, unfortunately this does not cover every possible cases as spilling prevents other masks from being generated. I'll look into how this can be improved as a separate patch.

I think this should be ready for review.

Pierre-vh updated this revision to Diff 250704.Mar 17 2020, 2:15 AM
Pierre-vh marked an inline comment as done.Mar 17 2020, 2:26 AM
Pierre-vh added inline comments.
llvm/lib/Target/ARM/MVEVPTBlockPass.cpp
319

This shouldn't be here and has been fixed locally.
However, I won't send a new diff just for a single line, so I'll push it when I have more meaningful changes to accompany it.

Pierre-vh updated this revision to Diff 251024.Mar 18 2020, 2:58 AM

Another patch to fix a bug - the predicate was not switching back to T on the second VPNOT found.

Pierre-vh updated this revision to Diff 251963.Mar 23 2020, 1:58 AM
  • The BlockMask enum is now an enum-class, so I had to make some changes to the pass.
dmgreen added inline comments.Mar 23 2020, 4:07 AM
llvm/lib/Target/ARM/MVEVPTBlockPass.cpp
97

Is this only called with a Count of 1 now? If so, can it be simplified.

101

Add a message for the assert.

134

Maybe name this "AddPredicatedInstruction", as we conceptually "adding" them to the block, as opposed to skipping them.

161

Can this use killsRegister?

177

Add a message to the assert.

201

Was VPTThenInstCnt renamed to something else? BlockSize?

248–251

Maybe just use:

CurrentPredicate = CurrentPredicate == ARMVCC::Then ? ARMVCC::Else : ARMVCC::Then;
260–263

This doesn't look like it needed to change? Unless the line after is too long too?

270–271

Why use a uint64_t?

Pierre-vh marked 11 inline comments as done.Mar 23 2020, 4:44 AM

(I've marked some comments as Done because I changed those locally, I'll update the diff once the review is complete (2 comments left))

llvm/lib/Target/ARM/MVEVPTBlockPass.cpp
134

I named the function "Skip" because it advances the iterator (it skips the predicated instruction in front of the iterator), this function doesn't really "care" about the block as a whole, so for me calling it "AddPredicatedInstructions" doesn't make sense.

201

It's indeed "BlockSize" now, I'll change this comment.

260–263

You are right, it doesn't need to change. It was probably changed by mistake by me or clang-format.

270–271

It's the type of the argument of addImm below.
Of course I can also use unsigned if you prefer, or I can change this to ARM::PredBlockMask and convert to uint64_t when calling addImm.

dmgreen added inline comments.Mar 24 2020, 4:26 AM
llvm/lib/Target/ARM/MVEVPTBlockPass.cpp
134

Hmm. "Skip" implied to me that we would not end up doing anything with these instructions. Like we would skip instructions between VPT blocks that are not predicated. We just leave them alone.

How about "StepOver" instead?

270–271

unsigned is probably fine, it will fit in either case. If the printing is still simple, PredBlockMask would probably be best. Up to you.

Pierre-vh marked 6 inline comments as done.Mar 24 2020, 5:05 AM
Pierre-vh added inline comments.
llvm/lib/Target/ARM/MVEVPTBlockPass.cpp
134

StepOverPredicatedInstrs is good I think. I'll change it.

270–271

I'll do something like this then:

ARM::PredBlockMask BlockMask = CreateVPTBlock(MBIter, EndIter, DeadInstructions);
// ...
      MIBuilder.addImm((uint64_t)BlockMask);
// ...
      MIBuilder.addImm((uint64_t)BlockMask);
Pierre-vh updated this revision to Diff 252336.Mar 24 2020, 8:31 AM
Pierre-vh marked 2 inline comments as done.
dmgreen accepted this revision.Mar 26 2020, 8:49 AM

LGTM

llvm/lib/Target/ARM/MVEVPTBlockPass.cpp
230

Find the register -> Find the operand holding the predicate

This revision is now accepted and ready to land.Mar 26 2020, 8:49 AM
Pierre-vh updated this revision to Diff 254145.Apr 1 2020, 3:14 AM

The patch had already been approved, but needed to be rebased.
Only llvm/test/CodeGen/Thumb2/mve-pred-threshold.ll caused a merge conflict and needed to be re-updated with utils/update_llc_test_checks.py, so it has changed a bit.

Pierre-vh requested review of this revision.Apr 1 2020, 3:15 AM

Please re-review the llvm/test/CodeGen/Thumb2/mve-pred-threshold.ll. Everything else should be good as they didn't change, they were only rebased.

dmgreen accepted this revision.Apr 1 2020, 3:21 AM

Oh yeah. Because we are now sinking float splats. Still LGTM.

This revision is now accepted and ready to land.Apr 1 2020, 3:21 AM
This revision was automatically updated to reflect the committed changes.