This is an archive of the discontinued LLVM Phabricator instance.

[ARM][MVE] Cleanup tail-predicated loop
ClosedPublic

Authored by samparker on Sep 18 2019, 7:13 AM.

Details

Summary

Remove any predicate that we replace with a vctp intrinsic, and try to remove their operands too. Also, look into the exit block to see if there's any duplicates of the predicates that we've replaced.

Diff Detail

Event Timeline

samparker created this revision.Sep 18 2019, 7:13 AM
samparker updated this revision to Diff 220839.Sep 19 2019, 4:56 AM

Now cloning VCTP in the exit block if needed. The reason being that if we create a real TP loop, then VPR will not hold the predicate.

samparker updated this revision to Diff 220848.Sep 19 2019, 6:04 AM

Added 'hasSideEffects' to VCTP because machine cse did a good job of removing the duplicate vctp in the exit block. I think always executing the 'extra' instruction outweighs the risk of having to spill/reload the vpr or not being able to perform the proper tail predication.

samparker updated this revision to Diff 220989.Sep 20 2019, 2:48 AM

Added a couple more tests.

samparker updated this revision to Diff 220992.Sep 20 2019, 3:13 AM

More tests.

Just curious, if the input is already a tail-predicated loop (e.g. coming from intrinsics), would it be possible we incorrectly start removing/inserting things?

lib/Target/ARM/MVETailPredication.cpp
459

I wouldn't mind if this is done in a helper function as TryConvert is becoming a pretty big function.

462

Do we need the extra MaybeDead bookkeeping? Can we just use Predicates?

samparker marked an inline comment as done.Sep 20 2019, 3:44 AM

So the pass shouldn't attempt to convert a loop that contains any vector intrinsics, other than masked.load and masked.store. So, we will actually need to do extra work for this to operate with MVE intrinsics. Even once we've done that, the pass only tries to remove the old icmp predicates, which we pattern match. If the user defined predicates match those that the vectorizer outputs, then there's no reason why we won't perform this transform. I would say that I'd add a test, but we don't have intrinsic support yet...

lib/Target/ARM/MVETailPredication.cpp
462

I'll take a look.

So the pass shouldn't attempt to convert a loop that contains any vector intrinsics, other than masked.load and masked.store. So, we will actually need to do extra work for this to operate with MVE intrinsics. Even once we've done that, the pass only tries to remove the old icmp predicates, which we pattern match. If the user defined predicates match those that the vectorizer outputs, then there's no reason why we won't perform this transform. I would say that I'd add a test, but we don't have intrinsic support yet...

Excellent, thanks for the reminder/clarification.

samparker updated this revision to Diff 220994.Sep 20 2019, 4:05 AM

Created Cleanup function.

samparker updated this revision to Diff 221026.Sep 20 2019, 7:10 AM

Added a few more tests.

SjoerdMeijer accepted this revision.Sep 23 2019, 2:10 AM

Thanks, looks good to me.

Nitpick/bikeshedding the title:

Cleanup tail-predicated loop

This is accurate when you know what this is doing, but one could get the wrong impression, like this is a NFC patch. Perhaps something like:

"Finalise tail-predicated loop, remove duplicate predicates"

Or something along those lines.

lib/Target/ARM/MVETailPredication.cpp
430

nit: don't need the brackets in this case (I just noticed because I started looking to see which blocks it closes as it is a bit difficult to see sometimes in phabricator).

This revision is now accepted and ready to land.Sep 23 2019, 2:10 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2019, 2:48 AM