This is an archive of the discontinued LLVM Phabricator instance.

[ARM][MVE] canTailPredicateLoop
ClosedPublic

Authored by SjoerdMeijer on Nov 5 2019, 7:31 AM.

Details

Summary

This implements TTI hook canTailPredicateLoop for MVE.
This is a first version, with the following restrictions:

  • only operate on single block loops,
  • only accept integer types, I will follow this up soon to add support for floats.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Nov 5 2019, 7:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2019, 7:31 AM
samparker added inline comments.Nov 5 2019, 7:55 AM
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
1004
  • isHardwareLoopProfitable will already catch these and tries quite hard to get it right.
  • isa<> not dyn_cast.
  • The zext and truncs could likely be folded into memory options, so you're gonna need to check the uses / users.
  • sext would be the same.
1016

If we're checking for casts within the loop, I don't see why this is necessary, because we shouldn't then also have to then check all the types used.

Thanks for taking a look! Will take this on board.

SjoerdMeijer retitled this revision from [ARM][MVE] WIP: canTailPredicateLoop to [ARM][MVE] canTailPredicateLoop.
SjoerdMeijer edited the summary of this revision. (Show Details)
SjoerdMeijer marked 2 inline comments as done.
samparker added inline comments.Nov 7 2019, 7:13 AM
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
1009

... check uses/users.

1018

SmallVectorImpl<Instruction*>

1026

if (Stride != 1) return false

1051

isa

1070

Makes sense to just check the stride here and then we can exit early.

  • strides: now allowing 1 and -1, and added a test case for -1.
  • checking the operand of zext/sext/trunc to see if it is a load/store, and added a test case when trunc is actually allowed.
samparker added inline comments.Nov 7 2019, 11:47 PM
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
1010

You need to check the user of the load, not it's operand. Also the logic is wrong here, we need to accept extending loads and truncating stores.

1050

What is the complication of supporting half and float now?

1064

You don't need FirstStride, the uninitialised Stride holds the same info.

Thanks for checking. Got myself confused about the loads/stores, but I think we indeed want to accept extending loads and narrowing stores, so I've added checks and a bunch of tests:

  • for SEXT/ZEXT, the operand needs to be a load,
  • for TRUNC, the user needs to be a store.

And have also allowed the float types.

added one more FP test case.

Have you checked the current code generation for masked fpext/fptrunc load/stores? I have a strong recollection that the codegen is absolutely awful.

In an ideal world / in the long run, it would be better for the option of whether to predicate to be dependant on the vectorisation factor. I believe we will have cases where we can pick between a slower tail predicated loop or a quicker non-tail predicated loop, and we should be able to pick the quicker one. This implies that it should really be incorporated into the cost mechanism inside the vectoriser.

(Also the current method seems a bit fuzzy, even by vectoriser standards. It seems to essentially be trying to pre-computing what vectorisation factor it thinks will be produced, choosing whether to predicate and hoping that the vectoriser does end up picking the same factor. Without doing something it doesn't expect).

But I think you are correct that this is a sensible way forward, to get something working. We can adjust it as needed, and perhaps it will work OK for almost all cases.

It does need to option to turn it off though. I would expect that should be off by default for the moment until the backend part is in and we have verified that this is working well enough.

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
1065

Can you explain the reasoning for adding -1 strides? I would expect them to become vrev's and vmov's, both of which will be difficult to prove in the backend (if not outright incorrect).

Are you expecting any reverse shuffles to be cancelled out?

  • fptrunc/fpextends: the loops are not that bad actually, but there's some quite terrible codegen around that, so let's indeed reject this for now.
  • stride -1: I think that should be okay, but I do see that complicates things quite a bit for the backend, so let's indeed reject that for now too.
  • this is now enabled/disabled by exisiting option -disable-mve-tail-predication.
dmgreen added inline comments.Nov 11 2019, 10:47 AM
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
1088

Nice one.

samparker accepted this revision.Nov 13 2019, 2:26 AM

LGTM. Something I'll like to see in a follow-up patch would be preventing predication in the presence of fcmps.

This revision is now accepted and ready to land.Nov 13 2019, 2:26 AM

Cheers, will do.

This revision was automatically updated to reflect the committed changes.