This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Allow vecreduce_add in tail predicated loops
ClosedPublic

Authored by dmgreen on Aug 6 2020, 11:12 AM.

Details

Summary

I was temped to try to change the MVETailPredication pass to a simpler "if loop has a activelanemask, convert it to a vctp" pass. Here I've just left it as is, but allowed vector reductions through into tail predicated loops.

Diff Detail

Event Timeline

dmgreen created this revision.Aug 6 2020, 11:12 AM
dmgreen requested review of this revision.Aug 6 2020, 11:12 AM
samparker accepted this revision.Aug 7 2020, 1:14 AM

CodeGen looks a little better! LGTM

This revision is now accepted and ready to land.Aug 7 2020, 1:14 AM

Just a remark on this:

I was temped to try to change the MVETailPredication pass to a simpler "if loop has a activelanemask, convert it to a vctp" pass.

After the rewrite of this pass to use activelanemask, that's essentially all there's left here, so I wouldn't see the benefit of that (unless I miss something). But anyway, what Sam said, nice codegen changes.

After the rewrite of this pass to use activelanemask, that's essentially all there's left here

Yep, that's the idea, but taking it further. The pass currently starts off by finding set_loop_iterations hardware loop intrinsics, then searches through for masked load/stores (but doesn't check for normal loads/stores) and then can get held up by other intrinsics like the reductions fixed here. Not much of that is really beneficial and the whole pass could simply be converting active lane masks to vctp's. As in I think that is often better even if you don't produce a tail predicated loop at the end of it.

Although I'm not sure how much it would help in the grand scheme of things. If we are in the state where we have a active lane mask but don't for whatever reason produce a tail predicated loop, the codegen is going to be pretty bad either way.

This revision was automatically updated to reflect the committed changes.