This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Switch to using get.active.lane.mask when tail folding
ClosedPublic

Authored by reames on Jul 6 2022, 12:34 PM.

Details

Summary

The motivation here is to a) bring us closer into alignment with AArch64 under the assumption that codepath is better tested, and b) simplify pattern matching in an upcoming change.

The immediate impact is a significant IR reduction but a fairly minimal change in the generated assembly. Due to a difference in expansion behavior we get a saturating add vs an unsaturating one for the old code, but that's about it. This difference comes down to different handling of overflow, which doesn't seem to be possible here anyways, so the assembly codegen is arguably a minor regression. I don't expect that to matter in practice.

Diff Detail

Event Timeline

reames created this revision.Jul 6 2022, 12:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2022, 12:34 PM
reames requested review of this revision.Jul 6 2022, 12:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2022, 12:34 PM

I don't see a vector IV being killed off in any of the changed tests. Am I missing something?

reames edited the summary of this revision. (Show Details)Jul 7 2022, 8:41 AM

I don't see a vector IV being killed off in any of the changed tests. Am I missing something?

No, you're completely right, I got myself confused. The vectorizer internally represents this with a test on a newly introduced vector IV, but then later generation handling (still in the vectorizer) converts that back into a use of the scalar one + a broadcast. So the actual code generated doesn't involve the vector IV at all.

craig.topper accepted this revision.Jul 7 2022, 1:51 PM

I don't see a vector IV being killed off in any of the changed tests. Am I missing something?

No, you're completely right, I got myself confused. The vectorizer internally represents this with a test on a newly introduced vector IV, but then later generation handling (still in the vectorizer) converts that back into a use of the scalar one + a broadcast. So the actual code generated doesn't involve the vector IV at all.

Thanks for clarifying. LGTM

This revision is now accepted and ready to land.Jul 7 2022, 1:51 PM
This revision was landed with ongoing or failed builds.Jul 8 2022, 10:25 AM
This revision was automatically updated to reflect the committed changes.