This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorizer] Leverage uniformity across unrolled iterations
AbandonedPublic

Authored by reames on Nov 13 2020, 11:35 AM.

Details

Reviewers
anna
fhahn
greened
Summary

(Note, this extends D91398 and probably won't make any sense unless you've looked at that first.)

When scalarizing a uniform expression, we currently only consider uniformity within a single vector factor. For some expressions, we can exploit the fact that the expression is uniform across all lanes of all vector factors in the unrolling. This patch teaches the VPReplicateRecipe how to achieve this.

After this patch (and the previous one), we can lower a load from a loop invariant address as a single scalar load. (Instead of UF*VF scalar loads and rely on CSE cleaning it up later.)

I'd hoped to exercise this through code paths not involving uniform mem ops, but the cases I tried were mostly covered by existing scalarization logic. I believe this will sometimes trigger with existing code, but have struggled to find a clean example so I made the patch dependent on the uniform memory op work.

Diff Detail

Event Timeline

reames created this revision.Nov 13 2020, 11:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 13 2020, 11:35 AM
reames requested review of this revision.Nov 13 2020, 11:35 AM
lebedev.ri added inline comments.
llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
122

Precommit this?

reames updated this revision to Diff 307228.Nov 23 2020, 5:52 PM

Rebase on landed test as requested.

reames abandoned this revision.Oct 15 2021, 12:00 PM

Abandoning an old review I'm not going to return to any time soon.