This is an archive of the discontinued LLVM Phabricator instance.

[LV] Use speculatability within entire loop to avoid strided load predication
ClosedPublic

Authored by anna on Mar 8 2023, 2:26 PM.

Details

Summary

Use existing functionality for identifying total access size by strided
loads. If we can speculate the load across all vector iterations, we can
avoid predication for these strided loads (or masked gathers in
architectures which support it).

Diff Detail

Event Timeline

anna created this revision.Mar 8 2023, 2:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2023, 2:26 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
anna requested review of this revision.Mar 8 2023, 2:26 PM
anna updated this revision to Diff 503790.Mar 9 2023, 8:30 AM

updated tests

anna edited the summary of this revision. (Show Details)Mar 9 2023, 8:33 AM
anna added reviewers: reames, fhahn.
anna added a comment.Mar 9 2023, 9:36 AM

AFAICT, the only thing which prevented us from figuring out dereferencability for strided loads (i.e. accesses with gaps) was identifying the correct AccessSize. So, that's basically the patch.

llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll
1023 ↗(On Diff #503790)

One thing I noticed is that we drop the inbounds on GEPs when we converted the masked loads to unmasked versions (perhaps because we cannot prove if the inbounds is correct without the predication?). We do not do the same "dropping of inbounds" when we removed predication for the strided case. Any idea why is that? It looks like we should be dropping on the strided case, but I don't know the LV code well enough to see where this is done and what is missing.

reames added inline comments.Mar 9 2023, 9:44 AM
llvm/lib/Analysis/Loads.cpp
294–304

Ignore is confusing here. "ignore" sounds like we might have a latent correctness issue here. What I think you mean is that we're being conservative on overlapping accesses.

Also, your TODO doesn't sound right to me. You'd want something along the lines of TC * Step + EltSize - Step.

anna added inline comments.Mar 9 2023, 10:02 AM
llvm/lib/Analysis/Loads.cpp
294–304

Good catch. TC * max(Step, EltSize) gets extra bytes without accounting for overlapping access.

anna updated this revision to Diff 505486.Mar 15 2023, 7:43 AM
anna edited the summary of this revision. (Show Details)

updated comment.

anna marked an inline comment as done.Mar 15 2023, 7:43 AM
reames accepted this revision.Mar 15 2023, 2:41 PM

LGTM w/minor comment

llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll
1158 ↗(On Diff #505486)

Remove TODO

This revision is now accepted and ready to land.Mar 15 2023, 2:41 PM
fhahn accepted this revision.Mar 20 2023, 12:36 AM

LGTM with the TODO in the test removed, thanks!

anna marked an inline comment as done.Mar 21 2023, 7:32 AM
anna added inline comments.
llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll
1023 ↗(On Diff #503790)

Just to loop back on this:
I did some digging into history of where this inbounds drop was introduced. It was here: https://reviews.llvm.org/D111846. Also, there is a specific comment stating we do not need to drop inbounds (and other poison generating flags) when the original instructions are gather/scatter. If backends convert the gather/scatter into use "base + offsets", those backends need fixing (just paraphrasing from the comment here: https://reviews.llvm.org/D111846#3098547).

So, I'll go ahead and land this change.

anna updated this revision to Diff 506975.Mar 21 2023, 7:41 AM

removed todo

This revision was landed with ongoing or failed builds.Mar 21 2023, 9:08 AM
This revision was automatically updated to reflect the committed changes.