Use existing functionality for identifying total access size by strided
loads. If we can speculate the load across all vector iterations, we can
avoid predication for these strided loads (or masked gathers in
architectures which support it).
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
AFAICT, the only thing which prevented us from figuring out dereferencability for strided loads (i.e. accesses with gaps) was identifying the correct AccessSize. So, that's basically the patch.
llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll | ||
---|---|---|
1023 | One thing I noticed is that we drop the inbounds on GEPs when we converted the masked loads to unmasked versions (perhaps because we cannot prove if the inbounds is correct without the predication?). We do not do the same "dropping of inbounds" when we removed predication for the strided case. Any idea why is that? It looks like we should be dropping on the strided case, but I don't know the LV code well enough to see where this is done and what is missing. |
llvm/lib/Analysis/Loads.cpp | ||
---|---|---|
294 | Ignore is confusing here. "ignore" sounds like we might have a latent correctness issue here. What I think you mean is that we're being conservative on overlapping accesses. Also, your TODO doesn't sound right to me. You'd want something along the lines of TC * Step + EltSize - Step. |
llvm/lib/Analysis/Loads.cpp | ||
---|---|---|
294 | Good catch. TC * max(Step, EltSize) gets extra bytes without accounting for overlapping access. |
LGTM w/minor comment
llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll | ||
---|---|---|
1158 | Remove TODO |
llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll | ||
---|---|---|
1023 | Just to loop back on this: So, I'll go ahead and land this change. |
Ignore is confusing here. "ignore" sounds like we might have a latent correctness issue here. What I think you mean is that we're being conservative on overlapping accesses.
Also, your TODO doesn't sound right to me. You'd want something along the lines of TC * Step + EltSize - Step.