This is an archive of the discontinued LLVM Phabricator instance.

Relax load store vectorizer pointer strip checks
ClosedPublic

Authored by rampitec on Aug 1 2019, 12:15 PM.

Details

Summary

The previous change to fix crash in the vectorizer introduced
performance regressions. The condition to preserve pointer
address space during the search is too tight, we only need to
match the size.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.Aug 1 2019, 12:15 PM
arsenm added inline comments.Aug 1 2019, 12:49 PM
test/Transforms/LoadStoreVectorizer/AMDGPU/vect-ptr-ptr-size-mismatch.ll
29–30 ↗(On Diff #212880)

It would probably be a more useful test to use an address space that will actually be vectorized (i.e. use global instead of flat)

rampitec marked an inline comment as done.Aug 1 2019, 12:55 PM
rampitec added inline comments.
test/Transforms/LoadStoreVectorizer/AMDGPU/vect-ptr-ptr-size-mismatch.ll
29–30 ↗(On Diff #212880)

It is neither flat or global, I have removed all references to AMD here, there is no triple and no calling convention, only data layout string.

tra added a comment.Aug 1 2019, 1:42 PM

It looks to me that the root cause is in

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
347–348 ↗(On Diff #212880)

If we were to use addrspacecast(1) instead of 5 in the example below, we'd proceed with the checks.
If we can deal with non-generic address spaces in principle, why can't we deal with address spaces that differ in pointer size? I'd assume that logic that determines consecutiveness should work the same.

I guess one way to handle mismatched address spaces would be to normalize the pointer to the common address space (generic?) and then run the checks for the consecutiveness.

test/Transforms/LoadStoreVectorizer/AMDGPU/vect-ptr-ptr-size-mismatch.ll
3 ↗(On Diff #212880)

It would be good to add a comment that p5:32:32 is the critical part here as that's what causes the original problem.

rampitec updated this revision to Diff 212900.Aug 1 2019, 1:56 PM
rampitec marked 2 inline comments as done.

Test update.

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
347–348 ↗(On Diff #212880)

In the meanwhile I am working on the followup patch to handle pointer size differences in graceful way. That requires more code and time and time though.

tra added inline comments.Aug 1 2019, 2:30 PM
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
347–348 ↗(On Diff #212880)

The reason SCEV is unhappy is that we're asking it to do something with different types. Bailing out early will avoid the problem, but it's still too conservative, IMO. We may still be able to do useful optimizations with such pointers.
Perhaps something like this might keep things working where possible until we have a better way: https://reviews.llvm.org/differential/diff/212903/

rampitec marked an inline comment as done.Aug 1 2019, 2:35 PM
rampitec added inline comments.
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
347–348 ↗(On Diff #212880)

Look, as i said I am working on the more relaxed patch to handle it. I am just trying to think about the corner cases, for example like this:

target datalayout = "e-p:64:64-p1:64:64-p5:32:32"

define amdgpu_kernel void @ext_ptr_overflow_unsigned(i32 addrspace(5)* %p) {
entry:
  %gep1 = getelementptr inbounds i32, i32 addrspace(5)* %p, i64 4294967297
  %gep2 = getelementptr inbounds i32, i32 addrspace(5)* %p, i64 4294967298
  %a.ascast = addrspacecast i32 addrspace(5)* %gep1 to i32*
  %b.ascast = addrspacecast i32 addrspace(5)* %gep2 to i32*
  %tmp1 = load i32, i32* %a.ascast, align 8
  %tmp2 = load i32, i32* %b.ascast, align 8
  unreachable
}
tra accepted this revision.Aug 1 2019, 3:03 PM
tra added inline comments.
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
347–348 ↗(On Diff #212880)

Fair enough. Let's land your patch as is to unblock things for now.

This revision is now accepted and ready to land.Aug 1 2019, 3:03 PM
rampitec marked an inline comment as done.Aug 1 2019, 3:03 PM
rampitec added inline comments.
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
347–348 ↗(On Diff #212880)

Looks like stripAndAccumulateInBoundsConstantOffsets() takes care of it, so offset always fits a smallest data type in chain.

This revision was automatically updated to reflect the committed changes.
rampitec marked 3 inline comments as done.
Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2019, 3:19 PM
rampitec marked an inline comment as done.Aug 1 2019, 3:27 PM
rampitec added inline comments.
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
347–348 ↗(On Diff #212880)
arsenm added inline comments.Aug 2 2019, 8:36 AM
llvm/trunk/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
347

Why the store size?

rampitec marked an inline comment as done.Aug 2 2019, 9:49 AM
rampitec added inline comments.
llvm/trunk/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
347

This is memory access, so should be in memory size.