This is an archive of the discontinued LLVM Phabricator instance.

Skip bitcasts while looking for GEP in LoadStoreVectorizer
ClosedPublic

Authored by rampitec on Apr 14 2017, 4:15 PM.

Details

Reviewers
arsenm
vpykhtin

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.Apr 14 2017, 4:15 PM
arsenm edited edge metadata.Apr 14 2017, 4:25 PM

I don't think this should be needed? Is this representative of the case where you saw this as a problem? If I run this testcase through-separate-const-offset-from-gep first it handles it. This and other cases are expected to be cleaned up by that first

I don't think this should be needed? Is this representative of the case where you saw this as a problem? If I run this testcase through-separate-const-offset-from-gep first it handles it. This and other cases are expected to be cleaned up by that first

It is quite representative. Here is the real piece of code from app, right as it comes to the vectorizer:

%97 = zext i32 %96 to i64
%98 = getelementptr inbounds float, float addrspace(1)* %1, i64 %97
%99 = bitcast float addrspace(1)* %98 to i32 addrspace(1)*
%100 = load i32, i32 addrspace(1)* %99, align 4, !tbaa !10
%101 = add i32 %96, 1
%102 = zext i32 %101 to i64
%103 = getelementptr inbounds float, float addrspace(1)* %1, i64 %102
%104 = bitcast float addrspace(1)* %103 to i32 addrspace(1)*
%105 = load i32, i32 addrspace(1)* %104, align 4, !tbaa !10
%106 = add i32 %96, 2
%107 = zext i32 %106 to i64
%108 = getelementptr inbounds float, float addrspace(1)* %1, i64 %107
%109 = bitcast float addrspace(1)* %108 to i32 addrspace(1)*
%110 = load i32, i32 addrspace(1)* %109, align 4, !tbaa !10
%111 = add i32 %96, 3
%112 = zext i32 %111 to i64
%113 = getelementptr inbounds float, float addrspace(1)* %1, i64 %112
%114 = bitcast float addrspace(1)* %113 to i32 addrspace(1)*
%115 = load i32, i32 addrspace(1)* %114, align 4, !tbaa !10

I don't think this should be needed? Is this representative of the case where you saw this as a problem? If I run this testcase through-separate-const-offset-from-gep first it handles it. This and other cases are expected to be cleaned up by that first

It is quite representative. Here is the real piece of code from app, right as it comes to the vectorizer:

%97 = zext i32 %96 to i64
%98 = getelementptr inbounds float, float addrspace(1)* %1, i64 %97
%99 = bitcast float addrspace(1)* %98 to i32 addrspace(1)*
%100 = load i32, i32 addrspace(1)* %99, align 4, !tbaa !10
%101 = add i32 %96, 1
%102 = zext i32 %101 to i64
%103 = getelementptr inbounds float, float addrspace(1)* %1, i64 %102
%104 = bitcast float addrspace(1)* %103 to i32 addrspace(1)*
%105 = load i32, i32 addrspace(1)* %104, align 4, !tbaa !10
%106 = add i32 %96, 2
%107 = zext i32 %106 to i64
%108 = getelementptr inbounds float, float addrspace(1)* %1, i64 %107
%109 = bitcast float addrspace(1)* %108 to i32 addrspace(1)*
%110 = load i32, i32 addrspace(1)* %109, align 4, !tbaa !10
%111 = add i32 %96, 3
%112 = zext i32 %111 to i64
%113 = getelementptr inbounds float, float addrspace(1)* %1, i64 %112
%114 = bitcast float addrspace(1)* %113 to i32 addrspace(1)*
%115 = load i32, i32 addrspace(1)* %114, align 4, !tbaa !10

What does it look like immediately after SeparateConstOffsetFromGEP? Does one of the other passes break this somehow?

What does it look like immediately after SeparateConstOffsetFromGEP? Does one of the other passes break this somehow?

This is after SeparateConstOffsetFromGEP:

%408 = zext i32 %407 to i64
%409 = getelementptr inbounds float, float addrspace(1)* %1, i64 %408
%410 = bitcast float addrspace(1)* %409 to i32 addrspace(1)*
%411 = load i32, i32 addrspace(1)* %410, align 4, !tbaa !10
%412 = or i32 %407, 1
%413 = zext i32 %412 to i64
%414 = getelementptr inbounds float, float addrspace(1)* %1, i64 %413
%415 = bitcast float addrspace(1)* %414 to i32 addrspace(1)*
%416 = load i32, i32 addrspace(1)* %415, align 4, !tbaa !10
%417 = or i32 %407, 2
%418 = zext i32 %417 to i64
%419 = getelementptr inbounds float, float addrspace(1)* %1, i64 %418
%420 = bitcast float addrspace(1)* %419 to i32 addrspace(1)*
%421 = load i32, i32 addrspace(1)* %420, align 4, !tbaa !10
%422 = or i32 %407, 3
%423 = zext i32 %422 to i64
%424 = getelementptr inbounds float, float addrspace(1)* %1, i64 %423
%425 = bitcast float addrspace(1)* %424 to i32 addrspace(1)*
%426 = load i32, i32 addrspace(1)* %425, align 4, !tbaa !10

Bitcasts are there plus we have or's instead of adds.

rampitec updated this revision to Diff 95370.Apr 14 2017, 6:06 PM

Replaced code with standard stripPointerCasts().

Can you add a test with bitcasts between pointers with different element types? I thought the or problem was also supposed to be solved

Can you add a test with bitcasts between pointers with different element types? I thought the or problem was also supposed to be solved

Matt, what do you exactly mean by pointers with different element types? They are different in this test, float vs i32.

Can you add a test with bitcasts between pointers with different element types? I thought the or problem was also supposed to be solved

Matt, what do you exactly mean by pointers with different element types? They are different in this test, float vs i32.

I meant type sizes. float and i32 are both 4 bytes, I could see something going wrong if later code relied on this assumption if the source type were i8 for example

rampitec updated this revision to Diff 95633.Apr 18 2017, 2:27 PM

Added check for pointee size.

vpykhtin edited edge metadata.Apr 25 2017, 5:37 AM

LGTM.

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
295

May be turn it into a helper function?

vpykhtin accepted this revision.Apr 25 2017, 5:37 AM
This revision is now accepted and ready to land.Apr 25 2017, 5:37 AM
rampitec updated this revision to Diff 96587.Apr 25 2017, 10:04 AM
rampitec marked an inline comment as done.

Factored out getSourceGEP() function as per review.