Download Raw Diff

Details

Reviewers

volkan
arsenm
bogner

Commits

rGe7d26aceca07: Change the context instruction for computeKnownBits in LoadStoreVectorizer pass

Summary

This change enables cases for which the index value for the first load/store instruction
in a pair could be a function argument. This allows using llvm.assume to provide known
bits information in such cases.

Diff Detail

Event Timeline

wvoquine created this revision.Apr 30 2021, 5:03 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptApr 30 2021, 5:03 PM

wvoquine requested review of this revision.Apr 30 2021, 5:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2021, 5:03 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

wvoquine added inline comments.Apr 30 2021, 5:46 PM

llvm/test/Transforms/LoadStoreVectorizer/X86/vectorize-i8-nested-add.ll
400	I've realized this should be "v2i8" in the test name.

Harbormaster completed remote builds in B102060: Diff 342084.Apr 30 2021, 6:44 PM

wvoquine updated this revision to Diff 342102.Apr 30 2021, 7:16 PM

Harbormaster completed remote builds in B102069: Diff 342102.Apr 30 2021, 8:22 PM

arsenm added inline comments.May 3 2021, 9:59 AM

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
521	How do we know A is the point the vectorized instruction will be inserted, and not B?
521	e.g. can you add some tests with an assume between the two instructions to vectorize?

wvoquine added a subscriber: rtereshin.May 3 2021, 10:31 AM

wvoquine added inline comments.

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
521	By the way, we have discussed some of these points with @rtereshin and @bogner Neither of OpA and OpB is a point of the vectorization. They both dominate the point of vectorization, but there's no information here about which one is closer to it. This change only enables more cases to pass: if OpB is nullptr then it means we have bailed long ago in the function: ( See this part: // At this point A could be a function parameter, i.e. not an instruction Value *ValA = OpA->getOperand(0); OpB = dyn_cast<Instruction>(OpB->getOperand(0)); if (!OpB \|\| ValA->getType() != OpB->getType()) return false; ) I just allow it not to bail if OpA appears to be nullptr. There's no assumption as to which one goes at the point or closer to the point of vectorization. Also without llvm.assume I couldn't differentiate the two contexts in a test. And I added a test with the llvm.assume - ld_v2i8_add_different_contexts. Now, sometimes OpA might be with a better context for the calculation of known bits. In such cases we might fail to vectorize. If llvm.assume's are inconsistent in the two points then this ought to be an UB. For an improvement in the future we might need to use the actual vectorization point as the context.

wvoquine added inline comments.May 3 2021, 10:43 AM

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
521	Don't you think ld_v2i8_add_different_contexts would count as one, or do you prefer a test without a control flow in it?
521	There's no assumption as to which one goes at the point or closer to the point of vectorization. Having said that, if a case is a result of unrolling, in many such occasions, OpB indeed naturally goes after OpA. And that means the context is more precise (the contexts shouldn't be inconsistent).

wvoquine marked 2 inline comments as done.May 3 2021, 11:02 AM

arsenm added inline comments.May 4 2021, 10:34 AM

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
521	ld_v2i8_add_different_contexts does not have an assume between the two loads, so I don't mean that

wvoquine added inline comments.May 6 2021, 12:03 PM

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
521	I see some unexpected results when I place llvm.assume between loads, or just before the first load. I will have to debug a little bit, and will likely update the change. A safer version could be to still use OpA if it's not nullptr and OpB if it is.

wvoquine added inline comments.May 6 2021, 7:28 PM

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
521	In fact llvm.assume itself did block the transformation as it was considered to be writing to the memory. Similar to other passes we can ignore assume in this case though.

Added tests where the assume intrinsic is placed between the vectorized instructions or even after them in the end of the basic block.

Added assume intrinsic to the list of ignored instructions when detecting whether it's legal to vectorize over an instruction which may read or write to memory.

Update the test ld_v2i8_add_context to align it with the comment put for the test: the llvm.assume should be placed between the two loads.

wvoquine added inline comments.May 6 2021, 9:02 PM

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
521	Now the tests `ld_v2i8_add_different_contexts1` and `ld_v2i8_add_context` have llvm.assume between the vectorized loads. Also in `ld_v2i8_add_context1` the llvm.assume goes after both and a store coming after that, it still works with the single basic block.

Harbormaster completed remote builds in B103124: Diff 343573.May 6 2021, 9:29 PM

wvoquine added inline comments.May 10 2021, 4:33 PM

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
521	I've added the tests along with the patch to let them show a difference (ignore llvm.assume itself when checking on memory conflicts between the two loads). Are there more concerns?

arsenm accepted this revision.May 11 2021, 5:19 PM

This revision is now accepted and ready to land.May 11 2021, 5:19 PM

Thanks for accepting!

Could someone merge the change?

lg. I'll merge this for you shortly

xbolva00 added a subscriber: xbolva00.May 12 2021, 3:26 PM

xbolva00 added inline comments.

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
678	isa<AssumeInst>

Closed by commit rGe7d26aceca07: Change the context instruction for computeKnownBits in LoadStoreVectorizer pass (authored by bogner). · Explain WhyMay 12 2021, 3:30 PM

This revision was automatically updated to reflect the committed changes.

bogner added a commit: rGe7d26aceca07: Change the context instruction for computeKnownBits in LoadStoreVectorizer pass.

wvoquine added inline comments.May 12 2021, 3:31 PM

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
678	Thanks! This could be addressed later on with the other intrinsics where that is applicable.

Diff 342084

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

Show First 20 Lines • Show All 508 Lines • ▼ Show 20 Lines	bool Vectorizer::lookThroughComplexAddresses(Value PtrA, Value PtrB,

unsigned BitWidth = ValA->getType()->getScalarSizeInBits();		unsigned BitWidth = ValA->getType()->getScalarSizeInBits();

// Third attempt:		// Third attempt:
// If all set bits of IdxDiff or any higher order bit other than the sign bit		// If all set bits of IdxDiff or any higher order bit other than the sign bit
// are known to be zero in ValA, we can add Diff to it while guaranteeing no		// are known to be zero in ValA, we can add Diff to it while guaranteeing no
// overflow of any sort.		// overflow of any sort.
if (!Safe) {		if (!Safe) {
OpA = dyn_cast<Instruction>(ValA);
if (!OpA)
return false;
KnownBits Known(BitWidth);		KnownBits Known(BitWidth);
computeKnownBits(OpA, Known, DL, 0, &AC, OpA, &DT);		computeKnownBits(ValA, Known, DL, 0, &AC, OpB, &DT);
arsenmUnsubmitted Done Reply Inline Actions How do we know A is the point the vectorized instruction will be inserted, and not B? arsenm: How do we know A is the point the vectorized instruction will be inserted, and not B?
arsenmUnsubmitted Done Reply Inline Actions e.g. can you add some tests with an assume between the two instructions to vectorize? arsenm: e.g. can you add some tests with an assume between the two instructions to vectorize?
wvoquineAuthorUnsubmitted Done Reply Inline Actions Don't you think ld_v2i8_add_different_contexts would count as one, or do you prefer a test without a control flow in it? wvoquine: Don't you think ld_v2i8_add_different_contexts would count as one, or do you prefer a test…
arsenmUnsubmitted Done Reply Inline Actions ld_v2i8_add_different_contexts does not have an assume between the two loads, so I don't mean that arsenm: ld_v2i8_add_different_contexts does not have an assume between the two loads, so I don't mean…
wvoquineAuthorUnsubmitted Done Reply Inline Actions I see some unexpected results when I place llvm.assume between loads, or just before the first load. I will have to debug a little bit, and will likely update the change. A safer version could be to still use OpA if it's not nullptr and OpB if it is. wvoquine: I see some unexpected results when I place llvm.assume between loads, or just before the first…
wvoquineAuthorUnsubmitted Done Reply Inline Actions In fact llvm.assume itself did block the transformation as it was considered to be writing to the memory. Similar to other passes we can ignore assume in this case though. wvoquine: In fact llvm.assume itself did block the transformation as it was considered to be writing to…
wvoquineAuthorUnsubmitted Done Reply Inline Actions Now the tests `ld_v2i8_add_different_contexts1` and `ld_v2i8_add_context` have llvm.assume between the vectorized loads. Also in `ld_v2i8_add_context1` the llvm.assume goes after both and a store coming after that, it still works with the single basic block. wvoquine: Now the tests `ld_v2i8_add_different_contexts1` and `ld_v2i8_add_context` have llvm.assume…
wvoquineAuthorUnsubmitted Done Reply Inline Actions I've added the tests along with the patch to let them show a difference (ignore llvm.assume itself when checking on memory conflicts between the two loads). Are there more concerns? wvoquine: I've added the tests along with the patch to let them show a difference (ignore llvm.assume…
wvoquineAuthorUnsubmitted Done Reply Inline Actions By the way, we have discussed some of these points with @rtereshin and @bogner Neither of OpA and OpB is a point of the vectorization. They both dominate the point of vectorization, but there's no information here about which one is closer to it. This change only enables more cases to pass: if OpB is nullptr then it means we have bailed long ago in the function: ( See this part: // At this point A could be a function parameter, i.e. not an instruction Value ValA = OpA->getOperand(0); OpB = dyn_cast<Instruction>(OpB->getOperand(0)); if (!OpB \|\| ValA->getType() != OpB->getType()) return false; ) I just allow it not to bail if OpA appears to be nullptr. There's no assumption as to which one goes at the point or closer to the point of vectorization. Also without llvm.assume I couldn't differentiate the two contexts in a test. And I added a test with the llvm.assume - ld_v2i8_add_different_contexts. Now, sometimes OpA might be with a better context for the calculation of known bits. In such cases we might fail to vectorize. If llvm.assume's are inconsistent in the two points then this ought to be an UB. For an improvement in the future we might need to use the actual vectorization point as the context. wvoquine:* By the way, we have discussed some of these points with @rtereshin and @bogner Neither of OpA…
wvoquineAuthorUnsubmitted Done Reply Inline Actions There's no assumption as to which one goes at the point or closer to the point of vectorization. Having said that, if a case is a result of unrolling, in many such occasions, OpB indeed naturally goes after OpA. And that means the context is more precise (the contexts shouldn't be inconsistent). wvoquine: > There's no assumption as to which one goes at the point or closer to the point of…
APInt BitsAllowedToBeSet = Known.Zero.zext(IdxDiff.getBitWidth());		APInt BitsAllowedToBeSet = Known.Zero.zext(IdxDiff.getBitWidth());
if (Signed)		if (Signed)
BitsAllowedToBeSet.clearBit(BitWidth - 1);		BitsAllowedToBeSet.clearBit(BitWidth - 1);
if (BitsAllowedToBeSet.ult(IdxDiff))		if (BitsAllowedToBeSet.ult(IdxDiff))
return false;		return false;
}		}

const SCEV *OffsetSCEVA = SE.getSCEV(ValA);		const SCEV *OffsetSCEVA = SE.getSCEV(ValA);
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	for (Instruction &I : make_range(getBoundaryInstrs(Chain))) {
} else if (isa<IntrinsicInst>(&I) &&		} else if (isa<IntrinsicInst>(&I) &&
cast<IntrinsicInst>(&I)->getIntrinsicID() ==		cast<IntrinsicInst>(&I)->getIntrinsicID() ==
Intrinsic::sideeffect) {		Intrinsic::sideeffect) {
// Ignore llvm.sideeffect calls.		// Ignore llvm.sideeffect calls.
} else if (isa<IntrinsicInst>(&I) &&		} else if (isa<IntrinsicInst>(&I) &&
cast<IntrinsicInst>(&I)->getIntrinsicID() ==		cast<IntrinsicInst>(&I)->getIntrinsicID() ==
Intrinsic::pseudoprobe) {		Intrinsic::pseudoprobe) {
// Ignore llvm.pseudoprobe calls.		// Ignore llvm.pseudoprobe calls.
} else if (IsLoadChain && (I.mayWriteToMemory() \|\| I.mayThrow())) {		} else if (IsLoadChain && (I.mayWriteToMemory() \|\| I.mayThrow())) {
		xbolva00Unsubmitted Not Done Reply Inline Actions isa<AssumeInst> xbolva00: isa<AssumeInst>
		wvoquineAuthorUnsubmitted Done Reply Inline Actions Thanks! This could be addressed later on with the other intrinsics where that is applicable. wvoquine: Thanks! This could be addressed later on with the other intrinsics where that is applicable.
LLVM_DEBUG(dbgs() << "LSV: Found may-write/throw operation: " << I		LLVM_DEBUG(dbgs() << "LSV: Found may-write/throw operation: " << I
<< '\n');		<< '\n');
break;		break;
} else if (!IsLoadChain && (I.mayReadOrWriteMemory() \|\| I.mayThrow())) {		} else if (!IsLoadChain && (I.mayReadOrWriteMemory() \|\| I.mayThrow())) {
LLVM_DEBUG(dbgs() << "LSV: Found may-read/write/throw operation: " << I		LLVM_DEBUG(dbgs() << "LSV: Found may-read/write/throw operation: " << I
<< '\n');		<< '\n');
break;		break;
}		}
▲ Show 20 Lines • Show All 634 Lines • Show Last 20 Lines

llvm/test/Transforms/LoadStoreVectorizer/X86/vectorize-i8-nested-add.ll

Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	bb:
%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0		%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0
%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1		%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1
%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2		%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2
%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3		%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3
store <4 x i8> %tmp22, <4 x i8>* %dst		store <4 x i8> %tmp22, <4 x i8>* %dst
ret void		ret void
}		}

declare void @llvm.assume(i1)

define void @ld_v4i8_add_known_bits(i32 %ind0, i32 %ind1, i8* %src, <4 x i8>* %dst) {		define void @ld_v4i8_add_known_bits(i32 %ind0, i32 %ind1, i8* %src, <4 x i8>* %dst) {
; CHECK-LABEL: @ld_v4i8_add_known_bits(		; CHECK-LABEL: @ld_v4i8_add_known_bits(
; CHECK-NEXT: bb:		; CHECK-NEXT: bb:
; CHECK-NEXT: [[V0:%.]] = mul i32 [[IND0:%.]], 4		; CHECK-NEXT: [[V0:%.]] = mul i32 [[IND0:%.]], 4
; CHECK-NEXT: [[V1:%.]] = mul i32 [[IND1:%.]], 4		; CHECK-NEXT: [[V1:%.]] = mul i32 [[IND1:%.]], 4
; CHECK-NEXT: [[TMP:%.*]] = add i32 [[V0]], -1		; CHECK-NEXT: [[TMP:%.*]] = add i32 [[V0]], -1
; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[V1]], [[TMP]]		; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[V1]], [[TMP]]
; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64		; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	bb:
%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0		%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0
%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1		%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1
%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2		%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2
%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3		%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3
store <4 x i8> %tmp22, <4 x i8>* %dst		store <4 x i8> %tmp22, <4 x i8>* %dst
ret void		ret void
}		}

		declare void @llvm.assume(i1)

		define void @ld_v4i8_add_assume_on_arg(i32 %v0, i32 %v1, i8* %src, <4 x i8>* %dst) {
		; CHECK-LABEL: @ld_v4i8_add_assume_on_arg(
		; CHECK-NEXT: bb:
		; CHECK-NEXT: [[AND_I:%.]] = and i32 [[V0:%.]], 3
		; CHECK-NEXT: [[CMP_I:%.*]] = icmp eq i32 [[AND_I]], 0
		; CHECK-NEXT: [[AND_I_1:%.]] = and i32 [[V1:%.]], 3
		; CHECK-NEXT: [[CMP_I_1:%.*]] = icmp eq i32 [[AND_I_1]], 0
		; CHECK-NEXT: call void @llvm.assume(i1 [[CMP_I]])
		; CHECK-NEXT: call void @llvm.assume(i1 [[CMP_I_1]])
		; CHECK-NEXT: [[TMP:%.*]] = add nsw i32 [[V0]], -1
		; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[V1]], [[TMP]]
		; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
		; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 [[TMP2]]
		; CHECK-NEXT: [[TMP4:%.]] = load i8, i8 [[TMP3]], align 1
		; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[V1]], [[V0]]
		; CHECK-NEXT: [[TMP6:%.*]] = sext i32 [[TMP5]] to i64
		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i8, i8 [[SRC]], i64 [[TMP6]]
		; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[TMP7]] to <3 x i8>*
		; CHECK-NEXT: [[TMP1:%.]] = load <3 x i8>, <3 x i8> [[TMP0]], align 1
		; CHECK-NEXT: [[TMP81:%.*]] = extractelement <3 x i8> [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP132:%.*]] = extractelement <3 x i8> [[TMP1]], i32 1
		; CHECK-NEXT: [[TMP183:%.*]] = extractelement <3 x i8> [[TMP1]], i32 2
		; CHECK-NEXT: [[TMP19:%.*]] = insertelement <4 x i8> undef, i8 [[TMP4]], i32 0
		; CHECK-NEXT: [[TMP20:%.*]] = insertelement <4 x i8> [[TMP19]], i8 [[TMP81]], i32 1
		; CHECK-NEXT: [[TMP21:%.*]] = insertelement <4 x i8> [[TMP20]], i8 [[TMP132]], i32 2
		; CHECK-NEXT: [[TMP22:%.*]] = insertelement <4 x i8> [[TMP21]], i8 [[TMP183]], i32 3
		; CHECK-NEXT: store <4 x i8> [[TMP22]], <4 x i8>* [[DST:%.*]]
		; CHECK-NEXT: ret void
		;
		bb:
		%and.i = and i32 %v0, 3
		%cmp.i = icmp eq i32 %and.i, 0
		%and.i.1 = and i32 %v1, 3
		%cmp.i.1 = icmp eq i32 %and.i.1, 0
		call void @llvm.assume(i1 %cmp.i)
		call void @llvm.assume(i1 %cmp.i.1)
		%tmp = add nsw i32 %v0, -1
		%tmp1 = add i32 %v1, %tmp
		%tmp2 = sext i32 %tmp1 to i64
		%tmp3 = getelementptr inbounds i8, i8* %src, i64 %tmp2
		%tmp4 = load i8, i8* %tmp3, align 1
		%tmp5 = add i32 %v1, %v0
		%tmp6 = sext i32 %tmp5 to i64
		%tmp7 = getelementptr inbounds i8, i8* %src, i64 %tmp6
		%tmp8 = load i8, i8* %tmp7, align 1
		%tmp9 = add nsw i32 %v0, 1
		%tmp10 = add i32 %v1, %tmp9
		%tmp11 = sext i32 %tmp10 to i64
		%tmp12 = getelementptr inbounds i8, i8* %src, i64 %tmp11
		%tmp13 = load i8, i8* %tmp12, align 1
		%tmp14 = add nsw i32 %v0, 2
		%tmp15 = add i32 %v1, %tmp14
		%tmp16 = sext i32 %tmp15 to i64
		%tmp17 = getelementptr inbounds i8, i8* %src, i64 %tmp16
		%tmp18 = load i8, i8* %tmp17, align 1
		%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0
		%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1
		%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2
		%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3
		store <4 x i8> %tmp22, <4 x i8>* %dst
		ret void
		}

		define void @ld_v4i8_add_assume_on_arg1(i32 %v0, i32 %v1, i8* %src, <4 x i8>* %dst) {
		; CHECK-LABEL: @ld_v4i8_add_assume_on_arg1(
		; CHECK-NEXT: bb:
		; CHECK-NEXT: [[AND_I:%.]] = and i32 [[V0:%.]], 3
		; CHECK-NEXT: [[CMP_I:%.*]] = icmp eq i32 [[AND_I]], 0
		; CHECK-NEXT: [[AND_I_1:%.]] = and i32 [[V1:%.]], 3
		; CHECK-NEXT: [[CMP_I_1:%.*]] = icmp eq i32 [[AND_I_1]], 0
		; CHECK-NEXT: call void @llvm.assume(i1 [[CMP_I]])
		; CHECK-NEXT: call void @llvm.assume(i1 [[CMP_I_1]])
		; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[V1]], [[V0]]
		; CHECK-NEXT: [[TMP6:%.*]] = sext i32 [[TMP5]] to i64
		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 [[TMP6]]
		; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[TMP7]] to <4 x i8>*
		; CHECK-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> [[TMP0]], align 1
		; CHECK-NEXT: [[TMP81:%.*]] = extractelement <4 x i8> [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP132:%.*]] = extractelement <4 x i8> [[TMP1]], i32 1
		; CHECK-NEXT: [[TMP183:%.*]] = extractelement <4 x i8> [[TMP1]], i32 2
		; CHECK-NEXT: [[TMP44:%.*]] = extractelement <4 x i8> [[TMP1]], i32 3
		; CHECK-NEXT: [[TMP19:%.*]] = insertelement <4 x i8> undef, i8 [[TMP44]], i32 0
		; CHECK-NEXT: [[TMP20:%.*]] = insertelement <4 x i8> [[TMP19]], i8 [[TMP81]], i32 1
		; CHECK-NEXT: [[TMP21:%.*]] = insertelement <4 x i8> [[TMP20]], i8 [[TMP132]], i32 2
		; CHECK-NEXT: [[TMP22:%.*]] = insertelement <4 x i8> [[TMP21]], i8 [[TMP183]], i32 3
		; CHECK-NEXT: store <4 x i8> [[TMP22]], <4 x i8>* [[DST:%.*]]
		; CHECK-NEXT: ret void
		;
		bb:
		%and.i = and i32 %v0, 3
		%cmp.i = icmp eq i32 %and.i, 0
		%and.i.1 = and i32 %v1, 3
		%cmp.i.1 = icmp eq i32 %and.i.1, 0
		call void @llvm.assume(i1 %cmp.i)
		call void @llvm.assume(i1 %cmp.i.1)
		%tmp = add nsw i32 %v0, 3
		%tmp1 = add i32 %v1, %tmp
		%tmp2 = sext i32 %tmp1 to i64
		%tmp3 = getelementptr inbounds i8, i8* %src, i64 %tmp2
		%tmp4 = load i8, i8* %tmp3, align 1
		%tmp5 = add i32 %v1, %v0
		%tmp6 = sext i32 %tmp5 to i64
		%tmp7 = getelementptr inbounds i8, i8* %src, i64 %tmp6
		%tmp8 = load i8, i8* %tmp7, align 1
		%tmp9 = add nsw i32 %v0, 1
		%tmp10 = add i32 %v1, %tmp9
		%tmp11 = sext i32 %tmp10 to i64
		%tmp12 = getelementptr inbounds i8, i8* %src, i64 %tmp11
		%tmp13 = load i8, i8* %tmp12, align 1
		%tmp14 = add nsw i32 %v0, 2
		%tmp15 = add i32 %v1, %tmp14
		%tmp16 = sext i32 %tmp15 to i64
		%tmp17 = getelementptr inbounds i8, i8* %src, i64 %tmp16
		%tmp18 = load i8, i8* %tmp17, align 1
		%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0
		%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1
		%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2
		%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3
		store <4 x i8> %tmp22, <4 x i8>* %dst
		ret void
		}

		define void @ld_v4i8_add_diff_contexts(i32 %ind0, i32 %ind1, i8* %src, <2 x i8>* %dst) {
		wvoquineAuthorUnsubmitted Done Reply Inline Actions I've realized this should be "v2i8" in the test name. wvoquine: I've realized this should be "v2i8" in the test name.
		; CHECK-LABEL: @ld_v4i8_add_diff_contexts(
		; CHECK-NEXT: bb:
		; CHECK-NEXT: [[V0:%.]] = and i32 [[IND0:%.]], 4
		; CHECK-NEXT: [[V1:%.]] = mul i32 [[IND1:%.]], 3
		; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[V1]], [[V0]]
		; CHECK-NEXT: [[BIT_COND:%.*]] = icmp eq i32 [[V1]], 0
		; CHECK-NEXT: br i1 [[BIT_COND]], label [[BB_LOADS:%.]], label [[BB_SKIP:%.]]
		; CHECK: bb.loads:
		; CHECK-NEXT: call void @llvm.assume(i1 [[BIT_COND]])
		; CHECK-NEXT: [[TMP6:%.*]] = sext i32 [[TMP5]] to i64
		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 [[TMP6]]
		; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[TMP7]] to <2 x i8>*
		; CHECK-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> [[TMP0]], align 1
		; CHECK-NEXT: [[TMP81:%.*]] = extractelement <2 x i8> [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP42:%.*]] = extractelement <2 x i8> [[TMP1]], i32 1
		; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x i8> undef, i8 [[TMP42]], i32 0
		; CHECK-NEXT: [[TMP20:%.*]] = insertelement <2 x i8> [[TMP19]], i8 [[TMP81]], i32 1
		; CHECK-NEXT: store <2 x i8> [[TMP20]], <2 x i8>* [[DST:%.*]]
		; CHECK-NEXT: br label [[BB_SKIP]]
		; CHECK: bb.skip:
		; CHECK-NEXT: ret void
		;
		bb:
		%v0 = and i32 %ind0, 4
		%v1 = mul i32 %ind1, 3
		%tmp5 = add i32 %v1, %v0
		%bit_cond = icmp eq i32 %v1, 0
		br i1 %bit_cond, label %bb.loads, label %bb.skip

		bb.loads:
		call void @llvm.assume(i1 %bit_cond)
		%tmp = add nsw i32 %v0, 1
		%tmp1 = add i32 %v1, %tmp
		%tmp2 = sext i32 %tmp1 to i64
		%tmp3 = getelementptr inbounds i8, i8* %src, i64 %tmp2
		%tmp4 = load i8, i8* %tmp3, align 1
		%tmp6 = sext i32 %tmp5 to i64
		%tmp7 = getelementptr inbounds i8, i8* %src, i64 %tmp6
		%tmp8 = load i8, i8* %tmp7, align 1
		%tmp19 = insertelement <2 x i8> undef, i8 %tmp4, i32 0
		%tmp20 = insertelement <2 x i8> %tmp19, i8 %tmp8, i32 1
		store <2 x i8> %tmp20, <2 x i8>* %dst
		br label %bb.skip

		bb.skip:
		ret void
		}

; Make sure we don't vectorize the loads below because the source of		; Make sure we don't vectorize the loads below because the source of
; sext instructions doesn't have the nsw flag or known bits allowing		; sext instructions doesn't have the nsw flag or known bits allowing
; to apply the vectorization.		; to apply the vectorization.

define void @ld_v4i8_add_not_safe(i32 %v0, i32 %v1, i8* %src, <4 x i8>* %dst) {		define void @ld_v4i8_add_not_safe(i32 %v0, i32 %v1, i8* %src, <4 x i8>* %dst) {
; CHECK-LABEL: @ld_v4i8_add_not_safe(		; CHECK-LABEL: @ld_v4i8_add_not_safe(
; CHECK-NEXT: bb:		; CHECK-NEXT: bb:
; CHECK-NEXT: [[TMP:%.]] = add nsw i32 [[V0:%.]], -1		; CHECK-NEXT: [[TMP:%.]] = add nsw i32 [[V0:%.]], -1
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Change the context instruction for computeKnownBits in LoadStoreVectorizer pass
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 342084

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

llvm/test/Transforms/LoadStoreVectorizer/X86/vectorize-i8-nested-add.ll

This is an archive of the discontinued LLVM Phabricator instance.

Change the context instruction for computeKnownBits in LoadStoreVectorizer passClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 342084

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

llvm/test/Transforms/LoadStoreVectorizer/X86/vectorize-i8-nested-add.ll

Change the context instruction for computeKnownBits in LoadStoreVectorizer pass
ClosedPublic