Diff 344976

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

Show First 20 Lines • Show All 508 Lines • ▼ Show 20 Lines	bool Vectorizer::lookThroughComplexAddresses(Value PtrA, Value PtrB,

unsigned BitWidth = ValA->getType()->getScalarSizeInBits();		unsigned BitWidth = ValA->getType()->getScalarSizeInBits();

// Third attempt:		// Third attempt:
// If all set bits of IdxDiff or any higher order bit other than the sign bit		// If all set bits of IdxDiff or any higher order bit other than the sign bit
// are known to be zero in ValA, we can add Diff to it while guaranteeing no		// are known to be zero in ValA, we can add Diff to it while guaranteeing no
// overflow of any sort.		// overflow of any sort.
if (!Safe) {		if (!Safe) {
OpA = dyn_cast<Instruction>(ValA);
if (!OpA)
return false;
KnownBits Known(BitWidth);		KnownBits Known(BitWidth);
computeKnownBits(OpA, Known, DL, 0, &AC, OpA, &DT);		computeKnownBits(ValA, Known, DL, 0, &AC, OpB, &DT);
arsenmUnsubmitted Done Reply Inline Actions How do we know A is the point the vectorized instruction will be inserted, and not B? arsenm: How do we know A is the point the vectorized instruction will be inserted, and not B?
arsenmUnsubmitted Done Reply Inline Actions e.g. can you add some tests with an assume between the two instructions to vectorize? arsenm: e.g. can you add some tests with an assume between the two instructions to vectorize?
wvoquineAuthorUnsubmitted Done Reply Inline Actions Don't you think ld_v2i8_add_different_contexts would count as one, or do you prefer a test without a control flow in it? wvoquine: Don't you think ld_v2i8_add_different_contexts would count as one, or do you prefer a test…
arsenmUnsubmitted Done Reply Inline Actions ld_v2i8_add_different_contexts does not have an assume between the two loads, so I don't mean that arsenm: ld_v2i8_add_different_contexts does not have an assume between the two loads, so I don't mean…
wvoquineAuthorUnsubmitted Done Reply Inline Actions I see some unexpected results when I place llvm.assume between loads, or just before the first load. I will have to debug a little bit, and will likely update the change. A safer version could be to still use OpA if it's not nullptr and OpB if it is. wvoquine: I see some unexpected results when I place llvm.assume between loads, or just before the first…
wvoquineAuthorUnsubmitted Done Reply Inline Actions In fact llvm.assume itself did block the transformation as it was considered to be writing to the memory. Similar to other passes we can ignore assume in this case though. wvoquine: In fact llvm.assume itself did block the transformation as it was considered to be writing to…
wvoquineAuthorUnsubmitted Done Reply Inline Actions Now the tests `ld_v2i8_add_different_contexts1` and `ld_v2i8_add_context` have llvm.assume between the vectorized loads. Also in `ld_v2i8_add_context1` the llvm.assume goes after both and a store coming after that, it still works with the single basic block. wvoquine: Now the tests `ld_v2i8_add_different_contexts1` and `ld_v2i8_add_context` have llvm.assume…
wvoquineAuthorUnsubmitted Done Reply Inline Actions I've added the tests along with the patch to let them show a difference (ignore llvm.assume itself when checking on memory conflicts between the two loads). Are there more concerns? wvoquine: I've added the tests along with the patch to let them show a difference (ignore llvm.assume…
wvoquineAuthorUnsubmitted Done Reply Inline Actions By the way, we have discussed some of these points with @rtereshin and @bogner Neither of OpA and OpB is a point of the vectorization. They both dominate the point of vectorization, but there's no information here about which one is closer to it. This change only enables more cases to pass: if OpB is nullptr then it means we have bailed long ago in the function: ( See this part: // At this point A could be a function parameter, i.e. not an instruction Value ValA = OpA->getOperand(0); OpB = dyn_cast<Instruction>(OpB->getOperand(0)); if (!OpB \|\| ValA->getType() != OpB->getType()) return false; ) I just allow it not to bail if OpA appears to be nullptr. There's no assumption as to which one goes at the point or closer to the point of vectorization. Also without llvm.assume I couldn't differentiate the two contexts in a test. And I added a test with the llvm.assume - ld_v2i8_add_different_contexts. Now, sometimes OpA might be with a better context for the calculation of known bits. In such cases we might fail to vectorize. If llvm.assume's are inconsistent in the two points then this ought to be an UB. For an improvement in the future we might need to use the actual vectorization point as the context. wvoquine:* By the way, we have discussed some of these points with @rtereshin and @bogner Neither of OpA…
wvoquineAuthorUnsubmitted Done Reply Inline Actions There's no assumption as to which one goes at the point or closer to the point of vectorization. Having said that, if a case is a result of unrolling, in many such occasions, OpB indeed naturally goes after OpA. And that means the context is more precise (the contexts shouldn't be inconsistent). wvoquine: > There's no assumption as to which one goes at the point or closer to the point of…
APInt BitsAllowedToBeSet = Known.Zero.zext(IdxDiff.getBitWidth());		APInt BitsAllowedToBeSet = Known.Zero.zext(IdxDiff.getBitWidth());
if (Signed)		if (Signed)
BitsAllowedToBeSet.clearBit(BitWidth - 1);		BitsAllowedToBeSet.clearBit(BitWidth - 1);
if (BitsAllowedToBeSet.ult(IdxDiff))		if (BitsAllowedToBeSet.ult(IdxDiff))
return false;		return false;
}		}

const SCEV *OffsetSCEVA = SE.getSCEV(ValA);		const SCEV *OffsetSCEVA = SE.getSCEV(ValA);
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	for (Instruction &I : make_range(getBoundaryInstrs(Chain))) {
} else if (isa<IntrinsicInst>(&I) &&		} else if (isa<IntrinsicInst>(&I) &&
cast<IntrinsicInst>(&I)->getIntrinsicID() ==		cast<IntrinsicInst>(&I)->getIntrinsicID() ==
Intrinsic::sideeffect) {		Intrinsic::sideeffect) {
// Ignore llvm.sideeffect calls.		// Ignore llvm.sideeffect calls.
} else if (isa<IntrinsicInst>(&I) &&		} else if (isa<IntrinsicInst>(&I) &&
cast<IntrinsicInst>(&I)->getIntrinsicID() ==		cast<IntrinsicInst>(&I)->getIntrinsicID() ==
Intrinsic::pseudoprobe) {		Intrinsic::pseudoprobe) {
// Ignore llvm.pseudoprobe calls.		// Ignore llvm.pseudoprobe calls.
		} else if (isa<IntrinsicInst>(&I) &&
		xbolva00Unsubmitted Not Done Reply Inline Actions isa<AssumeInst> xbolva00: isa<AssumeInst>
		wvoquineAuthorUnsubmitted Done Reply Inline Actions Thanks! This could be addressed later on with the other intrinsics where that is applicable. wvoquine: Thanks! This could be addressed later on with the other intrinsics where that is applicable.
		cast<IntrinsicInst>(&I)->getIntrinsicID() == Intrinsic::assume) {
		// Ignore llvm.assume calls.
} else if (IsLoadChain && (I.mayWriteToMemory() \|\| I.mayThrow())) {		} else if (IsLoadChain && (I.mayWriteToMemory() \|\| I.mayThrow())) {
LLVM_DEBUG(dbgs() << "LSV: Found may-write/throw operation: " << I		LLVM_DEBUG(dbgs() << "LSV: Found may-write/throw operation: " << I
<< '\n');		<< '\n');
break;		break;
} else if (!IsLoadChain && (I.mayReadOrWriteMemory() \|\| I.mayThrow())) {		} else if (!IsLoadChain && (I.mayReadOrWriteMemory() \|\| I.mayThrow())) {
LLVM_DEBUG(dbgs() << "LSV: Found may-read/write/throw operation: " << I		LLVM_DEBUG(dbgs() << "LSV: Found may-read/write/throw operation: " << I
<< '\n');		<< '\n');
break;		break;
▲ Show 20 Lines • Show All 635 Lines • Show Last 20 Lines

llvm/test/Transforms/LoadStoreVectorizer/X86/vectorize-i8-nested-add.ll

Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	bb:
%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0		%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0
%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1		%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1
%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2		%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2
%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3		%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3
store <4 x i8> %tmp22, <4 x i8>* %dst		store <4 x i8> %tmp22, <4 x i8>* %dst
ret void		ret void
}		}

declare void @llvm.assume(i1)

define void @ld_v4i8_add_known_bits(i32 %ind0, i32 %ind1, i8* %src, <4 x i8>* %dst) {		define void @ld_v4i8_add_known_bits(i32 %ind0, i32 %ind1, i8* %src, <4 x i8>* %dst) {
; CHECK-LABEL: @ld_v4i8_add_known_bits(		; CHECK-LABEL: @ld_v4i8_add_known_bits(
; CHECK-NEXT: bb:		; CHECK-NEXT: bb:
; CHECK-NEXT: [[V0:%.]] = mul i32 [[IND0:%.]], 4		; CHECK-NEXT: [[V0:%.]] = mul i32 [[IND0:%.]], 4
; CHECK-NEXT: [[V1:%.]] = mul i32 [[IND1:%.]], 4		; CHECK-NEXT: [[V1:%.]] = mul i32 [[IND1:%.]], 4
; CHECK-NEXT: [[TMP:%.*]] = add i32 [[V0]], -1		; CHECK-NEXT: [[TMP:%.*]] = add i32 [[V0]], -1
; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[V1]], [[TMP]]		; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[V1]], [[TMP]]
; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64		; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	bb:
%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0		%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0
%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1		%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1
%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2		%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2
%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3		%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3
store <4 x i8> %tmp22, <4 x i8>* %dst		store <4 x i8> %tmp22, <4 x i8>* %dst
ret void		ret void
}		}

		declare void @llvm.assume(i1)

		define void @ld_v4i8_add_assume_on_arg(i32 %v0, i32 %v1, i8* %src, <4 x i8>* %dst) {
		; CHECK-LABEL: @ld_v4i8_add_assume_on_arg(
		; CHECK-NEXT: bb:
		; CHECK-NEXT: [[AND_I:%.]] = and i32 [[V0:%.]], 3
		; CHECK-NEXT: [[CMP_I:%.*]] = icmp eq i32 [[AND_I]], 0
		; CHECK-NEXT: [[AND_I_1:%.]] = and i32 [[V1:%.]], 3
		; CHECK-NEXT: [[CMP_I_1:%.*]] = icmp eq i32 [[AND_I_1]], 0
		; CHECK-NEXT: call void @llvm.assume(i1 [[CMP_I]])
		; CHECK-NEXT: call void @llvm.assume(i1 [[CMP_I_1]])
		; CHECK-NEXT: [[TMP:%.*]] = add nsw i32 [[V0]], -1
		; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[V1]], [[TMP]]
		; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
		; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 [[TMP2]]
		; CHECK-NEXT: [[TMP4:%.]] = load i8, i8 [[TMP3]], align 1
		; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[V1]], [[V0]]
		; CHECK-NEXT: [[TMP6:%.*]] = sext i32 [[TMP5]] to i64
		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i8, i8 [[SRC]], i64 [[TMP6]]
		; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[TMP7]] to <3 x i8>*
		; CHECK-NEXT: [[TMP1:%.]] = load <3 x i8>, <3 x i8> [[TMP0]], align 1
		; CHECK-NEXT: [[TMP81:%.*]] = extractelement <3 x i8> [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP132:%.*]] = extractelement <3 x i8> [[TMP1]], i32 1
		; CHECK-NEXT: [[TMP183:%.*]] = extractelement <3 x i8> [[TMP1]], i32 2
		; CHECK-NEXT: [[TMP19:%.*]] = insertelement <4 x i8> undef, i8 [[TMP4]], i32 0
		; CHECK-NEXT: [[TMP20:%.*]] = insertelement <4 x i8> [[TMP19]], i8 [[TMP81]], i32 1
		; CHECK-NEXT: [[TMP21:%.*]] = insertelement <4 x i8> [[TMP20]], i8 [[TMP132]], i32 2
		; CHECK-NEXT: [[TMP22:%.*]] = insertelement <4 x i8> [[TMP21]], i8 [[TMP183]], i32 3
		; CHECK-NEXT: store <4 x i8> [[TMP22]], <4 x i8>* [[DST:%.*]]
		; CHECK-NEXT: ret void
		;
		bb:
		%and.i = and i32 %v0, 3
		%cmp.i = icmp eq i32 %and.i, 0
		%and.i.1 = and i32 %v1, 3
		%cmp.i.1 = icmp eq i32 %and.i.1, 0
		call void @llvm.assume(i1 %cmp.i)
		call void @llvm.assume(i1 %cmp.i.1)
		%tmp = add nsw i32 %v0, -1
		%tmp1 = add i32 %v1, %tmp
		%tmp2 = sext i32 %tmp1 to i64
		%tmp3 = getelementptr inbounds i8, i8* %src, i64 %tmp2
		%tmp4 = load i8, i8* %tmp3, align 1
		%tmp5 = add i32 %v1, %v0
		%tmp6 = sext i32 %tmp5 to i64
		%tmp7 = getelementptr inbounds i8, i8* %src, i64 %tmp6
		%tmp8 = load i8, i8* %tmp7, align 1
		%tmp9 = add nsw i32 %v0, 1
		%tmp10 = add i32 %v1, %tmp9
		%tmp11 = sext i32 %tmp10 to i64
		%tmp12 = getelementptr inbounds i8, i8* %src, i64 %tmp11
		%tmp13 = load i8, i8* %tmp12, align 1
		%tmp14 = add nsw i32 %v0, 2
		%tmp15 = add i32 %v1, %tmp14
		%tmp16 = sext i32 %tmp15 to i64
		%tmp17 = getelementptr inbounds i8, i8* %src, i64 %tmp16
		%tmp18 = load i8, i8* %tmp17, align 1
		%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0
		%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1
		%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2
		%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3
		store <4 x i8> %tmp22, <4 x i8>* %dst
		ret void
		}

		define void @ld_v4i8_add_assume_on_arg1(i32 %v0, i32 %v1, i8* %src, <4 x i8>* %dst) {
		; CHECK-LABEL: @ld_v4i8_add_assume_on_arg1(
		; CHECK-NEXT: bb:
		; CHECK-NEXT: [[AND_I:%.]] = and i32 [[V0:%.]], 3
		; CHECK-NEXT: [[CMP_I:%.*]] = icmp eq i32 [[AND_I]], 0
		; CHECK-NEXT: [[AND_I_1:%.]] = and i32 [[V1:%.]], 3
		; CHECK-NEXT: [[CMP_I_1:%.*]] = icmp eq i32 [[AND_I_1]], 0
		; CHECK-NEXT: call void @llvm.assume(i1 [[CMP_I]])
		; CHECK-NEXT: call void @llvm.assume(i1 [[CMP_I_1]])
		; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[V1]], [[V0]]
		; CHECK-NEXT: [[TMP6:%.*]] = sext i32 [[TMP5]] to i64
		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 [[TMP6]]
		; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[TMP7]] to <4 x i8>*
		; CHECK-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> [[TMP0]], align 1
		; CHECK-NEXT: [[TMP81:%.*]] = extractelement <4 x i8> [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP132:%.*]] = extractelement <4 x i8> [[TMP1]], i32 1
		; CHECK-NEXT: [[TMP183:%.*]] = extractelement <4 x i8> [[TMP1]], i32 2
		; CHECK-NEXT: [[TMP44:%.*]] = extractelement <4 x i8> [[TMP1]], i32 3
		; CHECK-NEXT: [[TMP19:%.*]] = insertelement <4 x i8> undef, i8 [[TMP44]], i32 0
		; CHECK-NEXT: [[TMP20:%.*]] = insertelement <4 x i8> [[TMP19]], i8 [[TMP81]], i32 1
		; CHECK-NEXT: [[TMP21:%.*]] = insertelement <4 x i8> [[TMP20]], i8 [[TMP132]], i32 2
		; CHECK-NEXT: [[TMP22:%.*]] = insertelement <4 x i8> [[TMP21]], i8 [[TMP183]], i32 3
		; CHECK-NEXT: store <4 x i8> [[TMP22]], <4 x i8>* [[DST:%.*]]
		; CHECK-NEXT: ret void
		;
		bb:
		%and.i = and i32 %v0, 3
		%cmp.i = icmp eq i32 %and.i, 0
		%and.i.1 = and i32 %v1, 3
		%cmp.i.1 = icmp eq i32 %and.i.1, 0
		call void @llvm.assume(i1 %cmp.i)
		call void @llvm.assume(i1 %cmp.i.1)
		%tmp = add nsw i32 %v0, 3
		%tmp1 = add i32 %v1, %tmp
		%tmp2 = sext i32 %tmp1 to i64
		%tmp3 = getelementptr inbounds i8, i8* %src, i64 %tmp2
		%tmp4 = load i8, i8* %tmp3, align 1
		%tmp5 = add i32 %v1, %v0
		%tmp6 = sext i32 %tmp5 to i64
		%tmp7 = getelementptr inbounds i8, i8* %src, i64 %tmp6
		%tmp8 = load i8, i8* %tmp7, align 1
		%tmp9 = add nsw i32 %v0, 1
		%tmp10 = add i32 %v1, %tmp9
		%tmp11 = sext i32 %tmp10 to i64
		%tmp12 = getelementptr inbounds i8, i8* %src, i64 %tmp11
		%tmp13 = load i8, i8* %tmp12, align 1
		%tmp14 = add nsw i32 %v0, 2
		%tmp15 = add i32 %v1, %tmp14
		%tmp16 = sext i32 %tmp15 to i64
		%tmp17 = getelementptr inbounds i8, i8* %src, i64 %tmp16
		%tmp18 = load i8, i8* %tmp17, align 1
		%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0
		%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1
		%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2
		%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3
		store <4 x i8> %tmp22, <4 x i8>* %dst
		ret void
		}

		; Address computations are partly separated by control flow and with llvm.assume placed
		wvoquineAuthorUnsubmitted Done Reply Inline Actions I've realized this should be "v2i8" in the test name. wvoquine: I've realized this should be "v2i8" in the test name.
		; in the second basic block

		define void @ld_v2i8_add_different_contexts(i32 %ind0, i32 %ind1, i8* %src, <2 x i8>* %dst) {
		; CHECK-LABEL: @ld_v2i8_add_different_contexts(
		; CHECK-NEXT: bb:
		; CHECK-NEXT: [[V0:%.]] = mul i32 [[IND0:%.]], 4
		; CHECK-NEXT: [[V1:%.]] = mul i32 [[IND1:%.]], 3
		; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[V1]], [[V0]]
		; CHECK-NEXT: [[BIT_COND:%.*]] = icmp eq i32 [[V1]], 0
		; CHECK-NEXT: br i1 [[BIT_COND]], label [[BB_LOADS:%.]], label [[BB_SKIP:%.]]
		; CHECK: bb.loads:
		; CHECK-NEXT: call void @llvm.assume(i1 [[BIT_COND]])
		; CHECK-NEXT: [[TMP6:%.*]] = sext i32 [[TMP5]] to i64
		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 [[TMP6]]
		; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[TMP7]] to <2 x i8>*
		; CHECK-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> [[TMP0]], align 1
		; CHECK-NEXT: [[TMP81:%.*]] = extractelement <2 x i8> [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP42:%.*]] = extractelement <2 x i8> [[TMP1]], i32 1
		; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x i8> undef, i8 [[TMP42]], i32 0
		; CHECK-NEXT: [[TMP20:%.*]] = insertelement <2 x i8> [[TMP19]], i8 [[TMP81]], i32 1
		; CHECK-NEXT: store <2 x i8> [[TMP20]], <2 x i8>* [[DST:%.*]]
		; CHECK-NEXT: br label [[BB_SKIP]]
		; CHECK: bb.skip:
		; CHECK-NEXT: ret void
		;
		bb:
		%v0 = mul i32 %ind0, 4
		%v1 = mul i32 %ind1, 3
		%tmp5 = add i32 %v1, %v0
		%bit_cond = icmp eq i32 %v1, 0
		br i1 %bit_cond, label %bb.loads, label %bb.skip

		bb.loads:
		call void @llvm.assume(i1 %bit_cond)
		%tmp = add nsw i32 %v0, 1
		%tmp1 = add i32 %v1, %tmp
		%tmp2 = sext i32 %tmp1 to i64
		%tmp3 = getelementptr inbounds i8, i8* %src, i64 %tmp2
		%tmp4 = load i8, i8* %tmp3, align 1
		%tmp6 = sext i32 %tmp5 to i64
		%tmp7 = getelementptr inbounds i8, i8* %src, i64 %tmp6
		%tmp8 = load i8, i8* %tmp7, align 1
		%tmp19 = insertelement <2 x i8> undef, i8 %tmp4, i32 0
		%tmp20 = insertelement <2 x i8> %tmp19, i8 %tmp8, i32 1
		store <2 x i8> %tmp20, <2 x i8>* %dst
		br label %bb.skip

		bb.skip:
		ret void
		}

		; Same as ld_v2i8_add_different_contexts but with llvm.assume placed between loads

		define void @ld_v2i8_add_different_contexts1(i32 %ind0, i32 %ind1, i8* %src, <2 x i8>* %dst) {
		; CHECK-LABEL: @ld_v2i8_add_different_contexts1(
		; CHECK-NEXT: bb:
		; CHECK-NEXT: [[V0:%.]] = mul i32 [[IND0:%.]], 4
		; CHECK-NEXT: [[V1:%.]] = mul i32 [[IND1:%.]], 3
		; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[V1]], [[V0]]
		; CHECK-NEXT: [[BIT_COND:%.*]] = icmp eq i32 [[V1]], 0
		; CHECK-NEXT: br i1 [[BIT_COND]], label [[BB_LOADS:%.]], label [[BB_SKIP:%.]]
		; CHECK: bb.loads:
		; CHECK-NEXT: [[TMP6:%.*]] = sext i32 [[TMP5]] to i64
		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 [[TMP6]]
		; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[TMP7]] to <2 x i8>*
		; CHECK-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> [[TMP0]], align 1
		; CHECK-NEXT: [[TMP81:%.*]] = extractelement <2 x i8> [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP42:%.*]] = extractelement <2 x i8> [[TMP1]], i32 1
		; CHECK-NEXT: call void @llvm.assume(i1 [[BIT_COND]])
		; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x i8> undef, i8 [[TMP42]], i32 0
		; CHECK-NEXT: [[TMP20:%.*]] = insertelement <2 x i8> [[TMP19]], i8 [[TMP81]], i32 1
		; CHECK-NEXT: store <2 x i8> [[TMP20]], <2 x i8>* [[DST:%.*]]
		; CHECK-NEXT: br label [[BB_SKIP]]
		; CHECK: bb.skip:
		; CHECK-NEXT: ret void
		;
		bb:
		%v0 = mul i32 %ind0, 4
		%v1 = mul i32 %ind1, 3
		%tmp5 = add i32 %v1, %v0
		%bit_cond = icmp eq i32 %v1, 0
		br i1 %bit_cond, label %bb.loads, label %bb.skip

		bb.loads:
		%tmp6 = sext i32 %tmp5 to i64
		%tmp7 = getelementptr inbounds i8, i8* %src, i64 %tmp6
		%tmp8 = load i8, i8* %tmp7, align 1
		call void @llvm.assume(i1 %bit_cond)
		%tmp = add nsw i32 %v0, 1
		%tmp1 = add i32 %v1, %tmp
		%tmp2 = sext i32 %tmp1 to i64
		%tmp3 = getelementptr inbounds i8, i8* %src, i64 %tmp2
		%tmp4 = load i8, i8* %tmp3, align 1
		%tmp19 = insertelement <2 x i8> undef, i8 %tmp4, i32 0
		%tmp20 = insertelement <2 x i8> %tmp19, i8 %tmp8, i32 1
		store <2 x i8> %tmp20, <2 x i8>* %dst
		br label %bb.skip

		bb.skip:
		ret void
		}

		; llvm.assume is placed between loads in a single basic block

		define void @ld_v2i8_add_context(i32 %ind0, i32 %ind1, i8* %src, <2 x i8>* %dst) {
		; CHECK-LABEL: @ld_v2i8_add_context(
		; CHECK-NEXT: bb:
		; CHECK-NEXT: [[V0:%.]] = mul i32 [[IND0:%.]], 4
		; CHECK-NEXT: [[V1:%.]] = mul i32 [[IND1:%.]], 3
		; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[V1]], [[V0]]
		; CHECK-NEXT: [[TMP6:%.*]] = sext i32 [[TMP5]] to i64
		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 [[TMP6]]
		; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[TMP7]] to <2 x i8>*
		; CHECK-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> [[TMP0]], align 1
		; CHECK-NEXT: [[TMP81:%.*]] = extractelement <2 x i8> [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP42:%.*]] = extractelement <2 x i8> [[TMP1]], i32 1
		; CHECK-NEXT: [[BIT_COND:%.*]] = icmp eq i32 [[TMP5]], 0
		; CHECK-NEXT: call void @llvm.assume(i1 [[BIT_COND]])
		; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x i8> undef, i8 [[TMP42]], i32 0
		; CHECK-NEXT: [[TMP20:%.*]] = insertelement <2 x i8> [[TMP19]], i8 [[TMP81]], i32 1
		; CHECK-NEXT: store <2 x i8> [[TMP20]], <2 x i8>* [[DST:%.*]]
		; CHECK-NEXT: ret void
		;
		bb:
		%v0 = mul i32 %ind0, 4
		%v1 = mul i32 %ind1, 3
		%tmp5 = add i32 %v1, %v0
		%tmp6 = sext i32 %tmp5 to i64
		%tmp7 = getelementptr inbounds i8, i8* %src, i64 %tmp6
		%tmp8 = load i8, i8* %tmp7, align 1
		%bit_cond = icmp eq i32 %tmp5, 0
		call void @llvm.assume(i1 %bit_cond)
		%tmp = add nsw i32 %v0, 1
		%tmp1 = add i32 %v1, %tmp
		%tmp2 = sext i32 %tmp1 to i64
		%tmp3 = getelementptr inbounds i8, i8* %src, i64 %tmp2
		%tmp4 = load i8, i8* %tmp3, align 1
		%tmp19 = insertelement <2 x i8> undef, i8 %tmp4, i32 0
		%tmp20 = insertelement <2 x i8> %tmp19, i8 %tmp8, i32 1
		store <2 x i8> %tmp20, <2 x i8>* %dst
		ret void
		}

		; Placing llvm.assume after all the loads and stores in the basic block still works

		define void @ld_v2i8_add_context1(i32 %ind0, i32 %ind1, i8* %src, <2 x i8>* %dst) {
		; CHECK-LABEL: @ld_v2i8_add_context1(
		; CHECK-NEXT: bb:
		; CHECK-NEXT: [[V0:%.]] = mul i32 [[IND0:%.]], 4
		; CHECK-NEXT: [[V1:%.]] = mul i32 [[IND1:%.]], 3
		; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[V1]], [[V0]]
		; CHECK-NEXT: [[TMP6:%.*]] = sext i32 [[TMP5]] to i64
		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 [[TMP6]]
		; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[TMP7]] to <2 x i8>*
		; CHECK-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> [[TMP0]], align 1
		; CHECK-NEXT: [[TMP81:%.*]] = extractelement <2 x i8> [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP42:%.*]] = extractelement <2 x i8> [[TMP1]], i32 1
		; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x i8> undef, i8 [[TMP42]], i32 0
		; CHECK-NEXT: [[TMP20:%.*]] = insertelement <2 x i8> [[TMP19]], i8 [[TMP81]], i32 1
		; CHECK-NEXT: store <2 x i8> [[TMP20]], <2 x i8>* [[DST:%.*]]
		; CHECK-NEXT: [[BIT_COND:%.*]] = icmp eq i32 [[TMP5]], 0
		; CHECK-NEXT: call void @llvm.assume(i1 [[BIT_COND]])
		; CHECK-NEXT: ret void
		;
		bb:
		%v0 = mul i32 %ind0, 4
		%v1 = mul i32 %ind1, 3
		%tmp5 = add i32 %v1, %v0
		%tmp6 = sext i32 %tmp5 to i64
		%tmp7 = getelementptr inbounds i8, i8* %src, i64 %tmp6
		%tmp8 = load i8, i8* %tmp7, align 1
		%tmp = add nsw i32 %v0, 1
		%tmp1 = add i32 %v1, %tmp
		%tmp2 = sext i32 %tmp1 to i64
		%tmp3 = getelementptr inbounds i8, i8* %src, i64 %tmp2
		%tmp4 = load i8, i8* %tmp3, align 1
		%tmp19 = insertelement <2 x i8> undef, i8 %tmp4, i32 0
		%tmp20 = insertelement <2 x i8> %tmp19, i8 %tmp8, i32 1
		store <2 x i8> %tmp20, <2 x i8>* %dst
		%bit_cond = icmp eq i32 %tmp5, 0
		call void @llvm.assume(i1 %bit_cond)
		ret void
		}

; Make sure we don't vectorize the loads below because the source of		; Make sure we don't vectorize the loads below because the source of
; sext instructions doesn't have the nsw flag or known bits allowing		; sext instructions doesn't have the nsw flag or known bits allowing
; to apply the vectorization.		; to apply the vectorization.

define void @ld_v4i8_add_not_safe(i32 %v0, i32 %v1, i8* %src, <4 x i8>* %dst) {		define void @ld_v4i8_add_not_safe(i32 %v0, i32 %v1, i8* %src, <4 x i8>* %dst) {
; CHECK-LABEL: @ld_v4i8_add_not_safe(		; CHECK-LABEL: @ld_v4i8_add_not_safe(
; CHECK-NEXT: bb:		; CHECK-NEXT: bb:
; CHECK-NEXT: [[TMP:%.]] = add nsw i32 [[V0:%.]], -1		; CHECK-NEXT: [[TMP:%.]] = add nsw i32 [[V0:%.]], -1
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Change the context instruction for computeKnownBits in LoadStoreVectorizer pass
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 344976

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

llvm/test/Transforms/LoadStoreVectorizer/X86/vectorize-i8-nested-add.ll

This is an archive of the discontinued LLVM Phabricator instance.

Change the context instruction for computeKnownBits in LoadStoreVectorizer passClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 344976

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

llvm/test/Transforms/LoadStoreVectorizer/X86/vectorize-i8-nested-add.ll

Change the context instruction for computeKnownBits in LoadStoreVectorizer pass
ClosedPublic