This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
7/7
LoadStoreVectorizer.cpp
-
test/Transforms/LoadStoreVectorizer/X86/
-
Transforms/
-
LoadStoreVectorizer/
-
X86/
-
vectorize-i8-nested-add.ll

Differential D103912

LoadStoreVectorizer: support different operand orders in the add sequence match
ClosedPublic

Authored by wvoquine on Jun 8 2021, 10:13 AM.

Download Raw Diff

Details

Reviewers

arsenm
volkan
bogner

Commits

rG119965865cc7: LoadStoreVectorizer: support different operand orders in the add sequence match

Summary

First we refactor the code which does no wrapping add sequences
match: we need to allow different operand orders for
the key add instructions involved in the match.

Then we use the refactored code trying 4 variants of matching operands.

Originally the code relied on the fact that the matching operands
of the two last add instructions of memory index calculations
had the same LHS argument. But which operand is the same
in the two instructions is actually not essential, so now we allow
that to be any of LHS or RHS of each of the two instructions.
This increases the chances of vectorization to happen.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wvoquine created this revision.Jun 8 2021, 10:13 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJun 8 2021, 10:13 AM

wvoquine requested review of this revision.Jun 8 2021, 10:13 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2021, 10:13 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

wvoquine added reviewers: arsenm, • JustinBorb, volkan.Jun 8 2021, 10:14 AM

Herald added a subscriber: wdng. · View Herald TranscriptJun 8 2021, 10:14 AM

wvoquine set the repository for this revision to rG LLVM Github Monorepo.Jun 8 2021, 10:15 AM

Harbormaster completed remote builds in B108241: Diff 350646.Jun 8 2021, 10:42 AM

Fixed the code style issue.
Implemented operand order picking loop instead of a sequence of ifs.

Harbormaster completed remote builds in B108268: Diff 350687.Jun 8 2021, 1:00 PM

The failing test seems to be not related to the change.

volkan added inline comments.Jun 10 2021, 1:35 PM

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
391	Nit: Add empty line?
393	You can rename `MatchinOperand[AB]` as `MatchinOpIdx[AB]` or `MatchinOperandIdx[AB]` to make it clear that this is an index.
415	You can move this into the if statement below as there is no other users and get the Value directly as below: Value *OtherOpA = AddOpA->getOperand(MatchinOpIdxA == 0 ? 1 : 0);
420	This might be LHS or RHS based on `MatchinOperand[AB]`, could you rename these variables?
430	No need to check other cases, you can return `true` directly and get rid of `Safe`.
526	Typo

Implemented the suggested naming and code style changes.

wvoquine marked 6 inline comments as done.Jun 10 2021, 2:48 PM

wvoquine added inline comments.

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
393	I like MatchingOpIdx[AB] - by the way, I missed `g` in `Matching` too, thanks!

LGTM, thanks Slava.

This revision is now accepted and ready to land.Jun 10 2021, 3:18 PM

wvoquine edited reviewers, added: bogner; removed: • JustinBorb.Jun 10 2021, 3:18 PM

Harbormaster completed remote builds in B108695: Diff 351270.Jun 10 2021, 3:36 PM

The reported build failure is not related to the current change and is currently faced by other builds as well:

ld.lld: error: undefined symbol: Fortran::semantics::OmpStructureChecker::Enter(Fortran::parser::OmpClause::Full const&)

>>> referenced by semantics.cpp:83 (/var/lib/buildkite-agent/builds/llvm-project/flang/lib/Semantics/semantics.cpp:83)

This revision was landed with ongoing or failed builds.Jun 10 2021, 4:32 PM

Closed by commit rG119965865cc7: LoadStoreVectorizer: support different operand orders in the add sequence match (authored by wvoquine, committed by volkan). · Explain Why

This revision was automatically updated to reflect the committed changes.

volkan added a commit: rG119965865cc7: LoadStoreVectorizer: support different operand orders in the add sequence match.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoadStoreVectorizer.cpp

146 lines

test/

Transforms/

LoadStoreVectorizer/

X86/

vectorize-i8-nested-add.ll

98 lines

Diff 351294

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

Show First 20 Lines • Show All 378 Lines • ▼ Show 20 Lines	if (C == Dist)
return true;		return true;

// Sometimes even this doesn't work, because SCEV can't always see through		// Sometimes even this doesn't work, because SCEV can't always see through
// patterns that look like (gep (ext (add (shl X, C1), C2))). Try checking		// patterns that look like (gep (ext (add (shl X, C1), C2))). Try checking
// things the hard way.		// things the hard way.
return lookThroughComplexAddresses(PtrA, PtrB, BaseDelta, Depth);		return lookThroughComplexAddresses(PtrA, PtrB, BaseDelta, Depth);
}		}

		static bool checkNoWrapFlags(Instruction *I, bool Signed) {
		BinaryOperator *BinOpI = cast<BinaryOperator>(I);
		return (Signed && BinOpI->hasNoSignedWrap()) \|\|
		(!Signed && BinOpI->hasNoUnsignedWrap());
		}
		volkanUnsubmitted Done Reply Inline Actions Nit: Add empty line? volkan: Nit: Add empty line?

		static bool checkIfSafeAddSequence(const APInt &IdxDiff, Instruction *AddOpA,
		volkanUnsubmitted Done Reply Inline Actions You can rename `MatchinOperand[AB]` as `MatchinOpIdx[AB]` or `MatchinOperandIdx[AB]` to make it clear that this is an index. volkan: You can rename `MatchinOperand[AB]` as `MatchinOpIdx[AB]` or `MatchinOperandIdx[AB]` to make it…
		wvoquineAuthorUnsubmitted Done Reply Inline Actions I like MatchingOpIdx[AB] - by the way, I missed `g` in `Matching` too, thanks! wvoquine: I like MatchingOpIdx[AB] - by the way, I missed `g` in `Matching` too, thanks!
		unsigned MatchingOpIdxA, Instruction *AddOpB,
		unsigned MatchingOpIdxB, bool Signed) {
		// If both OpA and OpB is an add with NSW/NUW and with
		// one of the operands being the same, we can guarantee that the
		// transformation is safe if we can prove that OpA won't overflow when
		// IdxDiff added to the other operand of OpA.
		// For example:
		// %tmp7 = add nsw i32 %tmp2, %v0
		// %tmp8 = sext i32 %tmp7 to i64
		// ...
		// %tmp11 = add nsw i32 %v0, 1
		// %tmp12 = add nsw i32 %tmp2, %tmp11
		// %tmp13 = sext i32 %tmp12 to i64
		//
		// Both %tmp7 and %tmp2 has the nsw flag and the first operand
		// is %tmp2. It's guaranteed that adding 1 to %tmp7 won't overflow
		// because %tmp11 adds 1 to %v0 and both %tmp11 and %tmp12 has the
		// nsw flag.
		assert(AddOpA->getOpcode() == Instruction::Add &&
		AddOpB->getOpcode() == Instruction::Add &&
		checkNoWrapFlags(AddOpA, Signed) && checkNoWrapFlags(AddOpB, Signed));
		if (AddOpA->getOperand(MatchingOpIdxA) ==
		volkanUnsubmitted Done Reply Inline Actions You can move this into the if statement below as there is no other users and get the Value directly as below: Value OtherOpA = AddOpA->getOperand(MatchinOpIdxA == 0 ? 1 : 0); volkan:* You can move this into the if statement below as there is no other users and get the Value…
		AddOpB->getOperand(MatchingOpIdxB)) {
		Value *OtherOperandA = AddOpA->getOperand(MatchingOpIdxA == 1 ? 0 : 1);
		Value *OtherOperandB = AddOpB->getOperand(MatchingOpIdxB == 1 ? 0 : 1);
		Instruction *OtherInstrA = dyn_cast<Instruction>(OtherOperandA);
		Instruction *OtherInstrB = dyn_cast<Instruction>(OtherOperandB);
		volkanUnsubmitted Done Reply Inline Actions This might be LHS or RHS based on `MatchinOperand[AB]`, could you rename these variables? volkan: This might be LHS or RHS based on `MatchinOperand[AB]`, could you rename these variables?
		// Match `x +nsw/nuw y` and `x +nsw/nuw (y +nsw/nuw IdxDiff)`.
		if (OtherInstrB && OtherInstrB->getOpcode() == Instruction::Add &&
		checkNoWrapFlags(OtherInstrB, Signed) &&
		isa<ConstantInt>(OtherInstrB->getOperand(1))) {
		int64_t CstVal =
		cast<ConstantInt>(OtherInstrB->getOperand(1))->getSExtValue();
		if (OtherInstrB->getOperand(0) == OtherOperandA &&
		IdxDiff.getSExtValue() == CstVal)
		return true;
		}
		volkanUnsubmitted Done Reply Inline Actions No need to check other cases, you can return `true` directly and get rid of `Safe`. volkan: No need to check other cases, you can return `true` directly and get rid of `Safe`.
		// Match `x +nsw/nuw (y +nsw/nuw -Idx)` and `x +nsw/nuw (y +nsw/nuw x)`.
		if (OtherInstrA && OtherInstrA->getOpcode() == Instruction::Add &&
		checkNoWrapFlags(OtherInstrA, Signed) &&
		isa<ConstantInt>(OtherInstrA->getOperand(1))) {
		int64_t CstVal =
		cast<ConstantInt>(OtherInstrA->getOperand(1))->getSExtValue();
		if (OtherInstrA->getOperand(0) == OtherOperandB &&
		IdxDiff.getSExtValue() == -CstVal)
		return true;
		}
		// Match `x +nsw/nuw (y +nsw/nuw c)` and
		// `x +nsw/nuw (y +nsw/nuw (c + IdxDiff))`.
		if (OtherInstrA && OtherInstrB &&
		OtherInstrA->getOpcode() == Instruction::Add &&
		OtherInstrB->getOpcode() == Instruction::Add &&
		checkNoWrapFlags(OtherInstrA, Signed) &&
		checkNoWrapFlags(OtherInstrB, Signed) &&
		isa<ConstantInt>(OtherInstrA->getOperand(1)) &&
		isa<ConstantInt>(OtherInstrB->getOperand(1))) {
		int64_t CstValA =
		cast<ConstantInt>(OtherInstrA->getOperand(1))->getSExtValue();
		int64_t CstValB =
		cast<ConstantInt>(OtherInstrB->getOperand(1))->getSExtValue();
		if (OtherInstrA->getOperand(0) == OtherInstrB->getOperand(0) &&
		IdxDiff.getSExtValue() == (CstValB - CstValA))
		return true;
		}
		}
		return false;
		}

bool Vectorizer::lookThroughComplexAddresses(Value PtrA, Value PtrB,		bool Vectorizer::lookThroughComplexAddresses(Value PtrA, Value PtrB,
APInt PtrDelta,		APInt PtrDelta,
unsigned Depth) const {		unsigned Depth) const {
auto *GEPA = dyn_cast<GetElementPtrInst>(PtrA);		auto *GEPA = dyn_cast<GetElementPtrInst>(PtrA);
auto *GEPB = dyn_cast<GetElementPtrInst>(PtrB);		auto *GEPB = dyn_cast<GetElementPtrInst>(PtrB);
if (!GEPA \|\| !GEPB)		if (!GEPA \|\| !GEPB)
return lookThroughSelects(PtrA, PtrB, PtrDelta, Depth);		return lookThroughSelects(PtrA, PtrB, PtrDelta, Depth);

Show All 38 Lines	bool Vectorizer::lookThroughComplexAddresses(Value PtrA, Value PtrB,
// At this point A could be a function parameter, i.e. not an instruction		// At this point A could be a function parameter, i.e. not an instruction
Value *ValA = OpA->getOperand(0);		Value *ValA = OpA->getOperand(0);
OpB = dyn_cast<Instruction>(OpB->getOperand(0));		OpB = dyn_cast<Instruction>(OpB->getOperand(0));
if (!OpB \|\| ValA->getType() != OpB->getType())		if (!OpB \|\| ValA->getType() != OpB->getType())
return false;		return false;

// Now we need to prove that adding IdxDiff to ValA won't overflow.		// Now we need to prove that adding IdxDiff to ValA won't overflow.
bool Safe = false;		bool Safe = false;
auto CheckFlags = [](Instruction *I, bool Signed) {
BinaryOperator *BinOpI = cast<BinaryOperator>(I);
return (Signed && BinOpI->hasNoSignedWrap()) \|\|
(!Signed && BinOpI->hasNoUnsignedWrap());
};

// First attempt: if OpB is an add with NSW/NUW, and OpB is IdxDiff added to		// First attempt: if OpB is an add with NSW/NUW, and OpB is IdxDiff added to
// ValA, we're okay.		// ValA, we're okay.
if (OpB->getOpcode() == Instruction::Add &&		if (OpB->getOpcode() == Instruction::Add &&
isa<ConstantInt>(OpB->getOperand(1)) &&		isa<ConstantInt>(OpB->getOperand(1)) &&
IdxDiff.sle(cast<ConstantInt>(OpB->getOperand(1))->getSExtValue()) &&		IdxDiff.sle(cast<ConstantInt>(OpB->getOperand(1))->getSExtValue()) &&
CheckFlags(OpB, Signed))		checkNoWrapFlags(OpB, Signed))
Safe = true;		Safe = true;

// Second attempt: If both OpA and OpB is an add with NSW/NUW and with		// Second attempt: check if we have eligible add NSW/NUW instruction
// the same LHS operand, we can guarantee that the transformation is safe		// sequences.
		volkanUnsubmitted Done Reply Inline Actions Typo volkan: Typo
// if we can prove that OpA won't overflow when IdxDiff added to the RHS
// of OpA.
// For example:
// %tmp7 = add nsw i32 %tmp2, %v0
// %tmp8 = sext i32 %tmp7 to i64
// ...
// %tmp11 = add nsw i32 %v0, 1
// %tmp12 = add nsw i32 %tmp2, %tmp11
// %tmp13 = sext i32 %tmp12 to i64
//
// Both %tmp7 and %tmp2 has the nsw flag and the first operand
// is %tmp2. It's guaranteed that adding 1 to %tmp7 won't overflow
// because %tmp11 adds 1 to %v0 and both %tmp11 and %tmp12 has the
// nsw flag.
OpA = dyn_cast<Instruction>(ValA);		OpA = dyn_cast<Instruction>(ValA);
if (!Safe && OpA && OpA->getOpcode() == Instruction::Add &&		if (!Safe && OpA && OpA->getOpcode() == Instruction::Add &&
OpB->getOpcode() == Instruction::Add &&		OpB->getOpcode() == Instruction::Add && checkNoWrapFlags(OpA, Signed) &&
OpA->getOperand(0) == OpB->getOperand(0) && CheckFlags(OpA, Signed) &&		checkNoWrapFlags(OpB, Signed)) {
CheckFlags(OpB, Signed)) {		// In the checks below a matching operand in OpA and OpB is
Value *RHSA = OpA->getOperand(1);		// an operand which is the same in those two instructions.
Value *RHSB = OpB->getOperand(1);		// Below we account for possible orders of the operands of
Instruction *OpRHSA = dyn_cast<Instruction>(RHSA);		// these add instructions.
Instruction *OpRHSB = dyn_cast<Instruction>(RHSB);		for (unsigned MatchingOpIdxA : {0, 1})
// Match `x +nsw/nuw y` and `x +nsw/nuw (y +nsw/nuw IdxDiff)`.		for (unsigned MatchingOpIdxB : {0, 1})
if (OpRHSB && OpRHSB->getOpcode() == Instruction::Add &&		if (!Safe)
CheckFlags(OpRHSB, Signed) && isa<ConstantInt>(OpRHSB->getOperand(1))) {		Safe = checkIfSafeAddSequence(IdxDiff, OpA, MatchingOpIdxA, OpB,
int64_t CstVal = cast<ConstantInt>(OpRHSB->getOperand(1))->getSExtValue();		MatchingOpIdxB, Signed);
if (OpRHSB->getOperand(0) == RHSA && IdxDiff.getSExtValue() == CstVal)
Safe = true;
}
// Match `x +nsw/nuw (y +nsw/nuw -Idx)` and `x +nsw/nuw (y +nsw/nuw x)`.
if (OpRHSA && OpRHSA->getOpcode() == Instruction::Add &&
CheckFlags(OpRHSA, Signed) && isa<ConstantInt>(OpRHSA->getOperand(1))) {
int64_t CstVal = cast<ConstantInt>(OpRHSA->getOperand(1))->getSExtValue();
if (OpRHSA->getOperand(0) == RHSB && IdxDiff.getSExtValue() == -CstVal)
Safe = true;
}
// Match `x +nsw/nuw (y +nsw/nuw c)` and
// `x +nsw/nuw (y +nsw/nuw (c + IdxDiff))`.
if (OpRHSA && OpRHSB && OpRHSA->getOpcode() == Instruction::Add &&
OpRHSB->getOpcode() == Instruction::Add && CheckFlags(OpRHSA, Signed) &&
CheckFlags(OpRHSB, Signed) && isa<ConstantInt>(OpRHSA->getOperand(1)) &&
isa<ConstantInt>(OpRHSB->getOperand(1))) {
int64_t CstValA =
cast<ConstantInt>(OpRHSA->getOperand(1))->getSExtValue();
int64_t CstValB =
cast<ConstantInt>(OpRHSB->getOperand(1))->getSExtValue();
if (OpRHSA->getOperand(0) == OpRHSB->getOperand(0) &&
IdxDiff.getSExtValue() == (CstValB - CstValA))
Safe = true;
}
}		}

unsigned BitWidth = ValA->getType()->getScalarSizeInBits();		unsigned BitWidth = ValA->getType()->getScalarSizeInBits();

// Third attempt:		// Third attempt:
// If all set bits of IdxDiff or any higher order bit other than the sign bit		// If all set bits of IdxDiff or any higher order bit other than the sign bit
// are known to be zero in ValA, we can add Diff to it while guaranteeing no		// are known to be zero in ValA, we can add Diff to it while guaranteeing no
// overflow of any sort.		// overflow of any sort.
▲ Show 20 Lines • Show All 808 Lines • Show Last 20 Lines

llvm/test/Transforms/LoadStoreVectorizer/X86/vectorize-i8-nested-add.ll

Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	bb:
%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0		%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0
%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1		%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1
%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2		%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2
%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3		%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3
store <4 x i8> %tmp22, <4 x i8>* %dst		store <4 x i8> %tmp22, <4 x i8>* %dst
ret void		ret void
}		}

		; Apply different operand orders for the nested add sequences
		define void @ld_v4i8_add_nsw_operand_orders(i32 %v0, i32 %v1, i8* %src, <4 x i8>* %dst) {
		; CHECK-LABEL: @ld_v4i8_add_nsw_operand_orders(
		; CHECK-NEXT: bb:
		; CHECK-NEXT: [[TMP:%.]] = add nsw i32 [[V0:%.]], -1
		; CHECK-NEXT: [[TMP1:%.]] = add nsw i32 [[V1:%.]], [[TMP]]
		; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
		; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 [[TMP2]]
		; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[TMP3]] to <4 x i8>*
		; CHECK-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> [[TMP0]], align 1
		; CHECK-NEXT: [[TMP41:%.*]] = extractelement <4 x i8> [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP82:%.*]] = extractelement <4 x i8> [[TMP1]], i32 1
		; CHECK-NEXT: [[TMP133:%.*]] = extractelement <4 x i8> [[TMP1]], i32 2
		; CHECK-NEXT: [[TMP184:%.*]] = extractelement <4 x i8> [[TMP1]], i32 3
		; CHECK-NEXT: [[TMP19:%.*]] = insertelement <4 x i8> undef, i8 [[TMP41]], i32 0
		; CHECK-NEXT: [[TMP20:%.*]] = insertelement <4 x i8> [[TMP19]], i8 [[TMP82]], i32 1
		; CHECK-NEXT: [[TMP21:%.*]] = insertelement <4 x i8> [[TMP20]], i8 [[TMP133]], i32 2
		; CHECK-NEXT: [[TMP22:%.*]] = insertelement <4 x i8> [[TMP21]], i8 [[TMP184]], i32 3
		; CHECK-NEXT: store <4 x i8> [[TMP22]], <4 x i8>* [[DST:%.*]]
		; CHECK-NEXT: ret void
		;
		bb:
		%tmp = add nsw i32 %v0, -1
		%tmp1 = add nsw i32 %v1, %tmp
		%tmp2 = sext i32 %tmp1 to i64
		%tmp3 = getelementptr inbounds i8, i8* %src, i64 %tmp2
		%tmp4 = load i8, i8* %tmp3, align 1
		%tmp5 = add nsw i32 %v0, %v1
		%tmp6 = sext i32 %tmp5 to i64
		%tmp7 = getelementptr inbounds i8, i8* %src, i64 %tmp6
		%tmp8 = load i8, i8* %tmp7, align 1
		%tmp9 = add nsw i32 %v0, 1
		%tmp10 = add nsw i32 %tmp9, %v1
		%tmp11 = sext i32 %tmp10 to i64
		%tmp12 = getelementptr inbounds i8, i8* %src, i64 %tmp11
		%tmp13 = load i8, i8* %tmp12, align 1
		%tmp14 = add nsw i32 %v0, 2
		%tmp15 = add nsw i32 %v1, %tmp14
		%tmp16 = sext i32 %tmp15 to i64
		%tmp17 = getelementptr inbounds i8, i8* %src, i64 %tmp16
		%tmp18 = load i8, i8* %tmp17, align 1
		%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0
		%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1
		%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2
		%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3
		store <4 x i8> %tmp22, <4 x i8>* %dst
		ret void
		}

		; Apply different operand orders for the nested add sequences
		define void @ld_v4i8_add_nuw_operand_orders(i32 %v0, i32 %v1, i8* %src, <4 x i8>* %dst) {
		; CHECK-LABEL: @ld_v4i8_add_nuw_operand_orders(
		; CHECK-NEXT: bb:
		; CHECK-NEXT: [[TMP:%.]] = add nuw i32 [[V0:%.]], -1
		; CHECK-NEXT: [[TMP1:%.]] = add nuw i32 [[V1:%.]], [[TMP]]
		; CHECK-NEXT: [[TMP2:%.*]] = zext i32 [[TMP1]] to i64
		; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 [[TMP2]]
		; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[TMP3]] to <4 x i8>*
		; CHECK-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> [[TMP0]], align 1
		; CHECK-NEXT: [[TMP41:%.*]] = extractelement <4 x i8> [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP82:%.*]] = extractelement <4 x i8> [[TMP1]], i32 1
		; CHECK-NEXT: [[TMP133:%.*]] = extractelement <4 x i8> [[TMP1]], i32 2
		; CHECK-NEXT: [[TMP184:%.*]] = extractelement <4 x i8> [[TMP1]], i32 3
		; CHECK-NEXT: [[TMP19:%.*]] = insertelement <4 x i8> undef, i8 [[TMP41]], i32 0
		; CHECK-NEXT: [[TMP20:%.*]] = insertelement <4 x i8> [[TMP19]], i8 [[TMP82]], i32 1
		; CHECK-NEXT: [[TMP21:%.*]] = insertelement <4 x i8> [[TMP20]], i8 [[TMP133]], i32 2
		; CHECK-NEXT: [[TMP22:%.*]] = insertelement <4 x i8> [[TMP21]], i8 [[TMP184]], i32 3
		; CHECK-NEXT: store <4 x i8> [[TMP22]], <4 x i8>* [[DST:%.*]]
		; CHECK-NEXT: ret void
		;
		bb:
		%tmp = add nuw i32 %v0, -1
		%tmp1 = add nuw i32 %v1, %tmp
		%tmp2 = zext i32 %tmp1 to i64
		%tmp3 = getelementptr inbounds i8, i8* %src, i64 %tmp2
		%tmp4 = load i8, i8* %tmp3, align 1
		%tmp5 = add nuw i32 %v0, %v1
		%tmp6 = zext i32 %tmp5 to i64
		%tmp7 = getelementptr inbounds i8, i8* %src, i64 %tmp6
		%tmp8 = load i8, i8* %tmp7, align 1
		%tmp9 = add nuw i32 %v0, 1
		%tmp10 = add nuw i32 %tmp9, %v1
		%tmp11 = zext i32 %tmp10 to i64
		%tmp12 = getelementptr inbounds i8, i8* %src, i64 %tmp11
		%tmp13 = load i8, i8* %tmp12, align 1
		%tmp14 = add nuw i32 %v0, 2
		%tmp15 = add nuw i32 %v1, %tmp14
		%tmp16 = zext i32 %tmp15 to i64
		%tmp17 = getelementptr inbounds i8, i8* %src, i64 %tmp16
		%tmp18 = load i8, i8* %tmp17, align 1
		%tmp19 = insertelement <4 x i8> undef, i8 %tmp4, i32 0
		%tmp20 = insertelement <4 x i8> %tmp19, i8 %tmp8, i32 1
		%tmp21 = insertelement <4 x i8> %tmp20, i8 %tmp13, i32 2
		%tmp22 = insertelement <4 x i8> %tmp21, i8 %tmp18, i32 3
		store <4 x i8> %tmp22, <4 x i8>* %dst
		ret void
		}

define void @ld_v4i8_add_known_bits(i32 %ind0, i32 %ind1, i8* %src, <4 x i8>* %dst) {		define void @ld_v4i8_add_known_bits(i32 %ind0, i32 %ind1, i8* %src, <4 x i8>* %dst) {
; CHECK-LABEL: @ld_v4i8_add_known_bits(		; CHECK-LABEL: @ld_v4i8_add_known_bits(
; CHECK-NEXT: bb:		; CHECK-NEXT: bb:
; CHECK-NEXT: [[V0:%.]] = mul i32 [[IND0:%.]], 4		; CHECK-NEXT: [[V0:%.]] = mul i32 [[IND0:%.]], 4
; CHECK-NEXT: [[V1:%.]] = mul i32 [[IND1:%.]], 4		; CHECK-NEXT: [[V1:%.]] = mul i32 [[IND1:%.]], 4
; CHECK-NEXT: [[TMP:%.*]] = add i32 [[V0]], -1		; CHECK-NEXT: [[TMP:%.*]] = add i32 [[V0]], -1
; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[V1]], [[TMP]]		; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[V1]], [[TMP]]
; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64		; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
▲ Show 20 Lines • Show All 530 Lines • Show Last 20 Lines