Download Raw Diff

Details

Reviewers

rengolin
• HaoLiu
Ayal
hfinkel
sbaranga

Commits

rG622b95be7b0b: [LV] Reallow positive-stride interleaved load groups with gaps
rL267751: [LV] Reallow positive-stride interleaved load groups with gaps

Summary

We previously disallowed interleaved load groups that may cause us to speculatively access memory out-of-bounds (D17332). We did this by ensuring each load group had an access corresponding to the first and last member. Instead of bailing out for these interleaved groups, this patch enables us to peel off the last vector iteration, ensuring that we execute at least one iteration of the scalar remainder loop. This solution was proposed in the review of the previous patch.

Diff Detail

Repository: rL LLVM

Event Timeline

mssimpso updated this revision to Diff 54876.Apr 25 2016, 10:27 AM

mssimpso retitled this revision from to [LV] Reallow interleaved load groups with gaps.

mssimpso updated this object.

mssimpso added reviewers: sbaranga, rengolin, hfinkel, • HaoLiu, Ayal.

mssimpso added subscribers: llvm-commits, mcrosier.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptApr 25 2016, 10:27 AM

Thanks for working on this! I have a single comment so far (see inline).

-Silviu

lib/Transforms/Vectorize/LoopVectorize.cpp
2896 ↗	(On Diff #54876)	This doesn't look right at first sight. Let's say we have 0 iterations in total (TC = 0), and Step is 4. If I'm not mistaken, this would underflow the VectorTripCount and give a result of MAX_UINT - 3? Also, if the result of the URem is 0 you don't need to peel the loop (although this wouldn't be that important and perhaps not even profitable to account for).

mssimpso added inline comments.Apr 25 2016, 12:48 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
2896 ↗	(On Diff #54876)	Ah, I think you're right! Thanks for catching the underflow. For the other comment, I think you mean if the result of the URem is not 0, we don't need to peel. This is right, and it's something I thought about. I agree that it's probably not that important to account for, but it's definitely something we can consider. The trade-off would be something like 1 select instruction in the preheader vs. VF * UF scalar iterations. Thanks very much for the quick feedback! I'll upload a new version soon.

Addressed Silviu's comments.

Silviu, I think this revision addresses both of your comments about the peeling. I now only peel if TC % Step is zero. If the mod is zero, we know that TC >= Step, so the subtraction should not overflow. And we also know that there wouldn't already be scalar iterations due to Step not evenly dividing TC. Thanks for the feedback!

I might be missing something obvious, but why is TC >= Step if TC % Step is 0?

I think we also need to check that TC is not 0.

lib/Transforms/Vectorize/LoopVectorize.cpp
2896 ↗	(On Diff #54909)	You're right, that should have been not 0, thanks! The comment above needs to be updated.

I might be missing something obvious, but why is TC >= Step if TC % Step is 0?

Sorry, I didn't mean for that to read as an implication. The minimum iterations check is supposed to ensure that TC >= Step. We can add the check for TC is not zero, but I think instcombine will probably remove it. What do you think?

In D19487#412065, @mssimpso wrote:

I might be missing something obvious, but why is TC >= Step if TC % Step is 0?

Sorry, I didn't mean for that to read as an implication. The minimum iterations check is supposed to ensure that TC >= Step. We can add the check for TC is not zero, but I think instcombine will probably remove it. What do you think?

Do you mean the emitMinimumIterationCountCheck? Looks like the minimum check is using whatever getOrCreateVectorTripCount returns, so it probably won't work as expected. I think checking for the zero case here would be ok.

In D19487#412076, @sbaranga wrote:

In D19487#412065, @mssimpso wrote:

I might be missing something obvious, but why is TC >= Step if TC % Step is 0?

Sorry, I didn't mean for that to read as an implication. The minimum iterations check is supposed to ensure that TC >= Step. We can add the check for TC is not zero, but I think instcombine will probably remove it. What do you think?

Do you mean the emitMinimumIterationCountCheck? Looks like the minimum check is using whatever getOrCreateVectorTripCount returns, so it probably won't work as expected. I think checking for the zero case here would be ok.

Sorry, I misread that. It's using the actual trip count, so this should be ok.

Updated comments about the vector trip count calculation.

LGTM!

This revision is now accepted and ready to land.Apr 26 2016, 8:09 AM

Thanks very much for the review, Silviu!

sbaranga requested changes to this revision.Apr 26 2016, 10:04 AM

sbaranga edited edge metadata.

sbaranga added inline comments.

lib/Transforms/Vectorize/LoopVectorize.cpp
5142 ↗	(On Diff #55004)	Sorry to backtrack on this, but it appears that you also require to have a positive stride for the group in order for this to work. Otherwise, you might get an out of bounds access in the first iteration of the vector loop.

This revision now requires changes to proceed.Apr 26 2016, 10:04 AM

Addressed Silviu's comments.

Thanks for catching the negative stride. I now check that a group is not reversed (stride is negative) before allowing it. I also added a reversed test case and check that we do not generate vector loads. For negative strides, we will need a scalar prologue iteration rather than an epilogue, but I'd rather tackle that in a follow-on patch. In the meantime, I've refactored the current patch to distinguish the epilogue and prologue cases. Thanks!

In D19487#412582, @mssimpso wrote:

Addressed Silviu's comments.

Thanks for catching the negative stride. I now check that a group is not reversed (stride is negative) before allowing it. I also added a reversed test case and check that we do not generate vector loads. For negative strides, we will need a scalar prologue iteration rather than an epilogue, but I'd rather tackle that in a follow-on patch. In the meantime, I've refactored the current patch to distinguish the epilogue and prologue cases. Thanks!

Thanks! LGTM now.

FWIW, for the reverse group we don't need a scalar prologue. It would be enough to "shift right" the interleaved access group such that we have a load at the last position in the group. I've added a comment in the test case.

test/Transforms/LoopVectorize/interleaved-accesses.ll
376 ↗	(On Diff #55058)	This would work if the wide load started at &A[i].x: it would solve the OOB in the initial iteration problem and the scalar iterations at the end would solve the OOB at the final iteration of the vector loop.

This revision is now accepted and ready to land.Apr 27 2016, 3:19 AM

FWIW, for the reverse group we don't need a scalar prologue. It would be enough to "shift right" the interleaved access group such that we have a load at the last position in the group.

Good point. Yes, that should work, and it would allow us to avoid inserting a new loop. Thanks!

Closed by commit rL267751: [LV] Reallow positive-stride interleaved load groups with gaps (authored by mssimpso). · Explain WhyApr 27 2016, 11:27 AM

This revision was automatically updated to reflect the committed changes.

Ayal mentioned this in D67510: [LV] Support gaps, overlaps, and inexact sizes in speculation logic.Sep 14 2019, 9:53 AM

Ayal mentioned this in D103700: [LV] Fix bug when unrolling (only) a loop with non-latch exit.Jun 6 2021, 11:55 AM

Diff 55267

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

Show First 20 Lines • Show All 827 Lines • ▼ Show 20 Lines
/// on interleaved accesses is unsafe.		/// on interleaved accesses is unsafe.
///		///
/// The analysis collects interleave groups and records the relationships		/// The analysis collects interleave groups and records the relationships
/// between the member and the group in a map.		/// between the member and the group in a map.
class InterleavedAccessInfo {		class InterleavedAccessInfo {
public:		public:
InterleavedAccessInfo(PredicatedScalarEvolution &PSE, Loop *L,		InterleavedAccessInfo(PredicatedScalarEvolution &PSE, Loop *L,
DominatorTree *DT)		DominatorTree *DT)
: PSE(PSE), TheLoop(L), DT(DT) {}		: PSE(PSE), TheLoop(L), DT(DT), RequiresScalarEpilogue(false) {}

~InterleavedAccessInfo() {		~InterleavedAccessInfo() {
SmallSet<InterleaveGroup *, 4> DelSet;		SmallSet<InterleaveGroup *, 4> DelSet;
// Avoid releasing a pointer twice.		// Avoid releasing a pointer twice.
for (auto &I : InterleaveGroupMap)		for (auto &I : InterleaveGroupMap)
DelSet.insert(I.second);		DelSet.insert(I.second);
for (auto *Ptr : DelSet)		for (auto *Ptr : DelSet)
delete Ptr;		delete Ptr;
Show All 12 Lines	public:
///		///
/// \returns nullptr if doesn't have such group.		/// \returns nullptr if doesn't have such group.
InterleaveGroup getInterleaveGroup(Instruction Instr) const {		InterleaveGroup getInterleaveGroup(Instruction Instr) const {
if (InterleaveGroupMap.count(Instr))		if (InterleaveGroupMap.count(Instr))
return InterleaveGroupMap.find(Instr)->second;		return InterleaveGroupMap.find(Instr)->second;
return nullptr;		return nullptr;
}		}

		/// \brief Returns true if an interleaved group that may access memory
		/// out-of-bounds requires a scalar epilogue iteration for correctness.
		bool requiresScalarEpilogue() const { return RequiresScalarEpilogue; }

private:		private:
/// A wrapper around ScalarEvolution, used to add runtime SCEV checks.		/// A wrapper around ScalarEvolution, used to add runtime SCEV checks.
/// Simplifies SCEV expressions in the context of existing SCEV assumptions.		/// Simplifies SCEV expressions in the context of existing SCEV assumptions.
/// The interleaved access analysis can also add new predicates (for example		/// The interleaved access analysis can also add new predicates (for example
/// by versioning strides of pointers).		/// by versioning strides of pointers).
PredicatedScalarEvolution &PSE;		PredicatedScalarEvolution &PSE;
Loop *TheLoop;		Loop *TheLoop;
DominatorTree *DT;		DominatorTree *DT;

		/// True if the loop may contain non-reversed interleaved groups with
		/// out-of-bounds accesses. We ensure we don't speculatively access memory
		/// out-of-bounds by executing at least one scalar epilogue iteration.
		bool RequiresScalarEpilogue;

/// Holds the relationships between the members and the interleave group.		/// Holds the relationships between the members and the interleave group.
DenseMap<Instruction , InterleaveGroup > InterleaveGroupMap;		DenseMap<Instruction , InterleaveGroup > InterleaveGroupMap;

/// \brief The descriptor for a strided memory access.		/// \brief The descriptor for a strided memory access.
struct StrideDescriptor {		struct StrideDescriptor {
StrideDescriptor(int Stride, const SCEV *Scev, unsigned Size,		StrideDescriptor(int Stride, const SCEV *Scev, unsigned Size,
unsigned Align)		unsigned Align)
: Stride(Stride), Scev(Scev), Size(Size), Align(Align) {}		: Stride(Stride), Scev(Scev), Size(Size), Align(Align) {}
▲ Show 20 Lines • Show All 449 Lines • ▼ Show 20 Lines	bool isAccessInterleaved(Instruction *Instr) {
return InterleaveInfo.isInterleaved(Instr);		return InterleaveInfo.isInterleaved(Instr);
}		}

/// \brief Get the interleaved access group that \p Instr belongs to.		/// \brief Get the interleaved access group that \p Instr belongs to.
const InterleaveGroup getInterleavedAccessGroup(Instruction Instr) {		const InterleaveGroup getInterleavedAccessGroup(Instruction Instr) {
return InterleaveInfo.getInterleaveGroup(Instr);		return InterleaveInfo.getInterleaveGroup(Instr);
}		}

		/// \brief Returns true if an interleaved group requires a scalar iteration
		/// to handle accesses with gaps.
		bool requiresScalarEpilogue() const {
		return InterleaveInfo.requiresScalarEpilogue();
		}

unsigned getMaxSafeDepDistBytes() { return LAI->getMaxSafeDepDistBytes(); }		unsigned getMaxSafeDepDistBytes() { return LAI->getMaxSafeDepDistBytes(); }

bool hasStride(Value *V) { return StrideSet.count(V); }		bool hasStride(Value *V) { return StrideSet.count(V); }
bool mustCheckStrides() { return !StrideSet.empty(); }		bool mustCheckStrides() { return !StrideSet.empty(); }
SmallPtrSet<Value *, 8>::iterator strides_begin() {		SmallPtrSet<Value *, 8>::iterator strides_begin() {
return StrideSet.begin();		return StrideSet.begin();
}		}
SmallPtrSet<Value *, 8>::iterator strides_end() { return StrideSet.end(); }		SmallPtrSet<Value *, 8>::iterator strides_end() { return StrideSet.end(); }
▲ Show 20 Lines • Show All 1,515 Lines • ▼ Show 20 Lines

Value InnerLoopVectorizer::getOrCreateVectorTripCount(Loop L) {		Value InnerLoopVectorizer::getOrCreateVectorTripCount(Loop L) {
if (VectorTripCount)		if (VectorTripCount)
return VectorTripCount;		return VectorTripCount;

Value *TC = getOrCreateTripCount(L);		Value *TC = getOrCreateTripCount(L);
IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());		IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());

// Now we need to generate the expression for N - (N % VF), which is		// Now we need to generate the expression for the part of the loop that the
// the part that the vectorized body will execute.		// vectorized body will execute. This is equal to N - (N % Step) if scalar
// The loop step is equal to the vectorization factor (num of SIMD elements)		// iterations are not required for correctness, or N - Step, otherwise. Step
// times the unroll factor (num of SIMD instructions).		// is equal to the vectorization factor (number of SIMD elements) times the
		// unroll factor (number of SIMD instructions).
Constant Step = ConstantInt::get(TC->getType(), VF UF);		Constant Step = ConstantInt::get(TC->getType(), VF UF);
Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");		Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");

		// If there is a non-reversed interleaved group that may speculatively access
		// memory out-of-bounds, we need to ensure that there will be at least one
		// iteration of the scalar epilogue loop. Thus, if the step evenly divides
		// the trip count, we set the remainder to be equal to the step. If the step
		// does not evenly divide the trip count, no adjustment is necessary since
		// there will already be scalar iterations. Note that the minimum iterations
		// check ensures that N >= Step.
		if (VF > 1 && Legal->requiresScalarEpilogue()) {
		auto *IsZero = Builder.CreateICmpEQ(R, ConstantInt::get(R->getType(), 0));
		R = Builder.CreateSelect(IsZero, Step, R);
		}

VectorTripCount = Builder.CreateSub(TC, R, "n.vec");		VectorTripCount = Builder.CreateSub(TC, R, "n.vec");

return VectorTripCount;		return VectorTripCount;
}		}

void InnerLoopVectorizer::emitMinimumIterationCountCheck(Loop *L,		void InnerLoopVectorizer::emitMinimumIterationCountCheck(Loop *L,
BasicBlock *Bypass) {		BasicBlock *Bypass) {
Value *Count = getOrCreateTripCount(L);		Value *Count = getOrCreateTripCount(L);
▲ Show 20 Lines • Show All 2,215 Lines • ▼ Show 20 Lines	for (auto I = StrideAccesses.rbegin(), E = StrideAccesses.rend(); I != E;
} // Iteration on instruction B		} // Iteration on instruction B
} // Iteration on instruction A		} // Iteration on instruction A

// Remove interleaved store groups with gaps.		// Remove interleaved store groups with gaps.
for (InterleaveGroup *Group : StoreGroups)		for (InterleaveGroup *Group : StoreGroups)
if (Group->getNumMembers() != Group->getFactor())		if (Group->getNumMembers() != Group->getFactor())
releaseGroup(Group);		releaseGroup(Group);

// Remove interleaved load groups that don't have the first and last member.		// If there is a non-reversed interleaved load group with gaps, we will need
// This guarantees that we won't do speculative out of bounds loads.		// to execute at least one scalar epilogue iteration. This will ensure that
		// we don't speculatively access memory out-of-bounds. Note that we only need
		// to look for a member at index factor - 1, since every group must have a
		// member at index zero.
for (InterleaveGroup *Group : LoadGroups)		for (InterleaveGroup *Group : LoadGroups)
if (!Group->getMember(0) \|\| !Group->getMember(Group->getFactor() - 1))		if (!Group->getMember(Group->getFactor() - 1)) {
		if (Group->isReverse()) {
releaseGroup(Group);		releaseGroup(Group);
		} else {
		DEBUG(dbgs() << "LV: Interleaved group requires epilogue iteration.\n");
		RequiresScalarEpilogue = true;
		}
		}
}		}

LoopVectorizationCostModel::VectorizationFactor		LoopVectorizationCostModel::VectorizationFactor
LoopVectorizationCostModel::selectVectorizationFactor(bool OptForSize) {		LoopVectorizationCostModel::selectVectorizationFactor(bool OptForSize) {
// Width 1 means no vectorize		// Width 1 means no vectorize
VectorizationFactor Factor = { 1U, 0U };		VectorizationFactor Factor = { 1U, 0U };
if (OptForSize && Legal->getRuntimePointerChecking()->Need) {		if (OptForSize && Legal->getRuntimePointerChecking()->Need) {
emitAnalysis(VectorizationReport() <<		emitAnalysis(VectorizationReport() <<
▲ Show 20 Lines • Show All 1,091 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/interleaved-accesses.ll

Show First 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	for.body: ; preds = %for.body, %entry
%y8 = getelementptr inbounds %struct.ST2, %struct.ST2* %B, i64 %indvars.iv, i32 1		%y8 = getelementptr inbounds %struct.ST2, %struct.ST2* %B, i64 %indvars.iv, i32 1
store i32 %sub, i32* %y8, align 4		store i32 %sub, i32* %y8, align 4
%indvars.iv.next = add nsw i64 %indvars.iv, -1		%indvars.iv.next = add nsw i64 %indvars.iv, -1
%cmp = icmp sgt i64 %indvars.iv, 0		%cmp = icmp sgt i64 %indvars.iv, 0
br i1 %cmp, label %for.body, label %for.cond.cleanup		br i1 %cmp, label %for.body, label %for.cond.cleanup
}		}

; Check vectorization on an interleaved load group of factor 2 with 1 gap		; Check vectorization on an interleaved load group of factor 2 with 1 gap
; (missing the load of odd elements).		; (missing the load of odd elements). Because the vectorized loop would
		; speculatively access memory out-of-bounds, we must execute at least one
		; iteration of the scalar loop.

; void even_load(int A, int B) {		; void even_load_static_tc(int A, int B) {
; for (unsigned i = 0; i < 1024; i+=2)		; for (unsigned i = 0; i < 1024; i+=2)
; B[i/2] = A[i] * 2;		; B[i/2] = A[i] * 2;
; }		; }

; CHECK-LABEL: @even_load(		; CHECK-LABEL: @even_load_static_tc(
; CHECK-NOT: %wide.vec = load <8 x i32>, <8 x i32>* %{{.*}}, align 4		; CHECK: vector.body:
; CHECK-NOT: %strided.vec = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>		; CHECK: %wide.vec = load <8 x i32>, <8 x i32>* %{{.*}}, align 4
		; CHECK: %strided.vec = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
		; CHECK: icmp eq i64 %index.next, 508
		; CHECK: middle.block:
		; CHECK: br i1 false, label %for.cond.cleanup, label %scalar.ph

define void @even_load(i32* noalias nocapture readonly %A, i32* noalias nocapture %B) {		define void @even_load_static_tc(i32* noalias nocapture readonly %A, i32* noalias nocapture %B) {
entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup: ; preds = %for.body		for.cond.cleanup: ; preds = %for.body
ret void		ret void

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv		%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
%tmp = load i32, i32* %arrayidx, align 4		%tmp = load i32, i32* %arrayidx, align 4
%mul = shl nsw i32 %tmp, 1		%mul = shl nsw i32 %tmp, 1
%tmp1 = lshr exact i64 %indvars.iv, 1		%tmp1 = lshr exact i64 %indvars.iv, 1
%arrayidx2 = getelementptr inbounds i32, i32* %B, i64 %tmp1		%arrayidx2 = getelementptr inbounds i32, i32* %B, i64 %tmp1
store i32 %mul, i32* %arrayidx2, align 4		store i32 %mul, i32* %arrayidx2, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
%cmp = icmp ult i64 %indvars.iv.next, 1024		%cmp = icmp ult i64 %indvars.iv.next, 1024
br i1 %cmp, label %for.body, label %for.cond.cleanup		br i1 %cmp, label %for.body, label %for.cond.cleanup
}		}

		; Check vectorization on an interleaved load group of factor 2 with 1 gap
		; (missing the load of odd elements). Because the vectorized loop would
		; speculatively access memory out-of-bounds, we must execute at least one
		; iteration of the scalar loop.

		; void even_load_dynamic_tc(int A, int B, unsigned N) {
		; for (unsigned i = 0; i < N; i+=2)
		; B[i/2] = A[i] * 2;
		; }

		; CHECK-LABEL: @even_load_dynamic_tc(
		; CHECK: min.iters.checked:
		; CHECK: %n.mod.vf = and i64 %[[N:[a-zA-Z0-9]+]], 3
		; CHECK: %[[IsZero:[a-zA-Z0-9]+]] = icmp eq i64 %n.mod.vf, 0
		; CHECK: %[[R:[a-zA-Z0-9]+]] = select i1 %[[IsZero]], i64 4, i64 %n.mod.vf
		; CHECK: %n.vec = sub i64 %[[N]], %[[R]]
		; CHECK: vector.body:
		; CHECK: %wide.vec = load <8 x i32>, <8 x i32>* %{{.*}}, align 4
		; CHECK: %strided.vec = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
		; CHECK: icmp eq i64 %index.next, %n.vec
		; CHECK: middle.block:
		; CHECK: br i1 false, label %for.cond.cleanup, label %scalar.ph

		define void @even_load_dynamic_tc(i32* noalias nocapture readonly %A, i32* noalias nocapture %B, i64 %N) {
		entry:
		br label %for.body

		for.cond.cleanup: ; preds = %for.body
		ret void

		for.body: ; preds = %for.body, %entry
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
		%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
		%tmp = load i32, i32* %arrayidx, align 4
		%mul = shl nsw i32 %tmp, 1
		%tmp1 = lshr exact i64 %indvars.iv, 1
		%arrayidx2 = getelementptr inbounds i32, i32* %B, i64 %tmp1
		store i32 %mul, i32* %arrayidx2, align 4
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
		%cmp = icmp ult i64 %indvars.iv.next, %N
		br i1 %cmp, label %for.body, label %for.cond.cleanup
		}

		; Check vectorization on a reverse interleaved load group of factor 2 with 1
		; gap and a reverse interleaved store group of factor 2. The interleaved load
		; group should be removed since it has a gap and is reverse.

		; struct pair {
		; int x;
		; int y;
		; };
		;
		; void load_gap_reverse(struct pair P1, struct pair P2, int X) {
		; for (int i = 1023; i >= 0; i--) {
		; int a = X + i;
		; int b = A[i].y - i;
		; B[i].x = a;
		; B[i].y = b;
		; }
		; }

		; CHECK-LABEL: @load_gap_reverse(
		; CHECK-NOT: %wide.vec = load <8 x i64>, <8 x i64>* %{{.*}}, align 8
		; CHECK-NOT: %strided.vec = shufflevector <8 x i64> %wide.vec, <8 x i64> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>

		%pair = type { i64, i64 }
		define void @load_gap_reverse(%pair* noalias nocapture readonly %P1, %pair* noalias nocapture readonly %P2, i64 %X) {
		entry:
		br label %for.body

		for.body:
		%i = phi i64 [ 1023, %entry ], [ %i.next, %for.body ]
		%0 = add nsw i64 %X, %i
		%1 = getelementptr inbounds %pair, %pair* %P1, i64 %i, i32 0
		%2 = getelementptr inbounds %pair, %pair* %P2, i64 %i, i32 1
		%3 = load i64, i64* %2, align 8
		%4 = sub nsw i64 %3, %i
		store i64 %0, i64* %1, align 8
		store i64 %4, i64* %2, align 8
		%i.next = add nsw i64 %i, -1
		%cond = icmp sgt i64 %i, 0
		br i1 %cond, label %for.body, label %for.exit

		for.exit:
		ret void
		}

; Check vectorization on interleaved access groups identified from mixed		; Check vectorization on interleaved access groups identified from mixed
; loads/stores.		; loads/stores.
; void mixed_load2_store2(int A, int B) {		; void mixed_load2_store2(int A, int B) {
; for (unsigned i = 0; i < 1024; i+=2) {		; for (unsigned i = 0; i < 1024; i+=2) {
; B[i] = A[i] * A[i+1];		; B[i] = A[i] * A[i+1];
; B[i+1] = A[i] + A[i+1];		; B[i+1] = A[i] + A[i+1];
; }		; }
; }		; }
▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Reallow interleaved load groups with gaps
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 55267

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/trunk/test/Transforms/LoopVectorize/interleaved-accesses.ll

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Reallow interleaved load groups with gapsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 55267

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/trunk/test/Transforms/LoopVectorize/interleaved-accesses.ll

[LV] Reallow interleaved load groups with gaps
ClosedPublic