This is an archive of the discontinued LLVM Phabricator instance.

[LV] Vectorize GEPs
ClosedPublic

Authored by mssimpso on Mar 7 2017, 11:21 AM.

Download Raw Diff

Details

Reviewers

delena
mkuper

Commits

rG4e7b71bc86e2: [LV] Vectorize GEPs
rL298620: [LV] Vectorize GEPs

Summary

This patch adds support for vectorizing GEPs. Previously, we only generated vector GEPs on-demand when creating gather or scatter operations. All GEPs from the original loop were scalarized by default, and if a pointer was to be stored to memory, we would have to build up the pointer vector with insertelement instructions.

With this patch, we will vectorize all GEPs that haven't already been marked for scalarization.

The patch refines collectLoopScalars to more exactly identify the scalar GEPs. The function now more closely resembles collectLoopUniforms. And the patch moves vector GEP creation out of vectorizeMemoryInstruction and into the main vectorization loop. The vector GEPs needed for gather and scatter operations will have already been generated before vectoring the memory accesses.

I think this patch makes sense on its own, but it's primarily motivated by a follow-on, which merges pointer induction variable widening with the rest of the induction widening code. The follow-on creates vector-of-pointer induction variables, and we need to be able to properly consume them.

Diff Detail

Repository: rL LLVM

Event Timeline

mssimpso created this revision.Mar 7 2017, 11:21 AM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptMar 7 2017, 11:21 AM

mssimpso added a parent revision: D30587: [LV] Delete unneeded scalar GEP creation code.Mar 7 2017, 11:22 AM

delena added inline comments.Mar 7 2017, 12:23 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
4710 ↗	(On Diff #90896)	GEP is uniform when the memory instruction (User) is uniform, right? Why do you need to broadcast it?

mssimpso added inline comments.Mar 7 2017, 12:45 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
4710 ↗	(On Diff #90896)	This is "loop-invariant" in the LoopAccessInfo::isUniform sense not in the LoopVectorizationCostModel::isUniformAfterVectorization sense (we really should come up with some better names for these concepts). Sometimes we end up with GEPs contained in the loop body that have loop-invariant operands. I'm not sure why these GEPs aren't hoisted out of the loop before the vectorizer runs. In theory, we should be able to hoist these GEPs out of the loop ourselves, but we have assumptions elsewhere that if an instruction existed in the original loop body, it will map to something inside the vectorized loop body. So I just clone and broadcast the original GEP inside the loop here. The change to the first-order-recurrence.ll test case is reflective of this.

mssimpso added inline comments.Mar 7 2017, 1:30 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
4710 ↗	(On Diff #90896)	I just thought about a different way to implement this. Instead of the uniform check, we could check if the value returned by the IRBuilder has a vector type, and if not, do the broadcast. If all the operands are loop-invariant in the code below, the IRBuilder will return a scalar GEP. This will probably be fewer lines of code anyway. Let me give it a shot.

Re-implemented the widening, as previously mentioned.

Removed the Legal->isUniform() case. Now, I just let the IRBuilder return a scalar GEP if all the operands are loop-invariant. If the result is scalar, I do the broadcast.
The above change exposes a likely bug in fixFirstOrderRecurrences, so I've fixed that as well. The IRBuilder constant-folded the GEP in first-order-recurrences.ll away. The problematic code assumes that an instruction in the old loop will map to an instruction in the new loop. The is probably not a safe assumption in general.

I'll fix the fixFirstOrderRecurrences bug in a separate patch and rebase. I was able to reproduce the issue in a different test case.

Addressed fixFirstOrderRecurrence bug in a separate patch and rebased.

Ping. Michael/Elena, do you have any more feedback for this patch (and D30587)?

Sorry, lost track of this. I'll try to get to this today/tomorrow, unless Elena does first.

No problem! Thanks, Michael!

Sorry for the delay, Matt.

This makes sense to me.
It was reasonable to special-case GEPs when vectorizing them was mostly pointless, because the vast majority of uses would be scalar. But I think the infrastructure now is good enough to handle it in a generic way. Thanks!

Is there a test that actually stores a vector of pointers?

lib/Transforms/Vectorize/LoopVectorize.cpp
5449 ↗	(On Diff #91050)	nit: "If there's no pointer operand or the pointer operand is not an instruction"

delena added inline comments.Mar 20 2017, 1:03 AM

lib/Transforms/Vectorize/LoopVectorize.cpp

4710 ↗

(On Diff #90896)

Do you have any real case with loop invariant GEP? Do you mean the last test case from first-order-recurrence.ll:

define void @PR29559() {
entry:
  br label %scalar.body

scalar.body:
  %i = phi i64 [ 0, %entry ], [ %i.next, %scalar.body ]
  %tmp2 = phi float* [ undef, %entry ], [ %tmp3, %scalar.body ]
  %tmp3 = getelementptr inbounds [3 x float], [3 x float]* undef, i64 0, i64 0
  %i.next = add nuw nsw i64 %i, 1
  %cond = icmp eq i64 %i.next, undef
  br i1 %cond, label %for.end, label %scalar.body

for.end:
  ret void
}

The %tmp2 and %tmp3 should be scalar.
In my understanding, if all operands of the GEP are loop invariant, the Load/Store is uniform. I just do not understand in which cases we'll need to broadcast the GEP.

In D30710#705027, @mkuper wrote:

Is there a test that actually stores a vector of pointers?

I'll add one and update the patch. Thanks, Michael!

lib/Transforms/Vectorize/LoopVectorize.cpp
4710 ↗	(On Diff #90896)	I should add a test for the broadcast case to make this clear, shouldn't I? If we have a GEP like %tmp3 in the example you pasted that was stored to memory with a vector store, we would need a vector version of it, and it wouldn't be "uniform-after-vectorization". It's still "uniform" in the LAA sense because it has loop-invariant operands. Because if this, IRBuilder will give us a scalar GEP that we will then broadcast. I'll add the test and update.

delena added inline comments.Mar 20 2017, 5:09 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4710 ↗	(On Diff #90896)	If %tmp3 should be stored as a vector, the user (store inst) will broadcast it. You can keep it scalar.

mssimpso added inline comments.Mar 20 2017, 8:42 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4710 ↗	(On Diff #90896)	I see your point, in that getVectorValue() will do the broadcast upon the first use of a loop-invariant value, and initialize the mapping. But I think your comment applies to the vectorization of all instructions, right? Not just GEPs? For example, if for some reason we found an "add i32 0, 0" inside the body of the loop, we currently vectorize it like any instruction. But we could instead skip vectorizing it and allow getVectorValue() to do the broadcast on-demand. Would it make sense to have something like: if (Legal->isUniform(&I)) continue; Before the switch statement in the main loop of vectorizeBlockInLoop()?

mssimpso added inline comments.Mar 20 2017, 10:17 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4710 ↗	(On Diff #90896)	To answer my previous question, no, we can't rely on getVectorValue() to perform the broadcast for "uniform" values. The function that performs the broadcast (getBroadcastInstrs) doesn't clone the original instruction from the old loop body into the new loop body. So we would end up with a broken module where the broadcast in the vector loop preheader uses a "uniform" value defined within the body of the scalar loop. So going back to Elena's comment about %tmp3 being broadcast by the store that uses it - this doesn't work. We have to either vectorize or scalarize the GEP. Otherwise, getBroadcastInsts would literally broadcasts %tmp3 from the body of the original loop into the vector loop preheader, violating dominance. Again, I'll add a test case to make this more concrete. Sorry for the noise.

Addressed feedback from Michael and Elena.

Updated comment.
Added two new test cases. In the first, we have a loop-varying GEP that is stored to memory, and in the second, we have a loop-invariant GEP that is stored to memory. In the first, we create a vector GEP, and in the second, we recreate the scalar GEP and then broadcast it.

delena added inline comments.Mar 21 2017, 12:18 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4710 ↗	(On Diff #90896)	I agree that your code works. But I'd organize the code in another way: if (OrigLoop->hasLoopInvariantOperands(GEP) \|\| VF ==1 \|\| isUniformAfterVectorization(GEP, VF)) { NewGEP = GEP->clone(); initScalar(GEP, NewGEP); } else { // build vector GEP exactly as you do ... VectorLoopValueMap.initVector(&I, Entry); } Michael, what do you think?

mkuper added inline comments.Mar 21 2017, 1:20 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4771 ↗	(On Diff #92355)	Assuming we stay with this version - do we really need the VF == 1 check here? Can we ever end up with NewGEP->getType()->isVectorTy() when VF == 1? I think this would require the original GEP to have a vector type, and I believe we filter this out in canVectorizeInstrs(), regardless of the VF (that is, before we even start thinking about VFs)
4710 ↗	(On Diff #90896)	I think the current version is slightly less brittle, but I understand why you'd want to be explicit. I'm fine with either option, but if we go with the one you're suggesting, we'd still need an assert that the resulting GEP isn't scalar. Also, this is different from the current version, right? The current version looks equivalent to checking that "OrigLoop->hasLoopInvariantOperands(GEP) \|\| VF ==1".

delena added inline comments.Mar 21 2017, 1:50 AM

lib/Transforms/Vectorize/LoopVectorize.cpp

4710 ↗

(On Diff #90896)

isUniformAfterVectorization is redundant here, I agree.
vectorizeMemoryInstruction() creates another GEP for consecutive loads and stores.

we'd still need an assert that the resulting GEP isn't scalar.

Yes.

Something like this?

if (OrigLoop->hasLoopInvariantOperands(GEP)) {

  NewGEP = GEP->clone();
  initScalar(GEP, NewGEP);
} else if (VF ==1) {
  NewGEP = GEP->clone();
  VectorLoopValueMap.initVector(GEP, NewGEP);
} else {

  // build vector GEP exactly as you do
 . ..
   assert(NewGEP->getType()->isVectorTy());
  Entry[Part] = NewGep;

  VectorLoopValueMap.initVector(&I, Entry);
}

mssimpso added inline comments.Mar 21 2017, 4:03 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4771 ↗	(On Diff #92355)	We need the check. The VF == 1 check is to ensure that we don't do the broadcast when unrolling. When unrolling, NewGEP will be scalar (what we want) and we don't want to try and broadcast to a vector with VF == 1.
4710 ↗	(On Diff #90896)	I agree that we should be explicit, but I'm not a huge fan of this approach. This is not how initScalar and initVector work (unless you've intentionally omitted some code). They are called with a (Value *, EntryTy), so before using them, we have to create all the VF x UF values. For your call to initScalar, we would be better off just calling scalarizeInstruction (which does this). But if we really should scalarize the GEP, we should have already done so with the check at the top of the loop. Note, though, that scalarizing the GEP will change what it's vector version will look like. Instead of having the broadcast, we would build the vector with a bunch of inserts. But again, we already know we need a vector version of the GEP, so why scalarize? For the VF == 1 case, once you create the Entry and initialize all the elements, the code will look pretty much the same as the "else" case here. So why separate it?

mssimpso added inline comments.Mar 21 2017, 4:20 AM

lib/Transforms/Vectorize/LoopVectorize.cpp

4710 ↗

(On Diff #90896)

If we want to be more explicit here, I would suggest something like this (which is very similar to the first version of the patch), although I haven't tested this yet:

if (VF > 1 && OrigLoop->hasLoopInvariantOperands(GEP)) {
  // clone GEP and broadcast clone
  for (unsigned Part = 0; Part < UF; ++Part)
    Entry[Part] = /* broadcast clone */
} else {
  for (unsigned Part = 0; Part < UF; ++Part) {
    // build GEP as we do now
    // add an assert that the GEP we build has a vector type
    Entry[Part] = /* vector gep */
}
VectorLoopValueMap.initVector(&I, Entry);

So the loop-invariant case is separate and more explicit, but we still vectorize the GEP instead of trying to scalarize it.

delena added inline comments.Mar 21 2017, 6:54 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4710 ↗	(On Diff #90896)	Note, though, that scalarizing the GEP will change what it's vector version will look like. Instead of having the broadcast, we would build the vector with a bunch of inserts. The scalarizeInstruction() works fine and I agree, we can use it instead of clone(). The "bunch of inserts" is inserted by getVectorvalue(). It happens because inside getVectorValue() we does not check for loop-invariant operands: We only check that the instruction is in the list of uniforms: if (Cost->isUniformAfterVectorization(I, VF)) { VectorValue = getBroadcastInstrs(getScalarValue(V, Part, 0)); } But again, we already know we need a vector version of the GEP, so why scalarize? How do you know this? You do not check the users. If you build a splat vector and the GEP is used in a scalar form, the scalar value will be extracted from the broadcast. Hopefully, another pass will resolve this redundancy. Can you try to change getvectorValue() to if (Cost->isUniformAfterVectorization(I, VF) \|\| OrigLoop->hasLoopInvariantOperands()) { VectorValue = getBroadcastInstrs(getScalarValue(V, Part, 0)); }

mssimpso added inline comments.Mar 21 2017, 8:19 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4710 ↗	(On Diff #90896)	The "bunch of inserts" is inserted by getVectorvalue(). It happens because inside getVectorValue() we does not check for loop-invariant operands: That's right. How do you know this? You do not check the users. If you build a splat vector and the GEP is used in a scalar form, the scalar value will be extracted from the broadcast. That's the whole point of collectLoopScalars(). It looks at the users of the GEPs and determines which ones should be scalar and which ones should be vector. Note that I updated collectLoopScalars to be more precise as part of this patch. So by the time we reach the code here, we "know" that we want a vector version of the GEP because of how it's used. We check the scalars and scalarize if necessary at the top of the main loop here. I see no reason to second-guess the scalarization decision. If something needs to change regarding scalarization, I think it should be done over in collectLoopScalars. To be clear, we will prefer the vector version of the GEP if it has vector users. That is, if we have a GEP with both scalar and vector users, we will now vectorize the GEP and then extract the values for the scalar users. Previously, since we always scalarized the GEPS, if we have the same GEP with both scalar and vector users, we would build the vector on-demand with inserts for the vector users. Can you try to change getvectorValue() I would rather do that in a separate patch, either before or after this one. Regarding what happens to a GEP with loop-invariant operands, we should note that this is no different than any other kind instruction. If we have an "add" instruction inside the loop with loop-invariant operands, we will still vectorize it (the operands would be broadcast). The reason this has to be special-cased for the GEP is simply because for vector GEPs, the vector-typed arguments are optional. We only produce a vector of pointers if at least one argument of the GEP is vector-typed. For the loop-invariant case, we can either pick an argument to broadcast or broadcast the result. But again, my view is that the scalarization decision should be left to collectLoopScalars. There's no other instruction where we check to see if it has loop-invariant operands and then decide to scalarize it. I'm not saying we can't do this, just that we should do the same thing for all instructions.

Incorporated feedback from Elena and Michael. Thanks!

I separated out the loop-invariant case to make the code more explicit. But I'm still vectorizing this case rather than scalarizing it as suggested by Elena. I don't think we need to revisit the scalarization decision here.
I added an assert for the loop-varying case to ensure we actually create a vector of pointers.
I rewrote the comments to hopefully add some more clarity.

Fixed a typo in the new comments.

delena accepted this revision.Mar 22 2017, 2:35 AM

delena added inline comments.

lib/Transforms/Vectorize/LoopVectorize.cpp
4749 ↗	(On Diff #92496)	This "IF" should be dropped, if the GEP with all LoopInvariantOperands will be in the Scalars and the getVectorValue() will know to braodcast it. Please add TODO to the comments.

This revision is now accepted and ready to land.Mar 22 2017, 2:35 AM

Thanks for the review and discussion, Elena! I'll add the TODO before committing.

Closed by commit rL298620: [LV] Vectorize GEPs (authored by mssimpso). · Explain WhyMar 23 2017, 9:42 AM

This revision was automatically updated to reflect the committed changes.

It looks like this commit caused a regression in the test-suite on SystemZ, I now get an assertion failure:

clang-5.0: /home/uweigand/llvm/llvm-head/lib/Transforms/Vectorize/LoopVectorize.cpp:2740: llvm::Value* {anonymous}::InnerLoopVectorizer::getScalarValue(llvm::Value*, unsigned int, unsigned int): Assertion `Lane > 0 ? !Cost->isUniformAfterVectorization(cast<Instruction>(V), VF) : true && "Uniform values only have lane zero"' failed.

when building MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/analyze.c

I suspect this introduced a regression (llvm bisect says it is between r298618 and r298625). This can be reproduced with the reduced example:

int a, c, e, *b, d;
void fn1() {
  int **f = 0;
  b = f + a;
  for (; e; e++, d += c)
    f[e] = b + d;
}

https://bugs.llvm.org//show_bug.cgi?id=32414

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

184 lines

test/

Transforms/

LoopVectorize/

X86/

consecutive-ptr-uniforms.ll

44 lines

scatter_crash.ll

106 lines

vector-geps.ll

61 lines

Diff 92823

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 271 Lines • ▼ Show 20 Lines
/// If the incoming type is void, we return void. If the VF is 1, we return		/// If the incoming type is void, we return void. If the VF is 1, we return
/// the scalar type.		/// the scalar type.
static Type ToVectorTy(Type Scalar, unsigned VF) {		static Type ToVectorTy(Type Scalar, unsigned VF) {
if (Scalar->isVoidTy() \|\| VF == 1)		if (Scalar->isVoidTy() \|\| VF == 1)
return Scalar;		return Scalar;
return VectorType::get(Scalar, VF);		return VectorType::get(Scalar, VF);
}		}

/// A helper function that returns GEP instruction and knows to skip a
/// 'bitcast'. The 'bitcast' may be skipped if the source and the destination
/// pointee types of the 'bitcast' have the same size.
/// For example:
/// bitcast double** %var to i64* - can be skipped
/// bitcast double** %var to i8* - can not
static GetElementPtrInst getGEPInstruction(Value Ptr) {

if (isa<GetElementPtrInst>(Ptr))
return cast<GetElementPtrInst>(Ptr);

if (isa<BitCastInst>(Ptr) &&
isa<GetElementPtrInst>(cast<BitCastInst>(Ptr)->getOperand(0))) {
Type *BitcastTy = Ptr->getType();
Type *GEPTy = cast<BitCastInst>(Ptr)->getSrcTy();
if (!isa<PointerType>(BitcastTy) \|\| !isa<PointerType>(GEPTy))
return nullptr;
Type *Pointee1Ty = cast<PointerType>(BitcastTy)->getPointerElementType();
Type *Pointee2Ty = cast<PointerType>(GEPTy)->getPointerElementType();
const DataLayout &DL = cast<BitCastInst>(Ptr)->getModule()->getDataLayout();
if (DL.getTypeSizeInBits(Pointee1Ty) == DL.getTypeSizeInBits(Pointee2Ty))
return cast<GetElementPtrInst>(cast<BitCastInst>(Ptr)->getOperand(0));
}
return nullptr;
}

// FIXME: The following helper functions have multiple implementations		// FIXME: The following helper functions have multiple implementations
// in the project. They can be effectively organized in a common Load/Store		// in the project. They can be effectively organized in a common Load/Store
// utilities unit.		// utilities unit.

/// A helper function that returns the pointer operand of a load or store		/// A helper function that returns the pointer operand of a load or store
/// instruction.		/// instruction.
static Value getPointerOperand(Value I) {		static Value getPointerOperand(Value I) {
if (auto *LI = dyn_cast<LoadInst>(I))		if (auto *LI = dyn_cast<LoadInst>(I))
▲ Show 20 Lines • Show All 2,678 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) {
int ConsecutiveStride = Legal->isConsecutivePtr(Ptr);		int ConsecutiveStride = Legal->isConsecutivePtr(Ptr);
bool Reverse = ConsecutiveStride < 0;		bool Reverse = ConsecutiveStride < 0;
bool CreateGatherScatter =		bool CreateGatherScatter =
(Decision == LoopVectorizationCostModel::CM_GatherScatter);		(Decision == LoopVectorizationCostModel::CM_GatherScatter);

VectorParts VectorGep;		VectorParts VectorGep;

// Handle consecutive loads/stores.		// Handle consecutive loads/stores.
GetElementPtrInst *Gep = getGEPInstruction(Ptr);
if (ConsecutiveStride) {		if (ConsecutiveStride) {
Ptr = getScalarValue(Ptr, 0, 0);		Ptr = getScalarValue(Ptr, 0, 0);
} else {		} else {
// At this point we should vector version of GEP for Gather or Scatter		// At this point we should vector version of GEP for Gather or Scatter
assert(CreateGatherScatter && "The instruction should be scalarized");		assert(CreateGatherScatter && "The instruction should be scalarized");
if (Gep) {
// Vectorizing GEP, across UF parts. We want to get a vector value for base
// and each index that's defined inside the loop, even if it is
// loop-invariant but wasn't hoisted out. Otherwise we want to keep them
// scalar.
SmallVector<VectorParts, 4> OpsV;
for (Value *Op : Gep->operands()) {
Instruction *SrcInst = dyn_cast<Instruction>(Op);
if (SrcInst && OrigLoop->contains(SrcInst))
OpsV.push_back(getVectorValue(Op));
else
OpsV.push_back(VectorParts(UF, Op));
}
for (unsigned Part = 0; Part < UF; ++Part) {
SmallVector<Value *, 4> Ops;
Value *GEPBasePtr = OpsV[0][Part];
for (unsigned i = 1; i < Gep->getNumOperands(); i++)
Ops.push_back(OpsV[i][Part]);
Value *NewGep = Builder.CreateGEP(GEPBasePtr, Ops, "VectorGep");
cast<GetElementPtrInst>(NewGep)->setIsInBounds(Gep->isInBounds());
assert(NewGep->getType()->isVectorTy() && "Expected vector GEP");

NewGep =
Builder.CreateBitCast(NewGep, VectorType::get(Ptr->getType(), VF));
VectorGep.push_back(NewGep);
}
} else
VectorGep = getVectorValue(Ptr);		VectorGep = getVectorValue(Ptr);
}		}

VectorParts Mask = createBlockInMask(Instr->getParent());		VectorParts Mask = createBlockInMask(Instr->getParent());
// Handle Stores:		// Handle Stores:
if (SI) {		if (SI) {
assert(!Legal->isUniform(SI->getPointerOperand()) &&		assert(!Legal->isUniform(SI->getPointerOperand()) &&
"We do not allow storing to uniform addresses");		"We do not allow storing to uniform addresses");
setDebugLocFromInst(Builder, SI);		setDebugLocFromInst(Builder, SI);
▲ Show 20 Lines • Show All 1,743 Lines • ▼ Show 20 Lines	case Instruction::Br:
// Nothing to do for PHIs and BR, since we already took care of the		// Nothing to do for PHIs and BR, since we already took care of the
// loop control flow instructions.		// loop control flow instructions.
continue;		continue;
case Instruction::PHI: {		case Instruction::PHI: {
// Vectorize PHINodes.		// Vectorize PHINodes.
widenPHIInstruction(&I, UF, VF);		widenPHIInstruction(&I, UF, VF);
continue;		continue;
} // End of PHI.		} // End of PHI.
		case Instruction::GetElementPtr: {
		// Construct a vector GEP by widening the operands of the scalar GEP as
		// necessary. We mark the vector GEP 'inbounds' if appropriate. A GEP
		// results in a vector of pointers when at least one operand of the GEP
		// is vector-typed. Thus, to keep the representation compact, we only use
		// vector-typed operands for loop-varying values.
		auto *GEP = cast<GetElementPtrInst>(&I);
		VectorParts Entry(UF);

		if (VF > 1 && OrigLoop->hasLoopInvariantOperands(GEP)) {
		// If we are vectorizing, but the GEP has only loop-invariant operands,
		// the GEP we build (by only using vector-typed operands for
		// loop-varying values) would be a scalar pointer. Thus, to ensure we
		// produce a vector of pointers, we need to either arbitrarily pick an
		// operand to broadcast, or broadcast a clone of the original GEP.
		// Here, we broadcast a clone of the original.
		//
		// TODO: If at some point we decide to scalarize instructions having
		// loop-invariant operands, this special case will no longer be
		// required. We would add the scalarization decision to
		// collectLoopScalars() and teach getVectorValue() to broadcast
		// the lane-zero scalar value.
		auto *Clone = Builder.Insert(GEP->clone());
		for (unsigned Part = 0; Part < UF; ++Part)
		Entry[Part] = Builder.CreateVectorSplat(VF, Clone);
		} else {
		// If the GEP has at least one loop-varying operand, we are sure to
		// produce a vector of pointers. But if we are only unrolling, we want
		// to produce a scalar GEP for each unroll part. Thus, the GEP we
		// produce with the code below will be scalar (if VF == 1) or vector
		// (otherwise). Note that for the unroll-only case, we still maintain
		// values in the vector mapping with initVector, as we do for other
		// instructions.
		for (unsigned Part = 0; Part < UF; ++Part) {

		// The pointer operand of the new GEP. If it's loop-invariant, we
		// won't broadcast it.
		auto *Ptr = OrigLoop->isLoopInvariant(GEP->getPointerOperand())
		? GEP->getPointerOperand()
		: getVectorValue(GEP->getPointerOperand())[Part];

		// Collect all the indices for the new GEP. If any index is
		// loop-invariant, we won't broadcast it.
		SmallVector<Value *, 4> Indices;
		for (auto &U : make_range(GEP->idx_begin(), GEP->idx_end())) {
		if (OrigLoop->isLoopInvariant(U.get()))
		Indices.push_back(U.get());
		else
		Indices.push_back(getVectorValue(U.get())[Part]);
		}

		// Create the new GEP. Note that this GEP may be a scalar if VF == 1,
		// but it should be a vector, otherwise.
		auto *NewGEP = GEP->isInBounds()
		? Builder.CreateInBoundsGEP(Ptr, Indices)
		: Builder.CreateGEP(Ptr, Indices);
		assert((VF == 1 \|\| NewGEP->getType()->isVectorTy()) &&
		"NewGEP is not a pointer vector");
		Entry[Part] = NewGEP;
		}
		}

		VectorLoopValueMap.initVector(&I, Entry);
		addMetadata(Entry, GEP);
		break;
		}
case Instruction::UDiv:		case Instruction::UDiv:
case Instruction::SDiv:		case Instruction::SDiv:
case Instruction::SRem:		case Instruction::SRem:
case Instruction::URem:		case Instruction::URem:
// Scalarize with predication if this instruction may divide by zero and		// Scalarize with predication if this instruction may divide by zero and
// block execution is conditional, otherwise fallthrough.		// block execution is conditional, otherwise fallthrough.
if (Legal->isScalarWithPredication(&I)) {		if (Legal->isScalarWithPredication(&I)) {
scalarizeInstruction(&I, true);		scalarizeInstruction(&I, true);
▲ Show 20 Lines • Show All 674 Lines • ▼ Show 20 Lines	void LoopVectorizationCostModel::collectLoopScalars(unsigned VF) {
// sense.		// sense.

assert(VF >= 2 && !Scalars.count(VF) &&		assert(VF >= 2 && !Scalars.count(VF) &&
"This function should not be visited twice for the same VF");		"This function should not be visited twice for the same VF");

// If an instruction is uniform after vectorization, it will remain scalar.		// If an instruction is uniform after vectorization, it will remain scalar.
Scalars[VF].insert(Uniforms[VF].begin(), Uniforms[VF].end());		Scalars[VF].insert(Uniforms[VF].begin(), Uniforms[VF].end());

// Collect the getelementptr instructions that will not be vectorized. A		// These sets are used to seed the analysis of loop scalars with memory
// getelementptr instruction is only vectorized if it is used for a legal		// access pointer operands that will remain scalar.
// gather or scatter operation.		SmallSetVector<Instruction *, 8> ScalarPtrs;
for (auto *BB : TheLoop->blocks())		SmallPtrSet<Instruction *, 8> PossibleNonScalarPtrs;

		// Returns true if the given instruction will not be a gather or scatter
		// operation with vectorization factor VF.
		auto isScalarDecision = [&](Instruction *I, unsigned VF) {
		InstWidening WideningDecision = getWideningDecision(I, VF);
		assert(WideningDecision != CM_Unknown &&
		"Widening decision should be ready at this moment");
		return WideningDecision != CM_GatherScatter;
		};

		// Collect the initial values that we know will not be vectorized. A value
		// will remain scalar if it is only used as the pointer operand of memory
		// accesses that are not gather or scatter operations.
		for (auto *BB : TheLoop->blocks()) {
for (auto &I : *BB) {		for (auto &I : *BB) {
if (auto *GEP = dyn_cast<GetElementPtrInst>(&I)) {
Scalars[VF].insert(GEP);		// If there's no pointer operand or the pointer operand is not an
continue;		// instruction, there's nothing to do.
}		auto *Ptr = dyn_cast_or_null<Instruction>(getPointerOperand(&I));
auto *Ptr = getPointerOperand(&I);
if (!Ptr)		if (!Ptr)
continue;		continue;
auto *GEP = getGEPInstruction(Ptr);
if (GEP && getWideningDecision(&I, VF) == CM_GatherScatter)		// If the pointer has already been identified as scalar (e.g., if it was
Scalars[VF].erase(GEP);		// also inditifed as uniform), there's nothing to do.
		if (Scalars[VF].count(Ptr))
		continue;

		// True if all users of Ptr are memory accesses that have Ptr as their
		// pointer operand.
		auto UsersAreMemAccesses = all_of(Ptr->users(), [&](User *U) -> bool {
		return getPointerOperand(U) == Ptr;
		});

		// If the pointer is used by an instruction other than a memory access,
		// it may not remain scalar. If the memory access is a gather or scatter
		// operation, the pointer will not remain scalar.
		if (!UsersAreMemAccesses \|\| !isScalarDecision(&I, VF))
		PossibleNonScalarPtrs.insert(Ptr);
		else
		ScalarPtrs.insert(Ptr);
		}
		}

		// Add to the set of scalars all the pointers we know will not be vectorized.
		for (auto *I : ScalarPtrs)
		if (!PossibleNonScalarPtrs.count(I)) {
		DEBUG(dbgs() << "LV: Found scalar instruction: " << *I << "\n");
		Scalars[VF].insert(I);
}		}

// An induction variable will remain scalar if all users of the induction		// An induction variable will remain scalar if all users of the induction
// variable and induction variable update remain scalar.		// variable and induction variable update remain scalar.
auto *Latch = TheLoop->getLoopLatch();		auto *Latch = TheLoop->getLoopLatch();
for (auto &Induction : *Legal->getInductionVars()) {		for (auto &Induction : *Legal->getInductionVars()) {
auto *Ind = Induction.first;		auto *Ind = Induction.first;
auto *IndUpdate = cast<Instruction>(Ind->getIncomingValueForBlock(Latch));		auto *IndUpdate = cast<Instruction>(Ind->getIncomingValueForBlock(Latch));
Show All 14 Lines	auto ScalarIndUpdate = all_of(IndUpdate->users(), [&](User *U) -> bool {
return I == Ind \|\| !TheLoop->contains(I) \|\| Scalars[VF].count(I);		return I == Ind \|\| !TheLoop->contains(I) \|\| Scalars[VF].count(I);
});		});
if (!ScalarIndUpdate)		if (!ScalarIndUpdate)
continue;		continue;

// The induction variable and its update instruction will remain scalar.		// The induction variable and its update instruction will remain scalar.
Scalars[VF].insert(Ind);		Scalars[VF].insert(Ind);
Scalars[VF].insert(IndUpdate);		Scalars[VF].insert(IndUpdate);
		DEBUG(dbgs() << "LV: Found scalar instruction: " << *Ind << "\n");
		DEBUG(dbgs() << "LV: Found scalar instruction: " << *IndUpdate << "\n");
}		}
}		}

bool LoopVectorizationLegality::isScalarWithPredication(Instruction *I) {		bool LoopVectorizationLegality::isScalarWithPredication(Instruction *I) {
if (!blockNeedsPredication(I->getParent()))		if (!blockNeedsPredication(I->getParent()))
return false;		return false;
switch(I->getOpcode()) {		switch(I->getOpcode()) {
default:		default:
▲ Show 20 Lines • Show All 2,264 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/X86/consecutive-ptr-uniforms.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -loop-vectorize -instcombine -S -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -instcombine -S -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; CHECK-LABEL: PR31671			; CHECK-LABEL: PR31671
	;			;
	; Check a pointer in which one of its uses is consecutive-like and another of			; Check a pointer in which one of its uses is consecutive-like and another of
	; its uses is non-consecutive-like. In the test case below, %tmp3 is the			; its uses is non-consecutive-like. In the test case below, %tmp3 is the
	; pointer operand of an interleaved load, making it consecutive-like. However,			; pointer operand of an interleaved load, making it consecutive-like. However,
	; it is also the pointer operand of a non-interleaved store that will become a			; it is also the pointer operand of a non-interleaved store that will become a
	; scatter operation. %tmp3 (and the induction variable) should not be marked			; scatter operation. %tmp3 (and the induction variable) should not be marked
	; uniform-after-vectorization.			; uniform-after-vectorization.
	;			;
	; CHECK: LV: Found uniform instruction: %tmp0 = getelementptr inbounds %data, %data* %d, i64 0, i32 3, i64 %i			; CHECK: LV: Found uniform instruction: %tmp0 = getelementptr inbounds %data, %data* %d, i64 0, i32 3, i64 %i
	; CHECK-NOT: LV: Found uniform instruction: %tmp3 = getelementptr inbounds %data, %data* %d, i64 0, i32 0, i64 %i			; CHECK-NOT: LV: Found uniform instruction: %tmp3 = getelementptr inbounds %data, %data* %d, i64 0, i32 0, i64 %i
	; CHECK-NOT: LV: Found uniform instruction: %i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]			; CHECK-NOT: LV: Found uniform instruction: %i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
	; CHECK-NOT: LV: Found uniform instruction: %i.next = add nuw nsw i64 %i, 5			; CHECK-NOT: LV: Found uniform instruction: %i.next = add nuw nsw i64 %i, 5
				; CHECK: vector.ph:
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x float> undef, float %x, i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x float> [[BROADCAST_SPLATINSERT]], <16 x float> undef, <16 x i32> zeroinitializer
				; CHECK-NEXT: br label %vector.body
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: %index = phi i64			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
	; CHECK: %vec.ind = phi <16 x i64>			; CHECK-NEXT: [[VEC_IND:%.]] = phi <16 x i64> [ <i64 0, i64 5, i64 10, i64 15, i64 20, i64 25, i64 30, i64 35, i64 40, i64 45, i64 50, i64 55, i64 60, i64 65, i64 70, i64 75>, %vector.ph ], [ [[VEC_IND_NEXT:%.]], %vector.body ]
	; CHECK: %[[T0:.+]] = mul i64 %index, 5			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = mul i64 [[INDEX]], 5
	; CHECK: %[[T1:.+]] = getelementptr inbounds %data, %data* %d, i64 0, i32 3, i64 %[[T0]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds %data, %data %d, i64 0, i32 3, i64 [[OFFSET_IDX]]
	; CHECK: %[[T2:.+]] = bitcast float* %[[T1]] to <80 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[TMP0]] to <80 x float>*
	; CHECK: load <80 x float>, <80 x float>* %[[T2]], align 4			; CHECK-NEXT: [[WIDE_VEC:%.]] = load <80 x float>, <80 x float> [[TMP1]], align 4
	; CHECK: %[[T3:.+]] = getelementptr inbounds %data, %data* %d, i64 0, i32 0, i64 %[[T0]]			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <80 x float> [[WIDE_VEC]], <80 x float> undef, <16 x i32> <i32 0, i32 5, i32 10, i32 15, i32 20, i32 25, i32 30, i32 35, i32 40, i32 45, i32 50, i32 55, i32 60, i32 65, i32 70, i32 75>
	; CHECK: %[[T4:.+]] = bitcast float* %[[T3]] to <80 x float>*			; CHECK-NEXT: [[TMP2:%.*]] = fmul <16 x float> [[BROADCAST_SPLAT]], [[STRIDED_VEC]]
	; CHECK: load <80 x float>, <80 x float>* %[[T4]], align 4			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds %data, %data %d, i64 0, i32 0, <16 x i64> [[VEC_IND]]
	; CHECK: %VectorGep = getelementptr inbounds %data, %data* %d, i64 0, i32 0, <16 x i64> %vec.ind			; CHECK-NEXT: [[BC:%.]] = bitcast <16 x float> [[TMP3]] to <16 x <80 x float>*>
	; CHECK: call void @llvm.masked.scatter.v16f32({{.}}, <16 x float> %VectorGep, {{.*}})			; CHECK-NEXT: [[TMP4:%.]] = extractelement <16 x <80 x float>> [[BC]], i32 0
				; CHECK-NEXT: [[WIDE_VEC1:%.]] = load <80 x float>, <80 x float> [[TMP4]], align 4
				; CHECK-NEXT: [[STRIDED_VEC2:%.*]] = shufflevector <80 x float> [[WIDE_VEC1]], <80 x float> undef, <16 x i32> <i32 0, i32 5, i32 10, i32 15, i32 20, i32 25, i32 30, i32 35, i32 40, i32 45, i32 50, i32 55, i32 60, i32 65, i32 70, i32 75>
				; CHECK-NEXT: [[TMP5:%.*]] = fadd <16 x float> [[STRIDED_VEC2]], [[TMP2]]
				; CHECK-NEXT: call void @llvm.masked.scatter.v16f32(<16 x float> [[TMP5]], <16 x float*> [[TMP3]], i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <16 x i64> [[VEC_IND]], <i64 80, i64 80, i64 80, i64 80, i64 80, i64 80, i64 80, i64 80, i64 80, i64 80, i64 80, i64 80, i64 80, i64 80, i64 80, i64 80>
	; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body			; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body

	%data = type { [32000 x float], [3 x i32], [4 x i8], [32000 x float] }			%data = type { [32000 x float], [3 x i32], [4 x i8], [32000 x float] }

	define void @PR31671(float %x, %data* %d) #0 {			define void @PR31671(float %x, %data* %d) #0 {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	Show All 17 Lines

llvm/trunk/test/Transforms/LoopVectorize/X86/scatter_crash.ll

	Show All 10 Lines
	@c = external global i32, align 4			@c = external global i32, align 4
	@a = external global i32, align 4			@a = external global i32, align 4
	@b = external global i64, align 8			@b = external global i64, align 8

	; Function Attrs: norecurse nounwind ssp uwtable			; Function Attrs: norecurse nounwind ssp uwtable
	define void @_Z3fn1v() #0 {			define void @_Z3fn1v() #0 {
	; CHECK-LABEL: @_Z3fn1v(			; CHECK-LABEL: @_Z3fn1v(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX:%.]].next, %vector.body ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
	; CHECK-NEXT: [[VEC_IND:%.*]] = phi <16 x i64> [			; CHECK-NEXT: [[VEC_IND:%.]] = phi <16 x i64> [ <i64 8, i64 10, i64 12, i64 14, i64 16, i64 18, i64 20, i64 22, i64 24, i64 26, i64 28, i64 30, i64 32, i64 34, i64 36, i64 38>, %vector.ph ], [ [[VEC_IND_NEXT:%.]], %vector.body ]
	; CHECK-NEXT: [[VEC_IND3:%.*]] = phi <16 x i64> [			; CHECK-NEXT: [[VEC_IND3:%.]] = phi <16 x i64> [ <i64 0, i64 2, i64 4, i64 6, i64 8, i64 10, i64 12, i64 14, i64 16, i64 18, i64 20, i64 22, i64 24, i64 26, i64 28, i64 30>, %vector.ph ], [ [[VEC_IND_NEXT4:%.]], %vector.body ]
	; CHECK-NEXT: [[SHL:%.*]] = shl i64 %index, 1
	; CHECK-NEXT: %offset.idx = add i64 [[SHL]], 8
	; CHECK-NEXT: [[IND00:%.*]] = add i64 %offset.idx, 0
	; CHECK-NEXT: [[IND02:%.*]] = add i64 %offset.idx, 2
	; CHECK-NEXT: [[IND04:%.*]] = add i64 %offset.idx, 4
	; CHECK-NEXT: [[IND06:%.*]] = add i64 %offset.idx, 6
	; CHECK-NEXT: [[IND08:%.*]] = add i64 %offset.idx, 8
	; CHECK-NEXT: [[IND10:%.*]] = add i64 %offset.idx, 10
	; CHECK-NEXT: [[IND12:%.*]] = add i64 %offset.idx, 12
	; CHECK-NEXT: [[IND14:%.*]] = add i64 %offset.idx, 14
	; CHECK-NEXT: [[IND16:%.*]] = add i64 %offset.idx, 16
	; CHECK-NEXT: [[IND18:%.*]] = add i64 %offset.idx, 18
	; CHECK-NEXT: [[IND20:%.*]] = add i64 %offset.idx, 20
	; CHECK-NEXT: [[IND22:%.*]] = add i64 %offset.idx, 22
	; CHECK-NEXT: [[IND24:%.*]] = add i64 %offset.idx, 24
	; CHECK-NEXT: [[IND26:%.*]] = add i64 %offset.idx, 26
	; CHECK-NEXT: [[IND28:%.*]] = add i64 %offset.idx, 28
	; CHECK-NEXT: [[IND30:%.*]] = add i64 %offset.idx, 30
	; CHECK-NEXT: [[TMP10:%.*]] = sub nsw <16 x i64> <i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8>, [[VEC_IND]]			; CHECK-NEXT: [[TMP10:%.*]] = sub nsw <16 x i64> <i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8>, [[VEC_IND]]
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND00]]			; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, <16 x i64> [[VEC_IND]]
	; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND02]]			; CHECK-NEXT: [[TMP12:%.*]] = add nsw <16 x i64> [[TMP10]], [[VEC_IND3]]
	; CHECK-NEXT: [[TMP18:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND04]]			; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds [10 x i32], <16 x [10 x i32]> [[TMP11]], <16 x i64> [[TMP12]], i64 0
	; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND06]]			; CHECK-NEXT: call void @llvm.masked.scatter.v16i32(<16 x i32> <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>, <16 x i32*> [[TMP13]], i32 16, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
	; CHECK-NEXT: [[TMP24:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND08]]			; CHECK-NEXT: [[TMP14:%.*]] = or <16 x i64> [[VEC_IND3]], <i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1>
	; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND10]]			; CHECK-NEXT: [[TMP15:%.*]] = add nsw <16 x i64> [[TMP10]], [[TMP14]]
	; CHECK-NEXT: [[TMP30:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND12]]			; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds [10 x i32], <16 x [10 x i32]> [[TMP11]], <16 x i64> [[TMP15]], i64 0
	; CHECK-NEXT: [[TMP33:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND14]]			; CHECK-NEXT: call void @llvm.masked.scatter.v16i32(<16 x i32> <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>, <16 x i32*> [[TMP16]], i32 8, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
	; CHECK-NEXT: [[TMP36:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND16]]			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP39:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND18]]			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <16 x i64> [[VEC_IND]], <i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32>
	; CHECK-NEXT: [[TMP42:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND20]]			; CHECK-NEXT: [[VEC_IND_NEXT4]] = add <16 x i64> [[VEC_IND3]], <i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32>
	; CHECK-NEXT: [[TMP45:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND22]]			; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body
	; CHECK-NEXT: [[TMP48:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND24]]			;
	; CHECK-NEXT: [[TMP51:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND26]]
	; CHECK-NEXT: [[TMP54:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND28]]
	; CHECK-NEXT: [[TMP57:%.]] = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]] @d, i64 0, i64 [[IND30]]
	; CHECK-NEXT: [[TMP13:%.]] = insertelement <16 x [10 x i32]> undef, [10 x i32]* [[TMP12]], i32 0
	; CHECK-NEXT: [[TMP16:%.]] = insertelement <16 x [10 x i32]> [[TMP13]], [10 x i32]* [[TMP15]], i32 1
	; CHECK-NEXT: [[TMP19:%.]] = insertelement <16 x [10 x i32]> [[TMP16]], [10 x i32]* [[TMP18]], i32 2
	; CHECK-NEXT: [[TMP22:%.]] = insertelement <16 x [10 x i32]> [[TMP19]], [10 x i32]* [[TMP21]], i32 3
	; CHECK-NEXT: [[TMP25:%.]] = insertelement <16 x [10 x i32]> [[TMP22]], [10 x i32]* [[TMP24]], i32 4
	; CHECK-NEXT: [[TMP28:%.]] = insertelement <16 x [10 x i32]> [[TMP25]], [10 x i32]* [[TMP27]], i32 5
	; CHECK-NEXT: [[TMP31:%.]] = insertelement <16 x [10 x i32]> [[TMP28]], [10 x i32]* [[TMP30]], i32 6
	; CHECK-NEXT: [[TMP34:%.]] = insertelement <16 x [10 x i32]> [[TMP31]], [10 x i32]* [[TMP33]], i32 7
	; CHECK-NEXT: [[TMP37:%.]] = insertelement <16 x [10 x i32]> [[TMP34]], [10 x i32]* [[TMP36]], i32 8
	; CHECK-NEXT: [[TMP40:%.]] = insertelement <16 x [10 x i32]> [[TMP37]], [10 x i32]* [[TMP39]], i32 9
	; CHECK-NEXT: [[TMP43:%.]] = insertelement <16 x [10 x i32]> [[TMP40]], [10 x i32]* [[TMP42]], i32 10
	; CHECK-NEXT: [[TMP46:%.]] = insertelement <16 x [10 x i32]> [[TMP43]], [10 x i32]* [[TMP45]], i32 11
	; CHECK-NEXT: [[TMP49:%.]] = insertelement <16 x [10 x i32]> [[TMP46]], [10 x i32]* [[TMP48]], i32 12
	; CHECK-NEXT: [[TMP52:%.]] = insertelement <16 x [10 x i32]> [[TMP49]], [10 x i32]* [[TMP51]], i32 13
	; CHECK-NEXT: [[TMP55:%.]] = insertelement <16 x [10 x i32]> [[TMP52]], [10 x i32]* [[TMP54]], i32 14
	; CHECK-NEXT: [[TMP58:%.]] = insertelement <16 x [10 x i32]> [[TMP55]], [10 x i32]* [[TMP57]], i32 15
	; CHECK-NEXT: [[TMP59:%.*]] = add nsw <16 x i64> [[TMP10]], [[VEC_IND3]]
	; CHECK-NEXT: [[TMP61:%.*]] = extractelement <16 x i64> [[TMP59]], i32 0
	; CHECK-NEXT: [[TMP62:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP12]], i64 [[TMP61]], i64 0
	; CHECK-NEXT: [[TMP65:%.*]] = extractelement <16 x i64> [[TMP59]], i32 1
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP15]], i64 [[TMP65]], i64 0
	; CHECK-NEXT: [[TMP69:%.*]] = extractelement <16 x i64> [[TMP59]], i32 2
	; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP18]], i64 [[TMP69]], i64 0
	; CHECK-NEXT: [[TMP73:%.*]] = extractelement <16 x i64> [[TMP59]], i32 3
	; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP21]], i64 [[TMP73]], i64 0
	; CHECK-NEXT: [[TMP77:%.*]] = extractelement <16 x i64> [[TMP59]], i32 4
	; CHECK-NEXT: [[TMP78:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP24]], i64 [[TMP77]], i64 0
	; CHECK-NEXT: [[TMP81:%.*]] = extractelement <16 x i64> [[TMP59]], i32 5
	; CHECK-NEXT: [[TMP82:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP27]], i64 [[TMP81]], i64 0
	; CHECK-NEXT: [[TMP85:%.*]] = extractelement <16 x i64> [[TMP59]], i32 6
	; CHECK-NEXT: [[TMP86:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP30]], i64 [[TMP85]], i64 0
	; CHECK-NEXT: [[TMP89:%.*]] = extractelement <16 x i64> [[TMP59]], i32 7
	; CHECK-NEXT: [[TMP90:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP33]], i64 [[TMP89]], i64 0
	; CHECK-NEXT: [[TMP93:%.*]] = extractelement <16 x i64> [[TMP59]], i32 8
	; CHECK-NEXT: [[TMP94:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP36]], i64 [[TMP93]], i64 0
	; CHECK-NEXT: [[TMP97:%.*]] = extractelement <16 x i64> [[TMP59]], i32 9
	; CHECK-NEXT: [[TMP98:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP39]], i64 [[TMP97]], i64 0
	; CHECK-NEXT: [[TMP101:%.*]] = extractelement <16 x i64> [[TMP59]], i32 10
	; CHECK-NEXT: [[TMP102:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP42]], i64 [[TMP101]], i64 0
	; CHECK-NEXT: [[TMP105:%.*]] = extractelement <16 x i64> [[TMP59]], i32 11
	; CHECK-NEXT: [[TMP106:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP45]], i64 [[TMP105]], i64 0
	; CHECK-NEXT: [[TMP109:%.*]] = extractelement <16 x i64> [[TMP59]], i32 12
	; CHECK-NEXT: [[TMP110:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP48]], i64 [[TMP109]], i64 0
	; CHECK-NEXT: [[TMP113:%.*]] = extractelement <16 x i64> [[TMP59]], i32 13
	; CHECK-NEXT: [[TMP114:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP51]], i64 [[TMP113]], i64 0
	; CHECK-NEXT: [[TMP117:%.*]] = extractelement <16 x i64> [[TMP59]], i32 14
	; CHECK-NEXT: [[TMP118:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP54]], i64 [[TMP117]], i64 0
	; CHECK-NEXT: [[TMP121:%.*]] = extractelement <16 x i64> [[TMP59]], i32 15
	; CHECK-NEXT: [[TMP122:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[TMP57]], i64 [[TMP121]], i64 0
	; CHECK-NEXT: [[VECTORGEP:%.]] = getelementptr inbounds [10 x i32], <16 x [10 x i32]> [[TMP58]], <16 x i64> [[TMP59]], i64 0
	; CHECK-NEXT: call void @llvm.masked.scatter.v16i32(<16 x i32> <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>, <16 x i32*> [[VECTORGEP]], i32 16, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
	; CHECK: [[STEP_ADD:%.*]] = add <16 x i64> [[VEC_IND]], <i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32>
	; CHECK: [[STEP_ADD4:%.*]] = add <16 x i64> [[VEC_IND3]], <i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32>
	entry:			entry:
	%0 = load i32, i32* @c, align 4			%0 = load i32, i32* @c, align 4
	%cmp34 = icmp sgt i32 %0, 8			%cmp34 = icmp sgt i32 %0, 8
	br i1 %cmp34, label %for.body.lr.ph, label %for.cond.cleanup			br i1 %cmp34, label %for.body.lr.ph, label %for.cond.cleanup

	for.body.lr.ph: ; preds = %entry			for.body.lr.ph: ; preds = %entry
	%1 = load i32, i32* @a, align 4			%1 = load i32, i32* @a, align 4
	%tobool = icmp eq i32 %1, 0			%tobool = icmp eq i32 %1, 0
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/vector-geps.ll

				; RUN: opt < %s -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -instcombine -S \| FileCheck %s

				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

				; CHECK-LABEL: @vector_gep_stored(
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, %vector.ph ], [ [[VEC_IND_NEXT:%.]], %vector.body ]
				; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 %b, <4 x i64> [[VEC_IND]]
				; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32** %a, i64 [[INDEX]]
				; CHECK-NEXT: [[TMP3:%.]] = bitcast i32* [[TMP2]] to <4 x i32>
				; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP3]], align 8
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body
				;
				define void @vector_gep_stored(i32** %a, i32 *%b, i64 %n) {
				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
				%tmp0 = getelementptr inbounds i32, i32* %b, i64 %i
				%tmp1 = getelementptr inbounds i32, i32* %a, i64 %i
				store i32* %tmp0, i32** %tmp1, align 8
				%i.next = add nuw nsw i64 %i, 1
				%cond = icmp slt i64 %i.next, %n
				br i1 %cond, label %for.body, label %for.end

				for.end:
				ret void
				}

				; CHECK-LABEL: @uniform_vector_gep_stored(
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
				; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 %b, i64 1
				; CHECK-NEXT: [[DOTSPLATINSERT:%.]] = insertelement <4 x i32> undef, i32* [[TMP1]], i32 0
				; CHECK-NEXT: [[DOTSPLAT:%.]] = shufflevector <4 x i32> [[DOTSPLATINSERT]], <4 x i32*> undef, <4 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32** %a, i64 [[INDEX]]
				; CHECK-NEXT: [[TMP3:%.]] = bitcast i32* [[TMP2]] to <4 x i32>
				; CHECK-NEXT: store <4 x i32> [[DOTSPLAT]], <4 x i32>* [[TMP3]], align 8
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
				; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body
				;
				define void @uniform_vector_gep_stored(i32** %a, i32 *%b, i64 %n) {
				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
				%tmp0 = getelementptr inbounds i32, i32* %b, i64 1
				%tmp1 = getelementptr inbounds i32, i32* %a, i64 %i
				store i32* %tmp0, i32** %tmp1, align 8
				%i.next = add nuw nsw i64 %i, 1
				%cond = icmp slt i64 %i.next, %n
				br i1 %cond, label %for.body, label %for.end

				for.end:
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Vectorize GEPsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 92823

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/trunk/test/Transforms/LoopVectorize/X86/consecutive-ptr-uniforms.ll

llvm/trunk/test/Transforms/LoopVectorize/X86/scatter_crash.ll

llvm/trunk/test/Transforms/LoopVectorize/vector-geps.ll

[LV] Vectorize GEPs
ClosedPublic