This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
2
LoopAccessAnalysis.cpp
-
test/
-
Analysis/LoopAccessAnalysis/
-
LoopAccessAnalysis/
2/5
forked-pointers.ll
-
Transforms/LoopVectorize/
-
LoopVectorize/
-
forked-pointers.ll

Differential D154714

LAA: handle GEPs with >2 operands in findForkedSCEVs()
AbandonedPublic

Authored by artagnon on Jul 7 2023, 6:27 AM.

Download Raw Diff

Details

Reviewers

fhahn
reames
bjope
nikic

Summary

Consider vectorizing the following case:

struct layer_data {
  unsigned char arr[2048];
} *ld;

void Next_Packet() {
  int l = 0;
  while (l<2048)
  {
    ld->arr[l++] = 3;
    ld->arr[l++] = 4;
  }
}

The GEPs for the assignments have three operands: the ptr ld, a
zero-index into ld, and another indexing depending on the loop induction
variable. However, findForkedSCEVs() suffers from the limitation that it
can only handle GEPs with two operands. Lift this limitation, and
successfully vectorize the above example, albeit with runtime
memory-checks.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,030 ms	x64 debian > MLIR.Examples/standalone::test.toy

Event Timeline

artagnon created this revision.Jul 7 2023, 6:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2023, 6:27 AM

Herald added subscribers: StephenFan, javed.absar, hiraditya. · View Herald Transcript

artagnon requested review of this revision.Jul 7 2023, 6:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2023, 6:27 AM

Herald added subscribers: llvm-commits, wangpc. · View Herald Transcript

To eliminate the runtime check, should we investigate turning on scev-aa for just LAA? Is it possible to do that?

Not familiar with this code, but this doesn't look correct to me.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
866	This seems to assume that the same scale can be applied to all GEP indices, which is not correct. I think the only reason this happens to work for your example is that one of the indices is zero. If you want to handle this generically, what you should probably do is either a) use collectOffsets() to convert the GEP into scale multiply representation or b) get the SCEV for the GEP and analyze SCEVUnknowns for forking.
954	If I'm reading this code right, this is claiming that `load ptr, ptr %p` is the same as `%p`? That doesn't make sense. `load` should always be a root instruction, as it was before.

fhahn added inline comments.Jul 7 2023, 7:47 AM

llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
987	I'm not sure if/how this fits into the forked pointer handling. The forked pointer handling is to support cases where a GEP can have multiple base pointers (e.g. due to a select or phi). But in this case there are 2 distinct GEPs each with their separate base pointer.

Harbormaster completed remote builds in B243763: Diff 538121.Jul 7 2023, 7:47 AM

artagnon added inline comments.Jul 7 2023, 7:58 AM

llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
987	I saw the comment on top of `findForkedPointers()`, and was initially confused too, but I think `findForkedPointers()` can be extended to support the use-case at hand: yes, there are two distinct GEPs, with separate base pointers, but really, both of them are loading the same ptr with different offsets: I look inside the load to find the SCEV associated to `@ld`, and the case at hand does seem to be vectorized successfully. Is the vectorization of the test-case I've provided incorrect? If I weren't to modify `findForkedPointers()`, I'm left with SCEV AddExprs, which I have no idea how to handle independently of each other, and vectorization bails out saying that the expressions are not AddRecs.

nikic added inline comments.Jul 7 2023, 8:07 AM

llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
987	For this code to be analyzable the two loads from `@ld` need to be CSE/GVNed first. Once that has happened, SCEV/LAA will be able to understand the relation between the GEPs. You cannot simply assume that two loads from the same pointer will return the same value, as there may be clobbers in between.

artagnon added inline comments.Jul 7 2023, 8:11 AM

llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
987	Good point. I'm curious about why CSE/GVN didn't eliminate one of the loads in this example.

nikic added inline comments.Jul 7 2023, 8:14 AM

llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
987	That's because there is a store to `%arrayidx` in between, and it's possible for `%arrayidx` and `@ld` to be the same (if `@ld` is stored inside `@ld`). In some cases TBAA metadata can help avoid this, but as `char` is involved here, it's not applicable to your case.

Thanks for the reviews. The patch is definitely incorrect: abandoning.

Revision Contents

Path

Size

llvm/

lib/

Analysis/

LoopAccessAnalysis.cpp

29 lines

test/

Analysis/

LoopAccessAnalysis/

forked-pointers.ll

60 lines

Transforms/

LoopVectorize/

forked-pointers.ll

138 lines

Diff 538121

llvm/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 844 Lines • ▼ Show 20 Lines	static void findForkedSCEVs(
};		};

Instruction *I = cast<Instruction>(Ptr);		Instruction *I = cast<Instruction>(Ptr);
unsigned Opcode = I->getOpcode();		unsigned Opcode = I->getOpcode();
switch (Opcode) {		switch (Opcode) {
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
GetElementPtrInst *GEP = cast<GetElementPtrInst>(I);		GetElementPtrInst *GEP = cast<GetElementPtrInst>(I);
Type *SourceTy = GEP->getSourceElementType();		Type *SourceTy = GEP->getSourceElementType();
// We only handle base + single offset GEPs here for now.
// Not dealing with preexisting gathers yet, so no vectors.		if (SourceTy->isArrayTy())
if (I->getNumOperands() != 2 \|\| SourceTy->isVectorTy()) {		SourceTy = SourceTy->getArrayElementType();
		else if (SourceTy->isStructTy() \|\| SourceTy->isVectorTy()) {
		// Not dealing with preexisting gathers yet, so no vectors. No structs either.
ScevList.emplace_back(Scev, !isGuaranteedNotToBeUndefOrPoison(GEP));		ScevList.emplace_back(Scev, !isGuaranteedNotToBeUndefOrPoison(GEP));
break;		break;
}		}
SmallVector<PointerIntPair<const SCEV *, 1, bool>, 2> BaseScevs;		SmallVector<PointerIntPair<const SCEV *, 1, bool>, 2> BaseScevs;
SmallVector<PointerIntPair<const SCEV *, 1, bool>, 2> OffsetScevs;		SmallVector<PointerIntPair<const SCEV *, 1, bool>, 2> OffsetScevs;
findForkedSCEVs(SE, L, I->getOperand(0), BaseScevs, Depth);		findForkedSCEVs(SE, L, I->getOperand(0), BaseScevs, Depth);
findForkedSCEVs(SE, L, I->getOperand(1), OffsetScevs, Depth);
		for (unsigned i = 1; i < I->getNumOperands(); ++i)
		findForkedSCEVs(SE, L, I->getOperand(i), OffsetScevs, Depth);
		nikicUnsubmitted Not Done Reply Inline Actions This seems to assume that the same scale can be applied to all GEP indices, which is not correct. I think the only reason this happens to work for your example is that one of the indices is zero. If you want to handle this generically, what you should probably do is either a) use collectOffsets() to convert the GEP into scale multiply representation or b) get the SCEV for the GEP and analyze SCEVUnknowns for forking. nikic: This seems to assume that the same scale can be applied to all GEP indices, which is not…

// See if we need to freeze our fork...		// See if we need to freeze our fork...
bool NeedsFreeze = any_of(BaseScevs, UndefPoisonCheck) \|\|		bool NeedsFreeze = any_of(BaseScevs, UndefPoisonCheck) \|\|
any_of(OffsetScevs, UndefPoisonCheck);		any_of(OffsetScevs, UndefPoisonCheck);

// Check that we only have a single fork, on either the base or the offset.		// Check that we only have a single fork, on either the base or the offset.
// Copy the SCEV across for the one without a fork in order to generate		// Copy the SCEV across for the one without a fork in order to generate
// the full SCEV for both sides of the GEP.		// the full SCEV for both sides of the GEP.
if (OffsetScevs.size() == 2 && BaseScevs.size() == 1)		if (OffsetScevs.size() == 2 && BaseScevs.size() == 1)
BaseScevs.push_back(BaseScevs[0]);		BaseScevs.push_back(BaseScevs[0]);
else if (BaseScevs.size() == 2 && OffsetScevs.size() == 1)		else if (BaseScevs.size() == 2 && OffsetScevs.size() == 1)
OffsetScevs.push_back(OffsetScevs[0]);		OffsetScevs.push_back(OffsetScevs[0]);
else {		else {
ScevList.emplace_back(Scev, NeedsFreeze);		ScevList.emplace_back(Scev, NeedsFreeze);
break;		break;
}		}

// Find the pointer type we need to extend to.		// Find the pointer type we need to extend to.
Type *IntPtrTy = SE->getEffectiveSCEVType(		Type *IntPtrTy = SE->getEffectiveSCEVType(
SE->getSCEV(GEP->getPointerOperand())->getType());		SE->getSCEV(GEP->getPointerOperand())->getType());

// Find the size of the type being pointed to. We only have a single		// Find the size of the type being pointed to.
// index term (guarded above) so we don't need to index into arrays or
// structures, just get the size of the scalar value.
const SCEV *Size = SE->getSizeOfExpr(IntPtrTy, SourceTy);		const SCEV *Size = SE->getSizeOfExpr(IntPtrTy, SourceTy);

// Scale up the offsets by the size of the type, then add to the bases.		// Scale up the offsets by the size of the type, then add to the bases.
const SCEV *Scaled1 = SE->getMulExpr(		const SCEV *Scaled1 = SE->getMulExpr(
Size, SE->getTruncateOrSignExtend(get<0>(OffsetScevs[0]), IntPtrTy));		Size, SE->getTruncateOrSignExtend(get<0>(OffsetScevs[0]), IntPtrTy));
const SCEV *Scaled2 = SE->getMulExpr(		const SCEV *Scaled2 = SE->getMulExpr(
Size, SE->getTruncateOrSignExtend(get<0>(OffsetScevs[1]), IntPtrTy));		Size, SE->getTruncateOrSignExtend(get<0>(OffsetScevs[1]), IntPtrTy));
ScevList.emplace_back(SE->getAddExpr(get<0>(BaseScevs[0]), Scaled1),		ScevList.emplace_back(SE->getAddExpr(get<0>(BaseScevs[0]), Scaled1),
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	case Instruction::Sub: {
ScevList.emplace_back(		ScevList.emplace_back(
GetBinOpExpr(Opcode, get<0>(LScevs[0]), get<0>(RScevs[0])),		GetBinOpExpr(Opcode, get<0>(LScevs[0]), get<0>(RScevs[0])),
NeedsFreeze);		NeedsFreeze);
ScevList.emplace_back(		ScevList.emplace_back(
GetBinOpExpr(Opcode, get<0>(LScevs[1]), get<0>(RScevs[1])),		GetBinOpExpr(Opcode, get<0>(LScevs[1]), get<0>(RScevs[1])),
NeedsFreeze);		NeedsFreeze);
break;		break;
}		}
		case Instruction::Load: {
		SmallVector<PointerIntPair<const SCEV *, 1, bool>, 2> PtrOperands;
		findForkedSCEVs(SE, L, getLoadStorePointerOperand(I), PtrOperands, Depth);
		if (PtrOperands.size() == 1)
		ScevList.push_back(PtrOperands[0]);
		else
		ScevList.emplace_back(Scev, !isGuaranteedNotToBeUndefOrPoison(Ptr));
		break;
		nikicUnsubmitted Not Done Reply Inline Actions If I'm reading this code right, this is claiming that `load ptr, ptr %p` is the same as `%p`? That doesn't make sense. `load` should always be a root instruction, as it was before. nikic: If I'm reading this code right, this is claiming that `load ptr, ptr %p` is the same as `%p`?
		}
default:		default:
// Just return the current SCEV if we haven't handled the instruction yet.		// Just return the current SCEV if we haven't handled the instruction yet.
LLVM_DEBUG(dbgs() << "ForkedPtr unhandled instruction: " << *I << "\n");		LLVM_DEBUG(dbgs() << "ForkedPtr unhandled instruction: " << *I << "\n");
ScevList.emplace_back(Scev, !isGuaranteedNotToBeUndefOrPoison(Ptr));		ScevList.emplace_back(Scev, !isGuaranteedNotToBeUndefOrPoison(Ptr));
break;		break;
}		}
}		}

▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	if (isDependencyCheckNeeded()) {
DepId = LeaderId;		DepId = LeaderId;
} else		} else
// Each access has its own dependence set.		// Each access has its own dependence set.
DepId = RunningDepId++;		DepId = RunningDepId++;

bool IsWrite = Access.getInt();		bool IsWrite = Access.getInt();
RtCheck.insert(TheLoop, Ptr, PtrExpr, AccessTy, IsWrite, DepId, ASId, PSE,		RtCheck.insert(TheLoop, Ptr, PtrExpr, AccessTy, IsWrite, DepId, ASId, PSE,
NeedsFreeze);		NeedsFreeze);
LLVM_DEBUG(dbgs() << "LAA: Found a runtime check ptr:" << *Ptr << '\n');		LLVM_DEBUG(dbgs() << "LAA: Found a runtime check ptr: " << *Ptr << '\n');
}		}

return true;		return true;
}		}

bool AccessAnalysis::canCheckPtrAtRT(RuntimePointerChecking &RtCheck,		bool AccessAnalysis::canCheckPtrAtRT(RuntimePointerChecking &RtCheck,
ScalarEvolution SE, Loop TheLoop,		ScalarEvolution SE, Loop TheLoop,
const DenseMap<Value , const SCEV > &StridesMap,		const DenseMap<Value , const SCEV > &StridesMap,
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	if (NumWritePtrChecks == 0 \|\|
continue;		continue;
}		}

for (auto &Access : AccessInfos) {		for (auto &Access : AccessInfos) {
for (const auto &AccessTy : Accesses[Access]) {		for (const auto &AccessTy : Accesses[Access]) {
if (!createCheckForAccess(RtCheck, Access, AccessTy, StridesMap,		if (!createCheckForAccess(RtCheck, Access, AccessTy, StridesMap,
DepSetId, TheLoop, RunningDepId, ASId,		DepSetId, TheLoop, RunningDepId, ASId,
ShouldCheckWrap, false)) {		ShouldCheckWrap, false)) {
LLVM_DEBUG(dbgs() << "LAA: Can't find bounds for ptr:"		LLVM_DEBUG(dbgs() << "LAA: Can't find bounds for ptr: "
<< *Access.getPointer() << '\n');		<< *Access.getPointer() << '\n');
Retries.push_back({Access, AccessTy});		Retries.push_back({Access, AccessTy});
CanDoAliasSetRT = false;		CanDoAliasSetRT = false;
}		}
}		}
}		}

// Note that this function computes CanDoRT and MayNeedRTCheck		// Note that this function computes CanDoRT and MayNeedRTCheck
▲ Show 20 Lines • Show All 1,738 Lines • Show Last 20 Lines

llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll

Show First 20 Lines • Show All 930 Lines • ▼ Show 20 Lines	for.body:
%dummy.load = load double, ptr %fptr, align 8		%dummy.load = load double, ptr %fptr, align 8
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond = icmp eq i64 %iv.next, %N		%exitcond = icmp eq i64 %iv.next, %N
br i1 %exitcond, label %exit, label %for.body		br i1 %exitcond, label %exit, label %for.body

exit:		exit:
ret void		ret void
}		}

		@ld = global ptr null

		; CHECK-LABEL: Loop access info in function 'forked_ptrs_gep_load':
		; CHECK-NEXT: while.body:
		; CHECK-NEXT: Memory dependences are safe with run-time checks
		; CHECK-NEXT: Dependences:
		; CHECK-NEXT: Run-time memory checks:
		; CHECK-NEXT: Check 0:
		; CHECK-NEXT: Comparing group ([[GROUP_A:.+]]):
		; CHECK-NEXT: %arrayidx = getelementptr inbounds [2048 x i8], ptr %0, i64 0, i64 %indvars.iv
		; CHECK-NEXT: %arrayidx = getelementptr inbounds [2048 x i8], ptr %0, i64 0, i64 %indvars.iv
		; CHECK-NEXT: Against group ([[GROUP_B:.+]]):
		; CHECK-NEXT: %arrayidx4 = getelementptr inbounds [2048 x i8], ptr %2, i64 0, i64 %1
		; CHECK-NEXT: %arrayidx4 = getelementptr inbounds [2048 x i8], ptr %2, i64 0, i64 %1
		; CHECK-NEXT: Check 1:
		; CHECK-NEXT: Comparing group ([[GROUP_A]]):
		; CHECK-NEXT: %arrayidx = getelementptr inbounds [2048 x i8], ptr %0, i64 0, i64 %indvars.iv
		; CHECK-NEXT: %arrayidx = getelementptr inbounds [2048 x i8], ptr %0, i64 0, i64 %indvars.iv
		; CHECK-NEXT: Against group ([[GROUP_C:.+]]):
		; CHECK-NEXT: @ld = global ptr null
		; CHECK-NEXT: Check 2:
		; CHECK-NEXT: Comparing group ([[GROUP_B]]):
		; CHECK-NEXT: %arrayidx4 = getelementptr inbounds [2048 x i8], ptr %2, i64 0, i64 %1
		; CHECK-NEXT: %arrayidx4 = getelementptr inbounds [2048 x i8], ptr %2, i64 0, i64 %1
		; CHECK-NEXT: Against group ([[GROUP_C]]):
		; CHECK-NEXT: @ld = global ptr null
		; CHECK-NEXT: Grouped accesses:
		; CHECK-NEXT: Group [[GROUP_A]]:
		; CHECK-NEXT: (Low: @ld High: (2047 + @ld))
		; CHECK-NEXT: Member: @ld
		; CHECK-NEXT: Member: {@ld,+,2}<nw><%while.body>
		; CHECK-NEXT: Group [[GROUP_B]]:
		; CHECK-NEXT: (Low: @ld High: (2048 + @ld))
		; CHECK-NEXT: Member: @ld
		; CHECK-NEXT: Member: {(1 + @ld)<nuw><nsw>,+,2}<nw><%while.body>
		; CHECK-NEXT: Group [[GROUP_C]]:
		; CHECK-NEXT: (Low: @ld High: (8 + @ld)<nuw>)
		; CHECK-NEXT: Member: @ld
		; CHECK-EMPTY:
		define void @forked_ptrs_gep_load() {
		entry:
		br label %while.body

		while.body:
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %while.body ]
		%0 = load ptr, ptr @ld
		%1 = or i64 %indvars.iv, 1
		%arrayidx = getelementptr inbounds [2048 x i8], ptr %0, i64 0, i64 %indvars.iv
		fhahnUnsubmitted Not Done Reply Inline Actions I'm not sure if/how this fits into the forked pointer handling. The forked pointer handling is to support cases where a GEP can have multiple base pointers (e.g. due to a select or phi). But in this case there are 2 distinct GEPs each with their separate base pointer. fhahn: I'm not sure if/how this fits into the forked pointer handling. The forked pointer handling is…
		artagnonAuthorUnsubmitted Done Reply Inline Actions I saw the comment on top of `findForkedPointers()`, and was initially confused too, but I think `findForkedPointers()` can be extended to support the use-case at hand: yes, there are two distinct GEPs, with separate base pointers, but really, both of them are loading the same ptr with different offsets: I look inside the load to find the SCEV associated to `@ld`, and the case at hand does seem to be vectorized successfully. Is the vectorization of the test-case I've provided incorrect? If I weren't to modify `findForkedPointers()`, I'm left with SCEV AddExprs, which I have no idea how to handle independently of each other, and vectorization bails out saying that the expressions are not AddRecs. artagnon: I saw the comment on top of `findForkedPointers()`, and was initially confused too, but I think…
		nikicUnsubmitted Not Done Reply Inline Actions For this code to be analyzable the two loads from `@ld` need to be CSE/GVNed first. Once that has happened, SCEV/LAA will be able to understand the relation between the GEPs. You cannot simply assume that two loads from the same pointer will return the same value, as there may be clobbers in between. nikic: For this code to be analyzable the two loads from `@ld` need to be CSE/GVNed first. Once that…
		artagnonAuthorUnsubmitted Done Reply Inline Actions Good point. I'm curious about why CSE/GVN didn't eliminate one of the loads in this example. artagnon: Good point. I'm curious about why CSE/GVN didn't eliminate one of the loads in this example.
		nikicUnsubmitted Not Done Reply Inline Actions That's because there is a store to `%arrayidx` in between, and it's possible for `%arrayidx` and `@ld` to be the same (if `@ld` is stored inside `@ld`). In some cases TBAA metadata can help avoid this, but as `char` is involved here, it's not applicable to your case. nikic: That's because there is a store to `%arrayidx` in between, and it's possible for `%arrayidx`…
		store i8 3, ptr %arrayidx
		%2 = load ptr, ptr @ld
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
		%arrayidx4 = getelementptr inbounds [2048 x i8], ptr %2, i64 0, i64 %1
		store i8 4, ptr %arrayidx4
		%cmp = icmp ult i64 %indvars.iv, 2046
		br i1 %cmp, label %while.body, label %while.end

		while.end:
		ret void
		}

llvm/test/Transforms/LoopVectorize/forked-pointers.ll

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[INDEX]], 1		; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[INDEX]], 1
; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[INDEX]], 2		; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[INDEX]], 3		; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[INDEX]], 3
; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[PREDS]], i64 [[INDEX]]		; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[PREDS]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP6]], align 4		; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP6]], align 4
; CHECK-NEXT: [[TMP8:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD]], zeroinitializer		; CHECK-NEXT: [[TMP7:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD]], zeroinitializer
; CHECK-NEXT: [[TMP9:%.*]] = select <4 x i1> [[TMP8]], <4 x ptr> [[BROADCAST_SPLAT]], <4 x ptr> [[BROADCAST_SPLAT10]]		; CHECK-NEXT: [[TMP8:%.*]] = select <4 x i1> [[TMP7]], <4 x ptr> [[BROADCAST_SPLAT]], <4 x ptr> [[BROADCAST_SPLAT10]]
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x ptr> [[TMP9]], i64 0		; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x ptr> [[TMP8]], i64 0
; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds float, ptr [[TMP10]], i64 [[INDEX]]		; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds float, ptr [[TMP9]], i64 [[INDEX]]
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x ptr> [[TMP9]], i64 1		; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x ptr> [[TMP8]], i64 1
; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds float, ptr [[TMP12]], i64 [[TMP3]]		; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds float, ptr [[TMP11]], i64 [[TMP3]]
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x ptr> [[TMP9]], i64 2		; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x ptr> [[TMP8]], i64 2
; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds float, ptr [[TMP14]], i64 [[TMP4]]		; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds float, ptr [[TMP13]], i64 [[TMP4]]
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x ptr> [[TMP9]], i64 3		; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x ptr> [[TMP8]], i64 3
; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds float, ptr [[TMP16]], i64 [[TMP5]]		; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds float, ptr [[TMP15]], i64 [[TMP5]]
; CHECK-NEXT: [[TMP18:%.*]] = load float, ptr [[TMP11]], align 4		; CHECK-NEXT: [[TMP17:%.*]] = load float, ptr [[TMP10]], align 4
; CHECK-NEXT: [[TMP19:%.*]] = load float, ptr [[TMP13]], align 4		; CHECK-NEXT: [[TMP18:%.*]] = load float, ptr [[TMP12]], align 4
; CHECK-NEXT: [[TMP20:%.*]] = load float, ptr [[TMP15]], align 4		; CHECK-NEXT: [[TMP19:%.*]] = load float, ptr [[TMP14]], align 4
; CHECK-NEXT: [[TMP21:%.*]] = load float, ptr [[TMP17]], align 4		; CHECK-NEXT: [[TMP20:%.*]] = load float, ptr [[TMP16]], align 4
; CHECK-NEXT: [[TMP22:%.*]] = insertelement <4 x float> poison, float [[TMP18]], i64 0		; CHECK-NEXT: [[TMP21:%.*]] = insertelement <4 x float> poison, float [[TMP17]], i64 0
; CHECK-NEXT: [[TMP23:%.*]] = insertelement <4 x float> [[TMP22]], float [[TMP19]], i64 1		; CHECK-NEXT: [[TMP22:%.*]] = insertelement <4 x float> [[TMP21]], float [[TMP18]], i64 1
; CHECK-NEXT: [[TMP24:%.*]] = insertelement <4 x float> [[TMP23]], float [[TMP20]], i64 2		; CHECK-NEXT: [[TMP23:%.*]] = insertelement <4 x float> [[TMP22]], float [[TMP19]], i64 2
; CHECK-NEXT: [[TMP25:%.*]] = insertelement <4 x float> [[TMP24]], float [[TMP21]], i64 3		; CHECK-NEXT: [[TMP24:%.*]] = insertelement <4 x float> [[TMP23]], float [[TMP20]], i64 3
; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds float, ptr [[DEST_FR]], i64 [[INDEX]]		; CHECK-NEXT: [[TMP25:%.*]] = getelementptr inbounds float, ptr [[DEST_FR]], i64 [[INDEX]]
; CHECK-NEXT: store <4 x float> [[TMP25]], ptr [[TMP26]], align 4		; CHECK-NEXT: store <4 x float> [[TMP24]], ptr [[TMP25]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100		; CHECK-NEXT: [[TMP26:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
; CHECK-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]		; CHECK-NEXT: br i1 [[TMP26]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]		; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 100, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]		; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 100, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.cond.cleanup:		; CHECK: for.cond.cleanup:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]		; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[PREDS]], i64 [[INDVARS_IV]]		; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[PREDS]], i64 [[INDVARS_IV]]
; CHECK-NEXT: [[TMP29:%.*]] = load i32, ptr [[ARRAYIDX]], align 4		; CHECK-NEXT: [[TMP27:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
; CHECK-NEXT: [[CMP1_NOT:%.*]] = icmp eq i32 [[TMP29]], 0		; CHECK-NEXT: [[CMP1_NOT:%.*]] = icmp eq i32 [[TMP27]], 0
; CHECK-NEXT: [[SPEC_SELECT:%.*]] = select i1 [[CMP1_NOT]], ptr [[BASE2_FR]], ptr [[BASE1_FR]]		; CHECK-NEXT: [[SPEC_SELECT:%.*]] = select i1 [[CMP1_NOT]], ptr [[BASE2_FR]], ptr [[BASE1_FR]]
; CHECK-NEXT: [[DOTSINK_IN:%.*]] = getelementptr inbounds float, ptr [[SPEC_SELECT]], i64 [[INDVARS_IV]]		; CHECK-NEXT: [[DOTSINK_IN:%.*]] = getelementptr inbounds float, ptr [[SPEC_SELECT]], i64 [[INDVARS_IV]]
; CHECK-NEXT: [[DOTSINK:%.*]] = load float, ptr [[DOTSINK_IN]], align 4		; CHECK-NEXT: [[DOTSINK:%.*]] = load float, ptr [[DOTSINK_IN]], align 4
; CHECK-NEXT: [[TMP30:%.*]] = getelementptr inbounds float, ptr [[DEST_FR]], i64 [[INDVARS_IV]]		; CHECK-NEXT: [[TMP28:%.*]] = getelementptr inbounds float, ptr [[DEST_FR]], i64 [[INDVARS_IV]]
; CHECK-NEXT: store float [[DOTSINK]], ptr [[TMP30]], align 4		; CHECK-NEXT: store float [[DOTSINK]], ptr [[TMP28]], align 4
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1		; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 100		; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 100
; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]		; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
;		;
entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup:		for.cond.cleanup:
ret void		ret void

for.body:		for.body:
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	for.body: ; preds = %entry, %for.body
%idxprom213 = zext i32 %offset.0 to i64		%idxprom213 = zext i32 %offset.0 to i64
%arrayidx3 = getelementptr inbounds float, ptr %Base, i64 %idxprom213		%arrayidx3 = getelementptr inbounds float, ptr %Base, i64 %idxprom213
%2 = load float, ptr %arrayidx3, align 4		%2 = load float, ptr %arrayidx3, align 4
%arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv		%arrayidx5 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
store float %2, ptr %arrayidx5, align 4		store float %2, ptr %arrayidx5, align 4
%exitcond.not = icmp eq i64 %indvars.iv.next, 100		%exitcond.not = icmp eq i64 %indvars.iv.next, 100
br i1 %exitcond.not, label %for.cond.cleanup, label %for.body		br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}		}

		@ld = global ptr null

		define void @forked_ptrs_gep_3_operands() {
		; CHECK-LABEL: @forked_ptrs_gep_3_operands(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
		; CHECK: vector.memcheck:
		; CHECK-NEXT: br i1 or (i1 or (i1 and (i1 icmp ugt (ptr getelementptr (ptr, ptr @ld, i64 256), ptr @ld), i1 icmp ugt (ptr getelementptr (i8, ptr @ld, i64 2047), ptr @ld)), i1 icmp ugt (ptr getelementptr (i8, ptr @ld, i64 2047), ptr @ld)), i1 icmp ugt (ptr getelementptr (ptr, ptr @ld, i64 256), ptr @ld)), label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
		; CHECK: vector.ph:
		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK: vector.body:
		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 2, i64 4, i64 6>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
		; CHECK-NEXT: [[TMP0:%.*]] = or i64 [[OFFSET_IDX]], 2
		; CHECK-NEXT: [[TMP1:%.*]] = or i64 [[OFFSET_IDX]], 4
		; CHECK-NEXT: [[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 6
		; CHECK-NEXT: [[TMP3:%.*]] = load ptr, ptr @ld, align 8, !alias.scope !4
		; CHECK-NEXT: [[TMP4:%.*]] = or <4 x i64> [[VEC_IND]], <i64 1, i64 1, i64 1, i64 1>
		; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds [2048 x i8], ptr [[TMP3]], i64 0, i64 [[OFFSET_IDX]]
		; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds [2048 x i8], ptr [[TMP3]], i64 0, i64 [[TMP0]]
		; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds [2048 x i8], ptr [[TMP3]], i64 0, i64 [[TMP1]]
		; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds [2048 x i8], ptr [[TMP3]], i64 0, i64 [[TMP2]]
		; CHECK-NEXT: store i8 3, ptr [[TMP5]], align 1, !alias.scope !7, !noalias !9
		; CHECK-NEXT: store i8 3, ptr [[TMP6]], align 1, !alias.scope !7, !noalias !9
		; CHECK-NEXT: store i8 3, ptr [[TMP7]], align 1, !alias.scope !7, !noalias !9
		; CHECK-NEXT: store i8 3, ptr [[TMP8]], align 1, !alias.scope !7, !noalias !9
		; CHECK-NEXT: [[TMP9:%.*]] = load ptr, ptr @ld, align 8, !alias.scope !4
		; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i64> [[TMP4]], i64 0
		; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds [2048 x i8], ptr [[TMP9]], i64 0, i64 [[TMP10]]
		; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i64> [[TMP4]], i64 1
		; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds [2048 x i8], ptr [[TMP9]], i64 0, i64 [[TMP12]]
		; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i64> [[TMP4]], i64 2
		; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds [2048 x i8], ptr [[TMP9]], i64 0, i64 [[TMP14]]
		; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x i64> [[TMP4]], i64 3
		; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds [2048 x i8], ptr [[TMP9]], i64 0, i64 [[TMP16]]
		; CHECK-NEXT: store i8 4, ptr [[TMP11]], align 1, !alias.scope !11, !noalias !4
		; CHECK-NEXT: store i8 4, ptr [[TMP13]], align 1, !alias.scope !11, !noalias !4
		; CHECK-NEXT: store i8 4, ptr [[TMP15]], align 1, !alias.scope !11, !noalias !4
		; CHECK-NEXT: store i8 4, ptr [[TMP17]], align 1, !alias.scope !11, !noalias !4
		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 8, i64 8, i64 8, i64 8>
		; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
		; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
		; CHECK: middle.block:
		; CHECK-NEXT: br i1 true, label [[WHILE_END:%.*]], label [[SCALAR_PH]]
		; CHECK: scalar.ph:
		; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 2048, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
		; CHECK-NEXT: br label [[WHILE_BODY:%.*]]
		; CHECK: while.body:
		; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[WHILE_BODY]] ]
		; CHECK-NEXT: [[TMP19:%.*]] = load ptr, ptr @ld, align 8
		; CHECK-NEXT: [[TMP20:%.*]] = or i64 [[INDVARS_IV]], 1
		; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [2048 x i8], ptr [[TMP19]], i64 0, i64 [[INDVARS_IV]]
		; CHECK-NEXT: store i8 3, ptr [[ARRAYIDX]], align 1
		; CHECK-NEXT: [[TMP21:%.*]] = load ptr, ptr @ld, align 8
		; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 2
		; CHECK-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds [2048 x i8], ptr [[TMP21]], i64 0, i64 [[TMP20]]
		; CHECK-NEXT: store i8 4, ptr [[ARRAYIDX4]], align 1
		; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INDVARS_IV]], 2046
		; CHECK-NEXT: br i1 [[CMP]], label [[WHILE_BODY]], label [[WHILE_END]], !llvm.loop [[LOOP13:![0-9]+]]
		; CHECK: while.end:
		; CHECK-NEXT: ret void
		;
		entry:
		br label %while.body

		while.body:
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %while.body ]
		%0 = load ptr, ptr @ld
		%1 = or i64 %indvars.iv, 1
		%arrayidx = getelementptr inbounds [2048 x i8], ptr %0, i64 0, i64 %indvars.iv
		store i8 3, ptr %arrayidx
		%2 = load ptr, ptr @ld
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
		%arrayidx4 = getelementptr inbounds [2048 x i8], ptr %2, i64 0, i64 %1
		store i8 4, ptr %arrayidx4
		%cmp = icmp ult i64 %indvars.iv, 2046
		br i1 %cmp, label %while.body, label %while.end

		while.end:
		ret void
		}