This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineInternal.h
1/1
InstructionCombining.cpp
-
test/Transforms/
-
Transforms/
-
InstCombine/
3/3
gep-canonicalize-constant-indices.ll
-
gep-combine-loop-invariant.ll
-
gep-merge-constant-indices.ll
-
opaque-ptr.ll
-
LoopVectorize/
-
AArch64/
-
vector-reverse-mask4.ll
-
interleaved-accesses.ll

Differential D129734

[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the front
AbandonedPublic

Authored by huangjd on Jul 14 2022, 12:07 AM.

Download Raw Diff

Details

Reviewers

davidxl
Carrot
nikic
spatel
reames
aeubanks

Summary

Alternative implementation to D125845. This way the code is cleaner as it will not interfere with LICM, because swapping constant GEP to the front will actually allow LICM to move it out of the loop. In addition, it is possibly more beneficial to codegen since evaluation of variable indices is pushed to the back, reducing potential pipeline stall on some architectures.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,050 ms	x64 debian > MLIR.Examples/standalone::test.toy
	60,070 ms	x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp

Event Timeline

huangjd created this revision.Jul 14 2022, 12:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2022, 12:07 AM

Herald added subscribers: arphaman, hiraditya. · View Herald Transcript

huangjd requested review of this revision.Jul 14 2022, 12:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2022, 12:07 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Note: this is a NEW patch and the implementation is different from D125845.

huangjd added a parent revision: D125934: [InstCombine] Changing constant-indexed GEP of GEP to i8* for merging.Jul 14 2022, 12:08 AM

huangjd added inline comments.Jul 14 2022, 12:15 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1981	Apparently this statement actually causes a bug, which inhibits merging of constant-index GEP in vector-reverse-mask4.ll and interleaved-accesses.ll
llvm/test/Transforms/InstCombine/gep-canonicalize-constant-indices.ll
27	test is no longer relevant since we are swapping constant-indexed GEP to the front
52–54	Simplified unnecessarily complicated test
55	C

Remove now irrelevant test

I think this will do worse in practice, because it breaks GVN. Imagine you have accesses like ary[i].x, ary[i].y, ary[i].z represented with two GEPs. If constants are canonicalized to the end, then the ary[i] GEP is the same all three times and can be CSEd/GVNd. If the constants are canonicalized to the start, then these all become distinct GEPs on different bases.

Harbormaster completed remote builds in B175313: Diff 444537.Jul 14 2022, 1:57 AM

In D129734#3650863, @nikic wrote:

I think this will do worse in practice, because it breaks GVN. Imagine you have accesses like ary[i].x, ary[i].y, ary[i].z represented with two GEPs. If constants are canonicalized to the end, then the ary[i] GEP is the same all three times and can be CSEd/GVNd. If the constants are canonicalized to the start, then these all become distinct GEPs on different bases.

Expressions like ary[i].x will always be merged in visitGEPofGEP into getelementptr type, ptr ary, i64 i, i32 0 even without my patch. In fact almost all valid C++ array/member access expression will be coalesced into one multi-index GEP, unless you are casting the pointer into another type that it actually isn't, and try to do more pointer arithmetic, which is probably intentional UB anyways.

If that breaks GVN then I think GEP transform pass should be pushed back so that CSE happens first. In the other hand, CSE should actually not apply to accesses in your case, because in most architecture load instruction can take a constant offset, so it wouldn't be necessary to compute the intermediate address of ary[i].

Can you provide an example showing how this is better than the alternative patch (e.g. helping LICM)?

Also provide an example showing nikic's concern is addressed?

Consider the following practical example

struct Vec {
    float  x, y, z;
};

float f1(Vec vecs[], size_t n) {
    float sum = 0;
    for (size_t i = 0; i < n; i++) {
        float g = 0;
        g += vecs[i].x * vecs[i].x;
        g += vecs[i].y * vecs[i].y;
        g += vecs[i].z * vecs[i].z;
        sum += sqrtf(g);
    }
    return sum;
}

llvm generates the following with optimizations without my patch

define dso_local noundef float @_Z2f1P3Vecm(ptr nocapture noundef readonly %0, i64 noundef %1) local_unnamed_addr #0 !dbg !361 {
  %3 = icmp eq i64 %1, 0, !dbg !385
  br i1 %3, label %4, label %6, !dbg !386

4: ; preds = %6, %2
  %5 = phi float [ 0.000000e+00, %2 ], [ %19, %6 ], !dbg !383
  ret float %5, !dbg !387

6: ; preds = %2, %6
  %7 = phi float [ %19, %6 ], [ 0.000000e+00, %2 ]
  %8 = phi i64 [ %20, %6 ], [ 0, %2 ]
  %9 = getelementptr inbounds %struct.Vec, ptr %0, i64 %8, !dbg !389
  %10 = load float, ptr %9, align 4, !dbg !390, !tbaa !391
  %11 = tail call float @llvm.fmuladd.f32(float %10, float %10, float 0.000000e+00), !dbg !396
  %12 = getelementptr inbounds %struct.Vec, ptr %0, i64 %8, i32 1, !dbg !397
  %13 = load float, ptr %12, align 4, !dbg !397, !tbaa !398
  %14 = tail call float @llvm.fmuladd.f32(float %13, float %13, float %11), !dbg !399
  %15 = getelementptr inbounds %struct.Vec, ptr %0, i64 %8, i32 2, !dbg !400
  %16 = load float, ptr %15, align 4, !dbg !400, !tbaa !401
  %17 = tail call float @llvm.fmuladd.f32(float %16, float %16, float %14), !dbg !402
  %18 = tail call float @llvm.sqrt.f32(float %17), !dbg !403
  %19 = fadd float %7, %18, !dbg !404
  %20 = add nuw i64 %8, 1, !dbg !405
  %21 = icmp eq i64 %20, %1, !dbg !385
  br i1 %21, label %4, label %6, !dbg !386, !llvm.loop !406
}

GEP of GEP are merged at a very early pass before common subexpression, and I actually couldn't write any C++ code that would make LLVM generate a GEP of GEP where the second one has constant index.

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptAug 31 2022, 6:22 PM

The example provided shows that there is a missing opportunity for CSE (of address computation of Vec[i]). The decision on GEP ordering should probably better be based on more general reassociation analysis for enabling more CSE/LICM.

follow up @davidxl
For the original issue where a chain of 3 or more gep with the first and last being constant indexed cannot be simplified, this patch can handle such case while D125845 can't
Consider the following code

01   b = gep a, const_index
02   use(b)
03   c = gep b, var_index
04   d = gep c, const_index
05   ret d

In this patch line 4 is swapped with line 3, and then it is constant-index folded with line 1 (line 1 is unchanged, and the old line 4 is replaced with gep a with new constant indices, breaking the dependency of b, and result in better codegen.

But if D125845 is used, then line 1 is supposed to be swapped with line 3, but since b has more than 1 use, the swap is inhibited, and nothing can be optimized.

In D129734#3853403, @huangjd wrote:
follow up @davidxl
For the original issue where a chain of 3 or more gep with the first and last being constant indexed cannot be simplified, this patch can handle such case while D125845 can't
Consider the following code
01   b = gep a, const_index
02   use(b)
03   c = gep b, var_index
04   d = gep c, const_index
05   ret d
In this patch line 4 is swapped with line 3, and then it is constant-index folded with line 1 (line 1 is unchanged, and the old line 4 is replaced with gep a with new constant indices, breaking the dependency of b, and result in better codegen.

But if D125845 is used, then line 1 is supposed to be swapped with line 3, but since b has more than 1 use, the swap is inhibited, and nothing can be optimized.

In theory, 01 should be propagated into 03 and then 04 after which 03 is deleted.

To compare two patches, I think it is worth collecting some benchmark numbers.

huangjd abandoned this revision.Oct 20 2022, 11:04 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineInternal.h

1 line

InstructionCombining.cpp

39 lines

test/

Transforms/

InstCombine/

gep-canonicalize-constant-indices.ll

148 lines

gep-combine-loop-invariant.ll

12 lines

gep-merge-constant-indices.ll

8 lines

opaque-ptr.ll

12 lines

LoopVectorize/

AArch64/

vector-reverse-mask4.ll

6 lines

interleaved-accesses.ll

4 lines

Diff 444537

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	public:
Instruction *visitCallInst(CallInst &CI);		Instruction *visitCallInst(CallInst &CI);
Instruction *visitInvokeInst(InvokeInst &II);		Instruction *visitInvokeInst(InvokeInst &II);
Instruction *visitCallBrInst(CallBrInst &CBI);		Instruction *visitCallBrInst(CallBrInst &CBI);

Instruction *SliceUpIllegalIntegerPHI(PHINode &PN);		Instruction *SliceUpIllegalIntegerPHI(PHINode &PN);
Instruction *visitPHINode(PHINode &PN);		Instruction *visitPHINode(PHINode &PN);
Instruction *visitGetElementPtrInst(GetElementPtrInst &GEP);		Instruction *visitGetElementPtrInst(GetElementPtrInst &GEP);
Instruction visitGEPOfGEP(GetElementPtrInst &GEP, GEPOperator Src);		Instruction visitGEPOfGEP(GetElementPtrInst &GEP, GEPOperator Src);
		Instruction swapGEPOfGEP(GetElementPtrInst &GEP, GEPOperator Src);
Instruction visitGEPOfBitcast(BitCastInst BCI, GetElementPtrInst &GEP);		Instruction visitGEPOfBitcast(BitCastInst BCI, GetElementPtrInst &GEP);
Instruction *visitAllocaInst(AllocaInst &AI);		Instruction *visitAllocaInst(AllocaInst &AI);
Instruction *visitAllocSite(Instruction &FI);		Instruction *visitAllocSite(Instruction &FI);
Instruction *visitFree(CallInst &FI);		Instruction *visitFree(CallInst &FI);
Instruction *visitLoadInst(LoadInst &LI);		Instruction *visitLoadInst(LoadInst &LI);
Instruction *visitStoreInst(StoreInst &SI);		Instruction *visitStoreInst(StoreInst &SI);
Instruction *visitAtomicRMWInst(AtomicRMWInst &SI);		Instruction *visitAtomicRMWInst(AtomicRMWInst &SI);
Instruction *visitUnconditionalBranchInst(BranchInst &BI);		Instruction *visitUnconditionalBranchInst(BranchInst &BI);
▲ Show 20 Lines • Show All 659 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 1,972 Lines • ▼ Show 20 Lines	if (LI) {
GEP.getSourceElementType(), NewSrc, {SO1});		GEP.getSourceElementType(), NewSrc, {SO1});
NewGEP->setIsInBounds(IsInBounds);		NewGEP->setIsInBounds(IsInBounds);
return NewGEP;		return NewGEP;
}		}
}		}
}		}
}		}

// Note that if our source is a gep chain itself then we wait for that
huangjdAuthorUnsubmitted Done Reply Inline Actions Apparently this statement actually causes a bug, which inhibits merging of constant-index GEP in vector-reverse-mask4.ll and interleaved-accesses.ll huangjd: Apparently this statement actually causes a bug, which inhibits merging of constant-index GEP…
// chain to be resolved before we perform this transformation. This
// avoids us creating a TON of code in some cases.
if (auto *SrcGEP = dyn_cast<GEPOperator>(Src->getOperand(0)))
if (SrcGEP->getNumOperands() == 2 && shouldMergeGEPs(Src, SrcGEP))
return nullptr; // Wait until our source is folded to completion.

// For constant GEPs, use a more general offset-based folding approach.		// For constant GEPs, use a more general offset-based folding approach.
// Only do this for opaque pointers, as the result element type may change.		// Only do this for opaque pointers, as the result element type may change.
Type *PtrTy = Src->getType()->getScalarType();		Type *PtrTy = Src->getType()->getScalarType();
if (PtrTy->isOpaquePointerTy() && GEP.hasAllConstantIndices() &&		if (PtrTy->isOpaquePointerTy() && GEP.hasAllConstantIndices() &&
(Src->hasOneUse() \|\| Src->hasAllConstantIndices())) {		(Src->hasOneUse() \|\| Src->hasAllConstantIndices())) {
// Split Src into a variable part and a constant suffix.		// Split Src into a variable part and a constant suffix.
gep_type_iterator GTI = gep_type_begin(*Src);		gep_type_iterator GTI = gep_type_begin(*Src);
Type *BaseType = GTI.getIndexedType();		Type *BaseType = GTI.getIndexedType();
▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	return isMergedGEPInBounds(Src, cast<GEPOperator>(&GEP))
GEP.getName())		GEP.getName())
: GetElementPtrInst::Create(Src->getSourceElementType(),		: GetElementPtrInst::Create(Src->getSourceElementType(),
Src->getOperand(0), Indices,		Src->getOperand(0), Indices,
GEP.getName());		GEP.getName());

return nullptr;		return nullptr;
}		}

		Instruction *InstCombinerImpl::swapGEPOfGEP(GetElementPtrInst &GEP,
		GEPOperator *Src) {
		// If GEP of GEP cannot be combined into one instruction, and the second GEP
		// is constant-indexed, we perform canonicalize swapping to move it before the
		// non-constant-indexed GEP. This potentially allows the application of some
		// optimizations in visitGEPofGEP.
		// Only swap if it doesn't violate use-def rule, and pointer types are
		// compatible (opaque ptr or GEP and Src must be same type, meaning they must
		// both have 1 index).
		if (Src->hasOneUse() &&
		((Src->getPointerOperandType()->isOpaquePointerTy() &&
		GEP.getPointerOperandType()->isOpaquePointerTy()) \|\|
		(Src->getNumIndices() == 1 && GEP.getNumIndices() == 1)) &&
		!Src->hasAllConstantIndices() && GEP.hasAllConstantIndices()) {
		// Cannot guarantee inbounds after swapping because the non-const GEP can
		// have arbitrary sign.
		Value *NewSrc =
		Builder.CreateGEP(GEP.getSourceElementType(), Src->getOperand(0),
		SmallVector<Value *>(GEP.indices()), Src->getName());
		GetElementPtrInst *NewGEP = GetElementPtrInst::Create(
		Src->getSourceElementType(), NewSrc,
		SmallVector<Value *>(Src->indices()), GEP.getName());
		return NewGEP;
		}
		return nullptr;
		}

// Note that we may have also stripped an address space cast in between.		// Note that we may have also stripped an address space cast in between.
Instruction InstCombinerImpl::visitGEPOfBitcast(BitCastInst BCI,		Instruction InstCombinerImpl::visitGEPOfBitcast(BitCastInst BCI,
GetElementPtrInst &GEP) {		GetElementPtrInst &GEP) {
// With opaque pointers, there is no pointer element type we can use to		// With opaque pointers, there is no pointer element type we can use to
// adjust the GEP type.		// adjust the GEP type.
PointerType *SrcType = cast<PointerType>(BCI->getSrcTy());		PointerType *SrcType = cast<PointerType>(BCI->getSrcTy());
if (SrcType->isOpaque())		if (SrcType->isOpaque())
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 268 Lines • ▼ Show 20 Lines	if (auto *PN = dyn_cast<PHINode>(PtrOp)) {
}		}

GEP.getParent()->getInstList().insert(		GEP.getParent()->getInstList().insert(
GEP.getParent()->getFirstInsertionPt(), NewGEP);		GEP.getParent()->getFirstInsertionPt(), NewGEP);
replaceOperand(GEP, 0, NewGEP);		replaceOperand(GEP, 0, NewGEP);
PtrOp = NewGEP;		PtrOp = NewGEP;
}		}

if (auto *Src = dyn_cast<GEPOperator>(PtrOp))		if (auto *Src = dyn_cast<GEPOperator>(PtrOp)) {
if (Instruction *I = visitGEPOfGEP(GEP, Src))		if (Instruction *I = visitGEPOfGEP(GEP, Src))
return I;		return I;
		else if (Instruction *I = swapGEPOfGEP(GEP, Src))
		return I;
		}

// Skip if GEP source element type is scalable. The type alloc size is unknown		// Skip if GEP source element type is scalable. The type alloc size is unknown
// at compile-time.		// at compile-time.
if (GEP.getNumIndices() == 1 && !IsGEPSrcEleScalable) {		if (GEP.getNumIndices() == 1 && !IsGEPSrcEleScalable) {
unsigned AS = GEP.getPointerAddressSpace();		unsigned AS = GEP.getPointerAddressSpace();
if (GEP.getOperand(1)->getType()->getScalarSizeInBits() ==		if (GEP.getOperand(1)->getType()->getScalarSizeInBits() ==
DL.getIndexSizeInBits(AS)) {		DL.getIndexSizeInBits(AS)) {
uint64_t TyAllocSize = DL.getTypeAllocSize(GEPEltType).getFixedSize();		uint64_t TyAllocSize = DL.getTypeAllocSize(GEPEltType).getFixedSize();
▲ Show 20 Lines • Show All 2,254 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/gep-canonicalize-constant-indices.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=instcombine -opaque-pointers -S \| FileCheck %s			; RUN: opt < %s -instcombine -licm -opaque-pointers -S \| FileCheck %s

	; Constant-indexed GEP instructions in a chain of GEP instructions should be			; Constant-indexed GEP instructions in a chain of GEP instructions should be
	; swapped to the end whenever such transformation is valid. This allows them to			; swapped to the front whenever such transformation is valid. This allows them
	; be merged.			; to be merged.

	declare void @use(i1)

				declare void @use(ptr)

	; The constant-indexed GEP instruction should be swapped to the end, even			; The constant-indexed GEP instruction should be swapped to the end, even
	; without merging.			; without merging.
	; result = (((i32*) p + a) + b) + 1			; result = (((i32*) p + a) + b) + 1
	define ptr @basic(ptr %p, i64 %a, i64 %b) {			define ptr @basic(ptr %p, i64 %a, i64 %b) {
	; CHECK-LABEL: @basic(			; CHECK-LABEL: @basic(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 1
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[A:%.]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[A:%.]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, ptr [[TMP2]], i64 [[B:%.]]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, ptr [[TMP2]], i64 [[B:%.]]
	; CHECK-NEXT: ret ptr [[TMP3]]			; CHECK-NEXT: ret ptr [[TMP3]]
	;			;
	%1 = getelementptr inbounds i32, ptr %p, i64 1			%1 = getelementptr inbounds i32, ptr %p, i64 1
	%2 = getelementptr inbounds i32, ptr %1, i64 %a			%2 = getelementptr inbounds i32, ptr %1, i64 %a
	%3 = getelementptr inbounds i32, ptr %2, i64 %b			%3 = getelementptr inbounds i32, ptr %2, i64 %b
	ret ptr %3			ret ptr %3
	}			}

	; GEP with the last index being a constant should also be swapped.			; Negative test. GEP should not be swapped if indices are not constant.
	huangjdAuthorUnsubmitted Done Reply Inline Actions test is no longer relevant since we are swapping constant-indexed GEP to the front huangjd: test is no longer relevant since we are swapping constant-indexed GEP to the front
	define ptr @partialConstant1(ptr %p, i64 %a, i64 %b) {			define ptr @partialConstant(ptr %p, i64 %a, i64 %b) {
	; CHECK-LABEL: @partialConstant1(			; CHECK-LABEL: @partialConstant(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 [[B:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [4 x i32], ptr [[P:%.]], i64 1, i64 [[A:%.*]]
	; CHECK-NEXT: ret ptr [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[B:%.]]
	;			; CHECK-NEXT: ret ptr [[TMP2]]
	%1 = getelementptr inbounds [4 x i32], ptr %p, i64 %a, i64 1
	%2 = getelementptr inbounds i32, ptr %p, i64 %b
	ret ptr %2
	}

	; Negative test. GEP should not be swapped if the last index is not a constant.
	define ptr @partialConstant2(ptr %p, i64 %a, i64 %b) {
	; CHECK-LABEL: @partialConstant2(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 [[B:%.*]]
	; CHECK-NEXT: ret ptr [[TMP1]]
	;			;
	%1 = getelementptr inbounds [4 x i32], ptr %p, i64 1, i64 %a			%1 = getelementptr inbounds [4 x i32], ptr %p, i64 1, i64 %a
	%2 = getelementptr inbounds i32, ptr %p, i64 %b			%2 = getelementptr inbounds i32, ptr %1, i64 %b
	ret ptr %2			ret ptr %2
	}			}

	; Constant-indexed GEP are merged after swawpping.			; Constant-indexed GEP are merged after swapping.
	; result = ((i32*) p + a) + 3			; result = ((i32*) p + a) + 3
	define ptr @merge(ptr %p, i64 %a) {			define ptr @merge(ptr %p, i64 %a) {
	; CHECK-LABEL: @merge(			; CHECK-LABEL: @merge(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = getelementptr i32, ptr [[P:%.]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[A:%.]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr i32, ptr [[TMP1]], i64 [[A:%.]]
	; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i64 2			; CHECK-NEXT: ret ptr [[TMP2]]
	; CHECK-NEXT: ret ptr [[TMP3]]
	;			;
	%1 = getelementptr inbounds i32, ptr %p, i64 1			%1 = getelementptr inbounds i32, ptr %p, i64 1
	%2 = getelementptr inbounds i32, ptr %1, i64 %a			%2 = getelementptr inbounds i32, ptr %1, i64 %a
	%3 = getelementptr inbounds i32, ptr %2, i64 2			%3 = getelementptr inbounds i32, ptr %2, i64 2
	ret ptr %3			ret ptr %3
	}			}

	; Multiple constant-indexed GEP. Note that the first two cannot be merged at			; Multiple constant-indexed GEP. All constant-index GEP will eventually be
	; first, but after the second and third are merged, the result can be merged			; swapped to the front and merged.
	; with the first one on the next pass.			; result = (i16) (((i8) p + 25) + a) + b
				huangjdAuthorUnsubmitted Done Reply Inline Actions Simplified unnecessarily complicated test huangjd: Simplified unnecessarily complicated test
	; result = (<3 x i32>) ((i16) ((i8) ptr + a) + (a b)) + 9
	define ptr @nested(ptr %p, i64 %a, i64 %b) {			define ptr @nested(ptr %p, i64 %a, i64 %b) {
				huangjdAuthorUnsubmitted Done Reply Inline Actions C huangjd: C
	; CHECK-LABEL: @nested(			; CHECK-LABEL: @nested(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds <3 x i32>, ptr [[P:%.]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = getelementptr i8, ptr [[P:%.]], i64 25
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, ptr [[TMP1]], i64 [[A:%.]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr i8, ptr [[TMP1]], i64 [[A:%.]]
	; CHECK-NEXT: [[TMP3:%.]] = mul i64 [[A]], [[B:%.]]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr i16, ptr [[TMP2]], i64 [[B:%.]]
	; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds <5 x i32>, ptr [[TMP2]], i64 4			; CHECK-NEXT: ret ptr [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i16, ptr [[TMP4]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds <4 x i32>, ptr [[TMP5]], i64 1
	; CHECK-NEXT: ret ptr [[TMP6]]
	;			;
	%1 = getelementptr inbounds <3 x i32>, ptr %p, i64 1			%1 = getelementptr inbounds i64, ptr %p, i64 1
	%2 = getelementptr inbounds i8, ptr %1, i64 %a			%2 = getelementptr inbounds i8, ptr %1, i64 %a
	%3 = mul i64 %a, %b			%3 = getelementptr inbounds i32, ptr %2, i64 4
	%4 = getelementptr inbounds <5 x i32>, ptr %2, i64 4			%4 = getelementptr inbounds i16, ptr %3, i64 %b
	%5 = getelementptr inbounds i16, ptr %4, i64 %3			%5 = getelementptr inbounds i8, ptr %4, i64 1
	%6 = getelementptr inbounds <4 x i32>, ptr %5, i64 1			ret ptr %5
	ret ptr %6
	}			}

	; It is valid to swap if the source operand of the first GEP has multiple uses.			; It is valid to swap if the source operand of the first GEP has multiple uses.
	define ptr @multipleUses1(ptr %p) {			define ptr @multipleUses1(ptr %p, i64 %a) {
	; CHECK-LABEL: @multipleUses1(			; CHECK-LABEL: @multipleUses1(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = getelementptr i32, ptr [[P:%.]], i64 1
	; CHECK-NEXT: [[TMP2:%.*]] = ptrtoint ptr [[P]] to i64			; CHECK-NEXT: [[TMP2:%.]] = getelementptr i32, ptr [[TMP1]], i64 [[A:%.]]
	; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[TMP2]]			; CHECK-NEXT: call void @use(ptr [[P]])
	; CHECK-NEXT: ret ptr [[TMP3]]			; CHECK-NEXT: ret ptr [[TMP2]]
	;			;
	%1 = getelementptr inbounds i32, ptr %p, i64 1			%1 = getelementptr inbounds i32, ptr %p, i64 %a
	%2 = ptrtoint ptr %p to i64			%2 = getelementptr inbounds i32, ptr %1, i64 1
	%3 = getelementptr inbounds i32, ptr %1, i64 %2			call void @use(ptr %p)
	ret ptr %3			ret ptr %2
	}			}

	; It is valid to swap if the second GEP has multiple uses.			; Negative test. It is not valid to swap if the first GEP has multiple uses.
	define ptr @multipleUses2(ptr %p, i64 %a) {			define ptr @multipleUses2(ptr %p, i64 %a) {
	; CHECK-LABEL: @multipleUses2(			; CHECK-LABEL: @multipleUses2(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 [[A:%.*]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[A:%.]]			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 1
	; CHECK-NEXT: call void @use(ptr nonnull [[TMP2]])			; CHECK-NEXT: call void @use(ptr [[TMP1]])
	; CHECK-NEXT: ret ptr [[TMP2]]			; CHECK-NEXT: ret ptr [[TMP2]]
	;			;
	%1 = getelementptr inbounds i32, ptr %p, i64 1			%1 = getelementptr inbounds i32, ptr %p, i64 %a
	%2 = getelementptr inbounds i32, ptr %1, i64 %a			%2 = getelementptr inbounds i32, ptr %1, i64 1
	call void @use(ptr %2)			call void @use(ptr %1)
	ret ptr %2			ret ptr %2
	}			}

	; Negative test. It is not valid to swap if the first GEP has multiple uses.			; Test interaction with LICM.
	define ptr @multipleUses3(ptr %p) {			define i64 @licm(ptr %p) {
	; CHECK-LABEL: @multipleUses3(			; CHECK-LABEL: @licm(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 1			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP2:%.*]] = ptrtoint ptr [[TMP1]] to i64			; CHECK-NEXT: [[P11:%.]] = getelementptr i64, ptr [[P:%.]], i64 4
	; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[TMP2]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK-NEXT: ret ptr [[TMP3]]			; CHECK: for.body:
	;			; CHECK-NEXT: [[I:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INEXT:%.*]], [[FOR_BODY]] ]
	%1 = getelementptr inbounds i32, ptr %p, i64 1			; CHECK-NEXT: [[SUM:%.]] = phi i64 [ 0, [[ENTRY]] ], [ [[ADD:%.]], [[FOR_BODY]] ]
	%2 = ptrtoint ptr %1 to i64			; CHECK-NEXT: [[P2:%.*]] = getelementptr i64, ptr [[P11]], i64 [[I]]
	%3 = getelementptr inbounds i32, ptr %1, i64 %2			; CHECK-NEXT: [[LOAD:%.*]] = load i64, ptr [[P2]], align 4
	ret ptr %3			; CHECK-NEXT: [[ADD]] = add nsw i64 [[SUM]], [[LOAD]]
				; CHECK-NEXT: [[INEXT]] = add nuw nsw i64 [[I]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[I]], 1000000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
				; CHECK: for.end:
				; CHECK-NEXT: [[ADD_LCSSA:%.*]] = phi i64 [ [[ADD]], [[FOR_BODY]] ]
				; CHECK-NEXT: ret i64 [[ADD_LCSSA]]
				;
				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ 0, %entry ], [ %inext, %for.body ]
				%sum = phi i64 [ 0, %entry ], [ %add, %for.body ]
				%p1 = getelementptr i64, ptr %p, i64 %i
				%p2 = getelementptr i64, ptr %p1, i64 4
				%load = load i64, ptr %p2
				%add = add nsw i64 %sum, %load
				%inext = add nuw nsw i64 %i, 1
				%exitcond = icmp eq i64 %i, 1000000
				br i1 %exitcond, label %for.end, label %for.body

				for.end:
				ret i64 %add
	}			}

llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S -enable-new-pm=0 \| FileCheck %s			; RUN: opt < %s -instcombine -S -enable-new-pm=0 \| FileCheck %s
	; RUN: opt < %s -passes='require<loops>,instcombine' -S \| FileCheck %s			; RUN: opt < %s -passes='require<loops>,instcombine' -S \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define i32 @foo(i8* nocapture readnone %match, i32 %cur_match, i32 %best_len, i32 %scan_end, i32* nocapture readonly %prev, i32 %limit, i32 %chain_length, i8* nocapture readonly %win, i32 %wmask) {			define i32 @foo(i8* nocapture readnone %match, i32 %cur_match, i32 %best_len, i32 %scan_end, i32* nocapture readonly %prev, i32 %limit, i32 %chain_length, i8* nocapture readonly %win, i32 %wmask) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[IDX_EXT2:%.]] = zext i32 [[CUR_MATCH:%.]] to i64			; CHECK-NEXT: [[IDX_EXT2:%.]] = zext i32 [[CUR_MATCH:%.]] to i64
	; CHECK-NEXT: [[ADD_PTR4:%.]] = getelementptr inbounds i8, i8 [[WIN:%.*]], i64 [[IDX_EXT2]]
	; CHECK-NEXT: [[IDX_EXT1:%.]] = zext i32 [[BEST_LEN:%.]] to i64			; CHECK-NEXT: [[IDX_EXT1:%.]] = zext i32 [[BEST_LEN:%.]] to i64
	; CHECK-NEXT: [[ADD_PTR25:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR4]], i64 [[IDX_EXT1]]			; CHECK-NEXT: [[ADD_PTR42:%.]] = getelementptr i8, i8 [[WIN:%.*]], i64 -1
	; CHECK-NEXT: [[ADD_PTR36:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR25]], i64 -1			; CHECK-NEXT: [[ADD_PTR251:%.]] = getelementptr i8, i8 [[ADD_PTR42]], i64 [[IDX_EXT2]]
				; CHECK-NEXT: [[ADD_PTR36:%.]] = getelementptr i8, i8 [[ADD_PTR251]], i64 [[IDX_EXT1]]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[ADD_PTR36]] to i32*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[ADD_PTR36]] to i32*
	; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[TMP0]], align 4
	; CHECK-NEXT: [[CMP7:%.]] = icmp eq i32 [[TMP1]], [[SCAN_END:%.]]			; CHECK-NEXT: [[CMP7:%.]] = icmp eq i32 [[TMP1]], [[SCAN_END:%.]]
	; CHECK-NEXT: br i1 [[CMP7]], label [[DO_END:%.]], label [[IF_THEN_LR_PH:%.]]			; CHECK-NEXT: br i1 [[CMP7]], label [[DO_END:%.]], label [[IF_THEN_LR_PH:%.]]
	; CHECK: if.then.lr.ph:			; CHECK: if.then.lr.ph:
	; CHECK-NEXT: br label [[IF_THEN:%.*]]			; CHECK-NEXT: br label [[IF_THEN:%.*]]
	; CHECK: do.body:			; CHECK: do.body:
	; CHECK-NEXT: [[IDX_EXT:%.]] = zext i32 [[TMP4:%.]] to i64			; CHECK-NEXT: [[IDX_EXT:%.]] = zext i32 [[TMP4:%.]] to i64
	; CHECK-NEXT: [[ADD_PTR1:%.]] = getelementptr inbounds i8, i8 [[WIN]], i64 [[IDX_EXT1]]			; CHECK-NEXT: [[ADD_PTR46:%.]] = getelementptr i8, i8 [[WIN]], i64 -1
	; CHECK-NEXT: [[ADD_PTR22:%.]] = getelementptr i8, i8 [[ADD_PTR1]], i64 -1			; CHECK-NEXT: [[ADD_PTR25:%.]] = getelementptr i8, i8 [[ADD_PTR46]], i64 [[IDX_EXT1]]
	; CHECK-NEXT: [[ADD_PTR3:%.]] = getelementptr i8, i8 [[ADD_PTR22]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR3:%.]] = getelementptr i8, i8 [[ADD_PTR25]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[ADD_PTR3]] to i32*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[ADD_PTR3]] to i32*
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP2]], align 4
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP3]], [[SCAN_END]]			; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP3]], [[SCAN_END]]
	; CHECK-NEXT: br i1 [[CMP]], label [[DO_END]], label [[IF_THEN]]			; CHECK-NEXT: br i1 [[CMP]], label [[DO_END]], label [[IF_THEN]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: [[CUR_MATCH_ADDR_09:%.]] = phi i32 [ [[CUR_MATCH]], [[IF_THEN_LR_PH]] ], [ [[TMP4]], [[DO_BODY:%.]] ]			; CHECK-NEXT: [[CUR_MATCH_ADDR_09:%.]] = phi i32 [ [[CUR_MATCH]], [[IF_THEN_LR_PH]] ], [ [[TMP4]], [[DO_BODY:%.]] ]
	; CHECK-NEXT: [[CHAIN_LENGTH_ADDR_08:%.]] = phi i32 [ [[CHAIN_LENGTH:%.]], [[IF_THEN_LR_PH]] ], [ [[DEC:%.*]], [[DO_BODY]] ]			; CHECK-NEXT: [[CHAIN_LENGTH_ADDR_08:%.]] = phi i32 [ [[CHAIN_LENGTH:%.]], [[IF_THEN_LR_PH]] ], [ [[DEC:%.*]], [[DO_BODY]] ]
	; CHECK-NEXT: [[AND:%.]] = and i32 [[CUR_MATCH_ADDR_09]], [[WMASK:%.]]			; CHECK-NEXT: [[AND:%.]] = and i32 [[CUR_MATCH_ADDR_09]], [[WMASK:%.]]
	▲ Show 20 Lines • Show All 308 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/gep-merge-constant-indices.ll

Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	;
%2 = getelementptr inbounds i32, ptr %1, i64 1		%2 = getelementptr inbounds i32, ptr %1, i64 1
ret ptr %2		ret ptr %2
}		}

; Negative test. Similar to above, but the new address does not alias the		; Negative test. Similar to above, but the new address does not alias the
; address of another member.		; address of another member.
define ptr @partialConstantMemberAliasing2(ptr %p, i64 %a) {		define ptr @partialConstantMemberAliasing2(ptr %p, i64 %a) {
; CHECK-LABEL: @partialConstantMemberAliasing2(		; CHECK-LABEL: @partialConstantMemberAliasing2(
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_C:%.]], ptr [[P:%.]], i64 [[A:%.]], i32 1		; CHECK-NEXT: [[TMP1:%.]] = getelementptr i8, ptr [[P:%.]], i64 1
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 1		; CHECK-NEXT: [[TMP2:%.]] = getelementptr [[STRUCT_C:%.]], ptr [[TMP1]], i64 [[A:%.*]], i32 1
; CHECK-NEXT: ret ptr [[TMP2]]		; CHECK-NEXT: ret ptr [[TMP2]]
;		;
%1 = getelementptr inbounds %struct.C, ptr %p, i64 %a, i32 1		%1 = getelementptr inbounds %struct.C, ptr %p, i64 %a, i32 1
%2 = getelementptr inbounds i8, ptr %1, i64 1		%2 = getelementptr inbounds i8, ptr %1, i64 1
ret ptr %2		ret ptr %2
}		}

; Negative test. Similar to above, but the new address falls outside the address		; Negative test. Similar to above, but the new address falls outside the address
; range of the object currently pointed by the non-constant GEP.		; range of the object currently pointed by the non-constant GEP.
define ptr @partialConstantMemberAliasing3(ptr %p, i64 %a) {		define ptr @partialConstantMemberAliasing3(ptr %p, i64 %a) {
; CHECK-LABEL: @partialConstantMemberAliasing3(		; CHECK-LABEL: @partialConstantMemberAliasing3(
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_C:%.]], ptr [[P:%.]], i64 [[A:%.]], i32 2		; CHECK-NEXT: [[TMP1:%.]] = getelementptr i32, ptr [[P:%.]], i64 1
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 1		; CHECK-NEXT: [[TMP2:%.]] = getelementptr [[STRUCT_C:%.]], ptr [[TMP1]], i64 [[A:%.*]], i32 2
; CHECK-NEXT: ret ptr [[TMP2]]		; CHECK-NEXT: ret ptr [[TMP2]]
;		;
%1 = getelementptr inbounds %struct.C, ptr %p, i64 %a, i32 2		%1 = getelementptr inbounds %struct.C, ptr %p, i64 %a, i32 2
%2 = getelementptr inbounds i32, ptr %1, i64 1		%2 = getelementptr inbounds i32, ptr %1, i64 1
ret ptr %2		ret ptr %2
}		}

llvm/test/Transforms/InstCombine/opaque-ptr.ll

	Show First 20 Lines • Show All 234 Lines • ▼ Show 20 Lines
	;			;
	%a2 = getelementptr { i32, i32 }, ptr %a, i64 %idx			%a2 = getelementptr { i32, i32 }, ptr %a, i64 %idx
	%a3 = getelementptr i8, ptr %a2, i64 4			%a3 = getelementptr i8, ptr %a2, i64 4
	ret ptr %a3			ret ptr %a3
	}			}

	define ptr @geps_combinable_different_elem_type7(ptr %a, i64 %idx) {			define ptr @geps_combinable_different_elem_type7(ptr %a, i64 %idx) {
	; CHECK-LABEL: @geps_combinable_different_elem_type7(			; CHECK-LABEL: @geps_combinable_different_elem_type7(
	; CHECK-NEXT: [[A2:%.]] = getelementptr { i32, i32 }, ptr [[A:%.]], i64 [[IDX:%.*]], i32 1			; CHECK-NEXT: [[A21:%.]] = getelementptr i8, ptr [[A:%.]], i64 4
	; CHECK-NEXT: [[A3:%.*]] = getelementptr i8, ptr [[A2]], i64 4			; CHECK-NEXT: [[A3:%.]] = getelementptr { i32, i32 }, ptr [[A21]], i64 [[IDX:%.]], i32 1
	; CHECK-NEXT: ret ptr [[A3]]			; CHECK-NEXT: ret ptr [[A3]]
	;			;
	%a2 = getelementptr { i32, i32 }, ptr %a, i64 %idx, i32 1			%a2 = getelementptr { i32, i32 }, ptr %a, i64 %idx, i32 1
	%a3 = getelementptr i8, ptr %a2, i64 4			%a3 = getelementptr i8, ptr %a2, i64 4
	ret ptr %a3			ret ptr %a3
	}			}

	define ptr @geps_combinable_different_elem_type8(ptr %a, i64 %idx) {			define ptr @geps_combinable_different_elem_type8(ptr %a, i64 %idx) {
	; CHECK-LABEL: @geps_combinable_different_elem_type8(			; CHECK-LABEL: @geps_combinable_different_elem_type8(
	; CHECK-NEXT: [[A2:%.]] = getelementptr inbounds { { i32, i32 } }, ptr [[A:%.]], i64 [[IDX:%.*]], i32 0, i32 1			; CHECK-NEXT: [[A21:%.]] = getelementptr i8, ptr [[A:%.]], i64 4
	; CHECK-NEXT: [[A3:%.*]] = getelementptr inbounds i8, ptr [[A2]], i64 4			; CHECK-NEXT: [[A3:%.]] = getelementptr { { i32, i32 } }, ptr [[A21]], i64 [[IDX:%.]], i32 0, i32 1
	; CHECK-NEXT: ret ptr [[A3]]			; CHECK-NEXT: ret ptr [[A3]]
	;			;
	%a2 = getelementptr inbounds { { i32, i32 } }, ptr %a, i64 %idx, i32 0, i32 1			%a2 = getelementptr inbounds { { i32, i32 } }, ptr %a, i64 %idx, i32 0, i32 1
	%a3 = getelementptr inbounds i8, ptr %a2, i32 4			%a3 = getelementptr inbounds i8, ptr %a2, i32 4
	ret ptr %a3			ret ptr %a3
	}			}

	define ptr @geps_combinable_different_elem_type9(ptr %a, i64 %idx) {			define ptr @geps_combinable_different_elem_type9(ptr %a, i64 %idx) {
	▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @gep_of_phi_of_gep(			; CHECK-LABEL: @gep_of_phi_of_gep(
	; CHECK-NEXT: br i1 [[C:%.]], label [[IF:%.]], label [[ELSE:%.*]]			; CHECK-NEXT: br i1 [[C:%.]], label [[IF:%.]], label [[ELSE:%.*]]
	; CHECK: if:			; CHECK: if:
	; CHECK-NEXT: br label [[JOIN:%.*]]			; CHECK-NEXT: br label [[JOIN:%.*]]
	; CHECK: else:			; CHECK: else:
	; CHECK-NEXT: br label [[JOIN]]			; CHECK-NEXT: br label [[JOIN]]
	; CHECK: join:			; CHECK: join:
	; CHECK-NEXT: [[TMP1:%.*]] = phi i64 [ 1, [[IF]] ], [ 2, [[ELSE]] ]			; CHECK-NEXT: [[TMP1:%.*]] = phi i64 [ 1, [[IF]] ], [ 2, [[ELSE]] ]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr i32, ptr [[P:%.]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr i32, ptr [[P:%.]], i64 1
	; CHECK-NEXT: [[GEP:%.*]] = getelementptr i32, ptr [[TMP2]], i64 1			; CHECK-NEXT: [[GEP:%.*]] = getelementptr i32, ptr [[TMP2]], i64 [[TMP1]]
	; CHECK-NEXT: ret ptr [[GEP]]			; CHECK-NEXT: ret ptr [[GEP]]
	;			;
	br i1 %c, label %if, label %else			br i1 %c, label %if, label %else

	if:			if:
	%gep1 = getelementptr i32, ptr %p, i64 1			%gep1 = getelementptr i32, ptr %p, i64 1
	br label %join			br label %join

	▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll

	Show All 38 Lines
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = xor i64 [[INDEX]], -1			; CHECK-NEXT: [[TMP0:%.*]] = xor i64 [[INDEX]], -1
	; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[TMP0]], [[N]]			; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[TMP0]], [[N]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds double, double [[COND]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds double, double [[COND]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds double, double [[TMP2]], i64 -3			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds double, double [[TMP2]], i64 -3
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TMP3]] to <4 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TMP3]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x double>, <4 x double> [[TMP4]], align 8, !alias.scope !0			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x double>, <4 x double> [[TMP4]], align 8, !alias.scope !0
	; CHECK-NEXT: [[REVERSE:%.*]] = shufflevector <4 x double> [[WIDE_LOAD]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE:%.*]] = shufflevector <4 x double> [[WIDE_LOAD]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds double, double [[TMP2]], i64 -4			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[TMP2]], i64 -7
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[TMP5]], i64 -3
	; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[TMP6]] to <4 x double>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[TMP6]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_LOAD6:%.]] = load <4 x double>, <4 x double> [[TMP7]], align 8, !alias.scope !0			; CHECK-NEXT: [[WIDE_LOAD6:%.]] = load <4 x double>, <4 x double> [[TMP7]], align 8, !alias.scope !0
	; CHECK-NEXT: [[REVERSE7:%.*]] = shufflevector <4 x double> [[WIDE_LOAD6]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE7:%.*]] = shufflevector <4 x double> [[WIDE_LOAD6]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP8:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer			; CHECK-NEXT: [[TMP8:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer
	; CHECK-NEXT: [[TMP9:%.*]] = fcmp une <4 x double> [[REVERSE7]], zeroinitializer			; CHECK-NEXT: [[TMP9:%.*]] = fcmp une <4 x double> [[REVERSE7]], zeroinitializer
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr double, double [[A]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP10:%.]] = getelementptr double, double [[A]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr double, double [[TMP10]], i64 -3			; CHECK-NEXT: [[TMP11:%.]] = getelementptr double, double [[TMP10]], i64 -3
	; CHECK-NEXT: [[REVERSE8:%.*]] = shufflevector <4 x i1> [[TMP8]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE8:%.*]] = shufflevector <4 x i1> [[TMP8]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[TMP11]] to <4 x double>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[TMP11]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP12]], i32 8, <4 x i1> [[REVERSE8]], <4 x double> poison), !alias.scope !3, !noalias !0			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP12]], i32 8, <4 x i1> [[REVERSE8]], <4 x double> poison), !alias.scope !3, !noalias !0
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr double, double [[TMP10]], i64 -4			; CHECK-NEXT: [[TMP14:%.]] = getelementptr double, double [[TMP10]], i64 -7
	; CHECK-NEXT: [[TMP14:%.]] = getelementptr double, double [[TMP13]], i64 -3
	; CHECK-NEXT: [[REVERSE10:%.*]] = shufflevector <4 x i1> [[TMP9]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE10:%.*]] = shufflevector <4 x i1> [[TMP9]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[TMP14]] to <4 x double>*			; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[TMP14]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD11:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP15]], i32 8, <4 x i1> [[REVERSE10]], <4 x double> poison), !alias.scope !3, !noalias !0			; CHECK-NEXT: [[WIDE_MASKED_LOAD11:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP15]], i32 8, <4 x i1> [[REVERSE10]], <4 x double> poison), !alias.scope !3, !noalias !0
	; CHECK-NEXT: [[TMP16:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP16:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP17:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD11]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP17:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD11]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP18:%.]] = bitcast double [[TMP11]] to <4 x double>*			; CHECK-NEXT: [[TMP18:%.]] = bitcast double [[TMP11]] to <4 x double>*
	; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP16]], <4 x double>* [[TMP18]], i32 8, <4 x i1> [[REVERSE8]]), !alias.scope !3, !noalias !0			; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP16]], <4 x double>* [[TMP18]], i32 8, <4 x i1> [[REVERSE8]]), !alias.scope !3, !noalias !0
	; CHECK-NEXT: [[TMP19:%.]] = bitcast double [[TMP14]] to <4 x double>*			; CHECK-NEXT: [[TMP19:%.]] = bitcast double [[TMP14]] to <4 x double>*
	▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll

	Show First 20 Lines • Show All 756 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP0:%.*]] = mul i64 [[INDEX]], 3			; CHECK-NEXT: [[TMP0:%.*]] = mul i64 [[INDEX]], 3
	; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i32, i32 [[A:%.*]], i64 [[TMP0]]			; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i32, i32 [[A:%.*]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[NEXT_GEP]] to <12 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[NEXT_GEP]] to <12 x i32>*
	; CHECK-NEXT: [[WIDE_VEC:%.]] = load <12 x i32>, <12 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_VEC:%.]] = load <12 x i32>, <12 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 0, i32 3, i32 6, i32 9>			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 0, i32 3, i32 6, i32 9>
	; CHECK-NEXT: [[STRIDED_VEC2:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 1, i32 4, i32 7, i32 10>			; CHECK-NEXT: [[STRIDED_VEC2:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 1, i32 4, i32 7, i32 10>
	; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 2, i32 5, i32 8, i32 11>			; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 2, i32 5, i32 8, i32 11>
	; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[STRIDED_VEC]], [[VEC_IND]]			; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[STRIDED_VEC]], [[VEC_IND]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[NEXT_GEP]], i64 2
	; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[STRIDED_VEC2]], [[VEC_IND]]			; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[STRIDED_VEC2]], [[VEC_IND]]
	; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[STRIDED_VEC3]], [[VEC_IND]]			; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[STRIDED_VEC3]], [[VEC_IND]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP3]], i64 -2			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[NEXT_GEP]] to <12 x i32>*
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <12 x i32>*
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11>			; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11>
	; CHECK-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[TMP7]], align 4			; CHECK-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[TMP7]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
	▲ Show 20 Lines • Show All 835 Lines • Show Last 20 Lines