This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
12/19
InstructionCombining.cpp
-
test/Transforms/
-
Transforms/
-
InstCombine/
6
gep-canonicalize-constant-indices.ll
-
gep-merge-constant-indices.ll
-
shift.ll
-
LoopVectorize/
-
AArch64/
-
vector-reverse-mask4.ll
-
interleaved-accesses.ll
-
PhaseOrdering/
-
single-iteration-loop-sroa.ll

Differential D125845

[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back
ClosedPublic

Authored by huangjd on May 17 2022, 6:53 PM.

Download Raw Diff

Details

Reviewers

davidxl
Carrot
nikic
spatel
reames
aeubanks

Commits

rG6c767cef5a1d: [InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the…

Summary

Canonicalize GEP of GEP by swapping GEP with some suffix constant indices to the back (and GEP with all constant indices to the back of that), this allows more constant index GEP merging to happen. Exceptions are: If swapping violates use-def relations, or anti-optimizes LICM

For constant indexed GEP of GEP, if they cannot be merged directly, they will be casted to i8* and merged.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

huangjd created this revision.May 17 2022, 6:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 6:53 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

huangjd requested review of this revision.May 17 2022, 6:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 6:53 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B165007: Diff 430218.May 17 2022, 6:53 PM

huangjd added a parent revision: D125438: [InstCombine] NEW Baseline tests for InstCombine optimization to merge GEP instructions with constant indices.May 17 2022, 6:54 PM

nikic added a reviewer: aeubanks.May 18 2022, 5:56 AM

nikic added inline comments.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1968	This is an interesting case. An alternative would be to skip this code if GEP has all constant indices, i.e. to say that `gep (gep p, x), C` is always the canonical form, even if it were possible to move `(gep p, C)` out of the loop. This makes some sense in that `gep p, C` is typically free (folded into addressing mode). I'm not sure whether that would really be better though.
2001	Doing this is not necessary: InstCombine is not supposed to gracefully deal with invalid IR. I've removed the offending test in https://github.com/llvm/llvm-project/commit/128da94d38242c28e6bf23ad025e0cb2d6ce9e4f.
2010	You probably don't need the explicit isOpaquePointerTy() check here, that should be covered by the type equality check.
2016	I would recommend to initially only keep the `Src->hasAllConstantIndices() && !GEP.hasAllConstantIndices()` case. The profitability of this transform for the case where both GEPs are non-constant is not really clear.
2022	You can pass InBounds directly to CreateGEP, as the last argument.
2071	This should be a separate patch, it's an independent transform.

Removed check for malformed IR (PR13621)
Removed unnecessary type check for canonicalization swap

Move converting GEP of constant index to i8 into next patch

Harbormaster completed remote builds in B165216: Diff 430512.May 18 2022, 3:02 PM

huangjd marked 4 inline comments as done.May 18 2022, 3:06 PM

huangjd added inline comments.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1968	LICM is always beneficial as it guarantees the reduction of runtime instruction count, but canonicalization can't guarantee an optimization

huangjd added inline comments.May 18 2022, 3:06 PM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2016	If Src is not all constant index and GEP is all constant index, merging them can reduce one GEP. I have observed that in the backend, inst selection does not always generate the most simplified output for multiple GEP instructions, so it's better to reduce the IR count

huangjd retitled this revision from [InstCombine] Canonicalize GEP of GEP and merging GEP with constant indices to [InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back.May 18 2022, 4:05 PM

Herald added a subscriber: arphaman. · View Herald TranscriptMay 18 2022, 4:05 PM

huangjd added a child revision: D125934: [InstCombine] Changing constant-indexed GEP of GEP to i8* for merging.May 18 2022, 4:06 PM

I think there's some git weirdness going on, could you apply this on top of main?

huangjd mentioned this in D126030: [InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back.May 19 2022, 4:10 PM

In D125845#3523984, @aeubanks wrote:

I think there's some git weirdness going on, could you apply this on top of main?

Fixed at D126030

huangjd abandoned this revision.May 19 2022, 4:11 PM

huangjd removed a child revision: D125934: [InstCombine] Changing constant-indexed GEP of GEP to i8* for merging.

huangjd reclaimed this revision.May 26 2022, 4:38 PM

Restored patch D125845

Harbormaster completed remote builds in B166559: Diff 432417.May 26 2022, 5:29 PM

huangjd marked an inline comment as done.May 31 2022, 10:53 AM

Any more comments?

huangjd added a child revision: D125934: [InstCombine] Changing constant-indexed GEP of GEP to i8* for merging.Jun 7 2022, 12:00 PM

nikic added inline comments.Jun 8 2022, 4:03 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1968	The main thing I'd be concerned about here is that the loop invariance canonicalization is only performed if cached LoopInfo is available. This means that depending on whether a particular InstCombine run has LoopInfo available, we will switch back and forth between two different orders.
2006	Unfortunately, this is not sufficient to preserve inbounds. If the indices have different sign, then swapping them may make the GEP non-inbounds even if both original GEPs were inbounds. `gep inbounds (gep inbounds p, -1), X ->` gep inbounds (gep inbounds p, X), -1` is not generally valid.

huangjd added inline comments.Jun 8 2022, 10:50 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1968	Since these two transforms contradict each other, one of them has to take precedence when both are available. If loopInfo is available during any pass and it swaps out the inner GEP, then further Instcombine pass shouldn't be able to swap it back. Perhaps I should add a check that canonicalization only takes place if both GEP are on the same BB
2006	I have seen the same inbounds check used in another place (line 2071), so that may be problematic as well? I think we could add another check in isMergedGEPInBounds to require both GEP offsets are known to be same sign?

huangjd marked an inline comment as not done.Jun 8 2022, 2:24 PM

huangjd added inline comments.Jun 8 2022, 3:39 PM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2006	Is the impact of changing inbounds GEP to non-inbounds significant to other optimizations?

GEP are no longer inbounds after swapping

Harbormaster completed remote builds in B170103: Diff 437324.Jun 15 2022, 3:53 PM

This looks fine to me, but maybe @spatel can also take a look.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1950	-> `ShouldCanonicalizeSwap`
1966	Omit braces for single-statement if.

I don't know enough to visualize the GEP corner-cases or the interaction with LICM, but I commented on general improvements to the patch.

Have you run any benchmarks to confirm there are improvements (no regressions)?

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1946–1950	This code structure and the comments are confusing. I think it would be better to rearrange this like: bool LoopInvariantGEP = false; bool LoopInvariantSrc = false; if (LI && ...) { LoopInvariantGEP = L->isLoopInvariant(GEP.getOperand(1)); LoopInvariantSrc = L->isLoopInvariant(Src->getOperand(1)); } if (LoopInvariantGEP && !LoopInvariantSrc) { // do the existing transform } if (!(!LoopInvariantGEP && LoopInvariantSrc) && ...) { // do the new transform } I think that matches the logic in this patch, but don't we need a set of tests with loops to make sure that does what you expect? I don't see any tests with loops.
llvm/test/Transforms/InstCombine/gep-canonicalize-constant-indices.ll
32	I don't know what this test and the next one are supposed to show. %1 is dead code, so it gets eliminated before anything else might have happened.
48–49	typo: swapping
97–98	This test doesn't add much. AFAIK every transform in instcombine does RAUW for the final (root) value.

huangjd added inline comments.Jun 21 2022, 2:20 PM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1946–1950	I think this doesn't simplify the logic, because LoopInvariant* is only meaningful within the scope of LoopInfo analysis. Also the existing transform (LICM) requires the Loop object from LI, so moving it outside of the first condition statement makes the code uglier.
1946–1950	It seems that LI is null when I test it. Is there an opt flag I need to enable?

huangjd added inline comments.Jun 21 2022, 2:56 PM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1946–1950	I confirmed this case is already covered by gep-combine-loop-invariant.ll. Taking out the ShouldCanonicalizeSwap check will cause infinite loop because two transformations trying to undo each other. In this case is a separate test case still needed?

Improved comment, fixed typo in test cases

Harbormaster completed remote builds in B171199: Diff 438846.Jun 21 2022, 2:59 PM

It looks like this is now the diff to the previous version, not the full patch.

huangjd updated this revision to Diff 439477.Jun 23 2022, 11:28 AM

This comment was removed by huangjd.

Harbormaster completed remote builds in B171658: Diff 439477.Jun 23 2022, 11:29 AM

Update diff to include previous commit

Harbormaster completed remote builds in B171661: Diff 439480.Jun 23 2022, 12:13 PM

Any more comments?

My comments about the tests were not addressed - there's a test of extra uses with no value, but no tests with loops?

llvm/test/Transforms/InstCombine/gep-canonicalize-constant-indices.ll
32	I'm more confused now. The test had 1 GEP, but now it has 2 - how is that better?

huangjd removed a child revision: D125934: [InstCombine] Changing constant-indexed GEP of GEP to i8* for merging.Jun 30 2022, 10:34 AM

huangjd mentioned this in D125934: [InstCombine] Changing constant-indexed GEP of GEP to i8* for merging.Jun 30 2022, 2:25 PM

huangjd added a parent revision: D125934: [InstCombine] Changing constant-indexed GEP of GEP to i8* for merging.Jun 30 2022, 2:29 PM

huangjd removed a parent revision: D125438: [InstCombine] NEW Baseline tests for InstCombine optimization to merge GEP instructions with constant indices.

Added test for interaction with LICM

Harbormaster completed remote builds in B174810: Diff 443857.Jul 12 2022, 3:01 AM

LGTM

This revision is now accepted and ready to land.Jul 12 2022, 6:20 AM

huangjd mentioned this in D129734: [InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the front.Jul 14 2022, 12:07 AM

merge with main

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptOct 20 2022, 10:38 AM

Harbormaster completed remote builds in B193276: Diff 469284.Oct 20 2022, 10:38 AM

This revision was landed with ongoing or failed builds.Oct 20 2022, 10:41 AM

Closed by commit rG6c767cef5a1d: [InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the… (authored by huangjd). · Explain Why

This revision was automatically updated to reflect the committed changes.

huangjd added a commit: rG6c767cef5a1d: [InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the….

Have you run any benchmarks to confirm there are improvements (no regressions)?

Did you ever measure the performance? I have reports of this making the leela benchmark in SPEC2017 worse under LTO.

syzaara added a subscriber: syzaara.Oct 25 2022, 11:04 AM

We also observe some regressions on internal benchmarks from this patch. We haven't yet analyzed why, but I will add details when we have something.

pzheng added a subscriber: pzheng.Nov 18 2022, 1:50 PM

uabelho added a subscriber: uabelho.Nov 22 2022, 5:24 AM

We also see regressions with this patch in internal benchmarks and at least in one case it seems like the lack of "inbounds" on the new GEPs is involved.
If I change so the GEPs are created like

-      Value *NewSrc = Builder.CreateGEP(
+      Value *NewSrc = Builder.CreateInBoundsGEP(
           GEP.getSourceElementType(), Src->getOperand(0),
           SmallVector<Value *>(GEP.indices()), Src->getName());
-      GetElementPtrInst *NewGEP = GetElementPtrInst::Create(
+      GetElementPtrInst *NewGEP = GetElementPtrInst::CreateInBounds(

at least some regression disappear.
I do see the comment saying this can't be done though...

// Cannot guarantee inbounds after swapping because the non-const GEP can
// have arbitrary sign.

Could someone reproduce the test case (and benchmarking results) showing regression?

bjope added a subscriber: bjope.Nov 28 2022, 10:17 AM

bjope added inline comments.

llvm/test/Transforms/InstCombine/gep-canonicalize-constant-indices.ll
32	I'm more confused now. The test had 1 GEP, but now it has 2 - how is that better? This question is still not answered afaict. We now get two non-inbound GEP:s instead of one inbounds GEP. Given that there are no other GEP:s etc that can be simplified (as in this test case) that doesn't look like an improvement? And I guess loosing the inbounds property can be bad in general when doing this canonicalizations. Do you have any examples when you see a performance improvement when rewriting inbound GEP:s into non-inbound GEP:s? Given the amount of complaints about seen regressions it would be nice to hear what kinds of benchmarks that were done and if you tried to limit it to cases when instruction count didn't increase, or when it would turn X inbound GEP:s into X or more non-inbound GEP:s.

was there any positive performance data for this patch in the first place (the description doesn't mention any)? if not and people are reporting regressions, we should revert

FWIW, the previous patch in this area also got strong post-commit negative feedback: D106450
I think any canonicalization that leads to loss of inbounds should not be performed.

davidxl added inline comments.Nov 28 2022, 11:04 AM

llvm/test/Transforms/InstCombine/gep-canonicalize-constant-indices.ll
32	The changes in this test seems to be irrelevant. The original test was probably buggy (with dead code), and the change here just fixed the test. (not sure if the main patch will affect the output though).

Got a benchmark downstream were we no longer get SLP vectorization after the canonicalizatoin. But the test is rather large and complicated so I do not have anything that show the full pipeline including SLP (at least not yet).

But before instcombine parts of the IR kind of looks like this:

%struct.S = type { i32, i32, i16, i16, [4 x i16] }

define i32 @foo(ptr %p1, ptr %p2) {
entry:
  br label %for.cond

for.cond:
  %i.0 = phi i16 [ 0, %entry ], [ %inc, %for.body ]
  %cmp = icmp ult i16 %i.0, 4
  br i1 %cmp, label %for.body, label %for.end

for.body:
  %q1 = getelementptr inbounds %struct.S, ptr %p2, i32 0, i32 4
  %q2 = getelementptr inbounds [4 x i16], ptr %q1, i32 0, i16 %i.0
  %t1 = load i16, ptr %q2, align 1
  store i16 %t1, ptr %p1
  %inc = add i16 %i.0, 1
  br label %for.cond

for.end:
  ret i32 7
}

If just running opt -S -passes=instcombine on that we used to get

define i32 @foo(ptr %p1, ptr %p2) {
entry:
  br label %for.cond

for.cond:                                         ; preds = %for.body, %entry
  %i.0 = phi i16 [ 0, %entry ], [ %inc, %for.body ]
  %cmp = icmp ult i16 %i.0, 4
  br i1 %cmp, label %for.body, label %for.end

for.body:                                         ; preds = %for.cond
  %0 = sext i16 %i.0 to i64
  %q2 = getelementptr inbounds %struct.S, ptr %p2, i64 0, i32 4, i64 %0
  %t1 = load i16, ptr %q2, align 1
  store i16 %t1, ptr %p1, align 2
  %inc = add i16 %i.0, 1
  br label %for.cond

for.end:                                          ; preds = %for.cond
  ret i32 7
}

but after this patch we get two non-inbound GEP:s instead of one single GEP:

define i32 @foo(ptr %p1, ptr %p2) {
entry:
  br label %for.cond

for.cond:                                         ; preds = %for.body, %entry
  %i.0 = phi i16 [ 0, %entry ], [ %inc, %for.body ]
  %cmp = icmp ult i16 %i.0, 4
  br i1 %cmp, label %for.body, label %for.end

for.body:                                         ; preds = %for.cond
  %0 = sext i16 %i.0 to i64
  %q11 = getelementptr [4 x i16], ptr %p2, i64 0, i64 %0
  %q2 = getelementptr %struct.S, ptr %q11, i64 0, i32 4
  %t1 = load i16, ptr %q2, align 1
  store i16 %t1, ptr %p1, align 2
  %inc = add i16 %i.0, 1
  br label %for.cond

for.end:                                          ; preds = %for.cond
  ret i32 7
}

So I suspect that the swapping (and replacing inbound GEP:s with non-inbound GEP:s) has prevented merging of the GEP:s?

Maybe the merging of the GEP:s isn't important here. But I think the result looks a bit weird anyhow. In the input IR the first GEP only got loop-invariant operands (so %q1 is loop-invariant in that sense even if LoopInfo::isInvariant would say that it is modified inside the loop) , but after the "swapping" the first GEP depend on %0 (or rather %i.0), so now it isn't loop-invariant.
Isn't the idea that it should try to put the loop invariant part first, to hopefully let LICM hoist that outside the loop. Although when I read the code comments it seems like we canonicalize in opposite order from what would suite LICM, so there is some kind of heuristic to avoid canonicalize in certain situations. So I'm not quite sure if this pattern really is supposed to be "swapped" or not. Or maybe even merged. Both merging and not swapping would allow keeping the inbounds. Not swapping could allow LICM to hoist one of the GEP:s out from the loop. But the current solution both prevent hoisting and it removes the inbound attributes. So I do not see how the swapping is beneficial in this case (unless we are canonicalizing for some specific target with an addressing mode that perhaps would allow folding of the constant offset in the load).

Thanks for sharing the test case @bjope! I agree that the generated IR with this change is strictly worse for that case; I think it would probably be good to revert the patch until this can be properly addressed. llvm/test/Transforms/PhaseOrdering/single-iteration-loop-sroa.ll also looks worse, as we loose the inbounds in the loop and the only benefit is removing an instruction outside the loop.

Note that there's also a dedicated pass to separate constant GEP offsets, which can be run as part of the backend pipeline: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp. It was enabled on AArch64 for a while but had to be disabled because it was also causing performance regressions .

huangjd mentioned this in D138950: [InstCombine] Revert D125845.Nov 29 2022, 1:50 PM

D138950 to revert this patch. Looks like reverting it will not interfere with D137212

huangjd added a reverting change: rGbe4b1dd35bd1: [InstCombine] Revert D125845.Nov 29 2022, 2:04 PM

I would need some clarification on inbounds keyword. When an inbounds GEP of GEP is being transformed, what kind of transformation and conditions keep the new GEP inbounds? For example GEP inbounds (GEP inbounds P a) b is equivalent to GEP inbounds (GEP inbounds P b) a if and only if a and b have the same sign. Are there other algebraically valid transformations? This actually does affect D137212 since it is swapping constant indexed GEP. Maybe I misunderstood what inbounds implies? I noticed arbitrary pointer arithmetic expression in C generates inbounds GEP even the pointer is clearly not pointing to any allocated object

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstructionCombining.cpp

45 lines

test/

Transforms/

InstCombine/

gep-canonicalize-constant-indices.ll

112 lines

gep-merge-constant-indices.ll

45 lines

shift.ll

4 lines

LoopVectorize/

AArch64/

vector-reverse-mask4.ll

6 lines

interleaved-accesses.ll

4 lines

PhaseOrdering/

single-iteration-loop-sroa.ll

4 lines

Diff 469287

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 1,937 Lines • ▼ Show 20 Lines
Instruction *InstCombinerImpl::visitGEPOfGEP(GetElementPtrInst &GEP,		Instruction *InstCombinerImpl::visitGEPOfGEP(GetElementPtrInst &GEP,
GEPOperator *Src) {		GEPOperator *Src) {
// Combine Indices - If the source pointer to this getelementptr instruction		// Combine Indices - If the source pointer to this getelementptr instruction
// is a getelementptr instruction with matching element type, combine the		// is a getelementptr instruction with matching element type, combine the
// indices of the two getelementptr instructions into a single instruction.		// indices of the two getelementptr instructions into a single instruction.
if (!shouldMergeGEPs(cast<GEPOperator>(&GEP), Src))		if (!shouldMergeGEPs(cast<GEPOperator>(&GEP), Src))
return nullptr;		return nullptr;

		// LICM moves a GEP with constant indices to the front, while canonicalization
		// swaps it to the back of a non-constant GEP. If both transformations can be
		// applied, LICM takes priority because it generally provides greater
		// optimization by reducing instruction count in the loop body, but performing
		// canonicalization swapping first negates the LICM opportunity while it does
		nikicUnsubmitted Not Done Reply Inline Actions -> `ShouldCanonicalizeSwap` nikic: -> `ShouldCanonicalizeSwap`
		spatelUnsubmitted Not Done Reply Inline Actions This code structure and the comments are confusing. I think it would be better to rearrange this like: bool LoopInvariantGEP = false; bool LoopInvariantSrc = false; if (LI && ...) { LoopInvariantGEP = L->isLoopInvariant(GEP.getOperand(1)); LoopInvariantSrc = L->isLoopInvariant(Src->getOperand(1)); } if (LoopInvariantGEP && !LoopInvariantSrc) { // do the existing transform } if (!(!LoopInvariantGEP && LoopInvariantSrc) && ...) { // do the new transform } I think that matches the logic in this patch, but don't we need a set of tests with loops to make sure that does what you expect? I don't see any tests with loops. spatel: This code structure and the comments are confusing. I think it would be better to rearrange…
		huangjdAuthorUnsubmitted Done Reply Inline Actions I think this doesn't simplify the logic, because LoopInvariant* is only meaningful within the scope of LoopInfo analysis. Also the existing transform (LICM) requires the Loop object from LI, so moving it outside of the first condition statement makes the code uglier. huangjd: I think this doesn't simplify the logic, because LoopInvariant* is only meaningful within the…
		huangjdAuthorUnsubmitted Done Reply Inline Actions It seems that LI is null when I test it. Is there an opt flag I need to enable? huangjd: It seems that LI is null when I test it. Is there an opt flag I need to enable?
		huangjdAuthorUnsubmitted Done Reply Inline Actions I confirmed this case is already covered by gep-combine-loop-invariant.ll. Taking out the ShouldCanonicalizeSwap check will cause infinite loop because two transformations trying to undo each other. In this case is a separate test case still needed? huangjd: I confirmed this case is already covered by gep-combine-loop-invariant.ll. Taking out the…
		// not necessarily reduce instruction count.
		bool ShouldCanonicalizeSwap = true;

if (Src->getResultElementType() == GEP.getSourceElementType() &&		if (Src->getResultElementType() == GEP.getSourceElementType() &&
Src->getNumOperands() == 2 && GEP.getNumOperands() == 2 &&		Src->getNumOperands() == 2 && GEP.getNumOperands() == 2 &&
Src->hasOneUse()) {		Src->hasOneUse()) {
Value *GO1 = GEP.getOperand(1);		Value *GO1 = GEP.getOperand(1);
Value *SO1 = Src->getOperand(1);		Value *SO1 = Src->getOperand(1);

if (LI) {		if (LI) {
// Try to reassociate loop invariant GEP chains to enable LICM.		// Try to reassociate loop invariant GEP chains to enable LICM.
if (Loop *L = LI->getLoopFor(GEP.getParent())) {		if (Loop *L = LI->getLoopFor(GEP.getParent())) {
		// If SO1 is invariant and GO1 is variant, they should not be swapped by
		// canonicalization even if it can be applied, otherwise it triggers
		// LICM swapping in the next iteration, causing an infinite loop.
		if (!L->isLoopInvariant(GO1) && L->isLoopInvariant(SO1))
		nikicUnsubmitted Not Done Reply Inline Actions Omit braces for single-statement if. nikic: Omit braces for single-statement if.
		ShouldCanonicalizeSwap = false;

		nikicUnsubmitted Not Done Reply Inline Actions This is an interesting case. An alternative would be to skip this code if GEP has all constant indices, i.e. to say that `gep (gep p, x), C` is always the canonical form, even if it were possible to move `(gep p, C)` out of the loop. This makes some sense in that `gep p, C` is typically free (folded into addressing mode). I'm not sure whether that would really be better though. nikic: This is an interesting case. An alternative would be to skip this code if GEP has all constant…
		huangjdAuthorUnsubmitted Done Reply Inline Actions LICM is always beneficial as it guarantees the reduction of runtime instruction count, but canonicalization can't guarantee an optimization huangjd: LICM is always beneficial as it guarantees the reduction of runtime instruction count, but…
		nikicUnsubmitted Not Done Reply Inline Actions The main thing I'd be concerned about here is that the loop invariance canonicalization is only performed if cached LoopInfo is available. This means that depending on whether a particular InstCombine run has LoopInfo available, we will switch back and forth between two different orders. nikic: The main thing I'd be concerned about here is that the loop invariance canonicalization is only…
		huangjdAuthorUnsubmitted Done Reply Inline Actions Since these two transforms contradict each other, one of them has to take precedence when both are available. If loopInfo is available during any pass and it swaps out the inner GEP, then further Instcombine pass shouldn't be able to swap it back. Perhaps I should add a check that canonicalization only takes place if both GEP are on the same BB huangjd: Since these two transforms contradict each other, one of them has to take precedence when both…
// Reassociate the two GEPs if SO1 is variant in the loop and GO1 is		// Reassociate the two GEPs if SO1 is variant in the loop and GO1 is
// invariant: this breaks the dependence between GEPs and allows LICM		// invariant: this breaks the dependence between GEPs and allows LICM
// to hoist the invariant part out of the loop.		// to hoist the invariant part out of the loop.
if (L->isLoopInvariant(GO1) && !L->isLoopInvariant(SO1)) {		if (L->isLoopInvariant(GO1) && !L->isLoopInvariant(SO1)) {
// The swapped GEPs are inbounds if both original GEPs are inbounds		// The swapped GEPs are inbounds if both original GEPs are inbounds
// and the sign of the offsets is the same. For simplicity, only		// and the sign of the offsets is the same. For simplicity, only
// handle both offsets being non-negative.		// handle both offsets being non-negative.
bool IsInBounds = Src->isInBounds() && GEP.isInBounds() &&		bool IsInBounds = Src->isInBounds() && GEP.isInBounds() &&
isKnownNonNegative(SO1, DL, 0, &AC, &GEP, &DT) &&		isKnownNonNegative(SO1, DL, 0, &AC, &GEP, &DT) &&
isKnownNonNegative(GO1, DL, 0, &AC, &GEP, &DT);		isKnownNonNegative(GO1, DL, 0, &AC, &GEP, &DT);
// Put NewSrc at same location as %src.		// Put NewSrc at same location as %src.
Builder.SetInsertPoint(cast<Instruction>(Src));		Builder.SetInsertPoint(cast<Instruction>(Src));
Value *NewSrc = Builder.CreateGEP(GEP.getSourceElementType(),		Value *NewSrc = Builder.CreateGEP(GEP.getSourceElementType(),
Src->getPointerOperand(), GO1,		Src->getPointerOperand(), GO1,
Src->getName(), IsInBounds);		Src->getName(), IsInBounds);
GetElementPtrInst *NewGEP = GetElementPtrInst::Create(		GetElementPtrInst *NewGEP = GetElementPtrInst::Create(
GEP.getSourceElementType(), NewSrc, {SO1});		GEP.getSourceElementType(), NewSrc, {SO1});
NewGEP->setIsInBounds(IsInBounds);		NewGEP->setIsInBounds(IsInBounds);
return NewGEP;		return NewGEP;
}		}
}		}
}		}
}		}

// Note that if our source is a gep chain itself then we wait for that		// Canonicalize swapping. Swap GEP with constant index suffix to the back if
// chain to be resolved before we perform this transformation. This		// it doesn't violate def-use relations or contradict with loop invariant
// avoids us creating a TON of code in some cases.		// swap above. This allows more potential applications of constant-indexed GEP
if (auto *SrcGEP = dyn_cast<GEPOperator>(Src->getOperand(0)))		// optimizations below.
if (SrcGEP->getNumOperands() == 2 && shouldMergeGEPs(Src, SrcGEP))		if (ShouldCanonicalizeSwap && Src->hasOneUse() &&
return nullptr; // Wait until our source is folded to completion.		Src->getPointerOperandType() == GEP.getPointerOperandType() &&
		Src->getType()->isVectorTy() == GEP.getType()->isVectorTy() &&
		!isa<GlobalValue>(Src->getPointerOperand())) {
		// When swapping, GEP with all constant indices are more prioritized than
		nikicUnsubmitted Done Reply Inline Actions Doing this is not necessary: InstCombine is not supposed to gracefully deal with invalid IR. I've removed the offending test in https://github.com/llvm/llvm-project/commit/128da94d38242c28e6bf23ad025e0cb2d6ce9e4f. nikic: Doing this is not necessary: InstCombine is not supposed to gracefully deal with invalid IR.
		// GEP with only the last few indices (but not all) being constant because
		// it may be merged with GEP with all constant indices.
		if ((isa<ConstantInt>(*(Src->indices().end() - 1)) &&
		!isa<ConstantInt>(*(GEP.indices().end() - 1))) \|\|
		(Src->hasAllConstantIndices() && !GEP.hasAllConstantIndices())) {
		nikicUnsubmitted Not Done Reply Inline Actions Unfortunately, this is not sufficient to preserve inbounds. If the indices have different sign, then swapping them may make the GEP non-inbounds even if both original GEPs were inbounds. `gep inbounds (gep inbounds p, -1), X ->` gep inbounds (gep inbounds p, X), -1` is not generally valid. nikic: Unfortunately, this is not sufficient to preserve inbounds. If the indices have different sign…
		huangjdAuthorUnsubmitted Not Done Reply Inline Actions I have seen the same inbounds check used in another place (line 2071), so that may be problematic as well? I think we could add another check in isMergedGEPInBounds to require both GEP offsets are known to be same sign? huangjd: I have seen the same inbounds check used in another place (line 2071), so that may be…
		huangjdAuthorUnsubmitted Done Reply Inline Actions Is the impact of changing inbounds GEP to non-inbounds significant to other optimizations? huangjd: Is the impact of changing inbounds GEP to non-inbounds significant to other optimizations?
		// Cannot guarantee inbounds after swapping because the non-const GEP can
		// have arbitrary sign.
		Value *NewSrc = Builder.CreateGEP(
		GEP.getSourceElementType(), Src->getOperand(0),
		nikicUnsubmitted Done Reply Inline Actions You probably don't need the explicit isOpaquePointerTy() check here, that should be covered by the type equality check. nikic: You probably don't need the explicit isOpaquePointerTy() check here, that should be covered by…
		SmallVector<Value *>(GEP.indices()), Src->getName());
		GetElementPtrInst *NewGEP = GetElementPtrInst::Create(
		Src->getSourceElementType(), NewSrc,
		SmallVector<Value *>(Src->indices()), GEP.getName());
		return NewGEP;
		}
		nikicUnsubmitted Done Reply Inline Actions I would recommend to initially only keep the `Src->hasAllConstantIndices() && !GEP.hasAllConstantIndices()` case. The profitability of this transform for the case where both GEPs are non-constant is not really clear. nikic: I would recommend to initially only keep the `Src->hasAllConstantIndices() && !GEP.
		huangjdAuthorUnsubmitted Done Reply Inline Actions If Src is not all constant index and GEP is all constant index, merging them can reduce one GEP. I have observed that in the backend, inst selection does not always generate the most simplified output for multiple GEP instructions, so it's better to reduce the IR count huangjd: If Src is not all constant index and GEP is all constant index, merging them can reduce one GEP.
		}

// For constant GEPs, use a more general offset-based folding approach.		// For constant GEPs, use a more general offset-based folding approach.
// Only do this for opaque pointers, as the result element type may change.		// Only do this for opaque pointers, as the result element type may change.
Type *PtrTy = Src->getType()->getScalarType();		Type *PtrTy = Src->getType()->getScalarType();
if (PtrTy->isOpaquePointerTy() && GEP.hasAllConstantIndices() &&		if (PtrTy->isOpaquePointerTy() && GEP.hasAllConstantIndices() &&
		nikicUnsubmitted Done Reply Inline Actions You can pass InBounds directly to CreateGEP, as the last argument. nikic: You can pass InBounds directly to CreateGEP, as the last argument.
(Src->hasOneUse() \|\| Src->hasAllConstantIndices())) {		(Src->hasOneUse() \|\| Src->hasAllConstantIndices())) {
// Split Src into a variable part and a constant suffix.		// Split Src into a variable part and a constant suffix.
gep_type_iterator GTI = gep_type_begin(*Src);		gep_type_iterator GTI = gep_type_begin(*Src);
Type *BaseType = GTI.getIndexedType();		Type *BaseType = GTI.getIndexedType();
bool IsFirstType = true;		bool IsFirstType = true;
unsigned NumVarIndices = 0;		unsigned NumVarIndices = 0;
for (auto Pair : enumerate(Src->indices())) {		for (auto Pair : enumerate(Src->indices())) {
if (!isa<ConstantInt>(Pair.value())) {		if (!isa<ConstantInt>(Pair.value())) {
Show All 32 Lines	if (!Offset.isZero() \|\| (!IsFirstType && !ConstIndices[0].isZero())) {
// convert them to a GEP of i8.		// convert them to a GEP of i8.
if (Src->hasAllConstantIndices())		if (Src->hasAllConstantIndices())
return isMergedGEPInBounds(Src, cast<GEPOperator>(&GEP))		return isMergedGEPInBounds(Src, cast<GEPOperator>(&GEP))
? GetElementPtrInst::CreateInBounds(		? GetElementPtrInst::CreateInBounds(
Builder.getInt8Ty(), Src->getOperand(0),		Builder.getInt8Ty(), Src->getOperand(0),
Builder.getInt(OffsetOld), GEP.getName())		Builder.getInt(OffsetOld), GEP.getName())
: GetElementPtrInst::Create(		: GetElementPtrInst::Create(
Builder.getInt8Ty(), Src->getOperand(0),		Builder.getInt8Ty(), Src->getOperand(0),
Builder.getInt(OffsetOld), GEP.getName());		Builder.getInt(OffsetOld), GEP.getName());
		nikicUnsubmitted Done Reply Inline Actions This should be a separate patch, it's an independent transform. nikic: This should be a separate patch, it's an independent transform.
return nullptr;		return nullptr;
}		}

bool IsInBounds = isMergedGEPInBounds(Src, cast<GEPOperator>(&GEP));		bool IsInBounds = isMergedGEPInBounds(Src, cast<GEPOperator>(&GEP));
SmallVector<Value *> Indices;		SmallVector<Value *> Indices;
append_range(Indices, drop_end(Src->indices(),		append_range(Indices, drop_end(Src->indices(),
Src->getNumIndices() - NumVarIndices));		Src->getNumIndices() - NumVarIndices));
for (const APInt &Idx : drop_begin(ConstIndices, !IsFirstType)) {		for (const APInt &Idx : drop_begin(ConstIndices, !IsFirstType)) {
▲ Show 20 Lines • Show All 2,643 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/gep-canonicalize-constant-indices.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=instcombine -opaque-pointers -S \| FileCheck %s			; RUN: opt < %s -passes='require<loops>,instcombine' -opaque-pointers -S \| FileCheck %s

	; Constant-indexed GEP instructions in a chain of GEP instructions should be			; Constant-indexed GEP instructions in a chain of GEP instructions should be
	; swapped to the end whenever such transformation is valid. This allows them to			; swapped to the end whenever such transformation is valid. This allows them to
	; be merged.			; be merged.

	declare void @use(i1)


	; The constant-indexed GEP instruction should be swapped to the end, even			; The constant-indexed GEP instruction should be swapped to the end, even
	; without merging.			; without merging.
	; result = (((ptr) p + a) + b) + 1			; result = (((i32*) p + a) + b) + 1
	define ptr @basic(ptr %p, i64 %a, i64 %b) {			define ptr @basic(ptr %p, i64 %a, i64 %b) {
	; CHECK-LABEL: @basic(			; CHECK-LABEL: @basic(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = getelementptr i32, ptr [[P:%.]], i64 [[A:%.*]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[A:%.]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr i32, ptr [[TMP1]], i64 [[B:%.]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, ptr [[TMP2]], i64 [[B:%.]]			; CHECK-NEXT: [[TMP3:%.*]] = getelementptr i32, ptr [[TMP2]], i64 1
	; CHECK-NEXT: ret ptr [[TMP3]]			; CHECK-NEXT: ret ptr [[TMP3]]
	;			;
	%1 = getelementptr inbounds i32, ptr %p, i64 1			%1 = getelementptr inbounds i32, ptr %p, i64 1
	%2 = getelementptr inbounds i32, ptr %1, i64 %a			%2 = getelementptr inbounds i32, ptr %1, i64 %a
	%3 = getelementptr inbounds i32, ptr %2, i64 %b			%3 = getelementptr inbounds i32, ptr %2, i64 %b
	ret ptr %3			ret ptr %3
	}			}

	; GEP with the last index being a constant should also be swapped.			; GEP with the last index being a constant should also be swapped.
	define ptr @partialConstant1(ptr %p, i64 %a, i64 %b) {			define ptr @partialConstant1(ptr %p, i64 %a, i64 %b) {
	; CHECK-LABEL: @partialConstant1(			; CHECK-LABEL: @partialConstant1(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 [[B:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr i32, ptr [[P:%.]], i64 [[B:%.*]]
	; CHECK-NEXT: ret ptr [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr [4 x i32], ptr [[TMP1]], i64 [[A:%.]], i64 1
				; CHECK-NEXT: ret ptr [[TMP2]]
	;			;
	%1 = getelementptr inbounds [4 x i32], ptr %p, i64 %a, i64 1			%1 = getelementptr inbounds [4 x i32], ptr %p, i64 %a, i64 1
				spatelUnsubmitted Not Done Reply Inline Actions I don't know what this test and the next one are supposed to show. %1 is dead code, so it gets eliminated before anything else might have happened. spatel: I don't know what this test and the next one are supposed to show. %1 is dead code, so it gets…
				spatelUnsubmitted Not Done Reply Inline Actions I'm more confused now. The test had 1 GEP, but now it has 2 - how is that better? spatel: I'm more confused now. The test had 1 GEP, but now it has 2 - how is that better?
				bjopeUnsubmitted Not Done Reply Inline Actions I'm more confused now. The test had 1 GEP, but now it has 2 - how is that better? This question is still not answered afaict. We now get two non-inbound GEP:s instead of one inbounds GEP. Given that there are no other GEP:s etc that can be simplified (as in this test case) that doesn't look like an improvement? And I guess loosing the inbounds property can be bad in general when doing this canonicalizations. Do you have any examples when you see a performance improvement when rewriting inbound GEP:s into non-inbound GEP:s? Given the amount of complaints about seen regressions it would be nice to hear what kinds of benchmarks that were done and if you tried to limit it to cases when instruction count didn't increase, or when it would turn X inbound GEP:s into X or more non-inbound GEP:s. bjope: > I'm more confused now. The test had 1 GEP, but now it has 2 - how is that better? This…
				davidxlUnsubmitted Not Done Reply Inline Actions The changes in this test seems to be irrelevant. The original test was probably buggy (with dead code), and the change here just fixed the test. (not sure if the main patch will affect the output though). davidxl: The changes in this test seems to be irrelevant. The original test was probably buggy (with…
	%2 = getelementptr inbounds i32, ptr %p, i64 %b			%2 = getelementptr inbounds i32, ptr %1, i64 %b
	ret ptr %2			ret ptr %2
	}			}

	; Negative test. GEP should not be swapped if the last index is not a constant.			; Negative test. GEP should not be swapped if the last index is not a constant.
	define ptr @partialConstant2(ptr %p, i64 %a, i64 %b) {			define ptr @partialConstant2(ptr %p, i64 %a, i64 %b) {
	; CHECK-LABEL: @partialConstant2(			; CHECK-LABEL: @partialConstant2(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 [[B:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [4 x i32], ptr [[P:%.]], i64 1, i64 [[A:%.*]]
	; CHECK-NEXT: ret ptr [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[B:%.]]
				; CHECK-NEXT: ret ptr [[TMP2]]
	;			;
	%1 = getelementptr inbounds [4 x i32], ptr %p, i64 1, i64 %a			%1 = getelementptr inbounds [4 x i32], ptr %p, i64 1, i64 %a
	%2 = getelementptr inbounds i32, ptr %p, i64 %b			%2 = getelementptr inbounds i32, ptr %1, i64 %b
	ret ptr %2			ret ptr %2
	}			}

	; Constant-indexed GEP are merged after swawpping.			; Constant-indexed GEP are merged after swapping.
				spatelUnsubmitted Not Done Reply Inline Actions typo: swapping spatel: typo: swapping
	; result = ((ptr) p + a) + 3			; result = ((i32*) p + a) + 3
	define ptr @merge(ptr %p, i64 %a) {			define ptr @merge(ptr %p, i64 %a) {
	; CHECK-LABEL: @merge(			; CHECK-LABEL: @merge(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = getelementptr i32, ptr [[P:%.]], i64 [[A:%.*]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[A:%.]]			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr i32, ptr [[TMP1]], i64 3
	; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i64 2			; CHECK-NEXT: ret ptr [[TMP2]]
	; CHECK-NEXT: ret ptr [[TMP3]]
	;			;
	%1 = getelementptr inbounds i32, ptr %p, i64 1			%1 = getelementptr inbounds i32, ptr %p, i64 1
	%2 = getelementptr inbounds i32, ptr %1, i64 %a			%2 = getelementptr inbounds i32, ptr %1, i64 %a
	%3 = getelementptr inbounds i32, ptr %2, i64 2			%3 = getelementptr inbounds i32, ptr %2, i64 2
	ret ptr %3			ret ptr %3
	}			}

	; Multiple constant-indexed GEP. Note that the first two cannot be merged at			; Multiple constant-indexed GEP. Note that the first two cannot be merged at
	; first, but after the second and third are merged, the result can be merged			; first, but after the second and third are merged, the result can be merged
	; with the first one on the next pass.			; with the first one on the next pass.
	; result = (ptr) ((ptr) ((ptr) ptr + a) + (a * b)) + 9			; result = (<3 x i32>) ((i16) ((i8) ptr + a) + (a b)) + 9
	define ptr @nested(ptr %p, i64 %a, i64 %b) {			define ptr @nested(ptr %p, i64 %a, i64 %b) {
	; CHECK-LABEL: @nested(			; CHECK-LABEL: @nested(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds <3 x i32>, ptr [[P:%.]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = getelementptr i8, ptr [[P:%.]], i64 [[A:%.*]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, ptr [[TMP1]], i64 [[A:%.]]			; CHECK-NEXT: [[TMP2:%.]] = mul i64 [[A]], [[B:%.]]
	; CHECK-NEXT: [[TMP3:%.]] = mul i64 [[A]], [[B:%.]]			; CHECK-NEXT: [[TMP3:%.*]] = getelementptr i16, ptr [[TMP1]], i64 [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds <5 x i32>, ptr [[TMP2]], i64 4			; CHECK-NEXT: [[TMP4:%.*]] = getelementptr <3 x i32>, ptr [[TMP3]], i64 10
	; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i16, ptr [[TMP4]], i64 [[TMP3]]			; CHECK-NEXT: ret ptr [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds <4 x i32>, ptr [[TMP5]], i64 1
	; CHECK-NEXT: ret ptr [[TMP6]]
	;			;
	%1 = getelementptr inbounds <3 x i32>, ptr %p, i64 1			%1 = getelementptr inbounds <3 x i32>, ptr %p, i64 1
	%2 = getelementptr inbounds i8, ptr %1, i64 %a			%2 = getelementptr inbounds i8, ptr %1, i64 %a
	%3 = mul i64 %a, %b			%3 = mul i64 %a, %b
	%4 = getelementptr inbounds <5 x i32>, ptr %2, i64 4			%4 = getelementptr inbounds <5 x i32>, ptr %2, i64 4
	%5 = getelementptr inbounds i16, ptr %4, i64 %3			%5 = getelementptr inbounds i16, ptr %4, i64 %3
	%6 = getelementptr inbounds <4 x i32>, ptr %5, i64 1			%6 = getelementptr inbounds <4 x i32>, ptr %5, i64 1
	ret ptr %6			ret ptr %6
	}			}

	; It is valid to swap if the source operand of the first GEP has multiple uses.			; It is valid to swap if the source operand of the first GEP has multiple uses.
	define ptr @multipleUses1(ptr %p) {			define ptr @multipleUses1(ptr %p) {
	; CHECK-LABEL: @multipleUses1(			; CHECK-LABEL: @multipleUses1(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = ptrtoint ptr [[P:%.]] to i64
	; CHECK-NEXT: [[TMP2:%.*]] = ptrtoint ptr [[P]] to i64			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr i32, ptr [[P]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = getelementptr i32, ptr [[TMP2]], i64 1
	; CHECK-NEXT: ret ptr [[TMP3]]			; CHECK-NEXT: ret ptr [[TMP3]]
	;			;
	%1 = getelementptr inbounds i32, ptr %p, i64 1			%1 = getelementptr inbounds i32, ptr %p, i64 1
	%2 = ptrtoint ptr %p to i64			%2 = ptrtoint ptr %p to i64
	%3 = getelementptr inbounds i32, ptr %1, i64 %2			%3 = getelementptr inbounds i32, ptr %1, i64 %2
	ret ptr %3			ret ptr %3
	}			}

	; It is valid to swap if the second GEP has multiple uses.
	define ptr @multipleUses2(ptr %p, i64 %a) {
	; CHECK-LABEL: @multipleUses2(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 1
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[A:%.]]
	; CHECK-NEXT: call void @use(ptr nonnull [[TMP2]])
	; CHECK-NEXT: ret ptr [[TMP2]]
	;
	%1 = getelementptr inbounds i32, ptr %p, i64 1
	%2 = getelementptr inbounds i32, ptr %1, i64 %a
	call void @use(ptr %2)
	ret ptr %2
	}

	; Negative test. It is not valid to swap if the first GEP has multiple uses.			; Negative test. It is not valid to swap if the first GEP has multiple uses.
				spatelUnsubmitted Not Done Reply Inline Actions This test doesn't add much. AFAIK every transform in instcombine does RAUW for the final (root) value. spatel: This test doesn't add much. AFAIK every transform in instcombine does RAUW for the final (root)…
	define ptr @multipleUses3(ptr %p) {			define ptr @multipleUses2(ptr %p) {
	; CHECK-LABEL: @multipleUses3(			; CHECK-LABEL: @multipleUses2(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 1
	; CHECK-NEXT: [[TMP2:%.*]] = ptrtoint ptr [[TMP1]] to i64			; CHECK-NEXT: [[TMP2:%.*]] = ptrtoint ptr [[TMP1]] to i64
	; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[TMP2]]
	; CHECK-NEXT: ret ptr [[TMP3]]			; CHECK-NEXT: ret ptr [[TMP3]]
	;			;
	%1 = getelementptr inbounds i32, ptr %p, i64 1			%1 = getelementptr inbounds i32, ptr %p, i64 1
	%2 = ptrtoint ptr %1 to i64			%2 = ptrtoint ptr %1 to i64
	%3 = getelementptr inbounds i32, ptr %1, i64 %2			%3 = getelementptr inbounds i32, ptr %1, i64 %2
	ret ptr %3			ret ptr %3
	}			}

				; Negative test. LICM should take priority over canonicalization, so the first
				; GEP should not be swapped, even if it contains a constant index.
				define i64 @licm(ptr %p) {
				; CHECK-LABEL: @licm(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INEXT:%.*]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[SUM:%.]] = phi i64 [ 0, [[ENTRY]] ], [ [[ADD:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[P1:%.]] = getelementptr i64, ptr [[P:%.]], i64 4
				; CHECK-NEXT: [[P2:%.*]] = getelementptr i64, ptr [[P1]], i64 [[I]]
				; CHECK-NEXT: [[LOAD:%.*]] = load i64, ptr [[P2]], align 4
				; CHECK-NEXT: [[ADD]] = add nsw i64 [[SUM]], [[LOAD]]
				; CHECK-NEXT: [[INEXT]] = add nuw nsw i64 [[I]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[I]], 1000000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
				; CHECK: for.end:
				; CHECK-NEXT: ret i64 [[ADD]]
				;
				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ 0, %entry ], [ %inext, %for.body ]
				%sum = phi i64 [ 0, %entry ], [ %add, %for.body ]
				%p1 = getelementptr i64, ptr %p, i64 4
				%p2 = getelementptr i64, ptr %p1, i64 %i
				%load = load i64, ptr %p2
				%add = add nsw i64 %sum, %load
				%inext = add nuw nsw i64 %i, 1
				%exitcond = icmp eq i64 %i, 1000000
				br i1 %exitcond, label %for.end, label %for.body

				for.end:
				ret i64 %add
				}

llvm/test/Transforms/InstCombine/gep-merge-constant-indices.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=instcombine -opaque-pointers -S \| FileCheck %s			; RUN: opt < %s -passes=instcombine -opaque-pointers -S \| FileCheck %s

	; Test merging GEP of GEP with constant indices.			; Test merging GEP of GEP with constant indices.

	target datalayout = "i24:8:8"			target datalayout = "i24:8:8"

	%struct.A = type { [123 x i8], i32 }			%struct.A = type { [123 x i8], i32 }
	%struct.B = type { i8, [3 x i16], %struct.A, float }			%struct.B = type { i8, [3 x i16], %struct.A, float }
	%struct.C = type { i8, i32, i32 }			%struct.C = type { i8, i32, i32 }

	; result = (ptr) p + 3			; result = (i32*) p + 3
	define ptr @mergeBasic(ptr %p) {			define ptr @mergeBasic(ptr %p) {
	; CHECK-LABEL: @mergeBasic(			; CHECK-LABEL: @mergeBasic(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 3			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 3
	; CHECK-NEXT: ret ptr [[TMP1]]			; CHECK-NEXT: ret ptr [[TMP1]]
	;			;
	%1 = getelementptr inbounds i32, ptr %p, i64 1			%1 = getelementptr inbounds i32, ptr %p, i64 1
	%2 = getelementptr inbounds i32, ptr %1, i64 2			%2 = getelementptr inbounds i32, ptr %1, i64 2
	ret ptr %2			ret ptr %2
	}			}

	; Converted to ptr and merged.			; Converted to i8* and merged.
	; result = (ptr) p + 10			; result = (i8*) p + 10
	define ptr @mergeDifferentTypes(ptr %p) {			define ptr @mergeDifferentTypes(ptr %p) {
	; CHECK-LABEL: @mergeDifferentTypes(			; CHECK-LABEL: @mergeDifferentTypes(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, ptr [[P:%.]], i64 10			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, ptr [[P:%.]], i64 10
	; CHECK-NEXT: ret ptr [[TMP1]]			; CHECK-NEXT: ret ptr [[TMP1]]
	;			;
	%1 = getelementptr inbounds i8, ptr %p, i64 2			%1 = getelementptr inbounds i8, ptr %p, i64 2
	%2 = getelementptr inbounds i64, ptr %1, i64 1			%2 = getelementptr inbounds i64, ptr %1, i64 1
	ret ptr %2			ret ptr %2
	}			}

	; Converted to ptr and merged.			; Converted to i8* and merged.
	; result = (ptr) p + 10			; result = (i8*) p + 10
	define ptr @mergeReverse(ptr %p) {			define ptr @mergeReverse(ptr %p) {
	; CHECK-LABEL: @mergeReverse(			; CHECK-LABEL: @mergeReverse(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, ptr [[P:%.]], i64 10			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, ptr [[P:%.]], i64 10
	; CHECK-NEXT: ret ptr [[TMP1]]			; CHECK-NEXT: ret ptr [[TMP1]]
	;			;
	%1 = getelementptr inbounds i64, ptr %p, i64 1			%1 = getelementptr inbounds i64, ptr %p, i64 1
	%2 = getelementptr inbounds i8, ptr %1, i64 2			%2 = getelementptr inbounds i8, ptr %1, i64 2
	ret ptr %2			ret ptr %2
	}			}

	; Offsets of first and last GEP cancel out.			; Offsets of first and last GEP cancel out.
	; result = p			; result = p
	define ptr @zeroSum(ptr %p) {			define ptr @zeroSum(ptr %p) {
	; CHECK-LABEL: @zeroSum(			; CHECK-LABEL: @zeroSum(
	; CHECK-NEXT: ret ptr [[P:%.*]]			; CHECK-NEXT: ret ptr [[P:%.*]]
	;			;
	%1 = getelementptr inbounds i32, ptr %p, i64 1			%1 = getelementptr inbounds i32, ptr %p, i64 1
	%2 = getelementptr inbounds i8, ptr %1, i64 -4			%2 = getelementptr inbounds i8, ptr %1, i64 -4
	ret ptr %2			ret ptr %2
	}			}

	; result = (ptr) ((ptr) p + 1) + 17			; result = (i8) (([20 x i8]) p + 1) + 17
	define ptr @array1(ptr %p) {			define ptr @array1(ptr %p) {
	; CHECK-LABEL: @array1(			; CHECK-LABEL: @array1(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [20 x i8], ptr [[P:%.]], i64 1, i64 17			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [20 x i8], ptr [[P:%.]], i64 1, i64 17
	; CHECK-NEXT: ret ptr [[TMP1]]			; CHECK-NEXT: ret ptr [[TMP1]]
	;			;
	%1 = getelementptr inbounds [20 x i8], ptr %p, i64 1, i64 1			%1 = getelementptr inbounds [20 x i8], ptr %p, i64 1, i64 1
	%2 = getelementptr inbounds i64, ptr %1, i64 2			%2 = getelementptr inbounds i64, ptr %1, i64 2
	ret ptr %2			ret ptr %2
	}			}

	; Converted to ptr and merged.			; Converted to i8* and merged.
	; result = (ptr) p + 20			; result = (i8*) p + 20
	define ptr @array2(ptr %p) {			define ptr @array2(ptr %p) {
	; CHECK-LABEL: @array2(			; CHECK-LABEL: @array2(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, ptr [[P:%.]], i64 20			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, ptr [[P:%.]], i64 20
	; CHECK-NEXT: ret ptr [[TMP1]]			; CHECK-NEXT: ret ptr [[TMP1]]
	;			;
	%1 = getelementptr inbounds i64, ptr %p, i64 2			%1 = getelementptr inbounds i64, ptr %p, i64 2
	%2 = getelementptr inbounds [3 x i8], ptr %1, i64 1, i64 1			%2 = getelementptr inbounds [3 x i8], ptr %1, i64 1, i64 1
	ret ptr %2			ret ptr %2
	}			}

	; Converted to ptr and merged.			; Converted to i8* and merged.
	; result = (ptr) p + 36			; result = (i8*) p + 36
	define ptr @struct1(ptr %p) {			define ptr @struct1(ptr %p) {
	; CHECK-LABEL: @struct1(			; CHECK-LABEL: @struct1(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, ptr [[P:%.]], i64 36			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, ptr [[P:%.]], i64 36
	; CHECK-NEXT: ret ptr [[TMP1]]			; CHECK-NEXT: ret ptr [[TMP1]]
	;			;
	%1 = getelementptr inbounds i64, ptr %p, i64 3			%1 = getelementptr inbounds i64, ptr %p, i64 3
	%2 = getelementptr inbounds %struct.C, ptr %1, i64 1			%2 = getelementptr inbounds %struct.C, ptr %1, i64 1
	ret ptr %2			ret ptr %2
	}			}

	; result = &((struct.A*) p - 1).member1			; result = &((struct.A*) p - 1).member1
	define ptr @struct2(ptr %p) {			define ptr @struct2(ptr %p) {
	; CHECK-LABEL: @struct2(			; CHECK-LABEL: @struct2(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr [[STRUCT_A:%.]], ptr [[P:%.*]], i64 -1, i32 1			; CHECK-NEXT: [[TMP1:%.]] = getelementptr [[STRUCT_A:%.]], ptr [[P:%.*]], i64 -1, i32 1
	; CHECK-NEXT: ret ptr [[TMP1]]			; CHECK-NEXT: ret ptr [[TMP1]]
	;			;
	%1 = getelementptr inbounds %struct.A, ptr %p, i64 0, i32 1			%1 = getelementptr inbounds %struct.A, ptr %p, i64 0, i32 1
	%2 = getelementptr inbounds i8, ptr %1, i64 -128			%2 = getelementptr inbounds i8, ptr %1, i64 -128
	ret ptr %2			ret ptr %2
	}			}

	; result = (ptr) &((struct.B) p)[0].member2.member0 + 7			; result = (i8*) &((struct.B) p)[0].member2.member0 + 7
	define ptr @structStruct(ptr %p) {			define ptr @structStruct(ptr %p) {
	; CHECK-LABEL: @structStruct(			; CHECK-LABEL: @structStruct(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_B:%.]], ptr [[P:%.*]], i64 0, i32 2, i32 0, i64 7			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_B:%.]], ptr [[P:%.*]], i64 0, i32 2, i32 0, i64 7
	; CHECK-NEXT: ret ptr [[TMP1]]			; CHECK-NEXT: ret ptr [[TMP1]]
	;			;
	%1 = getelementptr inbounds %struct.B, ptr %p, i64 0, i32 2, i32 0, i64 3			%1 = getelementptr inbounds %struct.B, ptr %p, i64 0, i32 2, i32 0, i64 3
	%2 = getelementptr inbounds %struct.A, ptr %1, i64 0, i32 0, i64 4			%2 = getelementptr inbounds %struct.A, ptr %1, i64 0, i32 0, i64 4
	ret ptr %2			ret ptr %2
	}			}

	; First GEP offset is not divisible by last GEP's source element size, but first			; First GEP offset is not divisible by last GEP's source element size, but first
	; GEP points to an array such that the last GEP offset is divisible by the			; GEP points to an array such that the last GEP offset is divisible by the
	; array's element size, so the first GEP can be rewritten with an extra index.			; array's element size, so the first GEP can be rewritten with an extra index.
	; result = (ptr) &((struct.B*) p)[i].member1 + 2			; result = (i16) &((struct.B) p)[i].member1 + 2
	define ptr @appendIndex(ptr %p, i64 %i) {			define ptr @appendIndex(ptr %p, i64 %i) {
	; CHECK-LABEL: @appendIndex(			; CHECK-LABEL: @appendIndex(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_B:%.]], ptr [[P:%.]], i64 [[I:%.]], i32 1, i64 2			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_B:%.]], ptr [[P:%.]], i64 [[I:%.]], i32 1, i64 2
	; CHECK-NEXT: ret ptr [[TMP1]]			; CHECK-NEXT: ret ptr [[TMP1]]
	;			;
	%1 = getelementptr inbounds %struct.B, ptr %p, i64 %i, i32 1			%1 = getelementptr inbounds %struct.B, ptr %p, i64 %i, i32 1
	%2 = getelementptr inbounds i32, ptr %1, i64 1			%2 = getelementptr inbounds i32, ptr %1, i64 1
	ret ptr %2			ret ptr %2
	}			}

	; Offset of either GEP is not divisible by the other's size, converted to ptr			; After canonicalizing, the second GEP is moved to the front, and then merged
				; with the first one with rewritten indices.
				; result = (i8) &((struct.A) &((struct.B*) p)[i].member2).member0 + 2
				define ptr @appendIndexReverse(ptr %p, i64 %i) {
				; CHECK-LABEL: @appendIndexReverse(
				; CHECK-NEXT: [[TMP1:%.]] = getelementptr [[STRUCT_B:%.]], ptr [[P:%.]], i64 [[I:%.]], i32 2, i32 0, i64 2
				; CHECK-NEXT: ret ptr [[TMP1]]
				;
				%1 = getelementptr inbounds i64, ptr %p, i64 1
				%2 = getelementptr inbounds %struct.B, ptr %1, i64 %i, i32 1
				ret ptr %2
				}

				; Offset of either GEP is not divisible by the other's size, converted to i8*
	; and merged.			; and merged.
	; Here i24 is 8-bit aligned.			; Here i24 is 8-bit aligned.
	; result = (ptr) p + 7			; result = (i8*) p + 7
	define ptr @notDivisible(ptr %p) {			define ptr @notDivisible(ptr %p) {
	; CHECK-LABEL: @notDivisible(			; CHECK-LABEL: @notDivisible(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, ptr [[P:%.]], i64 7			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, ptr [[P:%.]], i64 7
	; CHECK-NEXT: ret ptr [[TMP1]]			; CHECK-NEXT: ret ptr [[TMP1]]
	;			;
	%1 = getelementptr inbounds i24, ptr %p, i64 1			%1 = getelementptr inbounds i24, ptr %p, i64 1
	%2 = getelementptr inbounds i32, ptr %1, i64 1			%2 = getelementptr inbounds i32, ptr %1, i64 1
	ret ptr %2			ret ptr %2
	}			}

	; Negative test. Two GEP should not be merged if not both offsets are constant			; Negative test. Two GEP should not be merged if not both offsets are constant
	; or divisible by the other's size.			; or divisible by the other's size.
	define ptr @partialConstant2(ptr %p, i64 %a) {			define ptr @partialConstant2(ptr %p, i64 %a) {
	; CHECK-LABEL: @partialConstant2(			; CHECK-LABEL: @partialConstant2(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = getelementptr [4 x i64], ptr [[P:%.]], i64 [[A:%.*]], i64 2
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [4 x i64], ptr [[TMP1]], i64 [[A:%.]], i64 2			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr i32, ptr [[TMP1]], i64 1
	; CHECK-NEXT: ret ptr [[TMP2]]			; CHECK-NEXT: ret ptr [[TMP2]]
	;			;
	%1 = getelementptr inbounds i32, ptr %p, i64 1			%1 = getelementptr inbounds i32, ptr %p, i64 1
	%2 = getelementptr inbounds [4 x i64], ptr %1, i64 %a, i64 2			%2 = getelementptr inbounds [4 x i64], ptr %1, i64 %a, i64 2
	ret ptr %2			ret ptr %2
	}			}

	; Negative test. Two GEP should not be merged if there is another use of the			; Negative test. Two GEP should not be merged if there is another use of the
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/shift.ll

Show First 20 Lines • Show All 1,717 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

; OSS Fuzz #26135		; OSS Fuzz #26135
; https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=26135		; https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=26135
define void @ashr_out_of_range_1(ptr %A) {		define void @ashr_out_of_range_1(ptr %A) {
; CHECK-LABEL: @ashr_out_of_range_1(		; CHECK-LABEL: @ashr_out_of_range_1(
; CHECK-NEXT: [[L:%.]] = load i177, ptr [[A:%.]], align 4		; CHECK-NEXT: [[L:%.]] = load i177, ptr [[A:%.]], align 4
; CHECK-NEXT: [[G11:%.*]] = getelementptr i177, ptr [[A]], i64 -1
; CHECK-NEXT: [[B24_LOBIT:%.*]] = ashr i177 [[L]], 175		; CHECK-NEXT: [[B24_LOBIT:%.*]] = ashr i177 [[L]], 175
; CHECK-NEXT: [[TMP1:%.*]] = trunc i177 [[B24_LOBIT]] to i64		; CHECK-NEXT: [[TMP1:%.*]] = trunc i177 [[B24_LOBIT]] to i64
; CHECK-NEXT: [[G62:%.*]] = getelementptr i177, ptr [[G11]], i64 [[TMP1]]		; CHECK-NEXT: [[G111:%.*]] = getelementptr i177, ptr [[A]], i64 [[TMP1]]
		; CHECK-NEXT: [[G62:%.*]] = getelementptr i177, ptr [[G111]], i64 -1
; CHECK-NEXT: store i177 0, ptr [[G62]], align 4		; CHECK-NEXT: store i177 0, ptr [[G62]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%L = load i177, ptr %A, align 4		%L = load i177, ptr %A, align 4
%B5 = udiv i177 %L, -1		%B5 = udiv i177 %L, -1
%B4 = add i177 %B5, -1		%B4 = add i177 %B5, -1
%B = and i177 %B4, %L		%B = and i177 %B4, %L
%B2 = add i177 %B, -1		%B2 = add i177 %B, -1
▲ Show 20 Lines • Show All 251 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll

	Show All 32 Lines
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = xor i64 [[INDEX]], -1			; CHECK-NEXT: [[TMP0:%.*]] = xor i64 [[INDEX]], -1
	; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[TMP0]], [[N]]			; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[TMP0]], [[N]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds double, double [[COND:%.*]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds double, double [[COND:%.*]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds double, double [[TMP2]], i64 -3			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds double, double [[TMP2]], i64 -3
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TMP3]] to <4 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TMP3]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x double>, <4 x double> [[TMP4]], align 8			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x double>, <4 x double> [[TMP4]], align 8
	; CHECK-NEXT: [[REVERSE:%.*]] = shufflevector <4 x double> [[WIDE_LOAD]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE:%.*]] = shufflevector <4 x double> [[WIDE_LOAD]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds double, double [[TMP2]], i64 -4			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[TMP2]], i64 -7
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[TMP5]], i64 -3
	; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[TMP6]] to <4 x double>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[TMP6]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x double>, <4 x double> [[TMP7]], align 8			; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x double>, <4 x double> [[TMP7]], align 8
	; CHECK-NEXT: [[REVERSE2:%.*]] = shufflevector <4 x double> [[WIDE_LOAD1]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE2:%.*]] = shufflevector <4 x double> [[WIDE_LOAD1]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP8:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer			; CHECK-NEXT: [[TMP8:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer
	; CHECK-NEXT: [[TMP9:%.*]] = fcmp une <4 x double> [[REVERSE2]], zeroinitializer			; CHECK-NEXT: [[TMP9:%.*]] = fcmp une <4 x double> [[REVERSE2]], zeroinitializer
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr double, double [[A:%.*]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP10:%.]] = getelementptr double, double [[A:%.*]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr double, double [[TMP10]], i64 -3			; CHECK-NEXT: [[TMP11:%.]] = getelementptr double, double [[TMP10]], i64 -3
	; CHECK-NEXT: [[REVERSE3:%.*]] = shufflevector <4 x i1> [[TMP8]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE3:%.*]] = shufflevector <4 x i1> [[TMP8]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[TMP11]] to <4 x double>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[TMP11]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP12]], i32 8, <4 x i1> [[REVERSE3]], <4 x double> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP12]], i32 8, <4 x i1> [[REVERSE3]], <4 x double> poison)
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr double, double [[TMP10]], i64 -4			; CHECK-NEXT: [[TMP14:%.]] = getelementptr double, double [[TMP10]], i64 -7
	; CHECK-NEXT: [[TMP14:%.]] = getelementptr double, double [[TMP13]], i64 -3
	; CHECK-NEXT: [[REVERSE5:%.*]] = shufflevector <4 x i1> [[TMP9]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE5:%.*]] = shufflevector <4 x i1> [[TMP9]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[TMP14]] to <4 x double>*			; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[TMP14]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP15]], i32 8, <4 x i1> [[REVERSE5]], <4 x double> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP15]], i32 8, <4 x i1> [[REVERSE5]], <4 x double> poison)
	; CHECK-NEXT: [[TMP16:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP16:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP17:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD6]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP17:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD6]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP18:%.]] = bitcast double [[TMP11]] to <4 x double>*			; CHECK-NEXT: [[TMP18:%.]] = bitcast double [[TMP11]] to <4 x double>*
	; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP16]], <4 x double>* [[TMP18]], i32 8, <4 x i1> [[REVERSE3]])			; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP16]], <4 x double>* [[TMP18]], i32 8, <4 x i1> [[REVERSE3]])
	; CHECK-NEXT: [[TMP19:%.]] = bitcast double [[TMP14]] to <4 x double>*			; CHECK-NEXT: [[TMP19:%.]] = bitcast double [[TMP14]] to <4 x double>*
	▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll

	Show First 20 Lines • Show All 756 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP0:%.*]] = mul i64 [[INDEX]], 3			; CHECK-NEXT: [[TMP0:%.*]] = mul i64 [[INDEX]], 3
	; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i32, i32 [[A:%.*]], i64 [[TMP0]]			; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i32, i32 [[A:%.*]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[NEXT_GEP]] to <12 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[NEXT_GEP]] to <12 x i32>*
	; CHECK-NEXT: [[WIDE_VEC:%.]] = load <12 x i32>, <12 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_VEC:%.]] = load <12 x i32>, <12 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 0, i32 3, i32 6, i32 9>			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 0, i32 3, i32 6, i32 9>
	; CHECK-NEXT: [[STRIDED_VEC2:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 1, i32 4, i32 7, i32 10>			; CHECK-NEXT: [[STRIDED_VEC2:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 1, i32 4, i32 7, i32 10>
	; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 2, i32 5, i32 8, i32 11>			; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 2, i32 5, i32 8, i32 11>
	; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[STRIDED_VEC]], [[VEC_IND]]			; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[STRIDED_VEC]], [[VEC_IND]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[NEXT_GEP]], i64 2
	; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[STRIDED_VEC2]], [[VEC_IND]]			; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[STRIDED_VEC2]], [[VEC_IND]]
	; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[STRIDED_VEC3]], [[VEC_IND]]			; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[STRIDED_VEC3]], [[VEC_IND]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP3]], i64 -2			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[NEXT_GEP]] to <12 x i32>*
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <12 x i32>*
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11>			; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11>
	; CHECK-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[TMP7]], align 4			; CHECK-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[TMP7]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
	▲ Show 20 Lines • Show All 835 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/single-iteration-loop-sroa.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -O2 < %s \| FileCheck %s			; RUN: opt -S -O2 < %s \| FileCheck %s

	; Test a single-iteration loop that should get SROAd once we realize that fact.			; Test a single-iteration loop that should get SROAd once we realize that fact.
	; It should compile down to a bswap.			; It should compile down to a bswap.

	; The helper function exists to avoid IPSCCP breaking the loop too early.			; The helper function exists to avoid IPSCCP breaking the loop too early.

	define i16 @helper(i16 %0, i64 %x) {			define i16 @helper(i16 %0, i64 %x) {
	; CHECK-LABEL: @helper(			; CHECK-LABEL: @helper(
	; CHECK-NEXT: start:			; CHECK-NEXT: start:
	; CHECK-NEXT: [[DATA:%.*]] = alloca [2 x i8], align 2			; CHECK-NEXT: [[DATA:%.*]] = alloca [2 x i8], align 2
	; CHECK-NEXT: store i16 [[TMP0:%.*]], ptr [[DATA]], align 2			; CHECK-NEXT: store i16 [[TMP0:%.*]], ptr [[DATA]], align 2
	; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[DATA]], i64 1
	; CHECK-NEXT: br label [[BB6_I_I:%.*]]			; CHECK-NEXT: br label [[BB6_I_I:%.*]]
	; CHECK: bb6.i.i:			; CHECK: bb6.i.i:
	; CHECK-NEXT: [[ITER_SROA_0_07_I_I:%.]] = phi i64 [ [[TMP2:%.]], [[BB6_I_I]] ], [ 0, [[START:%.*]] ]			; CHECK-NEXT: [[ITER_SROA_0_07_I_I:%.]] = phi i64 [ [[TMP2:%.]], [[BB6_I_I]] ], [ 0, [[START:%.*]] ]
	; CHECK-NEXT: [[_40_I_I:%.*]] = sub nsw i64 0, [[ITER_SROA_0_07_I_I]]			; CHECK-NEXT: [[_40_I_I:%.*]] = sub nsw i64 0, [[ITER_SROA_0_07_I_I]]
	; CHECK-NEXT: [[TMP2]] = add nuw nsw i64 [[ITER_SROA_0_07_I_I]], 1			; CHECK-NEXT: [[TMP2]] = add nuw nsw i64 [[ITER_SROA_0_07_I_I]], 1
	; CHECK-NEXT: [[_34_I_I:%.*]] = getelementptr inbounds [0 x i8], ptr [[DATA]], i64 0, i64 [[ITER_SROA_0_07_I_I]]			; CHECK-NEXT: [[_34_I_I:%.*]] = getelementptr inbounds [0 x i8], ptr [[DATA]], i64 0, i64 [[ITER_SROA_0_07_I_I]]
	; CHECK-NEXT: [[_39_I_I:%.*]] = getelementptr inbounds [0 x i8], ptr [[TMP1]], i64 0, i64 [[_40_I_I]]			; CHECK-NEXT: [[TMP1:%.*]] = getelementptr [0 x i8], ptr [[DATA]], i64 0, i64 [[_40_I_I]]
				; CHECK-NEXT: [[_39_I_I:%.]] = getelementptr i8, ptr [[TMP1:%.]], i64 1
	; CHECK-NEXT: [[TMP_0_COPYLOAD_I_I_I_I:%.*]] = load i8, ptr [[_34_I_I]], align 1			; CHECK-NEXT: [[TMP_0_COPYLOAD_I_I_I_I:%.*]] = load i8, ptr [[_34_I_I]], align 1
	; CHECK-NEXT: [[TMP2_0_COPYLOAD_I_I_I_I:%.*]] = load i8, ptr [[_39_I_I]], align 1			; CHECK-NEXT: [[TMP2_0_COPYLOAD_I_I_I_I:%.*]] = load i8, ptr [[_39_I_I]], align 1
	; CHECK-NEXT: store i8 [[TMP2_0_COPYLOAD_I_I_I_I]], ptr [[_34_I_I]], align 1			; CHECK-NEXT: store i8 [[TMP2_0_COPYLOAD_I_I_I_I]], ptr [[_34_I_I]], align 1
	; CHECK-NEXT: store i8 [[TMP_0_COPYLOAD_I_I_I_I]], ptr [[_39_I_I]], align 1			; CHECK-NEXT: store i8 [[TMP_0_COPYLOAD_I_I_I_I]], ptr [[_39_I_I]], align 1
	; CHECK-NEXT: [[EXITCOND_NOT_I_I:%.]] = icmp eq i64 [[TMP2]], [[X:%.]]			; CHECK-NEXT: [[EXITCOND_NOT_I_I:%.]] = icmp eq i64 [[TMP2]], [[X:%.]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT_I_I]], label [[EXIT:%.*]], label [[BB6_I_I]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT_I_I]], label [[EXIT:%.*]], label [[BB6_I_I]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[DOTSROA_0_0_COPYLOAD:%.*]] = load i16, ptr [[DATA]], align 2			; CHECK-NEXT: [[DOTSROA_0_0_COPYLOAD:%.*]] = load i16, ptr [[DATA]], align 2
	Show All 37 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the backClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 469287

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

llvm/test/Transforms/InstCombine/gep-canonicalize-constant-indices.ll

llvm/test/Transforms/InstCombine/gep-merge-constant-indices.ll

llvm/test/Transforms/InstCombine/shift.ll

llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll

llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll

llvm/test/Transforms/PhaseOrdering/single-iteration-loop-sroa.ll

[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back
ClosedPublic