This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back
ClosedPublic

Authored by huangjd on May 17 2022, 6:53 PM.

Details

Summary

Canonicalize GEP of GEP by swapping GEP with some suffix constant indices to the back (and GEP with all constant indices to the back of that), this allows more constant index GEP merging to happen. Exceptions are: If swapping violates use-def relations, or anti-optimizes LICM

For constant indexed GEP of GEP, if they cannot be merged directly, they will be casted to i8* and merged.

Diff Detail

Event Timeline

huangjd created this revision.May 17 2022, 6:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 6:53 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
huangjd requested review of this revision.May 17 2022, 6:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 6:53 PM
nikic added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1968

This is an interesting case. An alternative would be to skip this code if GEP has all constant indices, i.e. to say that gep (gep p, x), C is always the canonical form, even if it were possible to move (gep p, C) out of the loop. This makes some sense in that gep p, C is typically free (folded into addressing mode). I'm not sure whether that would really be better though.

2001

Doing this is not necessary: InstCombine is not supposed to gracefully deal with invalid IR. I've removed the offending test in https://github.com/llvm/llvm-project/commit/128da94d38242c28e6bf23ad025e0cb2d6ce9e4f.

2010

You probably don't need the explicit isOpaquePointerTy() check here, that should be covered by the type equality check.

2016

I would recommend to initially only keep the Src->hasAllConstantIndices() && !GEP.hasAllConstantIndices() case. The profitability of this transform for the case where both GEPs are non-constant is not really clear.

2022

You can pass InBounds directly to CreateGEP, as the last argument.

2071

This should be a separate patch, it's an independent transform.

huangjd updated this revision to Diff 430512.May 18 2022, 3:01 PM

Removed check for malformed IR (PR13621)
Removed unnecessary type check for canonicalization swap

Move converting GEP of constant index to i8 into next patch

huangjd marked 4 inline comments as done.May 18 2022, 3:06 PM
huangjd added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1968

LICM is always beneficial as it guarantees the reduction of runtime instruction count, but canonicalization can't guarantee an optimization

huangjd added inline comments.May 18 2022, 3:06 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2016

If Src is not all constant index and GEP is all constant index, merging them can reduce one GEP. I have observed that in the backend, inst selection does not always generate the most simplified output for multiple GEP instructions, so it's better to reduce the IR count

huangjd retitled this revision from [InstCombine] Canonicalize GEP of GEP and merging GEP with constant indices to [InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back.May 18 2022, 4:05 PM

I think there's some git weirdness going on, could you apply this on top of main?

huangjd marked an inline comment as done.May 19 2022, 4:11 PM

I think there's some git weirdness going on, could you apply this on top of main?

Fixed at D126030

huangjd reclaimed this revision.May 26 2022, 4:38 PM
huangjd updated this revision to Diff 432417.May 26 2022, 4:46 PM

Restored patch D125845

huangjd marked an inline comment as done.May 31 2022, 10:53 AM

Any more comments?

nikic added inline comments.Jun 8 2022, 4:03 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1968

The main thing I'd be concerned about here is that the loop invariance canonicalization is only performed if cached LoopInfo is available. This means that depending on whether a particular InstCombine run has LoopInfo available, we will switch back and forth between two different orders.

2006

Unfortunately, this is not sufficient to preserve inbounds. If the indices have different sign, then swapping them may make the GEP non-inbounds even if both original GEPs were inbounds.

gep inbounds (gep inbounds p, -1), X -> gep inbounds (gep inbounds p, X), -1` is not generally valid.

huangjd added inline comments.Jun 8 2022, 10:50 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1968

Since these two transforms contradict each other, one of them has to take precedence when both are available. If loopInfo is available during any pass and it swaps out the inner GEP, then further Instcombine pass shouldn't be able to swap it back. Perhaps I should add a check that canonicalization only takes place if both GEP are on the same BB

2006

I have seen the same inbounds check used in another place (line 2071), so that may be problematic as well? I think we could add another check in isMergedGEPInBounds to require both GEP offsets are known to be same sign?

huangjd marked an inline comment as not done.Jun 8 2022, 2:24 PM
huangjd added inline comments.Jun 8 2022, 3:39 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2006

Is the impact of changing inbounds GEP to non-inbounds significant to other optimizations?

huangjd updated this revision to Diff 437324.Jun 15 2022, 1:31 PM

GEP are no longer inbounds after swapping

nikic added a comment.Jun 16 2022, 1:37 AM

This looks fine to me, but maybe @spatel can also take a look.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1950

-> ShouldCanonicalizeSwap

1966

Omit braces for single-statement if.

I don't know enough to visualize the GEP corner-cases or the interaction with LICM, but I commented on general improvements to the patch.

Have you run any benchmarks to confirm there are improvements (no regressions)?

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1946–1950

This code structure and the comments are confusing. I think it would be better to rearrange this like:

bool LoopInvariantGEP = false;
bool LoopInvariantSrc = false;
if (LI && ...) {
  LoopInvariantGEP = L->isLoopInvariant(GEP.getOperand(1));
  LoopInvariantSrc = L->isLoopInvariant(Src->getOperand(1));
}
if (LoopInvariantGEP && !LoopInvariantSrc) {
  // do the existing transform
}
if (!(!LoopInvariantGEP && LoopInvariantSrc) && ...) {
  // do the new transform
}

I think that matches the logic in this patch, but don't we need a set of tests with loops to make sure that does what you expect? I don't see any tests with loops.

llvm/test/Transforms/InstCombine/gep-canonicalize-constant-indices.ll
32

I don't know what this test and the next one are supposed to show. %1 is dead code, so it gets eliminated before anything else might have happened.

48–49

typo: swapping

97–98

This test doesn't add much. AFAIK every transform in instcombine does RAUW for the final (root) value.

huangjd added inline comments.Jun 21 2022, 2:20 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1946–1950

I think this doesn't simplify the logic, because LoopInvariant* is only meaningful within the scope of LoopInfo analysis. Also the existing transform (LICM) requires the Loop object from LI, so moving it outside of the first condition statement makes the code uglier.

1946–1950

It seems that LI is null when I test it. Is there an opt flag I need to enable?

huangjd added inline comments.Jun 21 2022, 2:56 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1946–1950

I confirmed this case is already covered by gep-combine-loop-invariant.ll. Taking out the ShouldCanonicalizeSwap check will cause infinite loop because two transformations trying to undo each other. In this case is a separate test case still needed?

huangjd updated this revision to Diff 438846.Jun 21 2022, 2:59 PM

Improved comment, fixed typo in test cases

nikic added a comment.Jun 23 2022, 8:43 AM

It looks like this is now the diff to the previous version, not the full patch.

huangjd updated this revision to Diff 439477.Jun 23 2022, 11:28 AM
This comment was removed by huangjd.
huangjd updated this revision to Diff 439480.Jun 23 2022, 11:31 AM

Update diff to include previous commit

Any more comments?

My comments about the tests were not addressed - there's a test of extra uses with no value, but no tests with loops?

llvm/test/Transforms/InstCombine/gep-canonicalize-constant-indices.ll
32

I'm more confused now. The test had 1 GEP, but now it has 2 - how is that better?

huangjd updated this revision to Diff 443857.Jul 12 2022, 1:22 AM

Added test for interaction with LICM

nikic accepted this revision.Jul 12 2022, 6:20 AM

LGTM

This revision is now accepted and ready to land.Jul 12 2022, 6:20 AM
huangjd updated this revision to Diff 469284.Oct 20 2022, 10:38 AM

merge with main

This revision was landed with ongoing or failed builds.Oct 20 2022, 10:41 AM
This revision was automatically updated to reflect the committed changes.

Have you run any benchmarks to confirm there are improvements (no regressions)?

Did you ever measure the performance? I have reports of this making the leela benchmark in SPEC2017 worse under LTO.

We also observe some regressions on internal benchmarks from this patch. We haven't yet analyzed why, but I will add details when we have something.

pzheng added a subscriber: pzheng.Nov 18 2022, 1:50 PM

We also see regressions with this patch in internal benchmarks and at least in one case it seems like the lack of "inbounds" on the new GEPs is involved.
If I change so the GEPs are created like

-      Value *NewSrc = Builder.CreateGEP(
+      Value *NewSrc = Builder.CreateInBoundsGEP(
           GEP.getSourceElementType(), Src->getOperand(0),
           SmallVector<Value *>(GEP.indices()), Src->getName());
-      GetElementPtrInst *NewGEP = GetElementPtrInst::Create(
+      GetElementPtrInst *NewGEP = GetElementPtrInst::CreateInBounds(

at least some regression disappear.
I do see the comment saying this can't be done though...

// Cannot guarantee inbounds after swapping because the non-const GEP can
// have arbitrary sign.

Could someone reproduce the test case (and benchmarking results) showing regression?

bjope added a subscriber: bjope.Nov 28 2022, 10:17 AM
bjope added inline comments.
llvm/test/Transforms/InstCombine/gep-canonicalize-constant-indices.ll
32

I'm more confused now. The test had 1 GEP, but now it has 2 - how is that better?

This question is still not answered afaict. We now get two non-inbound GEP:s instead of one inbounds GEP. Given that there are no other GEP:s etc that can be simplified (as in this test case) that doesn't look like an improvement?

And I guess loosing the inbounds property can be bad in general when doing this canonicalizations. Do you have any examples when you see a performance improvement when rewriting inbound GEP:s into non-inbound GEP:s? Given the amount of complaints about seen regressions it would be nice to hear what kinds of benchmarks that were done and if you tried to limit it to cases when instruction count didn't increase, or when it would turn X inbound GEP:s into X or more non-inbound GEP:s.

was there any positive performance data for this patch in the first place (the description doesn't mention any)? if not and people are reporting regressions, we should revert

FWIW, the previous patch in this area also got strong post-commit negative feedback: D106450
I think any canonicalization that leads to loss of inbounds should not be performed.

davidxl added inline comments.Nov 28 2022, 11:04 AM
llvm/test/Transforms/InstCombine/gep-canonicalize-constant-indices.ll
32

The changes in this test seems to be irrelevant. The original test was probably buggy (with dead code), and the change here just fixed the test. (not sure if the main patch will affect the output though).

bjope added a comment.Nov 28 2022, 5:18 PM

Got a benchmark downstream were we no longer get SLP vectorization after the canonicalizatoin. But the test is rather large and complicated so I do not have anything that show the full pipeline including SLP (at least not yet).

But before instcombine parts of the IR kind of looks like this:

%struct.S = type { i32, i32, i16, i16, [4 x i16] }

define i32 @foo(ptr %p1, ptr %p2) {
entry:
  br label %for.cond

for.cond:
  %i.0 = phi i16 [ 0, %entry ], [ %inc, %for.body ]
  %cmp = icmp ult i16 %i.0, 4
  br i1 %cmp, label %for.body, label %for.end

for.body:
  %q1 = getelementptr inbounds %struct.S, ptr %p2, i32 0, i32 4
  %q2 = getelementptr inbounds [4 x i16], ptr %q1, i32 0, i16 %i.0
  %t1 = load i16, ptr %q2, align 1
  store i16 %t1, ptr %p1
  %inc = add i16 %i.0, 1
  br label %for.cond

for.end:
  ret i32 7
}

If just running opt -S -passes=instcombine on that we used to get

define i32 @foo(ptr %p1, ptr %p2) {
entry:
  br label %for.cond

for.cond:                                         ; preds = %for.body, %entry
  %i.0 = phi i16 [ 0, %entry ], [ %inc, %for.body ]
  %cmp = icmp ult i16 %i.0, 4
  br i1 %cmp, label %for.body, label %for.end

for.body:                                         ; preds = %for.cond
  %0 = sext i16 %i.0 to i64
  %q2 = getelementptr inbounds %struct.S, ptr %p2, i64 0, i32 4, i64 %0
  %t1 = load i16, ptr %q2, align 1
  store i16 %t1, ptr %p1, align 2
  %inc = add i16 %i.0, 1
  br label %for.cond

for.end:                                          ; preds = %for.cond
  ret i32 7
}

but after this patch we get two non-inbound GEP:s instead of one single GEP:

define i32 @foo(ptr %p1, ptr %p2) {
entry:
  br label %for.cond

for.cond:                                         ; preds = %for.body, %entry
  %i.0 = phi i16 [ 0, %entry ], [ %inc, %for.body ]
  %cmp = icmp ult i16 %i.0, 4
  br i1 %cmp, label %for.body, label %for.end

for.body:                                         ; preds = %for.cond
  %0 = sext i16 %i.0 to i64
  %q11 = getelementptr [4 x i16], ptr %p2, i64 0, i64 %0
  %q2 = getelementptr %struct.S, ptr %q11, i64 0, i32 4
  %t1 = load i16, ptr %q2, align 1
  store i16 %t1, ptr %p1, align 2
  %inc = add i16 %i.0, 1
  br label %for.cond

for.end:                                          ; preds = %for.cond
  ret i32 7
}

So I suspect that the swapping (and replacing inbound GEP:s with non-inbound GEP:s) has prevented merging of the GEP:s?

Maybe the merging of the GEP:s isn't important here. But I think the result looks a bit weird anyhow. In the input IR the first GEP only got loop-invariant operands (so %q1 is loop-invariant in that sense even if LoopInfo::isInvariant would say that it is modified inside the loop) , but after the "swapping" the first GEP depend on %0 (or rather %i.0), so now it isn't loop-invariant.
Isn't the idea that it should try to put the loop invariant part first, to hopefully let LICM hoist that outside the loop. Although when I read the code comments it seems like we canonicalize in opposite order from what would suite LICM, so there is some kind of heuristic to avoid canonicalize in certain situations. So I'm not quite sure if this pattern really is supposed to be "swapped" or not. Or maybe even merged. Both merging and not swapping would allow keeping the inbounds. Not swapping could allow LICM to hoist one of the GEP:s out from the loop. But the current solution both prevent hoisting and it removes the inbound attributes. So I do not see how the swapping is beneficial in this case (unless we are canonicalizing for some specific target with an addressing mode that perhaps would allow folding of the constant offset in the load).

fhahn added a subscriber: fhahn.Nov 29 2022, 3:26 AM

Thanks for sharing the test case @bjope! I agree that the generated IR with this change is strictly worse for that case; I think it would probably be good to revert the patch until this can be properly addressed. llvm/test/Transforms/PhaseOrdering/single-iteration-loop-sroa.ll also looks worse, as we loose the inbounds in the loop and the only benefit is removing an instruction outside the loop.

fhahn added a comment.Nov 29 2022, 3:29 AM

Note that there's also a dedicated pass to separate constant GEP offsets, which can be run as part of the backend pipeline: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp. It was enabled on AArch64 for a while but had to be disabled because it was also causing performance regressions .

D138950 to revert this patch. Looks like reverting it will not interfere with D137212

huangjd added a comment.EditedDec 1 2022, 12:04 PM

I would need some clarification on inbounds keyword. When an inbounds GEP of GEP is being transformed, what kind of transformation and conditions keep the new GEP inbounds? For example GEP inbounds (GEP inbounds P a) b is equivalent to GEP inbounds (GEP inbounds P b) a if and only if a and b have the same sign. Are there other algebraically valid transformations? This actually does affect D137212 since it is swapping constant indexed GEP. Maybe I misunderstood what inbounds implies? I noticed arbitrary pointer arithmetic expression in C generates inbounds GEP even the pointer is clearly not pointing to any allocated object