This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Enhance select icmp and folding
ClosedPublic

Authored by peixin on Apr 15 2023, 2:17 AM.

Details

Summary

This folds (a << k) ? 2^k * a : 0 to 2^k * a.

https://alive2.llvm.org/ce/z/_dDRjo

Fix #62155.

Depends on D150069.

Diff Detail

Event Timeline

peixin created this revision.Apr 15 2023, 2:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2023, 2:17 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
peixin requested review of this revision.Apr 15 2023, 2:17 AM
RKSimon added inline comments.Apr 15 2023, 2:59 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
255

Would using m_And(m_Specific(X), m_APInt(C)) avoid the need for this?

259

!C->isMask() ?

peixin updated this revision to Diff 513908.Apr 15 2023, 7:54 AM

Address the comments.

peixin added inline comments.Apr 15 2023, 7:55 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
255

Thanks. Fixed.

259

Good point. Fixed.

nikic added a comment.Apr 16 2023, 1:38 AM

As this does not create a new instruction, it should be part of InstSimplify (probably in simplifySelectWithICmpCond).

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
239

Why exclude vectors? Looks like they should work out of the box?

peixin updated this revision to Diff 514004.Apr 16 2023, 7:04 AM
peixin retitled this revision from [InstCombine] Enhance select icmp and folding to [InstSimplify] Enhance select icmp and simplification.
peixin edited the summary of this revision. (Show Details)

Address the comments.

  1. Move to instsimplify.
  2. Add support for vector type.
goldstein.w.n added inline comments.
llvm/lib/Analysis/InstructionSimplify.cpp
4411 ↗(On Diff #514004)

Can we also handle select (icmp ne (and X, C), 0, (shl X, K), 0)?

llvm/test/Transforms/InstSimplify/select.ll
1346 ↗(On Diff #514004)

Can you split the next tests to a prior patch so we can see the diff this patch generates?

peixin updated this revision to Diff 514216.Apr 17 2023, 6:58 AM

Addressed the comments.

gentle ping

nikic added a comment.Apr 24 2023, 2:24 AM

It looks like this should be part of simplifySelectBitTest(), which covers other "select of icmp and" patterns.

It looks like this should be part of simplifySelectBitTest(), which covers other "select of icmp and" patterns.

It seems not, the CondVal in simplifySelectWithICmpCond is ICmp of value and zero. For cases in simplifySelectBitTest, the CondVal is ICmp of value and value.

nikic added a comment.Apr 26 2023, 3:36 AM

It looks like this should be part of simplifySelectBitTest(), which covers other "select of icmp and" patterns.

It seems not, the CondVal in simplifySelectWithICmpCond is ICmp of value and zero. For cases in simplifySelectBitTest, the CondVal is ICmp of value and value.

Not sure I follow. Isn't it matching the (X & C) == 0 pattern here? https://github.com/llvm/llvm-project/blob/f6ec56ac5be26ad862e22d0eebb2645d1ad78218/llvm/lib/Analysis/InstructionSimplify.cpp#L4494-L4500

peixin planned changes to this revision.Apr 26 2023, 8:18 AM

It looks like this should be part of simplifySelectBitTest(), which covers other "select of icmp and" patterns.

It seems not, the CondVal in simplifySelectWithICmpCond is ICmp of value and zero. For cases in simplifySelectBitTest, the CondVal is ICmp of value and value.

Not sure I follow. Isn't it matching the (X & C) == 0 pattern here? https://github.com/llvm/llvm-project/blob/f6ec56ac5be26ad862e22d0eebb2645d1ad78218/llvm/lib/Analysis/InstructionSimplify.cpp#L4494-L4500

OK. I get your idea. You are right. Will fix this.

nikic added a comment.May 3 2023, 7:20 AM

Just took a closer look at this. This is one of those transforms where we need to be very careful about derefinement, because we're replacing a constant 0 with an instruction that (under the given condition) may refine to zero but is not necessarily equivalent to zero.

In particular, if the shl has nowrap flags, this transform is incorrect: https://alive2.llvm.org/ce/z/EjBRVW

This means we need to drop nowrap flags as part of the transform. I'm afraid this means that my original recommendation to move this to InstSimplify was wrong and this should be in InstCombine after all, as we can only drop nowrap flags there.

peixin added a comment.May 3 2023, 6:03 PM

Just took a closer look at this. This is one of those transforms where we need to be very careful about derefinement, because we're replacing a constant 0 with an instruction that (under the given condition) may refine to zero but is not necessarily equivalent to zero.

In particular, if the shl has nowrap flags, this transform is incorrect: https://alive2.llvm.org/ce/z/EjBRVW

This means we need to drop nowrap flags as part of the transform. I'm afraid this means that my original recommendation to move this to InstSimplify was wrong and this should be in InstCombine after all, as we can only drop nowrap flags there.

@nikic Thanks for the notice. I was on vacation and just came back today. I will continue this work.

peixin updated this revision to Diff 520190.May 7 2023, 8:05 AM
peixin retitled this revision from [InstSimplify] Enhance select icmp and simplification to [InstCombine] Enhance select icmp and folding.
peixin edited the summary of this revision. (Show Details)
peixin set the repository for this revision to rG LLVM Github Monorepo.

Address the comments from @nikic .

peixin planned changes to this revision.May 7 2023, 6:03 PM

It seems I uploaded the wrong patch.

peixin updated this revision to Diff 520342.May 8 2023, 6:18 AM

Upload the right patch.

ping for review @nikic

goldstein.w.n added inline comments.May 18 2023, 9:50 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
613

Two things.

  1. I don't think you need the nsw/nuw on the shl
  2. This seems like a special case of the generalized transform:

(select (icmp eq (and X, C), C1), (shl C1, K), (shl X, K)) -> (shl X, K).

See:
https://alive2.llvm.org/ce/z/zjrVzM

613

What I mean by point 2, is can you just implement the generalized transform?

RKSimon added inline comments.May 18 2023, 9:53 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
631

If K is a shift value, why not use getZExtValue() and avoid the extra int64_t cast?

nikic added inline comments.May 18 2023, 10:00 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
613

The nsw/nuw here are relevant in that they must be dropped by the transform. They are not a precondition.

622

You'll want to add something like

if (Pred == ICmpInst::ICMP_NE) {
  Pred = ICmpInst::ICMP_EQ;
  std::swap(TVal, FVal);
}

here (and a test case using ne), to make sure you cover the case with swapped operands. (Unless there is some canonicalization here that makes this irrelevant?)

peixin added inline comments.May 18 2023, 11:39 PM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
613

Is it generated from some source code? I mean, we usually optimize the IR which can be generated from the source code, right?

622

The test cases of ne has been covered. Check test2 in D150069. It seems the InstSimplify has done the canonicalization.

631

countLeadingZeros returns unsigned. We will still need one extra uint64_t cast.

peixin added a comment.Jul 4 2023, 1:52 AM

gentle ping

goldstein.w.n added inline comments.Jul 4 2023, 12:05 PM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
622

Is it maybe worth extend instsimplify to handle the eq case then?

631

its implicitly casted. At the very least use static_cast instead of C-style.

1786

Would generate prefer you move the replaceInstUsesWith to return of foldSelectIcmpAndZeroShl Think it keeps impl details a bit better contained.

nikic added inline comments.Jul 4 2023, 12:14 PM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
622

You can probably get a case where the ne predicate stays if you add an additional use of the icmp (e.g. pass it to a call). But given that it does get canonicalized for the common case, I won't insist...

631

I agree that getZExtValue() makes more sense here, it's semantically an unsigned value.

1786

I don't really get why that would be better. This is a general InstCombine convention: Either your helper returns Instruction and that gets directly returned, or it returns Value and you call replaceInstUsesWith. You'll see this in a lot of the surrounding code. Moving replaceInstUsesWith into the helper also requires passing InstCombiner to it.

goldstein.w.n added inline comments.Jul 4 2023, 12:28 PM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
1786

not a hill I'll die on, just think if(Instruction * R = foo()) return R; is the simplest convention to have.

peixin updated this revision to Diff 537675.Jul 6 2023, 5:35 AM

Thanks @goldstein.w.n and @nikic for the comments. Addressed them.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
622

Got it. Thanks.

631

OK. Fixed.

1786

The code in line 1774-1803 have both of the formats. So it should be OK to use either of them.

peixin added a comment.Jul 7 2023, 9:29 AM

Sorry for the continued ping. I hope to finish this patch by the end of next week due to work changes.

nikic added a comment.Jul 8 2023, 1:02 PM

Implementation looks fine, but this is missing some test coverage. Can you please also add an alive2 proof to the patch description?

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
628

K -> C2 is the more typical convention

llvm/test/Transforms/InstCombine/select-icmp-and-zero-shl.ll
5–6

@test_eq

29–30

@test_ne

53–54

@test_nuw_dropped

65–66

@test_nsw_dropped

168

Missing tests:

  • Multi-use test. Add additional uses of the and, icmp and shift by passing them to a call. We still want to perform the fold in that case.
  • Negative test where the leading bits don't match.
  • Negative test for non-equality predicate.
  • Negative test where the select constant is not 0.
  • Negative test where the icmp constant is not 0.
peixin updated this revision to Diff 538411.Jul 9 2023, 2:20 AM
peixin edited the summary of this revision. (Show Details)

Thanks @nikic for the comments. Addressed them:

  1. Add negative and multi-use tests in D150069.
  2. Change K -> C2.
  3. Add alive2 proof in patch description.
  4. Fix the implementation by generating shl instruction instead of unset flags. Unsetting nuw/nsw flags was wrong when shl instruction has other use.
peixin updated this revision to Diff 538574.Jul 10 2023, 3:59 AM

Try to fix builtbot fail on debian x64.

nikic added a comment.Jul 10 2023, 2:35 PM

Fix the implementation by generating shl instruction instead of unset flags. Unsetting nuw/nsw flags was wrong when shl instruction has other use.

Generating a new shl isn't wrong, but I think your previous approach was better. The reason is that the shl with and without nowrap flags are going to be GVN/CSEd anyway, and the nowrap flags will be dropped in the process. So it makes more sense to me to directly go to the final form. Losing some nowrap flags is just the cost of this transform.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
625–639

This avoids duplicating the checks for ne and eq.

peixin updated this revision to Diff 539063.Jul 11 2023, 6:34 AM
peixin edited the summary of this revision. (Show Details)

Fix the implementation by generating shl instruction instead of unset flags. Unsetting nuw/nsw flags was wrong when shl instruction has other use.

Generating a new shl isn't wrong, but I think your previous approach was better. The reason is that the shl with and without nowrap flags are going to be GVN/CSEd anyway, and the nowrap flags will be dropped in the process. So it makes more sense to me to directly go to the final form. Losing some nowrap flags is just the cost of this transform.

Good to know it. Thanks for it.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
625–639

Good point. Fixed.

nikic accepted this revision.Jul 11 2023, 7:02 AM

LGTM

I think the alive2 link should be more along these lines: https://alive2.llvm.org/ce/z/dEW3Ak We should check the generic transform, not specific constants.

This revision is now accepted and ready to land.Jul 11 2023, 7:02 AM
peixin edited the summary of this revision. (Show Details)Jul 11 2023, 7:06 PM
peixin updated this revision to Diff 539480.Jul 12 2023, 3:47 AM

Fix clang-format issue.

This revision was automatically updated to reflect the committed changes.