This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold zero check followed by decrement to usub.sat
ClosedPublic

Authored by clubby789 on Dec 30 2022, 10:44 PM.

Details

Summary

Fold (a == 0) : 0 ? a - 1 into usub.sat(a, 1)

Diff Detail

Event Timeline

clubby789 created this revision.Dec 30 2022, 10:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 30 2022, 10:44 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
clubby789 requested review of this revision.Dec 30 2022, 10:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 30 2022, 10:44 PM

Proof: https://alive2.llvm.org/ce/z/a2jKgY

Is it possible/sensible to integrate this into canonicalizeSaturatedSubtract()? It seems like this is essentially the same, just for the ult 1 -> eq 0 special case. But maybe that is more awkward than a separate fold.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
899

Not necessary, icmp constants are canonicalized to the right.

905

This is incorrect: If we swap select operands, we also need to invert the predicate. What you want to do is check whether the predicate is ne and then invert and swap. I believe this can only happen in multi-use scenarios, e.g.: https://llvm.godbolt.org/z/5Pxo3f8WE

911

It probably makes more sense to use m_SpecificInt here. While this CreateNeg call will not actually create an instruction, it looks like it could.

912

Limited to just the constant case, I don't think we need to handle sub at all, it will always be canonicalized to add.

llvm/test/Transforms/InstCombine/saturating-add-sub.ll
533

Especially if this is implemented as a separate fold, we'd want some negative test cases here, e.g. incorrect constants (vary zero/one), incorrect operands (not %a both time), incorrect icmp predicate.

As you are using m_Zero() matchers, which allow undef/poison in vectors, we'd want to test that as well.

clubby789 updated this revision to Diff 485771.Dec 31 2022, 7:16 AM

Inline decrement check into canonicalizeSaturatedSubtract and add some negative test cases

clubby789 marked 4 inline comments as done.Dec 31 2022, 7:18 AM
clubby789 added inline comments.
llvm/test/Transforms/InstCombine/saturating-add-sub.ll
533

Sorry, I'm not quite sure how to test the vector case you mentioned.

clubby789 updated this revision to Diff 485821.Jan 1 2023, 2:02 PM

Fix for vectors and update test

clubby789 updated this revision to Diff 485825.Jan 1 2023, 3:48 PM
clubby789 updated this revision to Diff 485837.Jan 1 2023, 6:54 PM
nikic added inline comments.Jan 2 2023, 7:48 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
815

This is not going to trigger, because everything here is inside Pred == ICmpInst::ICMP_EQ. You'd want to check for isEquality() first (which accepts eq and ne), then have a check for ne where you invert the predicate and swap true/false, and then do the check for zero on the true value.

I'd suggest adding this test case to make sure the ne pattern is matched:

define i8 @test(i8 %a) {
  %i = icmp ne i8 %a, 0
  call void @use.i1(i1 %i)
  %i1 = add i8 %a, -1
  %i2 = select i1 %i, i8 %i1, i8 0
  ret i8 %i2
}

declare void @use.i1(i1)
825

You can match -1 using m_AllOnes().

clubby789 updated this revision to Diff 485878.Jan 2 2023, 11:45 AM

Fix icmp ne case

nikic added inline comments.Jan 4 2023, 1:58 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
816

Should be getInversePredicate(), though it doesn't actually matter here (as you are not checking the Pred afterwards anymore).

825

m_APInt(Added) can be replaced with m_AllOnes() here. Below you'd create the constant with value 1 then.

839–848

You could take this piece of code (starting at if (match(TrueVal) and move it to the top of the function, because this bit of logic is the same for both cases.

llvm/test/Transforms/InstCombine/saturating-add-sub.ll
615

2 would be better, to prevent the select from folding away.

clubby789 updated this revision to Diff 486285.Jan 4 2023, 7:29 AM
nikic added inline comments.Jan 6 2023, 2:43 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
845–865

I believe this should be sufficient now. The rest is handled by the shared bit of code.

clubby789 updated this revision to Diff 486920.Jan 6 2023, 10:06 AM
nikic accepted this revision.Jan 7 2023, 10:02 AM

LGTM

If you don't have commit access, can you please share the "Name <email>" to use for the commit?

This revision is now accepted and ready to land.Jan 7 2023, 10:02 AM

If you don't have commit access, can you please share the "Name <email>" to use for the commit?

Thanks for all of the help improving this patch 🙂 I'd like to use "Jamie Hill-Daniel <jamie@hill-daniel.co.uk>"

This revision was automatically updated to reflect the committed changes.
TimNN added a subscriber: TimNN.EditedJan 10 2023, 6:50 AM

Just as a heads-up, this regresses one of Rust's codegen tests: fn check_foo2 in [issue-45222.rs](https://github.com/rust-lang/rust/blob/master/src/test/codegen/issue-45222.rs) is no longer being constant-folded, AFAICT because IndVarSimplifyPass no longer works on the %a = tail call i64 @llvm.usub.sat.i64(i64 %b, i64 1) generated by this patch (instead of %a = add i64 %b, -1 without this patch).

The full module IR is at https://gist.github.com/TimNN/e9a762026083ff265bdccbbcb34be956 (sorry, not minimized). Running it through opt -O3 without this patch constant-folds the loop, with this patch the loop remains.

I don't know if this is significant enough to warrant reverting the patch (I'm guessing not). Let me know if you have suggestions on how to proceed. (Is this a problem with the transform or does IndVarSimplifyPass need to become smarter?)

edit: updated the gist with a reduced test case.

nikic added a comment.Jan 11 2023, 1:56 AM

Before this patch, CVP converts

define i32 @test() {
entry: 
  br label %loop 
   
loop:
  %iv = phi i32 [ 1000, %entry ], [ %iv.next, %loop ]
  %count = phi i32 [ 0, %entry ], [ %count.next, %loop ] 
  %cmp = icmp eq i32 %iv, 0
  %iv.dec = add i32 %iv, -1 
  %iv.next = select i1 %cmp, i32 0, i32 %iv.dec 
  %count.next = add i32 %count, 1 
  br i1 %cmp, label %exit, label %loop 
   
exit:
  ret i32 %count 
}

into

define i32 @test() {
entry:
  br label %loop

loop:                                             ; preds = %loop, %entry
  %iv = phi i32 [ 1000, %entry ], [ %iv.dec, %loop ]
  %count = phi i32 [ 0, %entry ], [ %count.next, %loop ]
  %cmp = icmp eq i32 %iv, 0
  %iv.dec = add i32 %iv, -1
  %iv.next = select i1 %cmp, i32 0, i32 %iv.dec
  %count.next = add i32 %count, 1
  br i1 %cmp, label %exit, label %loop

exit:                                             ; preds = %loop
  ret i32 %count
}

The difference is that %iv.next in the phi node is replaced with %iv.dec, because the select and branch have the same condition.

After this patch, we instead have:

define i32 @test() {
entry:
  br label %loop

loop:                                             ; preds = %loop, %entry
  %iv = phi i32 [ 1000, %entry ], [ %iv.next, %loop ]
  %count = phi i32 [ 0, %entry ], [ %count.next, %loop ]
  %cmp = icmp eq i32 %iv, 0
  %iv.next = call i32 @llvm.usub.sat.i32(i32 %iv, i32 1)
  %count.next = add i32 %count, 1
  br i1 %cmp, label %exit, label %loop

exit:                                             ; preds = %loop
  ret i32 %count
}

The same optimization is still possible in principle, in that we could convert the usub.sat into a sub in this case.

alex-t added a subscriber: alex-t.Mar 29 2023, 6:29 AM

This change caused a regression in AMDGPU backend.
In case optimization is done before inline, it covers opportunities for other inst-combines that may have higher precedence.
As a result, we get suboptimal code.

The example is in the https://reviews.llvm.org/D147146 which I have created to propose a possible fix.