Fold (a == 0) : 0 ? a - 1 into usub.sat(a, 1)
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Proof: https://alive2.llvm.org/ce/z/a2jKgY
Is it possible/sensible to integrate this into canonicalizeSaturatedSubtract()? It seems like this is essentially the same, just for the ult 1 -> eq 0 special case. But maybe that is more awkward than a separate fold.
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp | ||
---|---|---|
881 | Not necessary, icmp constants are canonicalized to the right. | |
887 | This is incorrect: If we swap select operands, we also need to invert the predicate. What you want to do is check whether the predicate is ne and then invert and swap. I believe this can only happen in multi-use scenarios, e.g.: https://llvm.godbolt.org/z/5Pxo3f8WE | |
893 | It probably makes more sense to use m_SpecificInt here. While this CreateNeg call will not actually create an instruction, it looks like it could. | |
894 | Limited to just the constant case, I don't think we need to handle sub at all, it will always be canonicalized to add. | |
llvm/test/Transforms/InstCombine/saturating-add-sub.ll | ||
533 | Especially if this is implemented as a separate fold, we'd want some negative test cases here, e.g. incorrect constants (vary zero/one), incorrect operands (not %a both time), incorrect icmp predicate. As you are using m_Zero() matchers, which allow undef/poison in vectors, we'd want to test that as well. |
Inline decrement check into canonicalizeSaturatedSubtract and add some negative test cases
llvm/test/Transforms/InstCombine/saturating-add-sub.ll | ||
---|---|---|
533 | Sorry, I'm not quite sure how to test the vector case you mentioned. |
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp | ||
---|---|---|
815 | This is not going to trigger, because everything here is inside Pred == ICmpInst::ICMP_EQ. You'd want to check for isEquality() first (which accepts eq and ne), then have a check for ne where you invert the predicate and swap true/false, and then do the check for zero on the true value. I'd suggest adding this test case to make sure the ne pattern is matched: define i8 @test(i8 %a) { %i = icmp ne i8 %a, 0 call void @use.i1(i1 %i) %i1 = add i8 %a, -1 %i2 = select i1 %i, i8 %i1, i8 0 ret i8 %i2 } declare void @use.i1(i1) | |
825 | You can match -1 using m_AllOnes(). |
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp | ||
---|---|---|
809–830 | You could take this piece of code (starting at if (match(TrueVal) and move it to the top of the function, because this bit of logic is the same for both cases. | |
816 | Should be getInversePredicate(), though it doesn't actually matter here (as you are not checking the Pred afterwards anymore). | |
825 | m_APInt(Added) can be replaced with m_AllOnes() here. Below you'd create the constant with value 1 then. | |
llvm/test/Transforms/InstCombine/saturating-add-sub.ll | ||
615 | 2 would be better, to prevent the select from folding away. |
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp | ||
---|---|---|
817–837 | I believe this should be sufficient now. The rest is handled by the shared bit of code. |
LGTM
If you don't have commit access, can you please share the "Name <email>" to use for the commit?
Thanks for all of the help improving this patch 🙂 I'd like to use "Jamie Hill-Daniel <jamie@hill-daniel.co.uk>"
Just as a heads-up, this regresses one of Rust's codegen tests: fn check_foo2 in [issue-45222.rs](https://github.com/rust-lang/rust/blob/master/src/test/codegen/issue-45222.rs) is no longer being constant-folded, AFAICT because IndVarSimplifyPass no longer works on the %a = tail call i64 @llvm.usub.sat.i64(i64 %b, i64 1) generated by this patch (instead of %a = add i64 %b, -1 without this patch).
The full module IR is at https://gist.github.com/TimNN/e9a762026083ff265bdccbbcb34be956 (sorry, not minimized). Running it through opt -O3 without this patch constant-folds the loop, with this patch the loop remains.
I don't know if this is significant enough to warrant reverting the patch (I'm guessing not). Let me know if you have suggestions on how to proceed. (Is this a problem with the transform or does IndVarSimplifyPass need to become smarter?)
edit: updated the gist with a reduced test case.
Before this patch, CVP converts
define i32 @test() { entry: br label %loop loop: %iv = phi i32 [ 1000, %entry ], [ %iv.next, %loop ] %count = phi i32 [ 0, %entry ], [ %count.next, %loop ] %cmp = icmp eq i32 %iv, 0 %iv.dec = add i32 %iv, -1 %iv.next = select i1 %cmp, i32 0, i32 %iv.dec %count.next = add i32 %count, 1 br i1 %cmp, label %exit, label %loop exit: ret i32 %count }
into
define i32 @test() { entry: br label %loop loop: ; preds = %loop, %entry %iv = phi i32 [ 1000, %entry ], [ %iv.dec, %loop ] %count = phi i32 [ 0, %entry ], [ %count.next, %loop ] %cmp = icmp eq i32 %iv, 0 %iv.dec = add i32 %iv, -1 %iv.next = select i1 %cmp, i32 0, i32 %iv.dec %count.next = add i32 %count, 1 br i1 %cmp, label %exit, label %loop exit: ; preds = %loop ret i32 %count }
The difference is that %iv.next in the phi node is replaced with %iv.dec, because the select and branch have the same condition.
After this patch, we instead have:
define i32 @test() { entry: br label %loop loop: ; preds = %loop, %entry %iv = phi i32 [ 1000, %entry ], [ %iv.next, %loop ] %count = phi i32 [ 0, %entry ], [ %count.next, %loop ] %cmp = icmp eq i32 %iv, 0 %iv.next = call i32 @llvm.usub.sat.i32(i32 %iv, i32 1) %count.next = add i32 %count, 1 br i1 %cmp, label %exit, label %loop exit: ; preds = %loop ret i32 %count }
The same optimization is still possible in principle, in that we could convert the usub.sat into a sub in this case.
This change caused a regression in AMDGPU backend.
In case optimization is done before inline, it covers opportunities for other inst-combines that may have higher precedence.
As a result, we get suboptimal code.
The example is in the https://reviews.llvm.org/D147146 which I have created to propose a possible fix.
This is not going to trigger, because everything here is inside Pred == ICmpInst::ICMP_EQ. You'd want to check for isEquality() first (which accepts eq and ne), then have a check for ne where you invert the predicate and swap true/false, and then do the check for zero on the true value.
I'd suggest adding this test case to make sure the ne pattern is matched: