This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] fold `sub + and` pattern with specific const value
ClosedPublic

Authored by bcl5980 on Oct 24 2022, 12:59 AM.

Details

Summary

C1 - ((C3 - X) & C2) --> (X & C2) + (C1 - (C2 & C3))
when:

(C3 - ((C2 & C3) - 1)) is pow2 &&
((C2 + C3) & ((C2 & C3) - 1)) == ((C2 & C3) - 1) && 
C2 is negative pow2 || (C3 - X) is nuw

https://alive2.llvm.org/ce/z/HXQJV-

Fix: #58523

Diff Detail

Event Timeline

bcl5980 created this revision.Oct 24 2022, 12:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2022, 12:59 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
bcl5980 requested review of this revision.Oct 24 2022, 12:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2022, 12:59 AM

Will precommit test after review.

bcl5980 planned changes to this revision.Oct 24 2022, 1:16 AM

the alive2 proof is not match the code. Need to fix.

bcl5980 updated this revision to Diff 470099.Oct 24 2022, 3:10 AM
bcl5980 edited the summary of this revision. (Show Details)

Only detect APInt because the condition is too complicated.

bcl5980 edited the summary of this revision. (Show Details)Oct 24 2022, 3:10 AM

vector test?

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
2054

use C2AddC3 .isSubsetOf(CMinus1)?

llvm/test/Transforms/InstCombine/sub.ll
2122

Add a comment describing the general fold

bcl5980 updated this revision to Diff 470107.Oct 24 2022, 3:36 AM

Address comments by @RKSimon

bcl5980 marked 2 inline comments as done.Oct 24 2022, 3:37 AM

I don't know if it would have any effect on this patch, but we seem to be missing a constant-shrinking (DemandedBits) opportunity for sub nuw:
https://alive2.llvm.org/ce/z/7Lb_Dy

We can clear high-bits of the mask constant based on the highest bit set in the subtract constant.

I don't know if it would have any effect on this patch, but we seem to be missing a constant-shrinking (DemandedBits) opportunity for sub nuw:
https://alive2.llvm.org/ce/z/7Lb_Dy

We can clear high-bits of the mask constant based on the highest bit set in the subtract constant.

Yeah, I also find the shrinking. And that part has no effect on this patch. Condition 1/3 don't care any high-bits for C2.

I don't know if it would have any effect on this patch, but we seem to be missing a constant-shrinking (DemandedBits) opportunity for sub nuw:
https://alive2.llvm.org/ce/z/7Lb_Dy

We can clear high-bits of the mask constant based on the highest bit set in the subtract constant.

Yeah, I also find the shrinking. And that part has no effect on this patch. Condition 1/3 don't care any high-bits for C2.

This patch still seems too specific.
Can we generalize this as a mask-of-subtract canonicalization instead - https://alive2.llvm.org/ce/z/qz_KmH ?
If we do that, an existing fold for subtract should reduce the motivating example.

I don't know if it would have any effect on this patch, but we seem to be missing a constant-shrinking (DemandedBits) opportunity for sub nuw:
https://alive2.llvm.org/ce/z/7Lb_Dy

We can clear high-bits of the mask constant based on the highest bit set in the subtract constant.

Yeah, I also find the shrinking. And that part has no effect on this patch. Condition 1/3 don't care any high-bits for C2.

This patch still seems too specific.
Can we generalize this as a mask-of-subtract canonicalization instead - https://alive2.llvm.org/ce/z/qz_KmH ?
If we do that, an existing fold for subtract should reduce the motivating example.

Thanks for the idea. It is a more clean and simplier way.
I only have a little concern about the subtract's combination after the canonicalization. For example:
https://alive2.llvm.org/ce/z/i5Qk9-
It will break exist and combination. Generally I think subtract's combination should be less than and.

I don't know if it would have any effect on this patch, but we seem to be missing a constant-shrinking (DemandedBits) opportunity for sub nuw:
https://alive2.llvm.org/ce/z/7Lb_Dy

We can clear high-bits of the mask constant based on the highest bit set in the subtract constant.

Yeah, I also find the shrinking. And that part has no effect on this patch. Condition 1/3 don't care any high-bits for C2.

This patch still seems too specific.
Can we generalize this as a mask-of-subtract canonicalization instead - https://alive2.llvm.org/ce/z/qz_KmH ?
If we do that, an existing fold for subtract should reduce the motivating example.

Thanks for the idea. It is a more clean and simplier way.
I only have a little concern about the subtract's combination after the canonicalization. For example:
https://alive2.llvm.org/ce/z/i5Qk9-
It will break exist and combination. Generally I think subtract's combination should be less than and.

Good point. It seems we are missing some family of canonicalizations with subtract-from-constant and bitwise logic. There may be some common pre-condition with a low-bit mask and any logic op?
https://alive2.llvm.org/ce/z/qrsyWe

I don't know if it would have any effect on this patch, but we seem to be missing a constant-shrinking (DemandedBits) opportunity for sub nuw:
https://alive2.llvm.org/ce/z/7Lb_Dy

We can clear high-bits of the mask constant based on the highest bit set in the subtract constant.

Yeah, I also find the shrinking. And that part has no effect on this patch. Condition 1/3 don't care any high-bits for C2.

This patch still seems too specific.
Can we generalize this as a mask-of-subtract canonicalization instead - https://alive2.llvm.org/ce/z/qz_KmH ?
If we do that, an existing fold for subtract should reduce the motivating example.

Thanks for the idea. It is a more clean and simplier way.
I only have a little concern about the subtract's combination after the canonicalization. For example:
https://alive2.llvm.org/ce/z/i5Qk9-
It will break exist and combination. Generally I think subtract's combination should be less than and.

Good point. It seems we are missing some family of canonicalizations with subtract-from-constant and bitwise logic. There may be some common pre-condition with a low-bit mask and any logic op?
https://alive2.llvm.org/ce/z/qrsyWe

The pattern is still complicate:
https://alive2.llvm.org/ce/z/Mx5tNC

And to avoid potential regression I prefer this pattern:
https://alive2.llvm.org/ce/z/TdYIkn

I don't know if it would have any effect on this patch, but we seem to be missing a constant-shrinking (DemandedBits) opportunity for sub nuw:
https://alive2.llvm.org/ce/z/7Lb_Dy

We can clear high-bits of the mask constant based on the highest bit set in the subtract constant.

Yeah, I also find the shrinking. And that part has no effect on this patch. Condition 1/3 don't care any high-bits for C2.

This patch still seems too specific.
Can we generalize this as a mask-of-subtract canonicalization instead - https://alive2.llvm.org/ce/z/qz_KmH ?
If we do that, an existing fold for subtract should reduce the motivating example.

Thanks for the idea. It is a more clean and simplier way.
I only have a little concern about the subtract's combination after the canonicalization. For example:
https://alive2.llvm.org/ce/z/i5Qk9-
It will break exist and combination. Generally I think subtract's combination should be less than and.

Good point. It seems we are missing some family of canonicalizations with subtract-from-constant and bitwise logic. There may be some common pre-condition with a low-bit mask and any logic op?
https://alive2.llvm.org/ce/z/qrsyWe

The pattern is still complicate:
https://alive2.llvm.org/ce/z/Mx5tNC

I think this can be reduced to low-bit mask constraints:
https://alive2.llvm.org/ce/z/dJqQHN
I also noticed a deficiency in demanded bits for sub. I don't know if that would cause trouble for this patch, but we probably want to do it to be consistent with add:
D136788

I don't know if it would have any effect on this patch, but we seem to be missing a constant-shrinking (DemandedBits) opportunity for sub nuw:
https://alive2.llvm.org/ce/z/7Lb_Dy

We can clear high-bits of the mask constant based on the highest bit set in the subtract constant.

Yeah, I also find the shrinking. And that part has no effect on this patch. Condition 1/3 don't care any high-bits for C2.

This patch still seems too specific.
Can we generalize this as a mask-of-subtract canonicalization instead - https://alive2.llvm.org/ce/z/qz_KmH ?
If we do that, an existing fold for subtract should reduce the motivating example.

Thanks for the idea. It is a more clean and simplier way.
I only have a little concern about the subtract's combination after the canonicalization. For example:
https://alive2.llvm.org/ce/z/i5Qk9-
It will break exist and combination. Generally I think subtract's combination should be less than and.

Good point. It seems we are missing some family of canonicalizations with subtract-from-constant and bitwise logic. There may be some common pre-condition with a low-bit mask and any logic op?
https://alive2.llvm.org/ce/z/qrsyWe

The pattern is still complicate:
https://alive2.llvm.org/ce/z/Mx5tNC

I think this can be reduced to low-bit mask constraints:
https://alive2.llvm.org/ce/z/dJqQHN
I also noticed a deficiency in demanded bits for sub. I don't know if that would cause trouble for this patch, but we probably want to do it to be consistent with add:
D136788

Yeah, only consider sub without nuw can be this pattern. But if we consider nuw, we still need to use the condition I send before I think.

bcl5980 updated this revision to Diff 471899.Oct 30 2022, 10:16 PM
bcl5980 retitled this revision from [InstCombine] fold sub pattern to and to [InstCombine] fold `sub + and` pattern with specific const value.
bcl5980 edited the summary of this revision. (Show Details)

Please pre-commit the baseline tests, so we will see the diffs here.

I'm still not sure if we want to add some more general transforms in addition to or instead of this fold.

Should we have the subtract version of this patch for consistency?
D130080

In case it was not clear, that was a transform I suggested earlier. Depending where it is added, I think the patch would look something like this:

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index c3dabc2d4a07..e7ae1a9be39c 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -2035,6 +2035,18 @@ Instruction *InstCombinerImpl::visitAnd(BinaryOperator &I) {
                                 ConstantInt::getNullValue(Ty));
     }
 
+    // If all bits affected by a sub are included in a high-bit-mask, do the
+    // mask op before the adjusted sub. Example:
+    // (0x0f - X) & 0xf8 --> 0x08 - (X & 0xf8)
+    const APInt *SubC;
+    if (C->isNegatedPowerOf2() &&
+        match(Op0, m_OneUse(m_Sub(m_APInt(SubC), m_Value(X)))) &&
+        (~*C).isSubsetOf(*SubC)) {
+      Value *NewAnd = Builder.CreateAnd(X, *C);
+      Constant *NewSubC = ConstantInt::get(Ty, *C & *SubC);
+      return BinaryOperator::CreateSub(NewSubC, NewAnd);
+    }
+
     Constant *C1, *C2;
     const APInt *C3 = C;
     Value *X;

I didn't see any immediate failures in regression tests with that patch applied.

bcl5980 added a comment.EditedOct 31 2022, 1:58 PM

Please pre-commit the baseline tests, so we will see the diffs here.

I'm still not sure if we want to add some more general transforms in addition to or instead of this fold.

Should we have the subtract version of this patch for consistency?
D130080

In case it was not clear, that was a transform I suggested earlier. Depending where it is added, I think the patch would look something like this:

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index c3dabc2d4a07..e7ae1a9be39c 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -2035,6 +2035,18 @@ Instruction *InstCombinerImpl::visitAnd(BinaryOperator &I) {
                                 ConstantInt::getNullValue(Ty));
     }
 
+    // If all bits affected by a sub are included in a high-bit-mask, do the
+    // mask op before the adjusted sub. Example:
+    // (0x0f - X) & 0xf8 --> 0x08 - (X & 0xf8)
+    const APInt *SubC;
+    if (C->isNegatedPowerOf2() &&
+        match(Op0, m_OneUse(m_Sub(m_APInt(SubC), m_Value(X)))) &&
+        (~*C).isSubsetOf(*SubC)) {
+      Value *NewAnd = Builder.CreateAnd(X, *C);
+      Constant *NewSubC = ConstantInt::get(Ty, *C & *SubC);
+      return BinaryOperator::CreateSub(NewSubC, NewAnd);
+    }
+
     Constant *C1, *C2;
     const APInt *C3 = C;
     Value *X;

I didn't see any immediate failures in regression tests with that patch applied.

I guess D130080's motivation is AMDGPU's load/store instruction have very strong address pattern that they prefer add close to load/store.
But for the sub, as far as I know they can't get any benifit from it. Actually, I think most of the backend can't support sub in load/store.
I prefer to keep sub+and except we make sure the sub can be optimized.

I update the code you paste in https://reviews.llvm.org/D136582?id=472123. This code doesn't consider the nuw flag, so it misses some cases. And if we add constant shrink for this case later, it works even worse I think.

bcl5980 updated this revision to Diff 472123.Oct 31 2022, 2:06 PM

Update based on @spatel 's suggestion.
I haven't update the summary for now because we can't make sure this patch is the final solution or not.

bcl5980 updated this revision to Diff 472128.Oct 31 2022, 2:10 PM

origin version rebased.

I still think this is a very specific/narrow transform, but if this is important to optimize, then I won't hold it up. The code seems correct.
See inline comments for a few nits.

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
2045

Should this be:

// (C3 - (C2 & C3) - 1) is pow2
llvm/test/Transforms/InstCombine/sub.ll
2231–2232

Remove TODO comment

2247–2248

Remove TODO comment

I still think this is a very specific/narrow transform, but if this is important to optimize, then I won't hold it up. The code seems correct.
See inline comments for a few nits.

I agree that the code is very limited, but I also worry about the solution canonicalize sub + and to and + sub.
If we find more cases get benefit from general canonicalization in the future, we can change to the canonicalization.

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
2045

If we want to match the alive2 proof, it should be:

(C3 - ((C2 & C3) - 1)) is pow2

and it is the same to

(C3 - (C2 & C3) + 1) is pow2
bcl5980 updated this revision to Diff 473117.Nov 3 2022, 8:57 PM
bcl5980 edited the summary of this revision. (Show Details)
spatel accepted this revision.Nov 4 2022, 6:52 AM

LGTM

This revision is now accepted and ready to land.Nov 4 2022, 6:52 AM