C1 - ((C3 - X) & C2) --> (X & C2) + (C1 - (C2 & C3))
when:
(C3 - ((C2 & C3) - 1)) is pow2 && ((C2 + C3) & ((C2 & C3) - 1)) == ((C2 & C3) - 1) && C2 is negative pow2 || (C3 - X) is nuw
https://alive2.llvm.org/ce/z/HXQJV-
Fix: #58523
Differential D136582
[InstCombine] fold `sub + and` pattern with specific const value bcl5980 on Oct 24 2022, 12:59 AM. Authored by
Details C1 - ((C3 - X) & C2) --> (X & C2) + (C1 - (C2 & C3)) (C3 - ((C2 & C3) - 1)) is pow2 && ((C2 + C3) & ((C2 & C3) - 1)) == ((C2 & C3) - 1) && C2 is negative pow2 || (C3 - X) is nuw https://alive2.llvm.org/ce/z/HXQJV- Fix: #58523
Diff Detail
Unit Tests Event TimelineComment Actions I don't know if it would have any effect on this patch, but we seem to be missing a constant-shrinking (DemandedBits) opportunity for sub nuw: We can clear high-bits of the mask constant based on the highest bit set in the subtract constant. Comment Actions Yeah, I also find the shrinking. And that part has no effect on this patch. Condition 1/3 don't care any high-bits for C2. Comment Actions This patch still seems too specific. Comment Actions Thanks for the idea. It is a more clean and simplier way. Comment Actions Good point. It seems we are missing some family of canonicalizations with subtract-from-constant and bitwise logic. There may be some common pre-condition with a low-bit mask and any logic op? Comment Actions The pattern is still complicate: And to avoid potential regression I prefer this pattern: Comment Actions I think this can be reduced to low-bit mask constraints: Comment Actions Yeah, only consider sub without nuw can be this pattern. But if we consider nuw, we still need to use the condition I send before I think. Comment Actions Please pre-commit the baseline tests, so we will see the diffs here. I'm still not sure if we want to add some more general transforms in addition to or instead of this fold. Should we have the subtract version of this patch for consistency? In case it was not clear, that was a transform I suggested earlier. Depending where it is added, I think the patch would look something like this: diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp index c3dabc2d4a07..e7ae1a9be39c 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp @@ -2035,6 +2035,18 @@ Instruction *InstCombinerImpl::visitAnd(BinaryOperator &I) { ConstantInt::getNullValue(Ty)); } + // If all bits affected by a sub are included in a high-bit-mask, do the + // mask op before the adjusted sub. Example: + // (0x0f - X) & 0xf8 --> 0x08 - (X & 0xf8) + const APInt *SubC; + if (C->isNegatedPowerOf2() && + match(Op0, m_OneUse(m_Sub(m_APInt(SubC), m_Value(X)))) && + (~*C).isSubsetOf(*SubC)) { + Value *NewAnd = Builder.CreateAnd(X, *C); + Constant *NewSubC = ConstantInt::get(Ty, *C & *SubC); + return BinaryOperator::CreateSub(NewSubC, NewAnd); + } + Constant *C1, *C2; const APInt *C3 = C; Value *X; I didn't see any immediate failures in regression tests with that patch applied. Comment Actions I guess D130080's motivation is AMDGPU's load/store instruction have very strong address pattern that they prefer add close to load/store. I update the code you paste in https://reviews.llvm.org/D136582?id=472123. This code doesn't consider the nuw flag, so it misses some cases. And if we add constant shrink for this case later, it works even worse I think. Comment Actions Update based on @spatel 's suggestion. Comment Actions I still think this is a very specific/narrow transform, but if this is important to optimize, then I won't hold it up. The code seems correct.
Comment Actions I agree that the code is very limited, but I also worry about the solution canonicalize sub + and to and + sub.
|
Should this be: