Follow up D123453, add one-use limitation for
(X * C2) << C1 --> X * (C2 << C1)
to make consistent with
lshr (mul nuw x, MulC), ShAmtC -> mul nuw x, (MulC >> ShAmtC)
Differential D124183
[InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1) bcl5980 on Apr 21 2022, 9:24 AM. Authored by
Details Follow up D123453, add one-use limitation for (X * C2) << C1 --> X * (C2 << C1) to make consistent with lshr (mul nuw x, MulC), ShAmtC -> mul nuw x, (MulC >> ShAmtC)
Diff Detail
Unit Tests
Event TimelineComment Actions This is not consistent with our general folding rules. Does this regress e.g. the following case: https://godbolt.org/z/aWsvzW751 ? Comment Actions Can you help to explain the general folding rules? I'm a beginner so it will be grateful you can tell me the detail rules.
I think sequential mul-shr is also very easy to fold to two independent mul in backend but not the other way. If the microarchitecture has enough mul ports and don't care too much about power they can do this in backend.
No performance report. But the general IR change is also hard to get a large set perf data I think.
Thanks for the case you mentioned. Maybe I can add some more conditions to avoid the regression but now I think a rule for one-use is important for me. Comment Actions For instcombine, basically, the rule is: do not increase IR instruction count. Comment Actions I'm sorry but I mean the rule for one-use usage. Actually we have a lot of one-use in instcombine. And most of them also follow the rule do not increase IR instruction count. Comment Actions We may be going in circles here. Comment Actions One other reason I insist one-use is we can give programmer choice. If we remove one-use, there is no choice for programmer. Comment Actions @spatel Do you agree with @lebedev.ri to revert both one-use? If yes I will revert both of them. Comment Actions Apologies, i do not really understand what you mean by a choice here. Note that i was not asking for the revert, and there is really no need to revert the D123453, if needed it can be adjusted in a new commit. Comment Actions Ideally, the compiler would not give the programmer a "choice" of multiply asm output on these examples. We want to be able to reduce logically equivalent code to the same IR (canonicalization) and then optimize it to the ideal target assembly. Leaving holes in the canonicalization makes it harder to optimize maximally and consistently. That can also be frustrating for users when they think they have written the same program, but it compiles to different output. As noted, the general rule for instcombine is "don't increase the instruction count". In addition, if we can reduce the number of uses of variables or reduce the dependency chain, then that is usually good. That is the reason we would leave out the one-use check in the test example for this patch - "C" can be calculated directly from "A" rather than using the intermediate "B" value. The rules can be confusing because we have several exceptions:
I don't think we need to revert D123453. As discussed there, we could have made that patch consistent with this transform (no one-use check). I assumed from the codegen examples that there could be a performance regression without one-use, but there could also be improvements. We can remove the one-use as a follow-up to that patch and see if anyone complains. Comment Actions Programmer can determine the final result is mul+shift or two independent mul. unsigned a = i * 100; --> mul unsigned b = a * 4; --> shift unsigned a = i * 100; --> mul1 unsigned b = i * 400; --> mul2 Let programmer to decide which pattern is better. I'm pretty sure on GPU(CUDA/HIP) or power sensitive CPU plaform we prefer mul+shift.
Yeah, I mean do we need a new commit to remove one-use on D123453. Comment Actions Thanks for the detail explaination. Is that means our general rule is don't add one-use first, but if there are some perf regressions happened then we add it ? Comment Actions The general rule is to make the canonicalization with as few restrictions as possible, so avoid one-use checks if possible. But if we know or suspect that there can be regressions, we try to fix those by creating other (inverse) transforms in the backend for example.
Comment Actions @spatel, thanks for the answer. Based on the discussion I review all one-use in instcombine and I find some more similar cases without TODO comments: Instruction *InstCombinerImpl::visitAdd usub.sat(A, B) + B => umax(A, B) m_OneUse(m_Intrinsic<Intrinsic::usub_sat>(m_Value(A), m_Value(B))) Instruction *InstCombinerImpl::visitSub X - usub.sat(X, Y) => umin(X, Y) m_OneUse(m_Intrinsic<Intrinsic::usub_sat>(m_Specific(Op0), m_Value(Y))) Instruction *InstCombinerImpl::visitFAdd fadd (rdx 0.0, X), Y --> rdx Y, X m_OneUse(m_Intrinsic<Intrinsic::vector_reduce_fadd>(m_AnyZeroFP(), m_Value(X))) Instruction *InstCombinerImpl::visitFNeg -(X - Y) --> (Y - X) m_OneUse(m_FSub(m_Value(X), m_Value(Y))) static Instruction *foldComplexAndOrPatterns (~(A | B) & C) | ~(A | C) --> ~((B & C) | A) (~(A & B) | C) & ~(A & C) --> ~((B | C) & A) m_OneUse(m_c_BinOp(Opcode, m_Specific(A), m_Specific(C))) (~(A | B) & C) | ~(B | C) --> ~((A & C) | B) (~(A & B) | C) & ~(B & C) --> ~((A | C) & B) m_OneUse(m_c_BinOp(Opcode, m_Specific(B), m_Specific(C))) (~A & B & C) | ~(A | B) --> (C | ~B) & ~A (~A | B | C) & ~(A & B) --> (C & ~B) | ~A m_OneUse(m_c_BinOp(Opcode, m_Specific(A), m_Specific(B))) (~A & B & C) | ~(A | C) --> (B | ~C) & ~A (~A | B | C) & ~(A & C) --> (B & ~C) | ~A m_OneUse(m_c_BinOp(Opcode, m_Specific(A), m_Specific(C)) static Instruction *foldBitCastBitwiseLogic bitcast(logic(bitcast(X), Y)) --> logic'(X, bitcast(Y)) m_OneUse(m_BitCast(m_Value(X))) bitcast(logic(Y, bitcast(X))) --> logic'(bitcast(Y), X) m_OneUse(m_BitCast(m_Value(X))) bitcast(select(Cond, bitcast(X), Y)) --> select'(Cond, X, bitcast(Y)) m_OneUse(m_BitCast(m_Value(X)))) bitcast(select(Cond, Y, bitcast(X))) --> select'(Cond, bitcast(Y), X) m_OneUse(m_BitCast(m_Value(X)))) Instruction *InstCombinerImpl::foldICmpEquality Transform "icmp eq (trunc (lshr(X, cst1)), cst" to "icmp (and X, mask), cst" m_OneUse(m_LShr(m_Value(A), m_ConstantInt(ShAmt))) static Value *foldMulSelectToNegate mul (select Cond, 1, -1), OtherOp --> select Cond, OtherOp, -OtherOp mul OtherOp, (select Cond, 1, -1) --> select Cond, OtherOp, -OtherOp m_OneUse(m_Select(m_Value(Cond), m_One(), m_AllOnes())) mul (select Cond, -1, 1), OtherOp --> select Cond, -OtherOp, OtherOp mul OtherOp, (select Cond, -1, 1) --> select Cond, -OtherOp, OtherOp m_OneUse(m_Select(m_Value(Cond), m_AllOnes(), m_One())) fmul (select Cond, 1.0, -1.0), OtherOp --> select Cond, OtherOp, -OtherOp fmul OtherOp, (select Cond, 1.0, -1.0) --> select Cond, OtherOp, -OtherOp m_OneUse(m_Select(m_Value(Cond), m_SpecificFP(1.0), m_SpecificFP(-1.0))) fmul (select Cond, -1.0, 1.0), OtherOp --> select Cond, -OtherOp, OtherOp fmul OtherOp, (select Cond, -1.0, 1.0) --> select Cond, -OtherOp, OtherOp m_OneUse(m_Select(m_Value(Cond), m_SpecificFP(-1.0), m_SpecificFP(1.0))) Instruction *InstCombinerImpl::visitFMul (C1 / X) * C --> (C * C1) / X; list here, but based on the discussion with div, I don't think we should remove one-use m_OneUse(m_FDiv(m_Constant(C1), m_Value(X))) Instruction *InstCombinerImpl::visitShl (C2 << X) << C1 --> (C2 << C1) << X m_OneUse(m_Shl(m_Constant(C2), m_Value(X)))) Instruction *InstCombinerImpl::foldICmpBinOpEqualityWithConstant Replace ((add A, B) != C) with (A != C-B) if B & C are constants. BO->hasOneUse() For the xor case, we can xor two constants together, eliminating the explicit xor. Replace ((xor A, B) != 0) with (A != B) BO->hasOneUse() Instruction *InstCombinerImpl::foldICmpEquality A^c1 == C^c2 --> A == C^(c1^c2) Op1->hasOneUse() Transform (zext A) == (B & (1<<X)-1) --> A == (trunc B) Op0->hasOneUse() and (B & (1<<X)-1) == (zext A) --> A == (trunc B) Op1->hasOneUse() @spatel @lebedev.ri Comment Actions Some of these may be intentionally limited to one-use while others were just coded conservatively to avoid controversy. As I said earlier, I don't think it is possible to be 100% consistent (without a lot of work in the backend). So I think it would be better to spend your time on problems that will give us a known improvement instead of a theoretical improvement. There are many open issues for optimization, or you can look for new opportunities by analyzing real applications. |
@lebedev.ri @spatel
Based on the rules, can we remove this one-use also ?
(C2 << X) << C1 dependent on C2 << X
(C2 << C1) << X is independent with C2 << X
One similar case udiv+lshr, do we need add one-use if we add the transform?
https://alive2.llvm.org/ce/z/wk6n3q