This improves the combine I included in D66504 to handle constants in the upper operands of the concat. If we can constant fold them away we can pull the concat after the bin op. This helps with chains of madd reductions on X86 from loop unrolling. The loop madd reduction pattern creates pmaddwd with half the width of the add that follows it using zeroes to fill the upper bits. If we have two of these added together we can pull the zeroes through the accumulating add and then shrink it.
Details
Details
Diff Detail
Diff Detail
Event Timeline
llvm/test/CodeGen/X86/madd.ll | ||
---|---|---|
2761 | We were just barely able to prove the bits were all 0 and therefore disjoint, but weren't able to remove the ADD/OR completely because the disjoint bits code gets to go one level deeper in computeKnownBits then SimplifyDemandedBits. This is because the disjoint check starts at a depth of 0 for the operands, but the SimplifyDemandedBits starts at a depth of 0 for the OR node itself. |
We were just barely able to prove the bits were all 0 and therefore disjoint, but weren't able to remove the ADD/OR completely because the disjoint bits code gets to go one level deeper in computeKnownBits then SimplifyDemandedBits. This is because the disjoint check starts at a depth of 0 for the operands, but the SimplifyDemandedBits starts at a depth of 0 for the OR node itself.