This is PR37104.
PR6773 will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, andl+andn/andps+andnps / bic/bsl would be generated. (see @out)
Now, they would no longer be generated (see @in).
So we need to make sure that they are still generated.
If the mask is constant, right now i always unfold it.
Else, i use hasAndNot() TLI hook.
For now, only handle scalars.
https://rise4fun.com/Alive/bO6
I *really* don't like the code i wrote in DAGCombiner::unfoldMaskedMerge().
It is super fragile. Is there something like IR Pattern Matchers for this?
After stepping through more of your tests, I see why this is ugly.
We don't have to capture the intermediate values if the hasOneUse() checks are in the lambda(s) though. What do you think of this version:
// There are 3 commutable operators in the pattern, so we have to deal with // 8 possible variants of the basic pattern. SDValue X, Y, M; auto matchAndXor = [&X,&Y,&M](SDValue And, unsigned XorIdx, SDValue Other) { if (And.getOpcode() != ISD::AND || !And.hasOneUse()) return false; if (And.getOperand(XorIdx).getOpcode() != ISD::XOR || !And.getOperand(XorIdx).hasOneUse()) return false; SDValue Xor0 = And.getOperand(XorIdx).getOperand(0); SDValue Xor1 = And.getOperand(XorIdx).getOperand(1); if (Other == Xor0) std::swap(Xor0, Xor1); if (Other != Xor1) return false; X = Xor0; Y = Xor1; M = And.getOperand(1); return true; }; if (!matchAndXor(A, 0, B) && !matchAndXor(A, 1, B) && !matchAndXor(B, 0, A) && !matchAndXor(B, 1, A)) return SDValue();