We have a vector compare reduction problem seen in PR39665 comment 2:
https://bugs.llvm.org/show_bug.cgi?id=39665#c2
Or slightly reduced here:
define i1 @cmp2(<2 x double> %a0) { %a = fcmp ogt <2 x double> %a0, <double 1.0, double 1.0> %b = extractelement <2 x i1> %a, i32 0 %c = extractelement <2 x i1> %a, i32 1 %d = and i1 %b, %c ret i1 %d }
SLP does not attempt to turn this into a vector reduction because there is an (artificial?) lower limit on that transform. I don't think we should have that limit: if the target's cost model says a reduction is cheaper (and it probably would be on x86), then we should do the transform.
Trying to make up for disallowing the transform in the backend (D59669) is not going to work. We would need to duplicate large chunks of IR optimizations. And it is clear that we can't do this as a target-independent canonicalization in instcombine because it involves creating shuffles and vector ops.
bool Try2WayRdx = false ?