Added patterns to implement select i1 %p, <vty> %a, <vty> %b
Details
Diff Detail
Event Timeline
We use whilelo to lower VECTOR_SPLAT; I think that ends up being one fewer vector instruction. What's the tradeoff between that vs. dup+cmpne?
There's no specific reason to choose dup+cmpne, other than that in these patterns the GPR32 can be any non-zero value, where to use whilelo we'd need a sign-extended i1 value.
I'm happy to change it to use whilelo, but I think it then needs to be implemented in ISelLowering code where we can do analysis on the value of the predicate using computeKnownBits, rather than using patterns. Is that correct?
It's possible to write a TableGen pattern that explicitly generates an SBFM. In general that's what we would generate anyway. But you'd want to do custom lowering to give DAGCombine a chance to combine it away, yes.
- select now lowers to splat_vector for the predicate (which codegens to whilelo) plus vselect.
- Updated tests and rebased patch.