This goes back to a discussion about IR canonicalization. We'd like to preserve and convert more IR to 'select' than we currently do because that's likely the best choice in IR:
...but that's often not true for codegen, so we need to account for this pattern coming in to the backend and transform it to better DAG ops.
Steps in this patch:
- Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely enable this transform. Other targets will probably want that anyway to distinguish scalars from vectors, but we need it here because AVX512 vectors infinite loop with these folds.
- Convert vselect to math or logic. Credit to @RKSimon for suggesting the xor/and/xor hack instead of add/sub for the general case ( also see https://graphics.stanford.edu/~seander/bithacks.html#MaskedMerge ). Bitwise ops should always be more amenable to further folding, so we don't need to add even more special cases for -1/0 constants here.
Try to verify the logic in Alive:
For x86, blendv* is always a multi-uop / multi-cycle instruction according to Agner's docs, so it always makes sense to replace that with simpler instructions. I'm not sure if the same is true for PPC.
Ie, if this:
-; CHECK-NEXT: xxsel 34, 51, 50, 34
+; CHECK-NEXT: xxland 0, 34, 50
+; CHECK-NEXT: xxlxor 34, 0, 51
...is not a good optimization in general, we could make the TLI hook distinguish between constants as a further refinement. Another possibility is to convert that back into a select as a machine-instruction-level fold predicated on uarch details?