Posting this patch seeking guidance...
This patch is lowering code for fixed width select. There's an issue with the select's condition operand though. Fixed width vectors-of-i1s (e.g. v8i1) aren't legal. They will be promoted to a larger legal integer vector (e.g. v8i64). We run into problems when we lower the fixed width masks to their legal scalable counterparts (e.g. nxv2i1).
Here's a hard example:
In AArch64TargetLowering::useSVEForFixedLengthVectorV(...), we have this comment:
// Fixed length predicates should be promoted to i8. // NOTE: This is consistent with how NEON (and thus 64/128bit vectors) work.
That's problematic for a select like this:
select <8 x i1> %mask, <8 x i64> %op1, <8 x i64> %op2
At VL=512, the v8i1 mask will be promoted to v8i64. In order to lower this to a scalable mask, we'd need to insert the v8i64 subvector into a nxv2i64. And then truncate that ZPR by performing a CMPNE against 0, to get the final nxv2i1 mask. Between the zero extend to promote the vXi1 mask, and the truncate to get back to a nxvXi1, there's a lot of extra instructions.
What's the best way to proceed with this? Are these extra instructions something we live with? Or should we play with making the vXi1 mask types legal when we're lowering fixed width vectors? I don't have a good feel for how the latter will fit with the existing NEON mask though.
P.S. You'll notice the included tests do not have CHECK lines yet. I didn't want to do that work until there was a clear direction forward. I will post a small example and the generated assembly for the reviewer's convenience shortly.