Hi @bsmith, this looks like a sensible optimisation! I suppose we can also do something similar when the input is an and too? i.e.
%mask = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 0x7FFF) %and = tail call <vscale x 8 x i16> @llvm.aarch64.sve.and.nxv8i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %mask) %splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2) %shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %and, <vscale x 8 x i16> %splat)
Do you mean SRSHL instead of SQSHL in all the comments?
nit: Maybe it's worth splitting the tests out into Positive and Negative tests, i.e. top half positive, bottom half negative? I think that makes it a bit easier to see what's going on.
I think it's probably not worth handling a case like this, the abs case is one we have explicitly seen in real world code. We could in theory go further and add even more cases (shr also, for example), but given there is no obvious generic way to do this we'd have to add loads of special cases for each intrinsic we might care about, all just in case someone happened to write/end up with suboptimal ACLE code, (srshl is not a compiler generated thing, it will only come from ACLE intrinsics). I think it's probably best to only extend this if and when we need it.
- Fix typos in comments, SQSHL -> SRSHL.
- Rearrange tests to have positive cases first.
I don't suppose m_NonNegative() works, or can be made to work, here?
Ideally I'd hope for something similar in the higher up code paths so that perhaps we can catch more cases for free but I think realistically this is the only place where it might work today out-the-box.
I think you can use IRBuilder<> Builder(II); here.