A lot of neon intrinsics work lane-wise, meaning that non-demanded elements in and not demanded out. This teaches that to AArch64TTIImpl::simplifyDemandedVectorEltsIntrinsic for some simple single-input truncate intrinsics, which can help remove unnecessary instructions in the final result.
Details
Diff Detail
Event Timeline
The change looks sensible to me, although I was a little confused by the commit message? It looks like what your patch is doing is just adding an AArch64 version of simplifyDemandedVectorEltsIntrinsic, that allows us to potentially simplify the input operand to the intrinsic based on the demanded elements. So, for example, if we know that we are only going to use the first N elements of the intrinsic result we can use that information to simplify the intrinsic operand too.
It looks like what your patch is doing is just adding an AArch64 version of simplifyDemandedVectorEltsIntrinsic, that allows us to potentially simplify the input operand to the intrinsic based on the demanded elements. So, for example, if we know that we are only going to use the first N elements of the intrinsic result we can use that information to simplify the intrinsic operand too.
Yep. I had written "single element" where I meant "single source". I can see that being confusing, but you have the right idea.
We could do it for binops too, but I've not looked at those here, just truncates with a single input.