Co-authored-by: Paul Walker <email@example.com>
Hi @dmgreen! This is SVE-specific, and SVEIntrinsicOpts.cpp is where such transformations are typically placed (at least for now). I did a quick grep and it seems SVE intrinsics don't currently have much of a presence in generic passes like instcombine/constant folding, perhaps because some SVE optimisations are more complex than others and I guess it makes sense to keep them all in the same place.
X86 also has llvm/lib/Target/X86/X86InstCombineIntrinsic.cpp which houses instcombine-like optimisations for X86. 🙂
Hmm. "Because we've been doing it wrong" isn't a great reason to _keep_ doing it wrong :-)
This is very similar to https://reviews.llvm.org/rGbecaa6803ab532d15506829f0551a5fa49c39d7e, but special cased for zeroinit vectors. I do see this code at the start of that function, but it seems like it could still be made to work, and presumably in the long run some of the other constant folding there might be useful (like masked loads with zero masks).
// Do not iterate on scalable vector. The number of elements is unknown at // compile-time. if (isa<ScalableVectorType>(VTy)) return nullptr;
More generally, any backend can use instCombineIntrinsic to add intrinsic folds to instcombine. I can appreciate that not everything that SVECombines is doing will fit well there, but the ones that do should be put there if possible. The steady state combining it does will be worthwhile, as well as it being run multiple times through the pipeline. Plus, you know, not re-inventing the wheel.
@dmgreen -- I hear you, makes sense to me. There might be some functionality already in SVEIntrinsicOpts.cpp that we can pull out, but I guess that can wait for later. In any case I'll move the functionality here to the other passes to reflect your suggestions.
Appreciate your comments! 😄
Nice one, thanks. This Looks good to me.
Doesn't need a break after the return (unless it's quieting a warning, but I think most compilers treat this sensibly).
If this folder is new it may need (or at least be best) to have a lit.local.cfg to only run it when AArch64 is a registered target.