This patch allows loop vectorization with function calls in cases where masks are not required but the only available vectorized function variants are masked.
Some of the code was originally written by @paulwalker-arm
huntergr on Aug 23 2022, 3:45 AM.Authored by
I recommend you split this patch into the following patches:
The first and second step are independent, and could be done in either order.
Thanks for the review. I'm tempted to add a masking equivalent to -force-target-supports-scalable-vectors=true in order to have target-independent tests, but I can add that in another patch.
@reames -- is this roughly what you expected for the case of allowing a masked variant to be used when no mask is required? I've added a cost for generating the mask (per-call for now, instead of potentially sharing it) so that we can compare costs for different VFs with and without a masked variant, but I think we would always prefer the non-masked variant for the same VF if a mask is not required.
In any case, I'm now working on the third patch.
Updated to store the pointer to the vector function in the recipe rather than looking it up again during recipe execution. Forced generation of a plan per VF when there are variants available for those VFs. Added some new tests for masked vs. unmasked variants.
I'm not fond of adding the optional parameters LoopVectorizationCostModel::getVectorCallCost -- it feels like function lookup needs to be split out of it, but I'd like to get some feedback from others before doing so.