Added function to the ExpandVectorPredication pass to handle VP loads and stores.
This needs "generic" testing e.g. like those in test/CodeGen/Generic/expand-vp.ll
I wonder if the addition of this method should be in a follow-up patch? That is, first get the "basic" support in using masked.* intrinsics in and consider this an enhancement/optimization for certain targets like PowerPC?
We should pick the expansion strategy based on the target's preferences. Either extend VPLegalizationStrategy to allow for strategy selection or add a new function to TTI that returns the expansion method (enum).
I agree. Better to add the default expansion strategy first, then extend TTI to allow for strategy selection and the cascading scheme in a followup.
This scheme is meant for any target where masked load/stores are not supported in hardware and all the active lanes are packed on the left (ie EVL only). I have a hard time imagining a target that wouldn't benefit from this scheme. I think that's why Hussain added this as part of the default expansion.
If there are two expansion schemes, targets should be able to choose between them through TTI. AVX.* supports masked load/store well - PPC may benefit more from the piecewise expansion scheme.
The only other expansion scheme is a fail-safe that applies to legalization of masked load/stores in general (not just the EVL cases), so it's less optimal regardless of the architecture. I neither see the need nor strongly object to adding a TTI query for this, but if we add one I think the default should be the cascading scheme.
I'm still in favor of splitting up the patch into the default expansion (which can be the cascading loads) and a second one for masked.load expansion.
I had overlooked this before. You are checking whether masked.load is supported, so my argument for selecting the expansion scheme with TTI is moot.
VPIntrinsic is a subclass of Instruction, we shouldn't need an explicit cast.
Don't use auto here.
Why not llvm_unreachable?
Doesn't IRBuilder's constructor also set the debug location?
Don't use auto.
Don't use auto
Drop else after return
Drop else after return
Does't IRBuilder's constructor set the debug location?
Drop else after return and drop the curly braces for the if body
Shouldn't be needed
break should be inside the curly braces
Drop parentheses around the statement
Using StringRef for Prefix should eliminate the need for Twine constructor here
- Remove VP gather/scatter references (to be added in a follow-up patch)
- Following the discussion in the comments, remove expandPredicationInUnfoldedLoadStore() function (to be added in a follow-up patch)
- Apply suggestions in comments
- Add tests
I changed the behaviour of the defalut case to reflect what was done before this patch, but I am not sure which one is the right approach: what do you think is best?
unpredicated seems misleading here, since we're using masked.load and masked.store: those are predicated in a sense.
nit, but I don't think you need these braces around the switch statements. NewStore/NewLoad are defined in their own scope.
I'd prefer something like /*IsVolatile*/ false
We should be testing the IsUnmasked path here too.
- Address comments
N.B.: the IsUnmasked == true path in expandPredicationInMemoryIntrinsic() is not reachable, as also shown by the tests. How should this be handled in the code? Do we still handle this case or we add something like assert(!isAllTrueMask(MaskParam)) instead of defining the IsUnmasked boloean?
Is that because you're using scalable vectors and the isAllTrueMask is expecting a ConstantVector? To my mind we should either:
- use a better true-mask check (surely there's one we can reuse instead of implementing our own)
- add tests for fixed vectors
- use constantexpr all-ones masks in the scalable-vector tests
The last is a bit of a hack, but the other two sound reasonable. Adding tests for fixed vectors sounds like a good idea for this patch anyway, and it'd give us coverage of this code path. The first one should be done, but done in a separate patch. Just add a FIXME in the scalable-vector tests for now?
- Improve isAllTrueMask() (for scalable vectors only)
- Add and update tests
I did not manage to find any ready to use function in place of isAllTrueMask(), so I added this PatternMatch approach found in other places of the codebase. Maybe we can unify its behaviour in a later patch?
I don't know if it matters whether this is isOneValue or isAllOnesValue? In practice for i1 masks it's the same, but somehow the latter sounds more appropriate to me.
Shouldn't we see regular load here?
Same here: regular store?