Added function to the ExpandVectorPredication pass to handle VP loads and stores.
Details
Diff Detail
Unit Tests
Time | Test | |
---|---|---|
60,030 ms | x64 debian > MLIR.Examples/standalone::test.toy |
Event Timeline
More fixes to the length-only scalarization, and added tests using the Power8 target for this scalarization.
This needs "generic" testing e.g. like those in test/CodeGen/Generic/expand-vp.ll
llvm/lib/CodeGen/ExpandVectorPredication.cpp | ||
---|---|---|
546 | I wonder if the addition of this method should be in a follow-up patch? That is, first get the "basic" support in using masked.* intrinsics in and consider this an enhancement/optimization for certain targets like PowerPC? |
llvm/lib/CodeGen/ExpandVectorPredication.cpp | ||
---|---|---|
533 | We should pick the expansion strategy based on the target's preferences. Either extend VPLegalizationStrategy to allow for strategy selection or add a new function to TTI that returns the expansion method (enum). | |
546 | I agree. Better to add the default expansion strategy first, then extend TTI to allow for strategy selection and the cascading scheme in a followup. |
llvm/lib/CodeGen/ExpandVectorPredication.cpp | ||
---|---|---|
546 | This scheme is meant for any target where masked load/stores are not supported in hardware and all the active lanes are packed on the left (ie EVL only). I have a hard time imagining a target that wouldn't benefit from this scheme. I think that's why Hussain added this as part of the default expansion. |
llvm/lib/CodeGen/ExpandVectorPredication.cpp | ||
---|---|---|
546 | If there are two expansion schemes, targets should be able to choose between them through TTI. AVX.* supports masked load/store well - PPC may benefit more from the piecewise expansion scheme. |
llvm/lib/CodeGen/ExpandVectorPredication.cpp | ||
---|---|---|
546 | The only other expansion scheme is a fail-safe that applies to legalization of masked load/stores in general (not just the EVL cases), so it's less optimal regardless of the architecture. I neither see the need nor strongly object to adding a TTI query for this, but if we add one I think the default should be the cascading scheme. |
llvm/lib/CodeGen/ExpandVectorPredication.cpp | ||
---|---|---|
546 | (and again we are talking about targets where masked/load stores are not supported in hardware, so I guess AVX doesn't apply). |
I'm still in favor of splitting up the patch into the default expansion (which can be the cascading loads) and a second one for masked.load expansion.
llvm/lib/CodeGen/ExpandVectorPredication.cpp | ||
---|---|---|
490 | I had overlooked this before. You are checking whether masked.load is supported, so my argument for selecting the expansion scheme with TTI is moot. |
llvm/lib/CodeGen/ExpandVectorPredication.cpp | ||
---|---|---|
497 | TTI.isLegalMaskedStore? |
llvm/lib/CodeGen/ExpandVectorPredication.cpp | ||
---|---|---|
533 | load/store unfolding does not work for scalable vectors this way. We should default to the memory intrinsic path for scalable types. |
llvm/lib/CodeGen/ExpandVectorPredication.cpp | ||
---|---|---|
411 | VPIntrinsic is a subclass of Instruction, we shouldn't need an explicit cast. | |
413 | Don't use auto here. | |
423 | Why not llvm_unreachable? | |
481 | Doesn't IRBuilder's constructor also set the debug location? | |
484 | Don't use auto. | |
488 | Don't use auto | |
493 | Drop else after return | |
500 | Drop else after return | |
507 | Does't IRBuilder's constructor set the debug location? | |
530 | llvm_unreachable | |
535 | Drop else after return and drop the curly braces for the if body | |
551 | Shouldn't be needed | |
560 | Use StringRef | |
564 | llvm_unreachable | |
574 | Capitalize | |
591 | break should be inside the curly braces | |
597 | Same here | |
610 | Drop parentheses around the statement | |
616 | Using StringRef for Prefix should eliminate the need for Twine constructor here |
hello @hussainjk @bmahjour, are you still working on this? Otherwise, I may take it over on your behalf.
Changelog:
- Remove VP gather/scatter references (to be added in a follow-up patch)
- Following the discussion in the comments, remove expandPredicationInUnfoldedLoadStore() function (to be added in a follow-up patch)
- Apply suggestions in comments
- Add tests
llvm/lib/CodeGen/ExpandVectorPredication.cpp | ||
---|---|---|
530 | I changed the behaviour of the defalut case to reflect what was done before this patch, but I am not sure which one is the right approach: what do you think is best? |
llvm/lib/CodeGen/ExpandVectorPredication.cpp | ||
---|---|---|
181 | unpredicated seems misleading here, since we're using masked.load and masked.store: those are predicated in a sense. | |
421 | nit, but I don't think you need these braces around the switch statements. NewStore/NewLoad are defined in their own scope. | |
423 | I'd prefer something like /*IsVolatile*/ false | |
llvm/test/CodeGen/Generic/expand-vp-load-store.ll | ||
16 | We should be testing the IsUnmasked path here too. |
Changelog:
- Address comments
N.B.: the IsUnmasked == true path in expandPredicationInMemoryIntrinsic() is not reachable, as also shown by the tests. How should this be handled in the code? Do we still handle this case or we add something like assert(!isAllTrueMask(MaskParam)) instead of defining the IsUnmasked boloean?
Is that because you're using scalable vectors and the isAllTrueMask is expecting a ConstantVector? To my mind we should either:
- use a better true-mask check (surely there's one we can reuse instead of implementing our own)
- add tests for fixed vectors
- use constantexpr all-ones masks in the scalable-vector tests
The last is a bit of a hack, but the other two sound reasonable. Adding tests for fixed vectors sounds like a good idea for this patch anyway, and it'd give us coverage of this code path. The first one should be done, but done in a separate patch. Just add a FIXME in the scalable-vector tests for now?
Changelog:
- Improve isAllTrueMask() (for scalable vectors only)
- Add and update tests
I did not manage to find any ready to use function in place of isAllTrueMask(), so I added this PatternMatch approach found in other places of the codebase. Maybe we can unify its behaviour in a later patch?
Changelog:
- Use getSplatValue() instead of PatternMatch in isAllTrueMask()
- Update tests
llvm/lib/CodeGen/ExpandVectorPredication.cpp | ||
---|---|---|
89 | I don't know if it matters whether this is isOneValue or isAllOnesValue? In practice for i1 masks it's the same, but somehow the latter sounds more appropriate to me. | |
llvm/test/CodeGen/Generic/expand-vp-load-store.ll | ||
35 | Shouldn't we see regular load here? | |
79 | Same here: regular store? |
Following the discussion in the comments, remove expandPredicationInUnfoldedLoadStore() function (to be added in a follow-up patch)
Is there a follow up patch for this yet?
hello @bmahjour, there is no follow-up patch yet and right now it is not in my short-term plan; I may be able to keep working on it later on. There wouldn't be any problem though if someone is interested on taking this over.
I don't know if it matters whether this is isOneValue or isAllOnesValue? In practice for i1 masks it's the same, but somehow the latter sounds more appropriate to me.