This extends the custom lowering for truncating stores on
fixed length vectors in SVE to support masked truncating stores.
It also adds a DAG combine for truncates followed by masked
stores.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll | ||
---|---|---|
1462 ↗ | (On Diff #366581) | These changes aren't correct, I have pinged @dmgreen and @SjoerdMeijer for review to see if they know what's happening here. My suspicion is that there's a target specific combine in Thumb2 for masked truncating stores that my general combine is blocking; I guess the easy fix here is to add an exemption for Thumb2 in the TLI hook but I'm not sure if that's the right thing to do? |
llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll | ||
---|---|---|
1462 ↗ | (On Diff #366581) | Sorry for the delay! Yeah, this extra codegen is not ideal (I think we agree this is a perf regression, not a correctness, but that aside). If we could avoid this that would be best. As I understood, it would involve adding this target hook to the ARM backend: virtual bool canCombineTruncStore(EVT ValVT, EVT MemVT, bool LegalOperations) const override { ... } Would you mind adding this to ARM backend? I have no preference if you include that here (I think that would make sense), or do that separately and create a dependent patch. But I think it would good if we can see the evidence this regression disappears before committing this. |
Sorry I didn't see this. I get a lot of phabricator spam and this wasn't very obvious from the title. It should probably have [DAG], [llvm] doesn't say much when the whole project is llvm ;)
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10070 | I'm surprised to see FP_ROUND here. I guess it gets handled in the same way as a normal non-masked trunc store? | |
llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll | ||
1462 ↗ | (On Diff #366581) | We don't want canCombineTruncStore. I think you need to add demanded bits for the truncating masked store. As in https://github.com/llvm/llvm-project/blob/b50a60c234433545fc1c9b39f193373f560ea869/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L18115 |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10070 | I just took this from the existing non-masked truncating store code. I think it gets handled by canCombineTruncStore checking if the truncating store has been marked legal or custom, which it won't have been if the architecture doesn't support floating point truncating stores. |
llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll | ||
---|---|---|
1462 ↗ | (On Diff #366581) | Thanks for the pointer, adding this seems to have fixed the regression |
Thanks for fixing the MVE issues. Most of the changes here don't look SVE related, being generic DAG combines. It would probably be best to split them into a separate review to keep logically separable additions in different reviews in case there are issues with them.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10118 | Is this tested anywhere? |
I'm nervous about splitting the DAG combine into a separate patch because it's currently only triggered by SVE code generation; adding it in a separate patch leaves it unused and untested in that patch which I'd rather try and avoid.
I've reverted 734708e04f84b72f1ae7c8b35c002b8bf97dc064 to fix the 2 stage vls bot while this is in review. Please reland both when this is approved.
Sorry for the delay. I missed this as something I should be looking at.
I've had a look around, and I can't see anything obvious that goes wrong with this. canCombineTruncStore is a bit overloaded between stores and masked stores - it relies on the support being symetrical.
I'm nervous about splitting the DAG combine into a separate patch because it's currently only triggered by SVE code generation; adding it in a separate patch leaves it unused and untested in that patch which I'd rather try and avoid.
OK Fair enough. MVE at least takes a separate route to turning masked stores to truncating stores, but has less legal types to deal with. We can treat the MVE test that improved between revisions as the tests.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10076 | This formatting looks off. | |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
18906 | Why do we need to extend the mask? So that convertFixedMaskToScalableVector shrinks to a i1 vector again? |
No problem, thanks for the review.
I've had a look around, and I can't see anything obvious that goes wrong with this. canCombineTruncStore is a bit overloaded between stores and masked stores - it relies on the support being symetrical.
I see what you mean about this one but I don't see an obvious way around it other than replicating all the getTruncStoreAction etc logic for masked stores separately. This might be the right thing to do in the end though? I'm open to other suggestions as doing that might be a bit of a pain!
llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll | ||
---|---|---|
8–16 | Any chance that this can be cleaned up and you use the update_tests script? The check coverage for these tests looks very inconsistent - some tests only check VBITS_GE_1024/2048 and the VBITS_GE_512 is reused by 1024/2048 without a 512-bit fallback. |
I've opened a separate patch to introduce the functionality for masked truncating store actions, which I will rebase this patch on when that is accepted. You can find that patch here https://reviews.llvm.org/D112536
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
18906 | @paulwalker-arm do the aarch64 portions of this patch look OK to you? If so I think the rest of this patch is fine. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
18906 | I think the reason we need to extend here is that fixed-length i1 vectors are not legal types, so the following function needs it to be a non-i1 vector to work correctly. |
Sorry for the late review @DavidTruby. Although I believe the patch as is likely works fine I've spotted an inconsistency that if possible would good to fix.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10097 | I have a comment below where I'm suggesting passing the original mask unmodified is likely an error. | |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
18905–18906 | On reflection I think this might be a bug introduced by the change to DAGCombiner::visitMSTORE. I say "might" because the requirements for MSTORE's mask is not well documented. Typically a mask's type is linked to the main datatype of the operation. In this instance the MSTORE's main datatype is the value being stored and thus I believe the mask's VT should always be the result of getSetCCResultType(Value->getValueType()). This is something you can see the legaliser honouring, where it uses the helper function DAGTypeLegalizer::PromoteTargetBoolean. I think visitMSTORE should be using similar logic when folding in a truncate. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10097 | You'll need to follow the same idiom as DAGTypeLegalizer::PromoteTargetBoolean because although ISD::SIGN_EXTEND is correct for AArch64, other target's might need something else. |
I'm surprised to see FP_ROUND here. I guess it gets handled in the same way as a normal non-masked trunc store?