This is an archive of the discontinued LLVM Phabricator instance.

[DAG][sve] Lowering for VLS masked truncating stores
ClosedPublic

Authored by DavidTruby on Aug 16 2021, 3:39 AM.

Details

Summary

This extends the custom lowering for truncating stores on
fixed length vectors in SVE to support masked truncating stores.
It also adds a DAG combine for truncates followed by masked
stores.

Diff Detail

Event Timeline

DavidTruby created this revision.Aug 16 2021, 3:39 AM
DavidTruby requested review of this revision.Aug 16 2021, 3:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2021, 3:39 AM
DavidTruby added inline comments.
llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗(On Diff #366581)

These changes aren't correct, I have pinged @dmgreen and @SjoerdMeijer for review to see if they know what's happening here. My suspicion is that there's a target specific combine in Thumb2 for masked truncating stores that my general combine is blocking; I guess the easy fix here is to add an exemption for Thumb2 in the TLI hook but I'm not sure if that's the right thing to do?

Matt added a subscriber: Matt.Aug 17 2021, 9:24 AM
SjoerdMeijer added inline comments.Sep 7 2021, 6:40 AM
llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗(On Diff #366581)

Sorry for the delay! Yeah, this extra codegen is not ideal (I think we agree this is a perf regression, not a correctness, but that aside).

If we could avoid this that would be best. As I understood, it would involve adding this target hook to the ARM backend:

virtual bool canCombineTruncStore(EVT ValVT, EVT MemVT,
                                  bool LegalOperations) const override {
  ...
}

Would you mind adding this to ARM backend? I have no preference if you include that here (I think that would make sense), or do that separately and create a dependent patch. But I think it would good if we can see the evidence this regression disappears before committing this.

Sorry I didn't see this. I get a lot of phabricator spam and this wasn't very obvious from the title. It should probably have [DAG], [llvm] doesn't say much when the whole project is llvm ;)

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9822

I'm surprised to see FP_ROUND here. I guess it gets handled in the same way as a normal non-masked trunc store?

llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗(On Diff #366581)

We don't want canCombineTruncStore. I think you need to add demanded bits for the truncating masked store. As in https://github.com/llvm/llvm-project/blob/b50a60c234433545fc1c9b39f193373f560ea869/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L18115

DavidTruby added inline comments.Sep 7 2021, 7:22 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9822

I just took this from the existing non-masked truncating store code. I think it gets handled by canCombineTruncStore checking if the truncating store has been marked legal or custom, which it won't have been if the architecture doesn't support floating point truncating stores.

DavidTruby updated this revision to Diff 371092.Sep 7 2021, 8:41 AM

Fix Thumb2 regression

DavidTruby added inline comments.Sep 7 2021, 8:42 AM
llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗(On Diff #366581)

Thanks for the pointer, adding this seems to have fixed the regression

Thanks for fixing the MVE issues. Most of the changes here don't look SVE related, being generic DAG combines. It would probably be best to split them into a separate review to keep logically separable additions in different reviews in case there are issues with them.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9868

Is this tested anywhere?

Remove unused and untested code

Thanks for fixing the MVE issues. Most of the changes here don't look SVE related, being generic DAG combines. It would probably be best to split them into a separate review to keep logically separable additions in different reviews in case there are issues with them.

I'm nervous about splitting the DAG combine into a separate patch because it's currently only triggered by SVE code generation; adding it in a separate patch leaves it unused and untested in that patch which I'd rather try and avoid.

I've reverted 734708e04f84b72f1ae7c8b35c002b8bf97dc064 to fix the 2 stage vls bot while this is in review. Please reland both when this is approved.

DavidTruby retitled this revision from [llvm][sve] Lowering for VLS masked truncating stores to [DAG][sve] Lowering for VLS masked truncating stores.Oct 7 2021, 3:17 AM

Sorry for the delay. I missed this as something I should be looking at.

I've had a look around, and I can't see anything obvious that goes wrong with this. canCombineTruncStore is a bit overloaded between stores and masked stores - it relies on the support being symetrical.

I'm nervous about splitting the DAG combine into a separate patch because it's currently only triggered by SVE code generation; adding it in a separate patch leaves it unused and untested in that patch which I'd rather try and avoid.

OK Fair enough. MVE at least takes a separate route to turning masked stores to truncating stores, but has less legal types to deal with. We can treat the MVE test that improved between revisions as the tests.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9828

This formatting looks off.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18161

Why do we need to extend the mask? So that convertFixedMaskToScalableVector shrinks to a i1 vector again?

Sorry for the delay. I missed this as something I should be looking at.

No problem, thanks for the review.

I've had a look around, and I can't see anything obvious that goes wrong with this. canCombineTruncStore is a bit overloaded between stores and masked stores - it relies on the support being symetrical.

I see what you mean about this one but I don't see an obvious way around it other than replicating all the getTruncStoreAction etc logic for masked stores separately. This might be the right thing to do in the end though? I'm open to other suggestions as doing that might be a bit of a pain!

RKSimon added inline comments.Oct 12 2021, 9:33 AM
llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll
16

Any chance that this can be cleaned up and you use the update_tests script?

The check coverage for these tests looks very inconsistent - some tests only check VBITS_GE_1024/2048 and the VBITS_GE_512 is reused by 1024/2048 without a 512-bit fallback.

DavidTruby planned changes to this revision.Oct 26 2021, 7:12 AM

I've opened a separate patch to introduce the functionality for masked truncating store actions, which I will rebase this patch on when that is accepted. You can find that patch here https://reviews.llvm.org/D112536

dmgreen added inline comments.Oct 28 2021, 12:10 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18161

@paulwalker-arm do the aarch64 portions of this patch look OK to you? If so I think the rest of this patch is fine.

paulwalker-arm added inline comments.Nov 26 2021, 9:50 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9843

This will fail once D114580 lands because that patch breaks the symmetry between truncating stores and masked truncating stores. If we can drop the ISD::FP_ROUND part of the combine then great otherwise we either need to add support for floating point to this patch or wait for D112536.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18163

For consistency this should be ISD::SIGN_EXTEND given on AArch64 fixed length masks are either zero or all ones.

DavidTruby updated this revision to Diff 393151.Dec 9 2021, 6:59 AM

Don't allow FP_ROUND

DavidTruby updated this revision to Diff 393152.Dec 9 2021, 7:01 AM

Change zero_extend to sign_extend.

DavidTruby marked 2 inline comments as done.Dec 9 2021, 7:02 AM
DavidTruby added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9843

I think for now we should land this without the FP_ROUND condition to get the code improvement in for LLVM 14 and then fix it up correctly in D112536 (which unfortunately needs a fair bit more work)

RKSimon resigned from this revision.Dec 9 2021, 12:29 PM
DavidTruby marked 3 inline comments as done.Dec 13 2021, 4:13 AM
DavidTruby added inline comments.Dec 13 2021, 4:30 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18161

I think the reason we need to extend here is that fixed-length i1 vectors are not legal types, so the following function needs it to be a non-i1 vector to work correctly.

DavidTruby marked 2 inline comments as done.Dec 13 2021, 5:32 AM
peterwaller-arm accepted this revision.Dec 13 2021, 7:01 AM
This revision is now accepted and ready to land.Dec 13 2021, 7:01 AM

Sorry for the late review @DavidTruby. Although I believe the patch as is likely works fine I've spotted an inconsistency that if possible would good to fix.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9849

I have a comment below where I'm suggesting passing the original mask unmodified is likely an error.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18160–18167

On reflection I think this might be a bug introduced by the change to DAGCombiner::visitMSTORE. I say "might" because the requirements for MSTORE's mask is not well documented.

Typically a mask's type is linked to the main datatype of the operation. In this instance the MSTORE's main datatype is the value being stored and thus I believe the mask's VT should always be the result of getSetCCResultType(Value->getValueType()).

This is something you can see the legaliser honouring, where it uses the helper function DAGTypeLegalizer::PromoteTargetBoolean. I think visitMSTORE should be using similar logic when folding in a truncate.

Fix mask type in DAGCombiner::visitMSTORE

paulwalker-arm added inline comments.Dec 15 2021, 4:42 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9849

You'll need to follow the same idiom as DAGTypeLegalizer::PromoteTargetBoolean because although ISD::SIGN_EXTEND is correct for AArch64, other target's might need something else.

Call promoteTargetBoolean instead of using SIGN_EXTEND for mask.

paulwalker-arm accepted this revision.Dec 17 2021, 5:39 AM
This revision was landed with ongoing or failed builds.Dec 17 2021, 7:05 AM
This revision was automatically updated to reflect the committed changes.