Page MenuHomePhabricator

[DAG][sve] Lowering for VLS masked truncating stores
Changes PlannedPublic

Authored by DavidTruby on Aug 16 2021, 3:39 AM.

Details

Summary

This extends the custom lowering for truncating stores on
fixed length vectors in SVE to support masked truncating stores.
It also adds a DAG combine for truncates followed by masked
stores.

Diff Detail

Unit TestsFailed

TimeTest
90 msx64 debian > MLIR.Target/LLVMIR::llvmir-intrinsics.mlir
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/mlir-translate -mlir-to-llvmir /var/lib/buildkite-agent/builds/llvm-project/mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir

Event Timeline

DavidTruby created this revision.Aug 16 2021, 3:39 AM
DavidTruby requested review of this revision.Aug 16 2021, 3:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2021, 3:39 AM
DavidTruby added inline comments.
llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗(On Diff #366581)

These changes aren't correct, I have pinged @dmgreen and @SjoerdMeijer for review to see if they know what's happening here. My suspicion is that there's a target specific combine in Thumb2 for masked truncating stores that my general combine is blocking; I guess the easy fix here is to add an exemption for Thumb2 in the TLI hook but I'm not sure if that's the right thing to do?

Matt added a subscriber: Matt.Aug 17 2021, 9:24 AM
SjoerdMeijer added inline comments.Sep 7 2021, 6:40 AM
llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗(On Diff #366581)

Sorry for the delay! Yeah, this extra codegen is not ideal (I think we agree this is a perf regression, not a correctness, but that aside).

If we could avoid this that would be best. As I understood, it would involve adding this target hook to the ARM backend:

virtual bool canCombineTruncStore(EVT ValVT, EVT MemVT,
                                  bool LegalOperations) const override {
  ...
}

Would you mind adding this to ARM backend? I have no preference if you include that here (I think that would make sense), or do that separately and create a dependent patch. But I think it would good if we can see the evidence this regression disappears before committing this.

Sorry I didn't see this. I get a lot of phabricator spam and this wasn't very obvious from the title. It should probably have [DAG], [llvm] doesn't say much when the whole project is llvm ;)

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9822

I'm surprised to see FP_ROUND here. I guess it gets handled in the same way as a normal non-masked trunc store?

llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗(On Diff #366581)

We don't want canCombineTruncStore. I think you need to add demanded bits for the truncating masked store. As in https://github.com/llvm/llvm-project/blob/b50a60c234433545fc1c9b39f193373f560ea869/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L18115

DavidTruby added inline comments.Sep 7 2021, 7:22 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9822

I just took this from the existing non-masked truncating store code. I think it gets handled by canCombineTruncStore checking if the truncating store has been marked legal or custom, which it won't have been if the architecture doesn't support floating point truncating stores.

DavidTruby updated this revision to Diff 371092.Sep 7 2021, 8:41 AM

Fix Thumb2 regression

DavidTruby added inline comments.Sep 7 2021, 8:42 AM
llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗(On Diff #366581)

Thanks for the pointer, adding this seems to have fixed the regression

Thanks for fixing the MVE issues. Most of the changes here don't look SVE related, being generic DAG combines. It would probably be best to split them into a separate review to keep logically separable additions in different reviews in case there are issues with them.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9868

Is this tested anywhere?

Remove unused and untested code

Thanks for fixing the MVE issues. Most of the changes here don't look SVE related, being generic DAG combines. It would probably be best to split them into a separate review to keep logically separable additions in different reviews in case there are issues with them.

I'm nervous about splitting the DAG combine into a separate patch because it's currently only triggered by SVE code generation; adding it in a separate patch leaves it unused and untested in that patch which I'd rather try and avoid.

I've reverted 734708e04f84b72f1ae7c8b35c002b8bf97dc064 to fix the 2 stage vls bot while this is in review. Please reland both when this is approved.

DavidTruby retitled this revision from [llvm][sve] Lowering for VLS masked truncating stores to [DAG][sve] Lowering for VLS masked truncating stores.Oct 7 2021, 3:17 AM

Sorry for the delay. I missed this as something I should be looking at.

I've had a look around, and I can't see anything obvious that goes wrong with this. canCombineTruncStore is a bit overloaded between stores and masked stores - it relies on the support being symetrical.

I'm nervous about splitting the DAG combine into a separate patch because it's currently only triggered by SVE code generation; adding it in a separate patch leaves it unused and untested in that patch which I'd rather try and avoid.

OK Fair enough. MVE at least takes a separate route to turning masked stores to truncating stores, but has less legal types to deal with. We can treat the MVE test that improved between revisions as the tests.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9828

This formatting looks off.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18161

Why do we need to extend the mask? So that convertFixedMaskToScalableVector shrinks to a i1 vector again?

Sorry for the delay. I missed this as something I should be looking at.

No problem, thanks for the review.

I've had a look around, and I can't see anything obvious that goes wrong with this. canCombineTruncStore is a bit overloaded between stores and masked stores - it relies on the support being symetrical.

I see what you mean about this one but I don't see an obvious way around it other than replicating all the getTruncStoreAction etc logic for masked stores separately. This might be the right thing to do in the end though? I'm open to other suggestions as doing that might be a bit of a pain!

RKSimon added inline comments.Oct 12 2021, 9:33 AM
llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll
16

Any chance that this can be cleaned up and you use the update_tests script?

The check coverage for these tests looks very inconsistent - some tests only check VBITS_GE_1024/2048 and the VBITS_GE_512 is reused by 1024/2048 without a 512-bit fallback.

DavidTruby planned changes to this revision.Oct 26 2021, 7:12 AM

I've opened a separate patch to introduce the functionality for masked truncating store actions, which I will rebase this patch on when that is accepted. You can find that patch here https://reviews.llvm.org/D112536

dmgreen added inline comments.Oct 28 2021, 12:10 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18161

@paulwalker-arm do the aarch64 portions of this patch look OK to you? If so I think the rest of this patch is fine.

paulwalker-arm added inline comments.Fri, Nov 26, 9:50 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9843

This will fail once D114580 lands because that patch breaks the symmetry between truncating stores and masked truncating stores. If we can drop the ISD::FP_ROUND part of the combine then great otherwise we either need to add support for floating point to this patch or wait for D112536.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18163

For consistency this should be ISD::SIGN_EXTEND given on AArch64 fixed length masks are either zero or all ones.