This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Instcombine SVE LD1/ST1 to stock LLVM IR
ClosedPublic

Authored by MattDevereau on Nov 9 2021, 8:06 AM.

Details

Summary

InstCombine AArch64 LD1/ST1 to llvm.masked.load/llvm.masked.store and LD1/ST1 to load/store when a ptrue all predicate pattern operand is present.

This allows existing IR optimizations such as dead-load removal to occur.

Diff Detail

Event Timeline

MattDevereau created this revision.Nov 9 2021, 8:06 AM
MattDevereau requested review of this revision.Nov 9 2021, 8:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 9 2021, 8:06 AM
paulwalker-arm added inline comments.Nov 9 2021, 8:39 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
718

Can we use a masked load for this scenario also? As in masked_load doesn't care how the mask was constructed.

My feeling is instCombineSVEMaskedLD1 and instCombineSVELD1 are called in the wrong order. It seems easier to always call instCombineSVEMaskedLD1 and then call instCombineSVELD1 for the case when

match(II.getOperand(0),
             m_Intrinsic<Intrinsic::aarch64_sve_ptrue>(m_ConstantInt<AArch64SVEPredPattern::all>()))

is true?

I've had a brief look, some typographical comments.

For some reason the patch has failed to apply and the tests have failed to run in CI, perhaps you could try a rebase and rerun arc diff to see if that fixes it?

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
701

Conventionally, I think this would be called VecTy rather than VecOp.

702

I've seen more uses of Ptr than "Pointer".

724

The name VScale brings up an association with "The runtime value of VScale", and this is not a pointer to it, so I think a name more in keeping with the codebase would be something like VecPtr.

733

Inconsistent use of auto (Value* was used above, which I think the coding guide would prefer).

734

You've named getOperand(0) and getOperand(2) here, I think for consistency/completeness you may as well name the final operand, even though it's only used once.

MattDevereau added inline comments.Nov 9 2021, 9:32 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
718

Using the match you suggested and always calling instCombineSVEMaskedLD1 would end up instCombining and erasing aarch64.sve.convert.from.svbool, predicates I think? Is this ok?

paulwalker-arm added inline comments.Nov 9 2021, 9:46 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
707–709

I also just noticed you're not specifying the Passthru parameter for CreateMaskedLoad. This is very important because the ld1 intrinsics are defined to zero the inactive lanes and so you need to pass in an explicit zero vector here. You'll see similar in instCombineLD1GatherIndex.

718

I don't understand your reasoning here.

aarch64.sve.convert.from.svbool is just an intrinsic that produces a predicate. If that is the predicate operand to the ld1 then it should end up being the mask operand of the masked_load.

For the case where the predicate operand is specifically ptrue 31 we can drop the predicate operand and call Builder.CreateLoad with all other cases passing all the operands to Builder.CreateMaskedLoad. Perhaps that means we don't even need instCombineSVELD1.

MattDevereau marked 7 inline comments as done.

Add CreateMaskedLoad passthru parameter
Renamed most variable names to fit with LLVM code convention more
Reworked some function/match logic

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
718

Nevermind, I realised I worded my question incorrectly and then even when worded correctly it made no sense either. I've changed the match.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
731

VecPtrTy -- the Ty conventionally goes at the end.

783

s/VecTyPtr/VecPtrTy/

paulwalker-arm added inline comments.Nov 10 2021, 4:39 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
728–738

Does this function and instCombineSVEST1 offer any real value? I cannot see them having other uses and the implementation is largely already available in the caller. I think it would be cleaner to just have the

{
  LoadInst *Load = Builder.CreateLoad(II.getType(), VecPtr);
  return IC.replaceInstUsesWith(II, Load);
}

part inside instCombineSVEMaskedLD1.

769

Is this necessary?

788

Is this necessary?

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
769

Adding (void) was my guidance, which on reflection I think was in error, thanks for pointing it out.

788

Adding (void) was my guidance, which on reflection I think was in error, thanks for pointing it out.

DavidTruby added inline comments.Nov 10 2021, 6:19 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
732

Nit: I think the default template arguments should just be picked without the <>

750
765
769

If you wanted to make sure you didn't get unused return value warnings you can use LLVM_ATTRIBUTE_UNUSED to be a bit more clear than the void cast.

Regenerated broken tests
Combined instCombineSVEMaskedLD1 and instCombineSVELD1
Combined instCombineSVEMaskedST1 and instCombineSVEST1

Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2021, 5:22 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
MattDevereau marked 9 inline comments as done.Nov 12 2021, 5:26 AM
MattDevereau added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
732

I'm getting compiler errors when omitting <>

750

I'm getting compiler errors when omitting <>

765

I'm getting compiler errors when omitting <>

Run clang-format

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st1w.c
235 ↗(On Diff #387173)

Unfortunately it looks like these were not autogenerated, they used to use CHECK-DAG. This results in more changes than we expect. My thinking is that these tests are intended to be autogenerated, so we should make a clean patch which autogenerates the tests and then reapply your work on top of that.

It looks like some auto-generation lines were added in 91eda9c30f33da6ec6da70b59a5f5da6c6397039, but the tests using CHECK-DAG were not actually auto-generated. It seems we could either cleanly auto-generate them, or we should copy the existing CHECK-DAG patterns. It seems that what is currently committed is a little wrong, if only by having a 'these tests are auto-generated' header even though they are not auto-generated.

My current thinking is that the route forward is to update the tests without update test checks, which affects only the st1{b,h,w} files, and possibly to remove the false header implying these files were autogenerated.

paulwalker-arm accepted this revision.Nov 15 2021, 4:22 AM

There's an issue with the a Value* being named VecPtrTy but otherwise this looks good to me.

I'll leave it up to you to decide whether it's worth breaking out the usage of update_cc_test_checks.py into a separate patch. Normally this is a good thing but given the patch is ready to go and I imagine you wouldn't submit the "use update_cc_test_checks.py" patch for review I'm not sure it's worth the effort.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
736

The name here is wrong as this is not a type. My guess is you meant VecPtr?

758

As above, I suspect this name is not what you intended.

This revision is now accepted and ready to land.Nov 15 2021, 4:22 AM
MattDevereau updated this revision to Diff 387261.EditedNov 15 2021, 7:42 AM

Renamed VecPtrTy to VecPtr in both instCombineSVEST1 and instCombineSVELD1
Moved irrelevant changes from regenerating tests to commit 485c193aa12addea13a0db12f4c6bc6252244319

MattDevereau marked 2 inline comments as done.Nov 16 2021, 3:03 AM

This caused buildbot failures which failed on the bfloat tests. Pushed commit 83727f27719d3f319f746b473ce09be7e1d99b32 to fix the issue