This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][InstCombine] Fuse ADD+MUL and SUB+MUL AArch64 instrinsics
ClosedPublic

Authored by MattDevereau on Dec 16 2022, 2:51 AM.

Details

Summary

Fold (ADD p c (MUL p a b)) into (MAD p a b c)
Fold (FADD p c (FMUL p a b)) into (FMAD p a b c)
Fold (FSUB p c (FMUL p a b)) into (FNMSB p a b c)

Fold (ADD p (MUL p a b) c) into (MLA p c a b)
Fold (FADD p (FMUL p a b) c) into (FMLA p c a b)
Fold (SUB p (MUL p a b) C) into (MLS p c a b)
Fold (FSUB p (FMUL p a b) c) into (FMLS p c a b)

Diff Detail

Event Timeline

MattDevereau created this revision.Dec 16 2022, 2:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2022, 2:51 AM
MattDevereau requested review of this revision.Dec 16 2022, 2:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2022, 2:51 AM
sdesmalen added inline comments.Dec 16 2022, 7:04 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1096

This function looks very similar to instCombineSVEVectorFMLA, I think we could combine them. There are some subtle differences though:

  • Fast-math flags can only be copied if the type is floating-point.
  • MAD is different from MLA in that the result is merged into the multiplicant, not the addend. You could pass a parameter WriteAddend that is used like this:
Value *A = II.getOperand(1);
Value *Mul = II.getOperand(2);
if (!WriteAddend)
  std::swap(A, Mul);
if (!match(Mul, m_Intrinsic<Intrinsic::aarch64_sve_mul>(...))
  • If you pass in the opcode to test for (in this case Intrinsic::aarch64_sve_mul) and the result Intrinsic::aarch64_sve_mad, then you could generalise this.
1527

It would be nice if we could do the same thing for subtracts (fmls/fmsb/mls/msb)

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-muladd.ll
110

The result is merged into %c, not the multiplicant (%1), so this should be resulting in sve.mla instead.

MattDevereau retitled this revision from [AArch64][InstCombine] Combine ADD and MUL AArch64 instrinsics to MAD to [AArch64][InstCombine] Fuse ADD+MUL and SUB+MUL AArch64 instrinsics.
MattDevereau edited the summary of this revision. (Show Details)
MattDevereau marked 2 inline comments as done.Dec 19 2022, 7:08 AM
MattDevereau added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1096

I've reworked instCombineSVEVectorFMLA into a function that can handle both the Addend/Not-Addend case which also handles floating-point vectors.

1527

I've added support for these now.

sdesmalen added inline comments.Dec 19 2022, 8:01 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1048

nit: Can you give T and M slightly more descriptive names? :)

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-muladdsub.ll
464

You can't make the same assumptions for MSB as you can for MAD, because the meaning of this is %a - (%b * %c), which is not equal to (%b * %c) - %a.

Matt added a subscriber: Matt.Dec 19 2022, 8:51 AM
MattDevereau marked 2 inline comments as done.
MattDevereau edited the summary of this revision. (Show Details)

Changed FMSB to FNMSB

MattDevereau marked an inline comment as done.Dec 20 2022, 5:48 AM
MattDevereau added inline comments.
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-muladdsub.ll
464

I've removed MSB and changed FMSB to FNMSB

sdesmalen added inline comments.Dec 20 2022, 6:29 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1051

nit: Perhaps MergeIntoAddendOp is a better name?

1053–1060

Could you rename these variables such that:

A -> MulOp0
B -> MulOp1
C -> AddendOp

That makes the code below (where you swap the order of these in {P, C, A, B} for example, a bit easier to follow)

1225

nit: Can you add a comment saying that there is no integer version of nmsb for the equivalent integer case of the above?

(is there a negative test available for that case?)

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-muladdsub.ll
397

I don't think we need these tests for all the element types, probably doing this for 1 FP type and 1 Integer type is sufficient.

516

I don't necessarily think you need to repeat these negative tests for all the fmla/fmad/fmls/.. combinations (same for the tests above (predicate not matching, not the right fast-math flags, etc). Maybe you can have all positive tests firsts, and then all the negative tests for just the fmla, since they should directly translate to the other mul-add intrinsics.

MattDevereau marked an inline comment as done.Dec 20 2022, 6:43 AM
MattDevereau added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1225

I'm personally of the opinion that comments describing the absence of things are bloat but I'm happy to put it in.

Regarding the test, I'm not sure what it would be beyond a sub(mul) with no transformation. Without the context of other tests surrounding it, I'm not sure it would make much sense.

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-muladdsub.ll
516

Sure, these were just low overhead copy/pasted tests and the repetition is unnecessary.

sdesmalen added inline comments.Dec 20 2022, 6:53 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1225

Regarding the test, I'm not sure what it would be beyond a sub(mul) with no transformation. Without the context of other tests surrounding it, I'm not sure it would make much sense.

The 'no transformation' would be the point. i.e. there is a positive test for the floating-point case that translates to fnmsb, and we'd have a negative test for the integer form because there is no equivalent integer instruction (it just covers that we don't make any wrong assumptions)

I'm personally of the opinion that comments describing the absence of things are bloat but I'm happy to put it in.

Fair enough, I'm happy for you to omit adding a comment here, if we have a test that covers it with a comment that explains the need for the test. That should guard it from adding an invalid transform in the future.

MattDevereau marked 4 inline comments as done.
MattDevereau added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1225

I've added a test for this with a comment describing why it's there.

sdesmalen accepted this revision.Dec 20 2022, 7:41 AM

Thanks for the changes @MattDevereau. LGTM with nit on the test addressed.

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-muladdsub.ll
18–29

Can you remove things like dso_local and local_unnamed_addr from these tests?

This revision is now accepted and ready to land.Dec 20 2022, 7:41 AM