This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
28/30
InstCombineCalls.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
2/4
matrix-multiplication-negation.ll

Differential D133300

[InstCombine] Matrix multiplication negation optimisation
ClosedPublic

Authored by zjaffal on Sep 5 2022, 6:16 AM.

Download Raw Diff

Details

Reviewers

spatel
fhahn
lebedev.ri
RKSimon
anemet
thegameg
scanon

Commits

rG68cc35d52cff: [InstCombine] Matrix multiplication negation optimisation

Summary

If one of the operands in a matrix multiplication is negated we can optimise the equation by moving the negation to the smallest element of the operands or the result.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

zjaffal created this revision.Sep 5 2022, 6:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2022, 6:16 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

zjaffal requested review of this revision.Sep 5 2022, 6:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2022, 6:16 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B185082: Diff 457965.Sep 5 2022, 6:17 AM

fhahn added inline comments.Sep 6 2022, 5:33 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
73	Why is this needed?
3289	here it should be sufficient to use a more compact mathematical notation: `(-A) * B = -(A * B)`
3320	can use `cast` here if you are not checking if `FNegType` is null. Same for similar uses uses of `dyn_cast` here

fhahn added a reviewer: scanon.Sep 6 2022, 5:33 AM

fhahn added inline comments.Sep 6 2022, 5:58 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
3331	I think this is not correct, you are updating all uses of `FNegOp`, but we only have to update the use in the matrix multiply. Can you add a test case where the `FNeg` also has other users in some different instructions? They should remain unchanged. Also, it would probably make sense to limit this to `fneg` instructions with a single use. If there are other uses outside the multiply, we still need to negate the input and we only add an extra `fneg`.

spatel added inline comments.Sep 6 2022, 6:26 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1889	Add an FP transform near these other FP intrinsics.
3331	Also, we don't typically replace operands - just create a new call. That's what instcombine's worklist iteration is expecting. Look at existing FP transforms above for examples (and where I think this transform should be placed too).

Are we looking to also support -(A * B) -> -A * B with the negation on the cheapest operand? (might need to check what the required fast math flags are)

Move code next to FP Intrinsics

Harbormaster completed remote builds in B185423: Diff 458464.Sep 7 2022, 8:30 AM

In D133300#3771911, @thegameg wrote:

Are we looking to also support -(A * B) -> -A * B with the negation on the cheapest operand? (might need to check what the required fast math flags are)

We should. In particular, there three places for the negation to go: -A * B = A * -B = -(A * B) and any one of the three might be smaller.

zjaffal marked 6 inline comments as done.Sep 8 2022, 3:32 AM

What happens if both args are negated? We need a test for that. Maybe that pattern should be handled before this, so we don't have to deal with the complication in this patch?

Tests should also include fast-math-flags in at least some cases, so we can see if those are propagated as expected.

In D133300#3774699, @radford wrote:

In D133300#3771911, @thegameg wrote:

Are we looking to also support -(A * B) -> -A * B with the negation on the cheapest operand? (might need to check what the required fast math flags are)

We should. In particular, there three places for the negation to go: -A * B = A * -B = -(A * B) and any one of the three might be smaller.

Yes I am not covering this case. I can add a test case for it and then introduce another patch for it

In D133300#3776842, @spatel wrote:

What happens if both args are negated? We need a test for that. Maybe that pattern should be handled before this, so we don't have to deal with the complication in this patch?

Tests should also include fast-math-flags in at least some cases, so we can see if those are propagated as expected.

I will add a test case for that now

spatel added a parent revision: D133287: [InstCombine] Test for matrix multiplication negation optimisation..Sep 9 2022, 4:42 AM

Support where two operands are negated

Harbormaster completed remote builds in B185810: Diff 459016.Sep 9 2022, 5:16 AM

spatel added inline comments.Sep 9 2022, 5:27 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1850–1858	I don't think this is correct if an fneg has multiple uses (similar to the bug noted earlier, and I repeat my suggestion to create new instructions rather than modifying existing ones). Please split this change and tests to its own review ahead of the original transforms in this patch.

zjaffal added inline comments.Sep 9 2022, 8:16 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1850–1858	We don't need this check anyways. if both operands are negative then there is two cases: Both of the negations will be moved to the result and then another pass will remove the negations negation gets moved to an operand and then we have two negations on one operand which will be optimised as well.

remove unecessary check for two negations

Harbormaster completed remote builds in B185853: Diff 459076.Sep 9 2022, 8:17 AM

fhahn added inline comments.Sep 9 2022, 9:20 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1833	optimise -> optimize (US spelling is commonly used in LLVM)
1835	nit: `We -> we`, `optimise -> optimize` (US spelling is commonly used in LLVM)
1837	`smalest->smallest`
1846	no need to `dyn_cast` here, the result is guaranteed to be a vector type, `cast` can be used.
1850	no need to capture a value you are not using , `m_Value()` should just work Also, you only use the variable `MatchOp0` in a single place, so it would be easier to read if you just use `if (match(..))`
1850–1858	@zjaffal do we have a test case where both operands are negated and at least one of the fnegs has multiple uses?
1884	nit: Period at end of sentence.
1886	There should be no need to cast to `Instruction` here?
1912	please remove the stray line change.

spatel added inline comments.Sep 9 2022, 9:44 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1850–1858	Correct me if I'm wrong: When both ops are negated, we don't care what the relative sizes of the values are, we always want to use the non-negated source ops. When both ops are negated, we don't care if those ops have other uses, we always want to use the non-negated source ops. Either way, we need tests to exercise those patterns. define <4 x double> @matrix_multiply_v2f64_v2f64(<2 x double> %a, <2 x double> %b) { %a.neg = fneg <2 x double> %a %b.neg = fneg <2 x double> %b %res = call <4 x double> @llvm.matrix.multiply.v4f64.v2f64.v2f64(<2 x double> %a.neg, <2 x double> %b.neg, i32 2, i32 1, i32 2) ret <4 x double> %res }

zjaffal added inline comments.Sep 9 2022, 11:25 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1850–1858	Correct me if I'm wrong: When both ops are negated, we don't care what the relative sizes of the values are, we always want to use the non-negated source ops. I am not sure I understand the question fully but in the case of the two operands the first operand gets handled first and we don't check if it is the larger of both When both ops are negated, we don't care if those ops have other uses, we always want to use the non-negated source ops. Currently if fmul has other uses we will still use the negated ops in the multiplication Either way, we need tests to exercise those patterns. define <4 x double> @matrix_multiply_v2f64_v2f64(<2 x double> %a, <2 x double> %b) { %a.neg = fneg <2 x double> %a %b.neg = fneg <2 x double> %b %res = call <4 x double> @llvm.matrix.multiply.v4f64.v2f64.v2f64(<2 x double> %a.neg, <2 x double> %b.neg, i32 2, i32 1, i32 2) ret <4 x double> %res } Yes I think I need to expand on the test cases where we have two operands.

Move the test to a seperate patch.
In the cases where both operands are negated we may need to introduce a seperate patch to handle that case

Harbormaster completed remote builds in B185920: Diff 459167.Sep 9 2022, 1:34 PM

In D133300#3781177, @zjaffal wrote:

In the cases where both operands are negated we may need to introduce a seperate patch to handle that case

Yes, and it should be written first. Without it, we have inconsistent behavior. This patch will reduce the 2nd example below, but it misses the 1st even though they are very similar patterns:

define <9 x double> @test_with_two_operands_negated2_commute(<3 x double> %a, <27 x double> %b){
  %a.neg = fneg <3 x double> %a
  %b.neg = fneg <27 x double> %b
  %res = call <9 x double> @llvm.matrix.multiply.v9f64.v3f64.v27f64(<3 x double> %a.neg, <27 x double> %b.neg, i32 1, i32 3, i32 9)
  ret <9 x double> %res
}

define <9 x double> @test_with_two_operands_negated2(<27 x double> %a, <3 x double> %b){
  %a.neg = fneg <27 x double> %a
  %b.neg = fneg <3 x double> %b
  %res = tail call <9 x double> @llvm.matrix.multiply.v9f64.v27f64.v3f64(<27 x double> %a.neg, <3 x double> %b.neg, i32 9, i32 3, i32 1)
  ret <9 x double> %res
}

Currently if fmul has other uses we will still use the negated ops in the multiplication

I don't think that's correct - there are no use restrictions on this transform:
https://github.com/llvm/llvm-project/blob/24e1736d84fd0fb45097245706a523c3398beb69/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp#L453

Add the optimization for two negations into a separate patch
Remove the check for single use for negation operation
Address style comments and spelling mistakes

Harbormaster completed remote builds in B186145: Diff 459448.Sep 12 2022, 7:01 AM

replace dyn_cast with cast

Harbormaster completed remote builds in B186147: Diff 459450.Sep 12 2022, 7:05 AM

zjaffal edited parent revisions, added: D133695: [InstCombine] Optimize multiplication where both operands are negated; removed: D133287: [InstCombine] Test for matrix multiplication negation optimisation..Sep 12 2022, 7:05 AM

update transforms

Harbormaster completed remote builds in B186328: Diff 459689.Sep 13 2022, 3:35 AM

use patterns

Harbormaster completed remote builds in B186381: Diff 459761.Sep 13 2022, 8:27 AM

Update variable names to patch comments

Harbormaster completed remote builds in B186565: Diff 460024.Sep 14 2022, 4:20 AM

fhahn added inline comments.Sep 15 2022, 1:51 PM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1833	nit: `We -> we`, `it's -> its`. On second thought, I think it would be better to just move the `Case 1,2,3` comments to the code that actually deals with those cases and remove the sentence here.
1834	`We -> we`.
1862	IIUC this is the case where we move the negation from one to the other operand. Could you move the comment for `Case 2` above here?
1867	If I read the code correctly, this may not be the second operand but could also the first one if the second one is negated?
1868	I thought an earlier version created a new call here, rather than updating the exist one. Did we agree that `replaceOperand` here is the right thing to do?
1872	IIUC this is the case where we move the negation from an argument to the result of the multiply. Could you move the comment from `Case 3` here?
1877	there should be no need to cast to `Instruction` here, you should be able to just use `Value`.

zjaffal added inline comments.Sep 16 2022, 1:38 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1867	I think it might be better to name them NegatedOperand OtherOperand or NonNegatedOperand
1868	We agreed in using `replaceOperand` since it is used in other areas of the same file.

zjaffal added inline comments.Sep 16 2022, 1:47 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1877	You need it when we call `FNegInst->setOperand()`

Fix comments and variable names

Harbormaster completed remote builds in B187097: Diff 460692.Sep 16 2022, 3:02 AM

LGTM, thanks!

This revision is now accepted and ready to land.Sep 16 2022, 3:07 AM

spatel added inline comments.Sep 16 2022, 7:34 AM

llvm/test/Transforms/InstCombine/matrix-multiplication-negation.ll
259	We added an fneg to this sequence without removing the existing one. How is this better?

zjaffal added inline comments.Sep 16 2022, 7:38 AM

llvm/test/Transforms/InstCombine/matrix-multiplication-negation.ll
259	This is a result of removing the check if the negation has a single use.

spatel added inline comments.Sep 16 2022, 7:50 AM

llvm/test/Transforms/InstCombine/matrix-multiplication-negation.ll
259	Right - it made sense when both operands are negated in the other patch, but not when only 1 is negated. The one-use check should be present for this optimization.

zjaffal added inline comments.Sep 16 2022, 7:56 AM

llvm/test/Transforms/InstCombine/matrix-multiplication-negation.ll
259	Perfect, I will add it now

Add condition to only optimize if the negated operand has one use

Harbormaster completed remote builds in B187154: Diff 460765.Sep 16 2022, 8:01 AM

spatel added inline comments.Sep 16 2022, 8:43 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1887	This sequence of create/replace/reset operand seems shaky. This transform is different than the earlier ones because we are creating a new instruction after the existing matmul call. It would be better to use the standard practice: Builder.CreateIntrinsic() followed by UnaryOperator::CreateFNeg(). This is also dropping all FMF on the fneg. Usually, we'd propagate the FMF from the matmul to the new fneg. We should adjust the tests to show this behavior more explicitly (ie, put some FMF on the fnegs in the tests).

Refactor the code for moving negation to the result. Now we preserve the fastmath flags on the created multiplication and negation instructions by passing them from the original multiplication instruction. FNeg is created using Builder instead of UnitaryOperand because
the latter caused build failures.
The changes in the fastflag behaviour can be seen on test_negation_move_to_result_with_fastflags test.

I added another 3 test cases using a subset of the fast flags and the flags on the negation instruction instead of the multiplication.

test_negation_move_to_result_with_nnan_flag
test_negation_move_to_result_with_nsz_flag
test_negation_move_to_result_with_fastflag_on_negation

Harbormaster completed remote builds in B187745: Diff 461557.Sep 20 2022, 6:50 AM

LGTM

Thanks for the latest update & @spatel for all your comments!

This revision was landed with ongoing or failed builds.Sep 20 2022, 11:51 AM

Closed by commit rG68cc35d52cff: [InstCombine] Matrix multiplication negation optimisation (authored by zjaffal, committed by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rG68cc35d52cff: [InstCombine] Matrix multiplication negation optimisation.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

46 lines

test/

Transforms/

InstCombine/

matrix-multiplication-negation.ll

85 lines

Diff 461646

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/InstCombine/InstCombiner.h"		#include "llvm/Transforms/InstCombine/InstCombiner.h"
#include "llvm/Transforms/Utils/AssumeBundleBuilder.h"		#include "llvm/Transforms/Utils/AssumeBundleBuilder.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/SimplifyLibCalls.h"		#include "llvm/Transforms/Utils/SimplifyLibCalls.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
		fhahnUnsubmitted Done Reply Inline Actions Why is this needed? fhahn: Why is this needed?
#include <utility>		#include <utility>
#include <vector>		#include <vector>

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"
#include "llvm/Transforms/Utils/InstructionWorklist.h"		#include "llvm/Transforms/Utils/InstructionWorklist.h"

using namespace llvm;		using namespace llvm;
using namespace PatternMatch;		using namespace PatternMatch;
▲ Show 20 Lines • Show All 1,743 Lines • ▼ Show 20 Lines	if ((match(Arg0, m_OneUse(m_FNeg(m_Value(X)))) && Arg1 == X) \|\|
if (IID == Intrinsic::minimum \|\| IID == Intrinsic::minnum)		if (IID == Intrinsic::minimum \|\| IID == Intrinsic::minnum)
R = Builder.CreateFNegFMF(R, II);		R = Builder.CreateFNegFMF(R, II);
return replaceInstUsesWith(*II, R);		return replaceInstUsesWith(*II, R);
}		}

break;		break;
}		}
case Intrinsic::matrix_multiply: {		case Intrinsic::matrix_multiply: {
		// Optimize negation in matrix multiplication.
		fhahnUnsubmitted Done Reply Inline Actions optimise -> optimize (US spelling is commonly used in LLVM) fhahn: optimise -> optimize (US spelling is commonly used in LLVM)
		fhahnUnsubmitted Done Reply Inline Actions nit: `We -> we`, `it's -> its`. On second thought, I think it would be better to just move the `Case 1,2,3` comments to the code that actually deals with those cases and remove the sentence here. fhahn: nit: `We -> we`, `it's -> its`. On second thought, I think it would be better to just move the…

		fhahnUnsubmitted Done Reply Inline Actions `We -> we`. fhahn: `We -> we`.
// -A * -B -> A * B		// -A * -B -> A * B
		fhahnUnsubmitted Done Reply Inline Actions nit: `We -> we`, `optimise -> optimize` (US spelling is commonly used in LLVM) fhahn: nit: `We -> we`, `optimise -> optimize` (US spelling is commonly used in LLVM)
Value A, B;		Value A, B;
if (match(II->getArgOperand(0), m_FNeg(m_Value(A))) &&		if (match(II->getArgOperand(0), m_FNeg(m_Value(A))) &&
		fhahnUnsubmitted Done Reply Inline Actions `smalest->smallest` fhahn: `smalest->smallest`
match(II->getArgOperand(1), m_FNeg(m_Value(B)))) {		match(II->getArgOperand(1), m_FNeg(m_Value(B)))) {
replaceOperand(*II, 0, A);		replaceOperand(*II, 0, A);
replaceOperand(*II, 1, B);		replaceOperand(*II, 1, B);
return II;		return II;
}		}

		Value *Op0 = II->getOperand(0);
		Value *Op1 = II->getOperand(1);
		Value OpNotNeg, NegatedOp;
		fhahnUnsubmitted Done Reply Inline Actions no need to `dyn_cast` here, the result is guaranteed to be a vector type, `cast` can be used. fhahn: no need to `dyn_cast` here, the result is guaranteed to be a vector type, `cast` can be used.
		unsigned NegatedOpArg, OtherOpArg;
		if (match(Op0, m_FNeg(m_Value(OpNotNeg)))) {
		NegatedOp = Op0;
		NegatedOpArg = 0;
		fhahnUnsubmitted Done Reply Inline Actions no need to capture a value you are not using , `m_Value()` should just work Also, you only use the variable `MatchOp0` in a single place, so it would be easier to read if you just use `if (match(..))` fhahn: no need to capture a value you are not using , `m_Value()` should just work Also, you only use…
		OtherOpArg = 1;
		} else if (match(Op1, m_FNeg(m_Value(OpNotNeg)))) {
		NegatedOp = Op1;
		NegatedOpArg = 1;
		OtherOpArg = 0;
		} else
		// Multiplication doesn't have a negated operand.
		break;
		spatelUnsubmitted Done Reply Inline Actions I don't think this is correct if an fneg has multiple uses (similar to the bug noted earlier, and I repeat my suggestion to create new instructions rather than modifying existing ones). Please split this change and tests to its own review ahead of the original transforms in this patch. spatel: I don't think this is correct if an fneg has multiple uses (similar to the bug noted earlier…
		zjaffalAuthorUnsubmitted Done Reply Inline Actions We don't need this check anyways. if both operands are negative then there is two cases: Both of the negations will be moved to the result and then another pass will remove the negations negation gets moved to an operand and then we have two negations on one operand which will be optimised as well. zjaffal: We don't need this check anyways. if both operands are negative then there is two cases: 1.
		fhahnUnsubmitted Done Reply Inline Actions @zjaffal do we have a test case where both operands are negated and at least one of the fnegs has multiple uses? fhahn: @zjaffal do we have a test case where both operands are negated and at least one of the fnegs…
		spatelUnsubmitted Done Reply Inline Actions Correct me if I'm wrong: When both ops are negated, we don't care what the relative sizes of the values are, we always want to use the non-negated source ops. When both ops are negated, we don't care if those ops have other uses, we always want to use the non-negated source ops. Either way, we need tests to exercise those patterns. define <4 x double> @matrix_multiply_v2f64_v2f64(<2 x double> %a, <2 x double> %b) { %a.neg = fneg <2 x double> %a %b.neg = fneg <2 x double> %b %res = call <4 x double> @llvm.matrix.multiply.v4f64.v2f64.v2f64(<2 x double> %a.neg, <2 x double> %b.neg, i32 2, i32 1, i32 2) ret <4 x double> %res } spatel: Correct me if I'm wrong: 1. When both ops are negated, we don't care what the relative sizes of…
		zjaffalAuthorUnsubmitted Done Reply Inline Actions Correct me if I'm wrong: When both ops are negated, we don't care what the relative sizes of the values are, we always want to use the non-negated source ops. I am not sure I understand the question fully but in the case of the two operands the first operand gets handled first and we don't check if it is the larger of both When both ops are negated, we don't care if those ops have other uses, we always want to use the non-negated source ops. Currently if fmul has other uses we will still use the negated ops in the multiplication Either way, we need tests to exercise those patterns. define <4 x double> @matrix_multiply_v2f64_v2f64(<2 x double> %a, <2 x double> %b) { %a.neg = fneg <2 x double> %a %b.neg = fneg <2 x double> %b %res = call <4 x double> @llvm.matrix.multiply.v4f64.v2f64.v2f64(<2 x double> %a.neg, <2 x double> %b.neg, i32 2, i32 1, i32 2) ret <4 x double> %res } Yes I think I need to expand on the test cases where we have two operands. zjaffal: > Correct me if I'm wrong: > 1. When both ops are negated, we don't care what the relative…

		// Only optimize if the negated operand has only one use.
		if (!NegatedOp->hasOneUse())
		break;
		fhahnUnsubmitted Done Reply Inline Actions IIUC this is the case where we move the negation from one to the other operand. Could you move the comment for `Case 2` above here? fhahn: IIUC this is the case where we move the negation from one to the other operand. Could you move…

		Value *OtherOp = II->getOperand(OtherOpArg);
		VectorType *RetTy = cast<VectorType>(II->getType());
		VectorType *NegatedOpTy = cast<VectorType>(NegatedOp->getType());
		VectorType *OtherOpTy = cast<VectorType>(OtherOp->getType());
		fhahnUnsubmitted Done Reply Inline Actions If I read the code correctly, this may not be the second operand but could also the first one if the second one is negated? fhahn: If I read the code correctly, this may not be the second operand but could also the first one…
		zjaffalAuthorUnsubmitted Done Reply Inline Actions I think it might be better to name them NegatedOperand OtherOperand or NonNegatedOperand zjaffal: I think it might be better to name them 1. NegatedOperand 2. OtherOperand or NonNegatedOperand
		ElementCount NegatedCount = NegatedOpTy->getElementCount();
		fhahnUnsubmitted Done Reply Inline Actions I thought an earlier version created a new call here, rather than updating the exist one. Did we agree that `replaceOperand` here is the right thing to do? fhahn: I thought an earlier version created a new call here, rather than updating the exist one. Did…
		zjaffalAuthorUnsubmitted Done Reply Inline Actions We agreed in using `replaceOperand` since it is used in other areas of the same file. zjaffal: We agreed in using `replaceOperand` since it is used in other areas of the same file.
		ElementCount OtherCount = OtherOpTy->getElementCount();
		ElementCount RetCount = RetTy->getElementCount();
		// (-A) * B -> A * (-B), if it is cheaper to negate B and vice versa.
		if (ElementCount::isKnownGT(NegatedCount, OtherCount) &&
		fhahnUnsubmitted Done Reply Inline Actions IIUC this is the case where we move the negation from an argument to the result of the multiply. Could you move the comment from `Case 3` here? fhahn: IIUC this is the case where we move the negation from an argument to the result of the multiply.
		ElementCount::isKnownLT(OtherCount, RetCount)) {
		Value *InverseOtherOp = Builder.CreateFNeg(OtherOp);
		replaceOperand(*II, NegatedOpArg, OpNotNeg);
		replaceOperand(*II, OtherOpArg, InverseOtherOp);
		return II;
		fhahnUnsubmitted Not Done Reply Inline Actions there should be no need to cast to `Instruction` here, you should be able to just use `Value`. fhahn: there should be no need to cast to `Instruction` here, you should be able to just use `Value`.
		zjaffalAuthorUnsubmitted Done Reply Inline Actions You need it when we call `FNegInst->setOperand()` zjaffal: You need it when we call `FNegInst->setOperand()`
		}
		// (-A) * B -> -(A * B), if it is cheaper to negate the result
		if (ElementCount::isKnownGT(NegatedCount, RetCount)) {
		SmallVector<Value *, 5> NewArgs(II->args());
		NewArgs[NegatedOpArg] = OpNotNeg;
		Instruction *NewMul =
		Builder.CreateIntrinsic(II->getType(), IID, NewArgs, II);
		fhahnUnsubmitted Done Reply Inline Actions nit: Period at end of sentence. fhahn: nit: Period at end of sentence.
		return replaceInstUsesWith(*II, Builder.CreateFNegFMF(NewMul, II));
		}
		fhahnUnsubmitted Done Reply Inline Actions There should be no need to cast to `Instruction` here? fhahn: There should be no need to cast to `Instruction` here?
break;		break;
		spatelUnsubmitted Not Done Reply Inline Actions This sequence of create/replace/reset operand seems shaky. This transform is different than the earlier ones because we are creating a new instruction after the existing matmul call. It would be better to use the standard practice: Builder.CreateIntrinsic() followed by UnaryOperator::CreateFNeg(). This is also dropping all FMF on the fneg. Usually, we'd propagate the FMF from the matmul to the new fneg. We should adjust the tests to show this behavior more explicitly (ie, put some FMF on the fnegs in the tests). spatel: This sequence of create/replace/reset operand seems shaky. This transform is different than the…
}		}
case Intrinsic::fmuladd: {		case Intrinsic::fmuladd: {
		spatelUnsubmitted Done Reply Inline Actions Add an FP transform near these other FP intrinsics. spatel: Add an FP transform near these other FP intrinsics.
// Canonicalize fast fmuladd to the separate fmul + fadd.		// Canonicalize fast fmuladd to the separate fmul + fadd.
if (II->isFast()) {		if (II->isFast()) {
BuilderTy::FastMathFlagGuard Guard(Builder);		BuilderTy::FastMathFlagGuard Guard(Builder);
Builder.setFastMathFlags(II->getFastMathFlags());		Builder.setFastMathFlags(II->getFastMathFlags());
Value *Mul = Builder.CreateFMul(II->getArgOperand(0),		Value *Mul = Builder.CreateFMul(II->getArgOperand(0),
II->getArgOperand(1));		II->getArgOperand(1));
Value *Add = Builder.CreateFAdd(Mul, II->getArgOperand(2));		Value *Add = Builder.CreateFAdd(Mul, II->getArgOperand(2));
Add->takeName(II);		Add->takeName(II);
return replaceInstUsesWith(*II, Add);		return replaceInstUsesWith(*II, Add);
}		}

// Try to simplify the underlying FMul.		// Try to simplify the underlying FMul.
if (Value *V = simplifyFMulInst(II->getArgOperand(0), II->getArgOperand(1),		if (Value *V = simplifyFMulInst(II->getArgOperand(0), II->getArgOperand(1),
II->getFastMathFlags(),		II->getFastMathFlags(),
SQ.getWithInstruction(II))) {		SQ.getWithInstruction(II))) {
auto *FAdd = BinaryOperator::CreateFAdd(V, II->getArgOperand(2));		auto *FAdd = BinaryOperator::CreateFAdd(V, II->getArgOperand(2));
FAdd->copyFastMathFlags(II);		FAdd->copyFastMathFlags(II);
return FAdd;		return FAdd;
}		}

[[fallthrough]];		[[fallthrough]];
}		}
case Intrinsic::fma: {		case Intrinsic::fma: {
		fhahnUnsubmitted Done Reply Inline Actions please remove the stray line change. fhahn: please remove the stray line change.
// fma fneg(x), fneg(y), z -> fma x, y, z		// fma fneg(x), fneg(y), z -> fma x, y, z
Value *Src0 = II->getArgOperand(0);		Value *Src0 = II->getArgOperand(0);
Value *Src1 = II->getArgOperand(1);		Value *Src1 = II->getArgOperand(1);
Value X, Y;		Value X, Y;
if (match(Src0, m_FNeg(m_Value(X))) && match(Src1, m_FNeg(m_Value(Y)))) {		if (match(Src0, m_FNeg(m_Value(X))) && match(Src1, m_FNeg(m_Value(Y)))) {
replaceOperand(*II, 0, X);		replaceOperand(*II, 0, X);
replaceOperand(*II, 1, Y);		replaceOperand(*II, 1, Y);
return II;		return II;
▲ Show 20 Lines • Show All 1,360 Lines • ▼ Show 20 Lines	case Intrinsic::experimental_gc_statepoint: {
return CallBase::Create(&Call, NewBundle);		return CallBase::Create(&Call, NewBundle);
}		}
default: { break; }		default: { break; }
}		}

return Changed ? &Call : nullptr;		return Changed ? &Call : nullptr;
}		}

/// If the callee is a constexpr cast of a function, attempt to move the cast to		/// If the callee is a constexpr cast of a function, attempt to move the cast to
		fhahnUnsubmitted Done Reply Inline Actions here it should be sufficient to use a more compact mathematical notation: `(-A) * B = -(A * B)` fhahn: here it should be sufficient to use a more compact mathematical notation: `(-A) * B = -(A * B)`
/// the arguments of the call/callbr/invoke.		/// the arguments of the call/callbr/invoke.
bool InstCombinerImpl::transformConstExprCastCall(CallBase &Call) {		bool InstCombinerImpl::transformConstExprCastCall(CallBase &Call) {
auto *Callee =		auto *Callee =
dyn_cast<Function>(Call.getCalledOperand()->stripPointerCasts());		dyn_cast<Function>(Call.getCalledOperand()->stripPointerCasts());
if (!Callee)		if (!Callee)
return false;		return false;

// If this is a call to a thunk function, don't remove the cast. Thunks are		// If this is a call to a thunk function, don't remove the cast. Thunks are
Show All 14 Lines	bool InstCombinerImpl::transformConstExprCastCall(CallBase &Call) {

// Okay, this is a cast from a function to a different type. Unless doing so		// Okay, this is a cast from a function to a different type. Unless doing so
// would cause a type conversion of one of our arguments, change this call to		// would cause a type conversion of one of our arguments, change this call to
// be a direct call with arguments casted to the appropriate types.		// be a direct call with arguments casted to the appropriate types.
FunctionType *FT = Callee->getFunctionType();		FunctionType *FT = Callee->getFunctionType();
Type *OldRetTy = Caller->getType();		Type *OldRetTy = Caller->getType();
Type *NewRetTy = FT->getReturnType();		Type *NewRetTy = FT->getReturnType();

// Check to see if we are changing the return type...		// Check to see if we are changing the return type...
		fhahnUnsubmitted Done Reply Inline Actions can use `cast` here if you are not checking if `FNegType` is null. Same for similar uses uses of `dyn_cast` here fhahn: can use `cast` here if you are not checking if `FNegType` is null. Same for similar uses uses…
if (OldRetTy != NewRetTy) {		if (OldRetTy != NewRetTy) {

if (NewRetTy->isStructTy())		if (NewRetTy->isStructTy())
return false; // TODO: Handle multiple return values.		return false; // TODO: Handle multiple return values.

if (!CastInst::isBitOrNoopPointerCastable(NewRetTy, OldRetTy, DL)) {		if (!CastInst::isBitOrNoopPointerCastable(NewRetTy, OldRetTy, DL)) {
if (Callee->isDeclaration())		if (Callee->isDeclaration())
return false; // Cannot transform this return value.		return false; // Cannot transform this return value.

if (!Caller->use_empty() &&		if (!Caller->use_empty() &&
// void -> non-void is handled specially		// void -> non-void is handled specially
		fhahnUnsubmitted Done Reply Inline Actions I think this is not correct, you are updating all uses of `FNegOp`, but we only have to update the use in the matrix multiply. Can you add a test case where the `FNeg` also has other users in some different instructions? They should remain unchanged. Also, it would probably make sense to limit this to `fneg` instructions with a single use. If there are other uses outside the multiply, we still need to negate the input and we only add an extra `fneg`. fhahn: I think this is not correct, you are updating all uses of `FNegOp`, but we only have to update…
		spatelUnsubmitted Done Reply Inline Actions Also, we don't typically replace operands - just create a new call. That's what instcombine's worklist iteration is expecting. Look at existing FP transforms above for examples (and where I think this transform should be placed too). spatel: Also, we don't typically replace operands - just create a new call. That's what instcombine's…
!NewRetTy->isVoidTy())		!NewRetTy->isVoidTy())
return false; // Cannot transform this return value.		return false; // Cannot transform this return value.
}		}

if (!CallerPAL.isEmpty() && !Caller->use_empty()) {		if (!CallerPAL.isEmpty() && !Caller->use_empty()) {
AttrBuilder RAttrs(FT->getContext(), CallerPAL.getRetAttrs());		AttrBuilder RAttrs(FT->getContext(), CallerPAL.getRetAttrs());
if (RAttrs.overlaps(AttributeFuncs::typeIncompatible(NewRetTy)))		if (RAttrs.overlaps(AttributeFuncs::typeIncompatible(NewRetTy)))
return false; // Attribute not compatible with transformed value.		return false; // Attribute not compatible with transformed value.
▲ Show 20 Lines • Show All 388 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/matrix-multiplication-negation.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -passes=instcombine -S \| FileCheck %s		; RUN: opt < %s -passes=instcombine -S \| FileCheck %s

; The result has the fewest vector elements between the result and the two operands so the negation can be moved there		; The result has the fewest vector elements between the result and the two operands so the negation can be moved there
define <2 x double> @test_negation_move_to_result(<6 x double> %a, <3 x double> %b) {		define <2 x double> @test_negation_move_to_result(<6 x double> %a, <3 x double> %b) {
; CHECK-LABEL: @test_negation_move_to_result(		; CHECK-LABEL: @test_negation_move_to_result(
; CHECK-NEXT: [[A_NEG:%.]] = fneg <6 x double> [[A:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double> [[A:%.]], <3 x double> [[B:%.*]], i32 2, i32 3, i32 1)
; CHECK-NEXT: [[RES:%.]] = tail call <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double> [[A_NEG]], <3 x double> [[B:%.]], i32 2, i32 3, i32 1)		; CHECK-NEXT: [[TMP2:%.*]] = fneg <2 x double> [[TMP1]]
; CHECK-NEXT: ret <2 x double> [[RES]]		; CHECK-NEXT: ret <2 x double> [[TMP2]]
;		;
%a.neg = fneg <6 x double> %a		%a.neg = fneg <6 x double> %a
%res = tail call <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double> %a.neg, <3 x double> %b, i32 2, i32 3, i32 1)		%res = tail call <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double> %a.neg, <3 x double> %b, i32 2, i32 3, i32 1)
ret <2 x double> %res		ret <2 x double> %res
}		}

; The result has the fewest vector elements between the result and the two operands so the negation can be moved there		; The result has the fewest vector elements between the result and the two operands so the negation can be moved there
; Fast flag should be preserved		; Fast flag should be preserved
define <2 x double> @test_negation_move_to_result_with_fastflags(<6 x double> %a, <3 x double> %b) {		define <2 x double> @test_negation_move_to_result_with_fastflags(<6 x double> %a, <3 x double> %b) {
; CHECK-LABEL: @test_negation_move_to_result_with_fastflags(		; CHECK-LABEL: @test_negation_move_to_result_with_fastflags(
; CHECK-NEXT: [[A_NEG:%.]] = fneg <6 x double> [[A:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call fast <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double> [[A:%.]], <3 x double> [[B:%.*]], i32 2, i32 3, i32 1)
; CHECK-NEXT: [[RES:%.]] = tail call fast <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double> [[A_NEG]], <3 x double> [[B:%.]], i32 2, i32 3, i32 1)		; CHECK-NEXT: [[TMP2:%.*]] = fneg fast <2 x double> [[TMP1]]
; CHECK-NEXT: ret <2 x double> [[RES]]		; CHECK-NEXT: ret <2 x double> [[TMP2]]
;		;
%a.neg = fneg <6 x double> %a		%a.neg = fneg <6 x double> %a
%res = tail call fast <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double> %a.neg, <3 x double> %b, i32 2, i32 3, i32 1)		%res = tail call fast <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double> %a.neg, <3 x double> %b, i32 2, i32 3, i32 1)
ret <2 x double> %res		ret <2 x double> %res
}		}

		define <2 x double> @test_negation_move_to_result_with_nnan_flag(<6 x double> %a, <3 x double> %b) {
		; CHECK-LABEL: @test_negation_move_to_result_with_nnan_flag(
		; CHECK-NEXT: [[TMP1:%.]] = call nnan <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double> [[A:%.]], <3 x double> [[B:%.*]], i32 2, i32 3, i32 1)
		; CHECK-NEXT: [[TMP2:%.*]] = fneg nnan <2 x double> [[TMP1]]
		; CHECK-NEXT: ret <2 x double> [[TMP2]]
		;
		%a.neg = fneg <6 x double> %a
		%res = tail call nnan <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double> %a.neg, <3 x double> %b, i32 2, i32 3, i32 1)
		ret <2 x double> %res
		}

		define <2 x double> @test_negation_move_to_result_with_nsz_flag(<6 x double> %a, <3 x double> %b) {
		; CHECK-LABEL: @test_negation_move_to_result_with_nsz_flag(
		; CHECK-NEXT: [[TMP1:%.]] = call nsz <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double> [[A:%.]], <3 x double> [[B:%.*]], i32 2, i32 3, i32 1)
		; CHECK-NEXT: [[TMP2:%.*]] = fneg nsz <2 x double> [[TMP1]]
		; CHECK-NEXT: ret <2 x double> [[TMP2]]
		;
		%a.neg = fneg <6 x double> %a
		%res = tail call nsz <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double> %a.neg, <3 x double> %b, i32 2, i32 3, i32 1)
		ret <2 x double> %res
		}

		define <2 x double> @test_negation_move_to_result_with_fastflag_on_negation(<6 x double> %a, <3 x double> %b) {
		; CHECK-LABEL: @test_negation_move_to_result_with_fastflag_on_negation(
		; CHECK-NEXT: [[TMP1:%.]] = call <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double> [[A:%.]], <3 x double> [[B:%.*]], i32 2, i32 3, i32 1)
		; CHECK-NEXT: [[TMP2:%.*]] = fneg <2 x double> [[TMP1]]
		; CHECK-NEXT: ret <2 x double> [[TMP2]]
		;
		%a.neg = fneg fast<6 x double> %a
		%res = tail call <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double> %a.neg, <3 x double> %b, i32 2, i32 3, i32 1)
		ret <2 x double> %res
		}

; %b has the fewest vector elements between the result and the two operands so the negation can be moved there		; %b has the fewest vector elements between the result and the two operands so the negation can be moved there
define <9 x double> @test_move_negation_to_second_operand(<27 x double> %a, <3 x double> %b) {		define <9 x double> @test_move_negation_to_second_operand(<27 x double> %a, <3 x double> %b) {
; CHECK-LABEL: @test_move_negation_to_second_operand(		; CHECK-LABEL: @test_move_negation_to_second_operand(
; CHECK-NEXT: [[A_NEG:%.]] = fneg <27 x double> [[A:%.]]		; CHECK-NEXT: [[TMP1:%.]] = fneg <3 x double> [[B:%.]]
; CHECK-NEXT: [[RES:%.]] = tail call <9 x double> @llvm.matrix.multiply.v9f64.v27f64.v3f64(<27 x double> [[A_NEG]], <3 x double> [[B:%.]], i32 9, i32 3, i32 1)		; CHECK-NEXT: [[RES:%.]] = tail call <9 x double> @llvm.matrix.multiply.v9f64.v27f64.v3f64(<27 x double> [[A:%.]], <3 x double> [[TMP1]], i32 9, i32 3, i32 1)
; CHECK-NEXT: ret <9 x double> [[RES]]		; CHECK-NEXT: ret <9 x double> [[RES]]
;		;
%a.neg = fneg <27 x double> %a		%a.neg = fneg <27 x double> %a
%res = tail call <9 x double> @llvm.matrix.multiply.v9f64.v27f64.v3f64(<27 x double> %a.neg, <3 x double> %b, i32 9, i32 3, i32 1)		%res = tail call <9 x double> @llvm.matrix.multiply.v9f64.v27f64.v3f64(<27 x double> %a.neg, <3 x double> %b, i32 9, i32 3, i32 1)
ret <9 x double> %res		ret <9 x double> %res
}		}

; %b has the fewest vector elements between the result and the two operands so the negation can be moved there		; %b has the fewest vector elements between the result and the two operands so the negation can be moved there
; Fast flag should be preserved		; Fast flag should be preserved
define <9 x double> @test_move_negation_to_second_operand_with_fast_flags(<27 x double> %a, <3 x double> %b) {		define <9 x double> @test_move_negation_to_second_operand_with_fast_flags(<27 x double> %a, <3 x double> %b) {
; CHECK-LABEL: @test_move_negation_to_second_operand_with_fast_flags(		; CHECK-LABEL: @test_move_negation_to_second_operand_with_fast_flags(
; CHECK-NEXT: [[A_NEG:%.]] = fneg <27 x double> [[A:%.]]		; CHECK-NEXT: [[TMP1:%.]] = fneg <3 x double> [[B:%.]]
; CHECK-NEXT: [[RES:%.]] = tail call fast <9 x double> @llvm.matrix.multiply.v9f64.v27f64.v3f64(<27 x double> [[A_NEG]], <3 x double> [[B:%.]], i32 9, i32 3, i32 1)		; CHECK-NEXT: [[RES:%.]] = tail call fast <9 x double> @llvm.matrix.multiply.v9f64.v27f64.v3f64(<27 x double> [[A:%.]], <3 x double> [[TMP1]], i32 9, i32 3, i32 1)
; CHECK-NEXT: ret <9 x double> [[RES]]		; CHECK-NEXT: ret <9 x double> [[RES]]
;		;
%a.neg = fneg <27 x double> %a		%a.neg = fneg <27 x double> %a
%res = tail call fast <9 x double> @llvm.matrix.multiply.v9f64.v27f64.v3f64(<27 x double> %a.neg, <3 x double> %b, i32 9, i32 3, i32 1)		%res = tail call fast <9 x double> @llvm.matrix.multiply.v9f64.v27f64.v3f64(<27 x double> %a.neg, <3 x double> %b, i32 9, i32 3, i32 1)
ret <9 x double> %res		ret <9 x double> %res
}		}

; The result has the fewest vector elements between the result and the two operands so the negation can be moved there		; The result has the fewest vector elements between the result and the two operands so the negation can be moved there
define <2 x double> @test_negation_move_to_result_from_second_operand(<3 x double> %a, <6 x double> %b){		define <2 x double> @test_negation_move_to_result_from_second_operand(<3 x double> %a, <6 x double> %b){
; CHECK-LABEL: @test_negation_move_to_result_from_second_operand(		; CHECK-LABEL: @test_negation_move_to_result_from_second_operand(
; CHECK-NEXT: [[B_NEG:%.]] = fneg <6 x double> [[B:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call <2 x double> @llvm.matrix.multiply.v2f64.v3f64.v6f64(<3 x double> [[A:%.]], <6 x double> [[B:%.*]], i32 1, i32 3, i32 2)
; CHECK-NEXT: [[RES:%.]] = tail call <2 x double> @llvm.matrix.multiply.v2f64.v3f64.v6f64(<3 x double> [[A:%.]], <6 x double> [[B_NEG]], i32 1, i32 3, i32 2)		; CHECK-NEXT: [[TMP2:%.*]] = fneg <2 x double> [[TMP1]]
; CHECK-NEXT: ret <2 x double> [[RES]]		; CHECK-NEXT: ret <2 x double> [[TMP2]]
;		;
%b.neg = fneg <6 x double> %b		%b.neg = fneg <6 x double> %b
%res = tail call <2 x double> @llvm.matrix.multiply.v2f64.v3f64.v6f64(<3 x double> %a, <6 x double> %b.neg, i32 1, i32 3, i32 2)		%res = tail call <2 x double> @llvm.matrix.multiply.v2f64.v3f64.v6f64(<3 x double> %a, <6 x double> %b.neg, i32 1, i32 3, i32 2)
ret <2 x double> %res		ret <2 x double> %res
}		}

; %a has the fewest vector elements between the result and the two operands so the negation can be moved there		; %a has the fewest vector elements between the result and the two operands so the negation can be moved there
define <9 x double> @test_move_negation_to_first_operand(<3 x double> %a, <27 x double> %b) {		define <9 x double> @test_move_negation_to_first_operand(<3 x double> %a, <27 x double> %b) {
; CHECK-LABEL: @test_move_negation_to_first_operand(		; CHECK-LABEL: @test_move_negation_to_first_operand(
; CHECK-NEXT: [[B_NEG:%.]] = fneg <27 x double> [[B:%.]]		; CHECK-NEXT: [[TMP1:%.]] = fneg <3 x double> [[A:%.]]
; CHECK-NEXT: [[RES:%.]] = tail call <9 x double> @llvm.matrix.multiply.v9f64.v3f64.v27f64(<3 x double> [[A:%.]], <27 x double> [[B_NEG]], i32 1, i32 3, i32 9)		; CHECK-NEXT: [[RES:%.]] = tail call <9 x double> @llvm.matrix.multiply.v9f64.v3f64.v27f64(<3 x double> [[TMP1]], <27 x double> [[B:%.]], i32 1, i32 3, i32 9)
; CHECK-NEXT: ret <9 x double> [[RES]]		; CHECK-NEXT: ret <9 x double> [[RES]]
;		;
%b.neg = fneg <27 x double> %b		%b.neg = fneg <27 x double> %b
%res = tail call <9 x double> @llvm.matrix.multiply.v9f64.v3f64.v27f64(<3 x double> %a, <27 x double> %b.neg, i32 1, i32 3, i32 9)		%res = tail call <9 x double> @llvm.matrix.multiply.v9f64.v3f64.v27f64(<3 x double> %a, <27 x double> %b.neg, i32 1, i32 3, i32 9)
ret <9 x double> %res		ret <9 x double> %res
}		}

; %a has the fewest vector elements between the result and the two operands so the negation is not moved		; %a has the fewest vector elements between the result and the two operands so the negation is not moved
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	;
%res.3 = fadd <12 x double> %res.2, %res		%res.3 = fadd <12 x double> %res.2, %res
ret <12 x double> %res.3		ret <12 x double> %res.3
}		}

define <12 x double> @fneg_with_multiple_uses_2(<15 x double> %a, <20 x double> %b, ptr %a_loc){		define <12 x double> @fneg_with_multiple_uses_2(<15 x double> %a, <20 x double> %b, ptr %a_loc){
; CHECK-LABEL: @fneg_with_multiple_uses_2(		; CHECK-LABEL: @fneg_with_multiple_uses_2(
; CHECK-NEXT: [[A_NEG:%.]] = fneg <15 x double> [[A:%.]]		; CHECK-NEXT: [[A_NEG:%.]] = fneg <15 x double> [[A:%.]]
; CHECK-NEXT: [[RES:%.]] = tail call <12 x double> @llvm.matrix.multiply.v12f64.v15f64.v20f64(<15 x double> [[A_NEG]], <20 x double> [[B:%.]], i32 3, i32 5, i32 4)		; CHECK-NEXT: [[RES:%.]] = tail call <12 x double> @llvm.matrix.multiply.v12f64.v15f64.v20f64(<15 x double> [[A_NEG]], <20 x double> [[B:%.]], i32 3, i32 5, i32 4)
; CHECK-NEXT: store <15 x double> [[A_NEG]], ptr [[A_LOC:%.*]], align 128		; CHECK-NEXT: store <15 x double> [[A_NEG]], ptr [[A_LOC:%.*]], align 128
		spatelUnsubmitted Not Done Reply Inline Actions We added an fneg to this sequence without removing the existing one. How is this better? spatel: We added an fneg to this sequence without removing the existing one. How is this better?
		zjaffalAuthorUnsubmitted Done Reply Inline Actions This is a result of removing the check if the negation has a single use. zjaffal: This is a result of removing the check if the negation has a single use.
		spatelUnsubmitted Not Done Reply Inline Actions Right - it made sense when both operands are negated in the other patch, but not when only 1 is negated. The one-use check should be present for this optimization. spatel: Right - it made sense when both operands are negated in the other patch, but not when only…
		zjaffalAuthorUnsubmitted Done Reply Inline Actions Perfect, I will add it now zjaffal: Perfect, I will add it now
; CHECK-NEXT: ret <12 x double> [[RES]]		; CHECK-NEXT: ret <12 x double> [[RES]]
;		;
%a.neg = fneg <15 x double> %a		%a.neg = fneg <15 x double> %a
%res = tail call <12 x double> @llvm.matrix.multiply.v12f64.v15f64.v20f64(<15 x double> %a.neg, <20 x double> %b, i32 3, i32 5, i32 4)		%res = tail call <12 x double> @llvm.matrix.multiply.v12f64.v15f64.v20f64(<15 x double> %a.neg, <20 x double> %b, i32 3, i32 5, i32 4)
store <15 x double> %a.neg, ptr %a_loc		store <15 x double> %a.neg, ptr %a_loc
ret <12 x double> %res		ret <12 x double> %res
}		}
; negation should be moved to the second operand given it has the smallest operand count		; negation should be moved to the second operand given it has the smallest operand count
define <72 x double> @chain_of_matrix_mutliplies(<27 x double> %a, <3 x double> %b, <8 x double> %c) {		define <72 x double> @chain_of_matrix_mutliplies(<27 x double> %a, <3 x double> %b, <8 x double> %c) {
; CHECK-LABEL: @chain_of_matrix_mutliplies(		; CHECK-LABEL: @chain_of_matrix_mutliplies(
; CHECK-NEXT: [[A_NEG:%.]] = fneg <27 x double> [[A:%.]]		; CHECK-NEXT: [[TMP1:%.]] = fneg <3 x double> [[B:%.]]
; CHECK-NEXT: [[RES:%.]] = tail call <9 x double> @llvm.matrix.multiply.v9f64.v27f64.v3f64(<27 x double> [[A_NEG]], <3 x double> [[B:%.]], i32 9, i32 3, i32 1)		; CHECK-NEXT: [[RES:%.]] = tail call <9 x double> @llvm.matrix.multiply.v9f64.v27f64.v3f64(<27 x double> [[A:%.]], <3 x double> [[TMP1]], i32 9, i32 3, i32 1)
; CHECK-NEXT: [[RES_2:%.]] = tail call <72 x double> @llvm.matrix.multiply.v72f64.v9f64.v8f64(<9 x double> [[RES]], <8 x double> [[C:%.]], i32 9, i32 1, i32 8)		; CHECK-NEXT: [[RES_2:%.]] = tail call <72 x double> @llvm.matrix.multiply.v72f64.v9f64.v8f64(<9 x double> [[RES]], <8 x double> [[C:%.]], i32 9, i32 1, i32 8)
; CHECK-NEXT: ret <72 x double> [[RES_2]]		; CHECK-NEXT: ret <72 x double> [[RES_2]]
;		;
%a.neg = fneg <27 x double> %a		%a.neg = fneg <27 x double> %a
%res = tail call <9 x double> @llvm.matrix.multiply.v9f64.v27f64.v3f64(<27 x double> %a.neg, <3 x double> %b, i32 9, i32 3, i32 1)		%res = tail call <9 x double> @llvm.matrix.multiply.v9f64.v27f64.v3f64(<27 x double> %a.neg, <3 x double> %b, i32 9, i32 3, i32 1)
%res.2 = tail call <72 x double> @llvm.matrix.multiply.v72f64.v9f64.v8f64(<9 x double> %res, <8 x double> %c, i32 9, i32 1, i32 8)		%res.2 = tail call <72 x double> @llvm.matrix.multiply.v72f64.v9f64.v8f64(<9 x double> %res, <8 x double> %c, i32 9, i32 1, i32 8)
ret <72 x double> %res.2		ret <72 x double> %res.2
}		}

; first negation should be moved to %a		; first negation should be moved to %a
; second negation should be moved to the result of the second multipication		; second negation should be moved to the result of the second multipication
define <6 x double> @chain_of_matrix_mutliplies_with_two_negations(<3 x double> %a, <5 x double> %b, <10 x double> %c) {		define <6 x double> @chain_of_matrix_mutliplies_with_two_negations(<3 x double> %a, <5 x double> %b, <10 x double> %c) {
; CHECK-LABEL: @chain_of_matrix_mutliplies_with_two_negations(		; CHECK-LABEL: @chain_of_matrix_mutliplies_with_two_negations(
; CHECK-NEXT: [[B_NEG:%.]] = fneg <5 x double> [[B:%.]]		; CHECK-NEXT: [[TMP1:%.]] = fneg <3 x double> [[A:%.]]
; CHECK-NEXT: [[RES:%.]] = tail call <15 x double> @llvm.matrix.multiply.v15f64.v3f64.v5f64(<3 x double> [[A:%.]], <5 x double> [[B_NEG]], i32 3, i32 1, i32 5)		; CHECK-NEXT: [[RES:%.]] = tail call <15 x double> @llvm.matrix.multiply.v15f64.v3f64.v5f64(<3 x double> [[TMP1]], <5 x double> [[B:%.]], i32 3, i32 1, i32 5)
; CHECK-NEXT: [[RES_NEG:%.*]] = fneg <15 x double> [[RES]]		; CHECK-NEXT: [[TMP2:%.]] = call <6 x double> @llvm.matrix.multiply.v6f64.v15f64.v10f64(<15 x double> [[RES]], <10 x double> [[C:%.]], i32 3, i32 5, i32 2)
; CHECK-NEXT: [[RES_2:%.]] = tail call <6 x double> @llvm.matrix.multiply.v6f64.v15f64.v10f64(<15 x double> [[RES_NEG]], <10 x double> [[C:%.]], i32 3, i32 5, i32 2)		; CHECK-NEXT: [[TMP3:%.*]] = fneg <6 x double> [[TMP2]]
; CHECK-NEXT: ret <6 x double> [[RES_2]]		; CHECK-NEXT: ret <6 x double> [[TMP3]]
;		;
%b.neg = fneg <5 x double> %b		%b.neg = fneg <5 x double> %b
%res = tail call <15 x double> @llvm.matrix.multiply.v15f64.v3f64.v5f64(<3 x double> %a, <5 x double> %b.neg, i32 3, i32 1, i32 5)		%res = tail call <15 x double> @llvm.matrix.multiply.v15f64.v3f64.v5f64(<3 x double> %a, <5 x double> %b.neg, i32 3, i32 1, i32 5)
%res.neg = fneg <15 x double> %res		%res.neg = fneg <15 x double> %res
%res.2 = tail call <6 x double> @llvm.matrix.multiply.v6f64.v15f64.v10f64(<15 x double> %res.neg, <10 x double> %c, i32 3, i32 5, i32 2)		%res.2 = tail call <6 x double> @llvm.matrix.multiply.v6f64.v15f64.v10f64(<15 x double> %res.neg, <10 x double> %c, i32 3, i32 5, i32 2)
ret <6 x double> %res.2		ret <6 x double> %res.2
}		}

; negation should be propagated to the result of the second matrix multiplication		; negation should be propagated to the result of the second matrix multiplication
define <6 x double> @chain_of_matrix_mutliplies_propagation(<15 x double> %a, <20 x double> %b, <8 x double> %c){		define <6 x double> @chain_of_matrix_mutliplies_propagation(<15 x double> %a, <20 x double> %b, <8 x double> %c){
; CHECK-LABEL: @chain_of_matrix_mutliplies_propagation(		; CHECK-LABEL: @chain_of_matrix_mutliplies_propagation(
; CHECK-NEXT: [[A_NEG:%.]] = fneg <15 x double> [[A:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call <12 x double> @llvm.matrix.multiply.v12f64.v15f64.v20f64(<15 x double> [[A:%.]], <20 x double> [[B:%.*]], i32 3, i32 5, i32 4)
; CHECK-NEXT: [[RES:%.]] = tail call <12 x double> @llvm.matrix.multiply.v12f64.v15f64.v20f64(<15 x double> [[A_NEG]], <20 x double> [[B:%.]], i32 3, i32 5, i32 4)		; CHECK-NEXT: [[TMP2:%.]] = call <6 x double> @llvm.matrix.multiply.v6f64.v12f64.v8f64(<12 x double> [[TMP1]], <8 x double> [[C:%.]], i32 3, i32 4, i32 2)
; CHECK-NEXT: [[RES_2:%.]] = tail call <6 x double> @llvm.matrix.multiply.v6f64.v12f64.v8f64(<12 x double> [[RES]], <8 x double> [[C:%.]], i32 3, i32 4, i32 2)		; CHECK-NEXT: [[TMP3:%.*]] = fneg <6 x double> [[TMP2]]
; CHECK-NEXT: ret <6 x double> [[RES_2]]		; CHECK-NEXT: ret <6 x double> [[TMP3]]
;		;
%a.neg = fneg <15 x double> %a		%a.neg = fneg <15 x double> %a
%res = tail call <12 x double> @llvm.matrix.multiply.v12f64.v15f64.v20f64(<15 x double> %a.neg, <20 x double> %b, i32 3, i32 5, i32 4)		%res = tail call <12 x double> @llvm.matrix.multiply.v12f64.v15f64.v20f64(<15 x double> %a.neg, <20 x double> %b, i32 3, i32 5, i32 4)
%res.2 = tail call <6 x double> @llvm.matrix.multiply.v6f64.v12f64.v8f64(<12 x double> %res, <8 x double> %c, i32 3, i32 4, i32 2)		%res.2 = tail call <6 x double> @llvm.matrix.multiply.v6f64.v12f64.v8f64(<12 x double> %res, <8 x double> %c, i32 3, i32 4, i32 2)
ret <6 x double> %res.2		ret <6 x double> %res.2
}		}

declare <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double>, <3 x double>, i32 immarg, i32 immarg, i32 immarg) #1		declare <2 x double> @llvm.matrix.multiply.v2f64.v6f64.v3f64(<6 x double>, <3 x double>, i32 immarg, i32 immarg, i32 immarg) #1
Show All 11 Lines