This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Matrix multiplication negation optimisation
ClosedPublic

Authored by zjaffal on Sep 5 2022, 6:16 AM.

Details

Summary

If one of the operands in a matrix multiplication is negated we can optimise the equation by moving the negation to the smallest element of the operands or the result.

Diff Detail

Unit TestsFailed

Event Timeline

zjaffal created this revision.Sep 5 2022, 6:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2022, 6:16 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
zjaffal requested review of this revision.Sep 5 2022, 6:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2022, 6:16 AM
fhahn added inline comments.Sep 6 2022, 5:33 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
73

Why is this needed?

3295

here it should be sufficient to use a more compact mathematical notation: (-A) * B = -(A * B)

3326

can use cast here if you are not checking if FNegType is null. Same for similar uses uses of dyn_cast here

fhahn added a reviewer: scanon.Sep 6 2022, 5:33 AM
fhahn added inline comments.Sep 6 2022, 5:58 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
3337

I think this is not correct, you are updating all uses of FNegOp, but we only have to update the use in the matrix multiply. Can you add a test case where the FNeg also has other users in some different instructions? They should remain unchanged.

Also, it would probably make sense to limit this to fneg instructions with a single use. If there are other uses outside the multiply, we still need to negate the input and we only add an extra fneg.

spatel added inline comments.Sep 6 2022, 6:26 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1893

Add an FP transform near these other FP intrinsics.

3337

Also, we don't typically replace operands - just create a new call. That's what instcombine's worklist iteration is expecting.
Look at existing FP transforms above for examples (and where I think this transform should be placed too).

Are we looking to also support -(A * B) -> -A * B with the negation on the cheapest operand? (might need to check what the required fast math flags are)

zjaffal updated this revision to Diff 458464.Sep 7 2022, 8:29 AM

Move code next to FP Intrinsics

radford added a subscriber: radford.Sep 7 2022, 8:38 AM

Are we looking to also support -(A * B) -> -A * B with the negation on the cheapest operand? (might need to check what the required fast math flags are)

We should. In particular, there three places for the negation to go: -A * B = A * -B = -(A * B) and any one of the three might be smaller.

zjaffal marked 6 inline comments as done.Sep 8 2022, 3:32 AM
spatel added a comment.Sep 8 2022, 5:23 AM

What happens if both args are negated? We need a test for that. Maybe that pattern should be handled before this, so we don't have to deal with the complication in this patch?

Tests should also include fast-math-flags in at least some cases, so we can see if those are propagated as expected.

Are we looking to also support -(A * B) -> -A * B with the negation on the cheapest operand? (might need to check what the required fast math flags are)

We should. In particular, there three places for the negation to go: -A * B = A * -B = -(A * B) and any one of the three might be smaller.

Yes I am not covering this case. I can add a test case for it and then introduce another patch for it

What happens if both args are negated? We need a test for that. Maybe that pattern should be handled before this, so we don't have to deal with the complication in this patch?

Tests should also include fast-math-flags in at least some cases, so we can see if those are propagated as expected.

I will add a test case for that now

zjaffal updated this revision to Diff 459016.Sep 9 2022, 5:15 AM

Support where two operands are negated

spatel added inline comments.Sep 9 2022, 5:27 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1850–1858

I don't think this is correct if an fneg has multiple uses (similar to the bug noted earlier, and I repeat my suggestion to create new instructions rather than modifying existing ones).

Please split this change and tests to its own review ahead of the original transforms in this patch.

zjaffal added inline comments.Sep 9 2022, 8:16 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1850–1858

We don't need this check anyways.
if both operands are negative then there is two cases:

  1. Both of the negations will be moved to the result and then another pass will remove the negations
  2. negation gets moved to an operand and then we have two negations on one operand which will be optimised as well.
zjaffal updated this revision to Diff 459076.Sep 9 2022, 8:17 AM

remove unecessary check for two negations

fhahn added inline comments.Sep 9 2022, 9:20 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1833

optimise -> optimize (US spelling is commonly used in LLVM)

1835

nit: We -> we, optimise -> optimize (US spelling is commonly used in LLVM)

1837

smalest->smallest

1846

no need to dyn_cast here, the result is guaranteed to be a vector type, cast can be used.

1850

no need to capture a value you are not using , m_Value() should just work

Also, you only use the variable MatchOp0 in a single place, so it would be easier to read if you just use if (match(..))

1850–1858

@zjaffal do we have a test case where both operands are negated and at least one of the fnegs has multiple uses?

1884

nit: Period at end of sentence.

1886

There should be no need to cast to Instruction here?

1916

please remove the stray line change.

spatel added inline comments.Sep 9 2022, 9:44 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1850–1858

Correct me if I'm wrong:

  1. When both ops are negated, we don't care what the relative sizes of the values are, we always want to use the non-negated source ops.
  2. When both ops are negated, we don't care if those ops have other uses, we always want to use the non-negated source ops.

Either way, we need tests to exercise those patterns.

define <4 x double> @matrix_multiply_v2f64_v2f64(<2 x double> %a, <2 x double> %b) {
  %a.neg = fneg <2 x double> %a
  %b.neg = fneg <2 x double> %b
  %res = call <4 x double> @llvm.matrix.multiply.v4f64.v2f64.v2f64(<2 x double> %a.neg, <2 x double> %b.neg, i32 2, i32 1, i32 2)
  ret <4 x double> %res
}
zjaffal added inline comments.Sep 9 2022, 11:25 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1850–1858

Correct me if I'm wrong:

  1. When both ops are negated, we don't care what the relative sizes of the values are, we always want to use the non-negated source ops.

I am not sure I understand the question fully but in the case of the two operands the first operand gets handled first and we don't check if it is the larger of both

  1. When both ops are negated, we don't care if those ops have other uses, we always want to use the non-negated source ops.

Currently if fmul has other uses we will still use the negated ops in the multiplication

Either way, we need tests to exercise those patterns.

define <4 x double> @matrix_multiply_v2f64_v2f64(<2 x double> %a, <2 x double> %b) {
  %a.neg = fneg <2 x double> %a
  %b.neg = fneg <2 x double> %b
  %res = call <4 x double> @llvm.matrix.multiply.v4f64.v2f64.v2f64(<2 x double> %a.neg, <2 x double> %b.neg, i32 2, i32 1, i32 2)
  ret <4 x double> %res
}

Yes I think I need to expand on the test cases where we have two operands.

zjaffal updated this revision to Diff 459167.Sep 9 2022, 12:39 PM
  • Move the test to a seperate patch.
  • In the cases where both operands are negated we may need to introduce a seperate patch to handle that case
  • In the cases where both operands are negated we may need to introduce a seperate patch to handle that case

Yes, and it should be written first. Without it, we have inconsistent behavior. This patch will reduce the 2nd example below, but it misses the 1st even though they are very similar patterns:

define <9 x double> @test_with_two_operands_negated2_commute(<3 x double> %a, <27 x double> %b){
  %a.neg = fneg <3 x double> %a
  %b.neg = fneg <27 x double> %b
  %res = call <9 x double> @llvm.matrix.multiply.v9f64.v3f64.v27f64(<3 x double> %a.neg, <27 x double> %b.neg, i32 1, i32 3, i32 9)
  ret <9 x double> %res
}

define <9 x double> @test_with_two_operands_negated2(<27 x double> %a, <3 x double> %b){
  %a.neg = fneg <27 x double> %a
  %b.neg = fneg <3 x double> %b
  %res = tail call <9 x double> @llvm.matrix.multiply.v9f64.v27f64.v3f64(<27 x double> %a.neg, <3 x double> %b.neg, i32 9, i32 3, i32 1)
  ret <9 x double> %res
}

Currently if fmul has other uses we will still use the negated ops in the multiplication

I don't think that's correct - there are no use restrictions on this transform:
https://github.com/llvm/llvm-project/blob/24e1736d84fd0fb45097245706a523c3398beb69/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp#L453

zjaffal updated this revision to Diff 459448.Sep 12 2022, 7:01 AM
zjaffal marked 11 inline comments as done.
  1. Add the optimization for two negations into a separate patch
  2. Remove the check for single use for negation operation
  3. Address style comments and spelling mistakes
zjaffal updated this revision to Diff 459450.Sep 12 2022, 7:05 AM

replace dyn_cast with cast

zjaffal updated this revision to Diff 459689.Sep 13 2022, 2:21 AM

update transforms

zjaffal updated this revision to Diff 459761.Sep 13 2022, 8:26 AM

use patterns

zjaffal updated this revision to Diff 460024.Sep 14 2022, 3:38 AM

Update variable names to patch comments

fhahn added inline comments.Sep 15 2022, 1:51 PM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1832

nit: We -> we, it's -> its.

On second thought, I think it would be better to just move the Case 1,2,3 comments to the code that actually deals with those cases and remove the sentence here.

1833

We -> we.

1851

IIUC this is the case where we move the negation from one to the other operand. Could you move the comment for Case 2 above here?

1856

If I read the code correctly, this may not be the second operand but could also the first one if the second one is negated?

1857

I thought an earlier version created a new call here, rather than updating the exist one. Did we agree that replaceOperand here is the right thing to do?

1861

IIUC this is the case where we move the negation from an argument to the result of the multiply. Could you move the comment from Case 3 here?

1866

there should be no need to cast to Instruction here, you should be able to just use Value.

zjaffal added inline comments.Sep 16 2022, 1:38 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1856

I think it might be better to name them

  1. NegatedOperand
  2. OtherOperand or NonNegatedOperand
1857

We agreed in using replaceOperand since it is used in other areas of the same file.

zjaffal added inline comments.Sep 16 2022, 1:47 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1866

You need it when we call FNegInst->setOperand()

zjaffal updated this revision to Diff 460692.Sep 16 2022, 3:02 AM
zjaffal marked 6 inline comments as done.

Fix comments and variable names

fhahn accepted this revision.Sep 16 2022, 3:07 AM

LGTM, thanks!

This revision is now accepted and ready to land.Sep 16 2022, 3:07 AM
spatel added inline comments.Sep 16 2022, 7:34 AM
llvm/test/Transforms/InstCombine/matrix-multiplication-negation.ll
185

We added an fneg to this sequence without removing the existing one. How is this better?

zjaffal added inline comments.Sep 16 2022, 7:38 AM
llvm/test/Transforms/InstCombine/matrix-multiplication-negation.ll
185

This is a result of removing the check if the negation has a single use.

spatel added inline comments.Sep 16 2022, 7:50 AM
llvm/test/Transforms/InstCombine/matrix-multiplication-negation.ll
185

Right - it made sense when both operands are negated in the other patch, but not when only 1 is negated. The one-use check should be present for this optimization.

zjaffal added inline comments.Sep 16 2022, 7:56 AM
llvm/test/Transforms/InstCombine/matrix-multiplication-negation.ll
185

Perfect, I will add it now

zjaffal updated this revision to Diff 460765.Sep 16 2022, 8:00 AM

Add condition to only optimize if the negated operand has one use

spatel added inline comments.Sep 16 2022, 8:43 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1876

This sequence of create/replace/reset operand seems shaky. This transform is different than the earlier ones because we are creating a new instruction after the existing matmul call.

It would be better to use the standard practice: Builder.CreateIntrinsic() followed by UnaryOperator::CreateFNeg().

This is also dropping all FMF on the fneg. Usually, we'd propagate the FMF from the matmul to the new fneg. We should adjust the tests to show this behavior more explicitly (ie, put some FMF on the fnegs in the tests).

zjaffal updated this revision to Diff 461557.Sep 20 2022, 6:50 AM

Refactor the code for moving negation to the result. Now we preserve the fastmath flags on the created multiplication and negation instructions by passing them from the original multiplication instruction. FNeg is created using Builder instead of UnitaryOperand because
the latter caused build failures.
The changes in the fastflag behaviour can be seen on test_negation_move_to_result_with_fastflags test.

I added another 3 test cases using a subset of the fast flags and the flags on the negation instruction instead of the multiplication.

  1. test_negation_move_to_result_with_nnan_flag
  2. test_negation_move_to_result_with_nsz_flag
  3. test_negation_move_to_result_with_fastflag_on_negation
spatel accepted this revision.Sep 20 2022, 8:48 AM

LGTM

Thanks for the latest update & @spatel for all your comments!

This revision was landed with ongoing or failed builds.Sep 20 2022, 11:51 AM
This revision was automatically updated to reflect the committed changes.