If one of the operands in a matrix multiplication is negated we can optimise the equation by moving the negation to the smallest element of the operands or the result.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
3331 | I think this is not correct, you are updating all uses of FNegOp, but we only have to update the use in the matrix multiply. Can you add a test case where the FNeg also has other users in some different instructions? They should remain unchanged. Also, it would probably make sense to limit this to fneg instructions with a single use. If there are other uses outside the multiply, we still need to negate the input and we only add an extra fneg. |
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
1889 | Add an FP transform near these other FP intrinsics. | |
3331 | Also, we don't typically replace operands - just create a new call. That's what instcombine's worklist iteration is expecting. |
Are we looking to also support -(A * B) -> -A * B with the negation on the cheapest operand? (might need to check what the required fast math flags are)
We should. In particular, there three places for the negation to go: -A * B = A * -B = -(A * B) and any one of the three might be smaller.
What happens if both args are negated? We need a test for that. Maybe that pattern should be handled before this, so we don't have to deal with the complication in this patch?
Tests should also include fast-math-flags in at least some cases, so we can see if those are propagated as expected.
Yes I am not covering this case. I can add a test case for it and then introduce another patch for it
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
1850–1858 | I don't think this is correct if an fneg has multiple uses (similar to the bug noted earlier, and I repeat my suggestion to create new instructions rather than modifying existing ones). Please split this change and tests to its own review ahead of the original transforms in this patch. |
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
1850–1858 | We don't need this check anyways.
|
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
1833 | optimise -> optimize (US spelling is commonly used in LLVM) | |
1835 | nit: We -> we, optimise -> optimize (US spelling is commonly used in LLVM) | |
1837 | smalest->smallest | |
1846 | no need to dyn_cast here, the result is guaranteed to be a vector type, cast can be used. | |
1850 | no need to capture a value you are not using , m_Value() should just work Also, you only use the variable MatchOp0 in a single place, so it would be easier to read if you just use if (match(..)) | |
1850–1858 | @zjaffal do we have a test case where both operands are negated and at least one of the fnegs has multiple uses? | |
1884 | nit: Period at end of sentence. | |
1886 | There should be no need to cast to Instruction here? | |
1912 | please remove the stray line change. |
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
1850–1858 | Correct me if I'm wrong:
Either way, we need tests to exercise those patterns. define <4 x double> @matrix_multiply_v2f64_v2f64(<2 x double> %a, <2 x double> %b) { %a.neg = fneg <2 x double> %a %b.neg = fneg <2 x double> %b %res = call <4 x double> @llvm.matrix.multiply.v4f64.v2f64.v2f64(<2 x double> %a.neg, <2 x double> %b.neg, i32 2, i32 1, i32 2) ret <4 x double> %res } |
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
1850–1858 |
I am not sure I understand the question fully but in the case of the two operands the first operand gets handled first and we don't check if it is the larger of both
Currently if fmul has other uses we will still use the negated ops in the multiplication
Yes I think I need to expand on the test cases where we have two operands. |
- Move the test to a seperate patch.
- In the cases where both operands are negated we may need to introduce a seperate patch to handle that case
Yes, and it should be written first. Without it, we have inconsistent behavior. This patch will reduce the 2nd example below, but it misses the 1st even though they are very similar patterns:
define <9 x double> @test_with_two_operands_negated2_commute(<3 x double> %a, <27 x double> %b){ %a.neg = fneg <3 x double> %a %b.neg = fneg <27 x double> %b %res = call <9 x double> @llvm.matrix.multiply.v9f64.v3f64.v27f64(<3 x double> %a.neg, <27 x double> %b.neg, i32 1, i32 3, i32 9) ret <9 x double> %res } define <9 x double> @test_with_two_operands_negated2(<27 x double> %a, <3 x double> %b){ %a.neg = fneg <27 x double> %a %b.neg = fneg <3 x double> %b %res = tail call <9 x double> @llvm.matrix.multiply.v9f64.v27f64.v3f64(<27 x double> %a.neg, <3 x double> %b.neg, i32 9, i32 3, i32 1) ret <9 x double> %res }
Currently if fmul has other uses we will still use the negated ops in the multiplication
I don't think that's correct - there are no use restrictions on this transform:
https://github.com/llvm/llvm-project/blob/24e1736d84fd0fb45097245706a523c3398beb69/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp#L453
- Add the optimization for two negations into a separate patch
- Remove the check for single use for negation operation
- Address style comments and spelling mistakes
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
1833 | nit: We -> we, it's -> its. On second thought, I think it would be better to just move the Case 1,2,3 comments to the code that actually deals with those cases and remove the sentence here. | |
1834 | We -> we. | |
1862 | IIUC this is the case where we move the negation from one to the other operand. Could you move the comment for Case 2 above here? | |
1867 | If I read the code correctly, this may not be the second operand but could also the first one if the second one is negated? | |
1868 | I thought an earlier version created a new call here, rather than updating the exist one. Did we agree that replaceOperand here is the right thing to do? | |
1872 | IIUC this is the case where we move the negation from an argument to the result of the multiply. Could you move the comment from Case 3 here? | |
1877 | there should be no need to cast to Instruction here, you should be able to just use Value. |
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
1877 | You need it when we call FNegInst->setOperand() |
llvm/test/Transforms/InstCombine/matrix-multiplication-negation.ll | ||
---|---|---|
259 | We added an fneg to this sequence without removing the existing one. How is this better? |
llvm/test/Transforms/InstCombine/matrix-multiplication-negation.ll | ||
---|---|---|
259 | This is a result of removing the check if the negation has a single use. |
llvm/test/Transforms/InstCombine/matrix-multiplication-negation.ll | ||
---|---|---|
259 | Right - it made sense when both operands are negated in the other patch, but not when only 1 is negated. The one-use check should be present for this optimization. |
llvm/test/Transforms/InstCombine/matrix-multiplication-negation.ll | ||
---|---|---|
259 | Perfect, I will add it now |
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
1887 | This sequence of create/replace/reset operand seems shaky. This transform is different than the earlier ones because we are creating a new instruction after the existing matmul call. It would be better to use the standard practice: Builder.CreateIntrinsic() followed by UnaryOperator::CreateFNeg(). This is also dropping all FMF on the fneg. Usually, we'd propagate the FMF from the matmul to the new fneg. We should adjust the tests to show this behavior more explicitly (ie, put some FMF on the fnegs in the tests). |
Refactor the code for moving negation to the result. Now we preserve the fastmath flags on the created multiplication and negation instructions by passing them from the original multiplication instruction. FNeg is created using Builder instead of UnitaryOperand because
the latter caused build failures.
The changes in the fastflag behaviour can be seen on test_negation_move_to_result_with_fastflags test.
I added another 3 test cases using a subset of the fast flags and the flags on the negation instruction instead of the multiplication.
- test_negation_move_to_result_with_nnan_flag
- test_negation_move_to_result_with_nsz_flag
- test_negation_move_to_result_with_fastflag_on_negation
Why is this needed?