This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
6/11
DAGCombiner.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
1/1
dagcombine-fma-fmad.ll

Differential D132837

[ISel] Enable generating more fma instructions.
ClosedPublic

Authored by tsymalla on Aug 29 2022, 2:02 AM.

Download Raw Diff

Details

Reviewers

foad
nhaehnle
spatel
cameron.mcinally
ohsallen
RKSimon
lebedev.ri

Commits

rGc98a46fee6f4: [ISel] Enable generating more fma instructions.

Summary

This patch changes a FADD / FMUL => FMA ISel pattern implemented
in D80801 so that it peeks through more than one FMA.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tsymalla created this revision.Aug 29 2022, 2:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2022, 2:02 AM

Herald added subscribers: kosarev, ecnelises, kerbowa and 2 others. · View Herald Transcript

tsymalla requested review of this revision.Aug 29 2022, 2:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2022, 2:02 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B183884: Diff 456289.Aug 29 2022, 3:00 AM

foad added inline comments.Aug 30 2022, 4:12 AM

llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll
178	Please precommit the new tests and rebase, so the patch shows the codegen diff.

foad added reviewers: spatel, cameron.mcinally, ohsallen, RKSimon, lebedev.ri.Aug 30 2022, 4:22 AM

foad added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14178	This seems to be a more limited case of your new reassociation, so can we remove this code now?
14219	I don't understand the logic here. What do copies and extracts have to do with reassociating fmuls and fadds?

See the comment on D80801: "The fold implemented here is actually a specialization - we should be able to peek through >1 fma to find this pattern. That's another patch if we want to try that enhancement though." That's what you are implementing here.

tsymalla mentioned this in rGd26dd37149b4: [NFC][AMDGPU] Pre-commit tests for D132837..Aug 30 2022, 4:55 AM

Rebase.

tsymalla marked an inline comment as done.Aug 30 2022, 5:31 AM

Harbormaster completed remote builds in B184142: Diff 456632.Aug 30 2022, 6:54 AM

tsymalla mentioned this in rG72730c3f0e20: [NFC][AMDGPU] Pre-commit test for D132837..Sep 9 2022, 5:09 AM

This changes a FADD / FMUL => FMA ISel pattern implemented
in D80801 so that it peeks through more than one FMA. This also
changes the order of the operands, which can help with eliminating
a final COPY.

tsymalla edited the summary of this revision. (Show Details)Sep 9 2022, 5:59 AM

Harbormaster completed remote builds in B185820: Diff 459025.Sep 9 2022, 6:54 AM

Did not update the lit tests for other targets yet (CodeGen/AArch64/fadd-combines.ll, CodeGen/PowerPC/fma-assoc.ll, CodeGen/PowerPC/machine-combiner.ll, CodeGen/X86/fma_patterns.ll) because I wanted to get some opinion on the change in the operand order first.

arsenm added a subscriber: arsenm.Sep 9 2022, 7:08 AM

arsenm added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14208	Why use UpdateNodeOperands here instead of just constructing the new node normally?

foad added inline comments.Sep 9 2022, 8:00 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14181	Remove "on"?
14182–14184	Looks like clang-format has mangled this comment!
14186–14211	Can you explain this? It is not at all obvious. Are you sure it's not just random? Perhaps it helps your case, but could harm other cases?

tsymalla marked 2 inline comments as done.Sep 14 2022, 8:19 AM

tsymalla added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14186–14211	When keeping the order fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E), The compiler will assign the result of the inner FMA to a new (virtual) register which will be used by the outer FMA. The outer FMA will write its output to the same virtual register which will, in the simple example (fmac_sequence_simple) cause a COPY to the first available register at the end. Finally, the register allocator will try recoloring the registers. This output register (in this case, %6) will be marked as recolorizable and assigned to the first available register. As the outer FMA is using %0 and % 1 (a and b), it can only assign to %2, which is in this case v2. During virtual register rewriting, the compiler will now try to eliminate identity copies. Even if the final copy is regarded as candidate for deletion, it cannot do so because the COPY is used by the final SI_RETURN. So, this generates a superfluous V_MOV at the end. By changing the operand order, we get following changes: The multiplicand operand order of the innermost and the outermost FMA is swapped. So, innermost uses %0 and %1 and outermost uses %2 and %3. So, the compiler is able to assign %6 (output of outermost FMA) to $vgpr0 because the register is not used inside the outermost FMA (only in the innermost FMA). By this, the final COPY can be eliminated because it's essentially a identity copy, removing the final V_MOV. Changing the instruction order in the DAG basically frees up the desired output register for the register allocator. I cannot assume this causes harm for other cases, but from fast-math point of view it should not cause any issues. Looking at the (currently failing) test cases, I don't see any actual issues, but please correct me if I'm wrong.

tsymalla added inline comments.Sep 16 2022, 12:33 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

14208

This is updating the innermost FMA node which already exists, I am just replacing the last operand of it.
The newly created node is the FMA node in NewFMA:

Example steps:

fadd (fma A, B, (fmul C, D)), E

1. NewFMA = fma_o (C, D, fma_i (A, B, fmul(C, D)))
2. fma_i = fma (A, B, E)
=> NewFMA = fma (C, D, fma (A, B, E))

More complex case:

fadd (fma_i0 A, B, (fma (C, D, (fmul (E, F))))), G:
1. TmpFMA = fma_i = fma (C, D, (fmul (E, F))))
2. NewFMA = fma (E, F, fma_i0) = fma (E, F, fma (A, B, fma (C, D, (fmul (E, F))))) (construct outermost FMA)
3. TmpFMA.UpdateNodeOperands: fma_i = fma (C, D, (fmul (E, F)) => fma (C, D, G) => fma (E, F, fma (A, B, fma (C, D, (fmul (E, F))))) = fma (E, F, fma (A, B, fma (C, D, G))) (replace innermost FMA operand with addition operator of the initial FADD)

I can create a new node for the innermost FMA, but isn't UpdateNodeOperands being used for such cases?

Addressed comments, added a few comments and updated X86, AArch64, PowerPC
tests.y

Herald added subscribers: pengfei, nemanjai. · View Herald TranscriptSep 16 2022, 1:35 AM

foad added inline comments.Sep 16 2022, 2:55 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14186–14211	https://reviews.llvm.org/differential/diff/460688/ adds a couple of variations of PPC test cases. All I have done is swapped the operands of the "fadd %F, %G" instruction. If you apply your patch on top of this you will see that the codegen gets worse for these cases, in exactly the same way that it got better for the existing tests. So I do not agree that changing the order of the fmas is a good thing in general. Because of this, I would prefer to keep the existing order of the fmas. That makes the combine simpler because you only have to mutate the fmul to an fma, and remove the fadd. You don't have to change the existing fmas at all.
llvm/test/CodeGen/PowerPC/machine-combiner.ll
1 ↗	(On Diff #460670)	Please do this as a separate patch and then rebase this patch on it.

Harbormaster completed remote builds in B187085: Diff 460670.Sep 16 2022, 3:18 AM

tsymalla added inline comments.Sep 16 2022, 8:48 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14186–14211	Agreed. Thanks for constructing the tests. However, in this example, the number of instructions (fma_innermost test) will not decrease due to the addition of the v_mov instead of constructing the return value in-place. I don't think this will negatively affect real-life examples (but I'll check), but for the sake of code quality, we could try getting rid of these moves probably at some other place.
llvm/test/CodeGen/PowerPC/machine-combiner.ll
1 ↗	(On Diff #460670)	These will be removed again in the next diff.

Once more, change the algorithm to keep the operands in order
Use node morphing instead of updating the node operands.

Harbormaster completed remote builds in B187313: Diff 460981.Sep 17 2022, 4:09 AM

I think the patch looks good now, but I am a little confused by the test changes: one of them now uses fmac instead of mac, and the other uses mad instead of fma.

Also could you add another version of fmac_sequence_innermost_fmul to show that it still works if you swap the operands of the outermost fadd?

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14197	You don't need to test TmpFMA here. Optionally, this loop could be rewritten as: if (E) { do { ... } while (isFusedOp(TmpFMA)); }

Are you ready to include the changes on other targets?

In D132837#3802595, @foad wrote:

I am a little confused by the test changes: one of them now uses fmac instead of mac, and the other uses mad instead of fma.

This is not your fault and need not block the patch. The choice of fma vs fmad depends on whether the combiner is running pre- or post-legalization, and it is a bit random because the pre-legalization combiner does not always re-run combines on new nodes that it creates, so some stuff is (wrongly) left for the post-legalizer combiner to clean up.

tsymalla mentioned this in rG053df6ccafbb: [NFC][AMDGPU] Pre-commit test for D132837..Sep 21 2022, 1:17 AM

tsymalla edited the summary of this revision. (Show Details)Sep 21 2022, 2:00 AM

Remove TmpFMA check, add additional test.

Harbormaster completed remote builds in B187915: Diff 461816.Sep 21 2022, 2:03 AM

LGTM, thanks!

This revision is now accepted and ready to land.Sep 21 2022, 2:07 AM

This revision was landed with ongoing or failed builds.Sep 21 2022, 3:03 AM

Closed by commit rGc98a46fee6f4: [ISel] Enable generating more fma instructions. (authored by tsymalla). · Explain Why

This revision was automatically updated to reflect the committed changes.

tsymalla added a commit: rGc98a46fee6f4: [ISel] Enable generating more fma instructions..

In D132837#3805144, @foad wrote:

LGTM, thanks!

Thanks for reviewing!

In D132837#3802978, @foad wrote:

In D132837#3802595, @foad wrote:

I am a little confused by the test changes: one of them now uses fmac instead of mac, and the other uses mad instead of fma.

This is not your fault and need not block the patch. The choice of fma vs fmad depends on whether the combiner is running pre- or post-legalization, and it is a bit random because the pre-legalization combiner does not always re-run combines on new nodes that it creates, so some stuff is (wrongly) left for the post-legalizer combiner to clean up.

@deadalnix @RKSimon I debugged this to a case where running the pre-legalization combiner twice in a row would do extra combines that a single pass missed, and D127115 did not help. Is this to be expected? Is it worth debugging more to work out exactly why some combines were missed the first time?

spatel mentioned this in rGddd27a3d3934: [AArch64] add tests for fadd -> fma combines; NFC.Sep 21 2022, 6:01 AM

Note that there's a gray area for fast-math-flags with transforms like this: we generally just check FMF on the final value in the sequence to determine if the fold is allowed.
I haven't seen many examples of mixed FMF in practice, but front-ends are becoming more flexible about that via pragma or other decorations, so I added a couple of AArch64 tests to demonstrate current behavior:
ddd27a3d3934

If this patch is reverted for some reason, those tests will need to be updated.

foad mentioned this in D134810: [ISel] Fix DAG divergence after new FMA combine.Sep 28 2022, 7:35 AM

foad mentioned this in rG2c12a04bba76: [ISel] Fix DAG divergence after new FMA combine.Sep 28 2022, 11:44 AM

tsymalla mentioned this in D134856: [AMDGPU] Add use check in v_fma combine..Sep 29 2022, 12:59 AM

tsymalla mentioned this in rGa41dde2c625e: [AMDGPU] Add use check in v_fma combine..Sep 29 2022, 3:25 AM

slydiman mentioned this in D133235: [DAGCombiner] More opportunities to fuse fmul and fadd to fma aggressively.Oct 2 2022, 3:42 PM

foad mentioned this in D135150: [ISel] Fix crash in new FMA DAG combine.Oct 4 2022, 5:10 AM

foad mentioned this in rGaf947d9fcbbd: [ISel] Fix crash in new FMA DAG combine.Oct 4 2022, 7:19 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

49 lines

test/

CodeGen/

AMDGPU/

dagcombine-fma-fmad.ll

10 lines

Diff 461837

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,169 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {

// fold (fadd x, (fmul y, z)) -> (fma y, z, x)		// fold (fadd x, (fmul y, z)) -> (fma y, z, x)
// Note: Commutes FADD operands.		// Note: Commutes FADD operands.
if (isContractableFMUL(N1) && (Aggressive \|\| N1->hasOneUse())) {		if (isContractableFMUL(N1) && (Aggressive \|\| N1->hasOneUse())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT, N1.getOperand(0),		return DAG.getNode(PreferredFusedOpcode, SL, VT, N1.getOperand(0),
N1.getOperand(1), N0);		N1.getOperand(1), N0);
}		}

// fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)		// fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)
		foadUnsubmitted Not Done Reply Inline Actions This seems to be a more limited case of your new reassociation, so can we remove this code now? foad: This seems to be a more limited case of your new reassociation, so can we remove this code now?
// fadd E, (fma A, B, (fmul C, D)) --> fma A, B, (fma C, D, E)		// fadd E, (fma A, B, (fmul C, D)) --> fma A, B, (fma C, D, E)
		// This also works with nested fma instructions:
		// fadd (fma A, B, (fma (C, D, (fmul (E, F))))), G -->
		foadUnsubmitted Done Reply Inline Actions Remove "on"? foad: Remove "on"?
		// fma A, B, (fma C, D, fma (E, F, G))
		// fadd (G, (fma A, B, (fma (C, D, (fmul (E, F)))))) -->
		// fma A, B, (fma C, D, fma (E, F, G)).
		foadUnsubmitted Done Reply Inline Actions Looks like clang-format has mangled this comment! foad: Looks like clang-format has mangled this comment!
// This requires reassociation because it changes the order of operations.		// This requires reassociation because it changes the order of operations.
		if (CanReassociate) {
SDValue FMA, E;		SDValue FMA, E;
if (CanReassociate && isFusedOp(N0) &&		if (isFusedOp(N0) && N0.hasOneUse()) {
N0.getOperand(2).getOpcode() == ISD::FMUL && N0.hasOneUse() &&
N0.getOperand(2).hasOneUse()) {
FMA = N0;		FMA = N0;
E = N1;		E = N1;
} else if (CanReassociate && isFusedOp(N1) &&		} else if (isFusedOp(N1) && N1.hasOneUse()) {
N1.getOperand(2).getOpcode() == ISD::FMUL && N1.hasOneUse() &&
N1.getOperand(2).hasOneUse()) {
FMA = N1;		FMA = N1;
E = N0;		E = N0;
}		}
if (FMA && E) {
SDValue A = FMA.getOperand(0);		SDValue TmpFMA = FMA;
SDValue B = FMA.getOperand(1);		while (E && isFusedOp(TmpFMA)) {
		foadUnsubmitted Not Done Reply Inline Actions You don't need to test TmpFMA here. Optionally, this loop could be rewritten as: if (E) { do { ... } while (isFusedOp(TmpFMA)); } foad: You don't need to test TmpFMA here. Optionally, this loop could be rewritten as: ``` if (E) {…
SDValue C = FMA.getOperand(2).getOperand(0);		SDValue FMul = TmpFMA->getOperand(2);
SDValue D = FMA.getOperand(2).getOperand(1);		if (FMul.getOpcode() == ISD::FMUL && FMul.hasOneUse()) {
SDValue CDE = DAG.getNode(PreferredFusedOpcode, SL, VT, C, D, E);		SDValue C = FMul.getOperand(0);
return DAG.getNode(PreferredFusedOpcode, SL, VT, A, B, CDE);		SDValue D = FMul.getOperand(1);

		DAG.MorphNodeTo(FMul.getNode(), PreferredFusedOpcode, FMul->getVTList(),
		{C, D, E});

		return FMA;
		}

		arsenmUnsubmitted Not Done Reply Inline Actions Why use UpdateNodeOperands here instead of just constructing the new node normally? arsenm: Why use UpdateNodeOperands here instead of just constructing the new node normally?
		tsymallaAuthorUnsubmitted Done Reply Inline Actions This is updating the innermost FMA node which already exists, I am just replacing the last operand of it. The newly created node is the FMA node in NewFMA: Example steps: fadd (fma A, B, (fmul C, D)), E 1. NewFMA = fma_o (C, D, fma_i (A, B, fmul(C, D))) 2. fma_i = fma (A, B, E) => NewFMA = fma (C, D, fma (A, B, E)) More complex case: fadd (fma_i0 A, B, (fma (C, D, (fmul (E, F))))), G: 1. TmpFMA = fma_i = fma (C, D, (fmul (E, F)))) 2. NewFMA = fma (E, F, fma_i0) = fma (E, F, fma (A, B, fma (C, D, (fmul (E, F))))) (construct outermost FMA) 3. TmpFMA.UpdateNodeOperands: fma_i = fma (C, D, (fmul (E, F)) => fma (C, D, G) => fma (E, F, fma (A, B, fma (C, D, (fmul (E, F))))) = fma (E, F, fma (A, B, fma (C, D, G))) (replace innermost FMA operand with addition operator of the initial FADD) I can create a new node for the innermost FMA, but isn't `UpdateNodeOperands` being used for such cases? tsymalla: This is updating the innermost FMA node which already exists, I am just replacing the last…
		TmpFMA = TmpFMA->getOperand(2);
		}
}		}
		foadUnsubmitted Not Done Reply Inline Actions Can you explain this? It is not at all obvious. Are you sure it's not just random? Perhaps it helps your case, but could harm other cases? foad: Can you explain this? It is not at all obvious. Are you sure it's not just random? Perhaps it…
		tsymallaAuthorUnsubmitted Done Reply Inline Actions When keeping the order fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E), The compiler will assign the result of the inner FMA to a new (virtual) register which will be used by the outer FMA. The outer FMA will write its output to the same virtual register which will, in the simple example (fmac_sequence_simple) cause a COPY to the first available register at the end. Finally, the register allocator will try recoloring the registers. This output register (in this case, %6) will be marked as recolorizable and assigned to the first available register. As the outer FMA is using %0 and % 1 (a and b), it can only assign to %2, which is in this case v2. During virtual register rewriting, the compiler will now try to eliminate identity copies. Even if the final copy is regarded as candidate for deletion, it cannot do so because the COPY is used by the final SI_RETURN. So, this generates a superfluous V_MOV at the end. By changing the operand order, we get following changes: The multiplicand operand order of the innermost and the outermost FMA is swapped. So, innermost uses %0 and %1 and outermost uses %2 and %3. So, the compiler is able to assign %6 (output of outermost FMA) to $vgpr0 because the register is not used inside the outermost FMA (only in the innermost FMA). By this, the final COPY can be eliminated because it's essentially a identity copy, removing the final V_MOV. Changing the instruction order in the DAG basically frees up the desired output register for the register allocator. I cannot assume this causes harm for other cases, but from fast-math point of view it should not cause any issues. Looking at the (currently failing) test cases, I don't see any actual issues, but please correct me if I'm wrong. tsymalla: When keeping the order ``` fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E), ```…
		foadUnsubmitted Not Done Reply Inline Actions https://reviews.llvm.org/differential/diff/460688/ adds a couple of variations of PPC test cases. All I have done is swapped the operands of the "fadd %F, %G" instruction. If you apply your patch on top of this you will see that the codegen gets worse for these cases, in exactly the same way that it got better for the existing tests. So I do not agree that changing the order of the fmas is a good thing in general. Because of this, I would prefer to keep the existing order of the fmas. That makes the combine simpler because you only have to mutate the fmul to an fma, and remove the fadd. You don't have to change the existing fmas at all. foad: https://reviews.llvm.org/differential/diff/460688/ adds a couple of variations of PPC test…
		tsymallaAuthorUnsubmitted Done Reply Inline Actions Agreed. Thanks for constructing the tests. However, in this example, the number of instructions (fma_innermost test) will not decrease due to the addition of the v_mov instead of constructing the return value in-place. I don't think this will negatively affect real-life examples (but I'll check), but for the sake of code quality, we could try getting rid of these moves probably at some other place. tsymalla: Agreed. Thanks for constructing the tests. However, in this example, the number of instructions…

// Look through FP_EXTEND nodes to do more combining.		// Look through FP_EXTEND nodes to do more combining.

// fold (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)		// fold (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)
if (N0.getOpcode() == ISD::FP_EXTEND) {		if (N0.getOpcode() == ISD::FP_EXTEND) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (isContractableFMUL(N00) &&		if (isContractableFMUL(N00) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
		foadUnsubmitted Done Reply Inline Actions I don't understand the logic here. What do copies and extracts have to do with reassociating fmuls and fadds? foad: I don't understand the logic here. What do copies and extracts have to do with reassociating…
N00.getValueType())) {		N00.getValueType())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(0)),		DAG.getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(0)),
DAG.getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(1)),		DAG.getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(1)),
N1);		N1);
}		}
}		}

▲ Show 20 Lines • Show All 10,985 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_max_f32_e64 v6, s0, s0 clamp		; GCN-NEXT: v_max_f32_e64 v6, s0, s0 clamp
; GCN-NEXT: v_add_f32_e64 v5, s29, -1.0		; GCN-NEXT: v_add_f32_e64 v5, s29, -1.0
; GCN-NEXT: v_sub_f32_e32 v8, s0, v1		; GCN-NEXT: v_sub_f32_e32 v8, s0, v1
; GCN-NEXT: v_fma_f32 v7, -s2, v6, s6		; GCN-NEXT: v_fma_f32 v7, -s2, v6, s6
; GCN-NEXT: v_fma_f32 v5, v6, v5, 1.0		; GCN-NEXT: v_fma_f32 v5, v6, v5, 1.0
; GCN-NEXT: v_mad_f32 v10, s2, v6, v2		; GCN-NEXT: v_mad_f32 v10, s2, v6, v2
; GCN-NEXT: s_mov_b32 s0, 0x3c23d70a		; GCN-NEXT: s_mov_b32 s0, 0x3c23d70a
; GCN-NEXT: v_fmac_f32_e32 v1, v6, v8		; GCN-NEXT: v_fmac_f32_e32 v1, v6, v8
; GCN-NEXT: v_mac_f32_e32 v10, v7, v6		; GCN-NEXT: v_fmac_f32_e32 v10, v7, v6
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: v_mul_f32_e32 v9, s10, v0		; GCN-NEXT: v_mul_f32_e32 v9, s10, v0
; GCN-NEXT: v_fma_f32 v0, -v0, s10, s14		; GCN-NEXT: v_fma_f32 v0, -v0, s10, s14
; GCN-NEXT: v_mul_f32_e32 v8, s18, v2		; GCN-NEXT: v_mul_f32_e32 v8, s18, v2
; GCN-NEXT: v_mul_f32_e32 v3, s22, v3		; GCN-NEXT: v_mul_f32_e32 v3, s22, v3
; GCN-NEXT: v_fmac_f32_e32 v9, v0, v6		; GCN-NEXT: v_fmac_f32_e32 v9, v0, v6
; GCN-NEXT: v_sub_f32_e32 v0, v1, v5		; GCN-NEXT: v_sub_f32_e32 v0, v1, v5
; GCN-NEXT: v_mul_f32_e32 v1, v8, v6		; GCN-NEXT: v_mul_f32_e32 v1, v8, v6
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	.entry:
%.i2539 = fmul reassoc nnan nsz arcp contract afn float %.i2536, %4		%.i2539 = fmul reassoc nnan nsz arcp contract afn float %.i2536, %4
%.i2542 = fadd reassoc nnan nsz arcp contract afn float %.i2488, %.i2539		%.i2542 = fadd reassoc nnan nsz arcp contract afn float %.i2488, %.i2539
%.i2545 = fmul reassoc nnan nsz arcp contract afn float %.i2525, %.i2542		%.i2545 = fmul reassoc nnan nsz arcp contract afn float %.i2525, %.i2542
%.i2548 = fadd reassoc nnan nsz arcp contract afn float %.i2469, %.i2545		%.i2548 = fadd reassoc nnan nsz arcp contract afn float %.i2469, %.i2545
%.i2551 = call reassoc nnan nsz arcp contract afn float @llvm.maxnum.f32(float %.i2548, float 0.000000e+00)		%.i2551 = call reassoc nnan nsz arcp contract afn float @llvm.maxnum.f32(float %.i2548, float 0.000000e+00)
ret float %.i2551		ret float %.i2551
}		}

define float @fmac_sequence_simple(float %a, float %b, float %c, float %d, float %e) #0 {		define float @fmac_sequence_simple(float %a, float %b, float %c, float %d, float %e) #0 {
		foadUnsubmitted Done Reply Inline Actions Please precommit the new tests and rebase, so the patch shows the codegen diff. foad: Please precommit the new tests and rebase, so the patch shows the codegen diff.
; GCN-LABEL: fmac_sequence_simple:		; GCN-LABEL: fmac_sequence_simple:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_waitcnt_vscnt null, 0x0		; GCN-NEXT: s_waitcnt_vscnt null, 0x0
; GCN-NEXT: v_fma_f32 v2, v2, v3, v4		; GCN-NEXT: v_fma_f32 v2, v2, v3, v4
; GCN-NEXT: v_fmac_f32_e32 v2, v0, v1		; GCN-NEXT: v_fmac_f32_e32 v2, v0, v1
; GCN-NEXT: v_mov_b32_e32 v0, v2		; GCN-NEXT: v_mov_b32_e32 v0, v2
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
%t0 = fmul fast float %a, %b		%t0 = fmul fast float %a, %b
%t1 = fmul fast float %c, %d		%t1 = fmul fast float %c, %d
%t2 = fadd fast float %t0, %t1		%t2 = fadd fast float %t0, %t1
%t5 = fadd fast float %t2, %e		%t5 = fadd fast float %t2, %e
ret float %t5		ret float %t5
}		}

define float @fmac_sequence_innermost_fmul(float %a, float %b, float %c, float %d, float %e, float %f, float %g) #0 {		define float @fmac_sequence_innermost_fmul(float %a, float %b, float %c, float %d, float %e, float %f, float %g) #0 {
; GCN-LABEL: fmac_sequence_innermost_fmul:		; GCN-LABEL: fmac_sequence_innermost_fmul:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_waitcnt_vscnt null, 0x0		; GCN-NEXT: s_waitcnt_vscnt null, 0x0
; GCN-NEXT: v_mul_f32_e32 v2, v2, v3		; GCN-NEXT: v_mad_f32 v2, v2, v3, v6
; GCN-NEXT: v_fmac_f32_e32 v2, v0, v1		; GCN-NEXT: v_fmac_f32_e32 v2, v0, v1
; GCN-NEXT: v_fmac_f32_e32 v2, v4, v5		; GCN-NEXT: v_fmac_f32_e32 v2, v4, v5
; GCN-NEXT: v_add_f32_e32 v0, v2, v6		; GCN-NEXT: v_mov_b32_e32 v0, v2
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
%t0 = fmul fast float %a, %b		%t0 = fmul fast float %a, %b
%t1 = fmul fast float %c, %d		%t1 = fmul fast float %c, %d
%t2 = fadd fast float %t0, %t1		%t2 = fadd fast float %t0, %t1
%t3 = fmul fast float %e, %f		%t3 = fmul fast float %e, %f
%t4 = fadd fast float %t2, %t3		%t4 = fadd fast float %t2, %t3
%t5 = fadd fast float %t4, %g		%t5 = fadd fast float %t4, %g
ret float %t5		ret float %t5
}		}

define float @fmac_sequence_innermost_fmul_swapped_operands(float %a, float %b, float %c, float %d, float %e, float %f, float %g) #0 {		define float @fmac_sequence_innermost_fmul_swapped_operands(float %a, float %b, float %c, float %d, float %e, float %f, float %g) #0 {
; GCN-LABEL: fmac_sequence_innermost_fmul_swapped_operands:		; GCN-LABEL: fmac_sequence_innermost_fmul_swapped_operands:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_waitcnt_vscnt null, 0x0		; GCN-NEXT: s_waitcnt_vscnt null, 0x0
; GCN-NEXT: v_mul_f32_e32 v2, v2, v3		; GCN-NEXT: v_mad_f32 v2, v2, v3, v6
; GCN-NEXT: v_fmac_f32_e32 v2, v0, v1		; GCN-NEXT: v_fmac_f32_e32 v2, v0, v1
; GCN-NEXT: v_fmac_f32_e32 v2, v4, v5		; GCN-NEXT: v_fmac_f32_e32 v2, v4, v5
; GCN-NEXT: v_add_f32_e32 v0, v6, v2		; GCN-NEXT: v_mov_b32_e32 v0, v2
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
%t0 = fmul fast float %a, %b		%t0 = fmul fast float %a, %b
%t1 = fmul fast float %c, %d		%t1 = fmul fast float %c, %d
%t2 = fadd fast float %t0, %t1		%t2 = fadd fast float %t0, %t1
%t3 = fmul fast float %e, %f		%t3 = fmul fast float %e, %f
%t4 = fadd fast float %t2, %t3		%t4 = fadd fast float %t2, %t3
%t5 = fadd fast float %g, %t4		%t5 = fadd fast float %g, %t4
ret float %t5		ret float %t5
Show All 30 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ISel] Enable generating more fma instructions.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 461837

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll

[ISel] Enable generating more fma instructions.
ClosedPublic