This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
6/11
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
fadd-combines.ll
-
AMDGPU/
1/1
dagcombine-fma-fmad.ll
-
fadd-fma-fmul-combine.ll
-
PowerPC/
-
fma-assoc.ll
1/2
machine-combiner.ll
-
X86/
-
fma_patterns.ll

Differential D132837

[ISel] Enable generating more fma instructions.
ClosedPublic

Authored by tsymalla on Aug 29 2022, 2:02 AM.

Download Raw Diff

Details

Reviewers

foad
nhaehnle
spatel
cameron.mcinally
ohsallen
RKSimon
lebedev.ri

Commits

rGc98a46fee6f4: [ISel] Enable generating more fma instructions.

Summary

This patch changes a FADD / FMUL => FMA ISel pattern implemented
in D80801 so that it peeks through more than one FMA.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,040 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test

Event Timeline

tsymalla created this revision.Aug 29 2022, 2:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2022, 2:02 AM

Herald added subscribers: kosarev, ecnelises, kerbowa and 2 others. · View Herald Transcript

tsymalla requested review of this revision.Aug 29 2022, 2:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2022, 2:02 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B183884: Diff 456289.Aug 29 2022, 3:00 AM

foad added inline comments.Aug 30 2022, 4:12 AM

llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll
178	Please precommit the new tests and rebase, so the patch shows the codegen diff.

foad added reviewers: spatel, cameron.mcinally, ohsallen, RKSimon, lebedev.ri.Aug 30 2022, 4:22 AM

foad added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14155–14156	This seems to be a more limited case of your new reassociation, so can we remove this code now?
14220	I don't understand the logic here. What do copies and extracts have to do with reassociating fmuls and fadds?

See the comment on D80801: "The fold implemented here is actually a specialization - we should be able to peek through >1 fma to find this pattern. That's another patch if we want to try that enhancement though." That's what you are implementing here.

tsymalla mentioned this in rGd26dd37149b4: [NFC][AMDGPU] Pre-commit tests for D132837..Aug 30 2022, 4:55 AM

Rebase.

tsymalla marked an inline comment as done.Aug 30 2022, 5:31 AM

Harbormaster completed remote builds in B184142: Diff 456632.Aug 30 2022, 6:54 AM

tsymalla mentioned this in rG72730c3f0e20: [NFC][AMDGPU] Pre-commit test for D132837..Sep 9 2022, 5:09 AM

This changes a FADD / FMUL => FMA ISel pattern implemented
in D80801 so that it peeks through more than one FMA. This also
changes the order of the operands, which can help with eliminating
a final COPY.

tsymalla edited the summary of this revision. (Show Details)Sep 9 2022, 5:59 AM

Harbormaster completed remote builds in B185820: Diff 459025.Sep 9 2022, 6:54 AM

Did not update the lit tests for other targets yet (CodeGen/AArch64/fadd-combines.ll, CodeGen/PowerPC/fma-assoc.ll, CodeGen/PowerPC/machine-combiner.ll, CodeGen/X86/fma_patterns.ll) because I wanted to get some opinion on the change in the operand order first.

arsenm added a subscriber: arsenm.Sep 9 2022, 7:08 AM

arsenm added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14186	Why use UpdateNodeOperands here instead of just constructing the new node normally?

foad added inline comments.Sep 9 2022, 8:00 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14158	Remove "on"?
14159–14161	Looks like clang-format has mangled this comment!
14164–14212	Can you explain this? It is not at all obvious. Are you sure it's not just random? Perhaps it helps your case, but could harm other cases?

tsymalla marked 2 inline comments as done.Sep 14 2022, 8:19 AM

tsymalla added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14164–14212	When keeping the order fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E), The compiler will assign the result of the inner FMA to a new (virtual) register which will be used by the outer FMA. The outer FMA will write its output to the same virtual register which will, in the simple example (fmac_sequence_simple) cause a COPY to the first available register at the end. Finally, the register allocator will try recoloring the registers. This output register (in this case, %6) will be marked as recolorizable and assigned to the first available register. As the outer FMA is using %0 and % 1 (a and b), it can only assign to %2, which is in this case v2. During virtual register rewriting, the compiler will now try to eliminate identity copies. Even if the final copy is regarded as candidate for deletion, it cannot do so because the COPY is used by the final SI_RETURN. So, this generates a superfluous V_MOV at the end. By changing the operand order, we get following changes: The multiplicand operand order of the innermost and the outermost FMA is swapped. So, innermost uses %0 and %1 and outermost uses %2 and %3. So, the compiler is able to assign %6 (output of outermost FMA) to $vgpr0 because the register is not used inside the outermost FMA (only in the innermost FMA). By this, the final COPY can be eliminated because it's essentially a identity copy, removing the final V_MOV. Changing the instruction order in the DAG basically frees up the desired output register for the register allocator. I cannot assume this causes harm for other cases, but from fast-math point of view it should not cause any issues. Looking at the (currently failing) test cases, I don't see any actual issues, but please correct me if I'm wrong.

tsymalla added inline comments.Sep 16 2022, 12:33 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

14186

This is updating the innermost FMA node which already exists, I am just replacing the last operand of it.
The newly created node is the FMA node in NewFMA:

Example steps:

fadd (fma A, B, (fmul C, D)), E

1. NewFMA = fma_o (C, D, fma_i (A, B, fmul(C, D)))
2. fma_i = fma (A, B, E)
=> NewFMA = fma (C, D, fma (A, B, E))

More complex case:

fadd (fma_i0 A, B, (fma (C, D, (fmul (E, F))))), G:
1. TmpFMA = fma_i = fma (C, D, (fmul (E, F))))
2. NewFMA = fma (E, F, fma_i0) = fma (E, F, fma (A, B, fma (C, D, (fmul (E, F))))) (construct outermost FMA)
3. TmpFMA.UpdateNodeOperands: fma_i = fma (C, D, (fmul (E, F)) => fma (C, D, G) => fma (E, F, fma (A, B, fma (C, D, (fmul (E, F))))) = fma (E, F, fma (A, B, fma (C, D, G))) (replace innermost FMA operand with addition operator of the initial FADD)

I can create a new node for the innermost FMA, but isn't UpdateNodeOperands being used for such cases?

Addressed comments, added a few comments and updated X86, AArch64, PowerPC
tests.y

Herald added subscribers: pengfei, nemanjai. · View Herald TranscriptSep 16 2022, 1:35 AM

foad added inline comments.Sep 16 2022, 2:55 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14164–14212	https://reviews.llvm.org/differential/diff/460688/ adds a couple of variations of PPC test cases. All I have done is swapped the operands of the "fadd %F, %G" instruction. If you apply your patch on top of this you will see that the codegen gets worse for these cases, in exactly the same way that it got better for the existing tests. So I do not agree that changing the order of the fmas is a good thing in general. Because of this, I would prefer to keep the existing order of the fmas. That makes the combine simpler because you only have to mutate the fmul to an fma, and remove the fadd. You don't have to change the existing fmas at all.
llvm/test/CodeGen/PowerPC/machine-combiner.ll
1	Please do this as a separate patch and then rebase this patch on it.

Harbormaster completed remote builds in B187085: Diff 460670.Sep 16 2022, 3:18 AM

tsymalla added inline comments.Sep 16 2022, 8:48 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14164–14212	Agreed. Thanks for constructing the tests. However, in this example, the number of instructions (fma_innermost test) will not decrease due to the addition of the v_mov instead of constructing the return value in-place. I don't think this will negatively affect real-life examples (but I'll check), but for the sake of code quality, we could try getting rid of these moves probably at some other place.
llvm/test/CodeGen/PowerPC/machine-combiner.ll
1	These will be removed again in the next diff.

Once more, change the algorithm to keep the operands in order
Use node morphing instead of updating the node operands.

Harbormaster completed remote builds in B187313: Diff 460981.Sep 17 2022, 4:09 AM

I think the patch looks good now, but I am a little confused by the test changes: one of them now uses fmac instead of mac, and the other uses mad instead of fma.

Also could you add another version of fmac_sequence_innermost_fmul to show that it still works if you swap the operands of the outermost fadd?

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14175	You don't need to test TmpFMA here. Optionally, this loop could be rewritten as: if (E) { do { ... } while (isFusedOp(TmpFMA)); }

Are you ready to include the changes on other targets?

In D132837#3802595, @foad wrote:

I am a little confused by the test changes: one of them now uses fmac instead of mac, and the other uses mad instead of fma.

This is not your fault and need not block the patch. The choice of fma vs fmad depends on whether the combiner is running pre- or post-legalization, and it is a bit random because the pre-legalization combiner does not always re-run combines on new nodes that it creates, so some stuff is (wrongly) left for the post-legalizer combiner to clean up.

tsymalla mentioned this in rG053df6ccafbb: [NFC][AMDGPU] Pre-commit test for D132837..Sep 21 2022, 1:17 AM

tsymalla edited the summary of this revision. (Show Details)Sep 21 2022, 2:00 AM

Remove TmpFMA check, add additional test.

Harbormaster completed remote builds in B187915: Diff 461816.Sep 21 2022, 2:03 AM

LGTM, thanks!

This revision is now accepted and ready to land.Sep 21 2022, 2:07 AM

This revision was landed with ongoing or failed builds.Sep 21 2022, 3:03 AM

Closed by commit rGc98a46fee6f4: [ISel] Enable generating more fma instructions. (authored by tsymalla). · Explain Why

This revision was automatically updated to reflect the committed changes.

tsymalla added a commit: rGc98a46fee6f4: [ISel] Enable generating more fma instructions..

In D132837#3805144, @foad wrote:

LGTM, thanks!

Thanks for reviewing!

In D132837#3802978, @foad wrote:

In D132837#3802595, @foad wrote:

I am a little confused by the test changes: one of them now uses fmac instead of mac, and the other uses mad instead of fma.

This is not your fault and need not block the patch. The choice of fma vs fmad depends on whether the combiner is running pre- or post-legalization, and it is a bit random because the pre-legalization combiner does not always re-run combines on new nodes that it creates, so some stuff is (wrongly) left for the post-legalizer combiner to clean up.

@deadalnix @RKSimon I debugged this to a case where running the pre-legalization combiner twice in a row would do extra combines that a single pass missed, and D127115 did not help. Is this to be expected? Is it worth debugging more to work out exactly why some combines were missed the first time?

spatel mentioned this in rGddd27a3d3934: [AArch64] add tests for fadd -> fma combines; NFC.Sep 21 2022, 6:01 AM

Note that there's a gray area for fast-math-flags with transforms like this: we generally just check FMF on the final value in the sequence to determine if the fold is allowed.
I haven't seen many examples of mixed FMF in practice, but front-ends are becoming more flexible about that via pragma or other decorations, so I added a couple of AArch64 tests to demonstrate current behavior:
ddd27a3d3934

If this patch is reverted for some reason, those tests will need to be updated.

foad mentioned this in D134810: [ISel] Fix DAG divergence after new FMA combine.Sep 28 2022, 7:35 AM

foad mentioned this in rG2c12a04bba76: [ISel] Fix DAG divergence after new FMA combine.Sep 28 2022, 11:44 AM

tsymalla mentioned this in D134856: [AMDGPU] Add use check in v_fma combine..Sep 29 2022, 12:59 AM

tsymalla mentioned this in rGa41dde2c625e: [AMDGPU] Add use check in v_fma combine..Sep 29 2022, 3:25 AM

slydiman mentioned this in D133235: [DAGCombiner] More opportunities to fuse fmul and fadd to fma aggressively.Oct 2 2022, 3:42 PM

foad mentioned this in D135150: [ISel] Fix crash in new FMA DAG combine.Oct 4 2022, 5:10 AM

foad mentioned this in rGaf947d9fcbbd: [ISel] Fix crash in new FMA DAG combine.Oct 4 2022, 7:19 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

76 lines

test/

CodeGen/

AArch64/

fadd-combines.ll

10 lines

AMDGPU/

dagcombine-fma-fmad.ll

30 lines

fadd-fma-fmul-combine.ll

4 lines

PowerPC/

fma-assoc.ll

18 lines

machine-combiner.ll

339 lines

X86/

fma_patterns.ll

30 lines

Diff 460670

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,146 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
}		}

// fold (fadd x, (fmul y, z)) -> (fma y, z, x)		// fold (fadd x, (fmul y, z)) -> (fma y, z, x)
// Note: Commutes FADD operands.		// Note: Commutes FADD operands.
if (isContractableFMUL(N1) && (Aggressive \|\| N1->hasOneUse())) {		if (isContractableFMUL(N1) && (Aggressive \|\| N1->hasOneUse())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT, N1.getOperand(0),		return DAG.getNode(PreferredFusedOpcode, SL, VT, N1.getOperand(0),
N1.getOperand(1), N0);		N1.getOperand(1), N0);
}		}

// fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)		// fadd (fma A, B, (fmul C, D)), E --> fma C, D, (fma A, B, E)
		foadUnsubmitted Not Done Reply Inline Actions This seems to be a more limited case of your new reassociation, so can we remove this code now? foad: This seems to be a more limited case of your new reassociation, so can we remove this code now?
// fadd E, (fma A, B, (fmul C, D)) --> fma A, B, (fma C, D, E)		// fadd E, (fma A, B, (fmul C, D)) --> fma C, D, (fma A, B, E)
		// This also works with nested fma instructions:
		foadUnsubmitted Done Reply Inline Actions Remove "on"? foad: Remove "on"?
		// fadd (fma A, B, (fma (C, D, (fmul (E, F))))), G -->
		// fma E, F, (fma C, D, fma (A, B, G))
		// fadd (G, (fma A, B, (fma (C, D, (fmul (E, F)))))) -->
		foadUnsubmitted Done Reply Inline Actions Looks like clang-format has mangled this comment! foad: Looks like clang-format has mangled this comment!
		// fma E, F, (fma C, D, fma (A, B, G)).
// This requires reassociation because it changes the order of operations.		// This requires reassociation because it changes the order of operations.

		// Moving the outermost FMA operands to the innermost FMA in the chain can
		// help with eliminating a final copy to an output register. For instance,
		// look at the DAG transformation
		// fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E).
		//
		// If a function wants to return the result of the outermost FMA, a final
		// COPY to the first available register will be inserted, which is then
		// returned.
		// In this case, this is the virtual register being assigned to A (%0). So,
		// this register is in use by the RETURN. During RA, a recolorization attempt
		// takes place. As A (%0) and B (%1) are in use by the outermost FMA, the
		foadUnsubmitted Not Done Reply Inline Actions You don't need to test TmpFMA here. Optionally, this loop could be rewritten as: if (E) { do { ... } while (isFusedOp(TmpFMA)); } foad: You don't need to test TmpFMA here. Optionally, this loop could be rewritten as: ``` if (E) {…
		// output register (%6) for the FMA can only be assigned to the first unused
		// register, which is the one for C (%2). During virtual register rewriting,
		// it will attempt to eliminate identity copies. However, as the register for
		// A (%0) is in use by the RETURN, it cannot eliminate the COPY. This means,
		// this will result in a superfluous move instruction. By swapping the
		// operands of the FMA instructions, the output register is freed up (because
		// %6 can be assigned to %0 as the outermost FMA uses only %2 and %3),
		// essentially letting the virtual register rewriter eliminate the final copy.

		if (CanReassociate) {
SDValue FMA, E;		SDValue FMA, E;
		arsenmUnsubmitted Not Done Reply Inline Actions Why use UpdateNodeOperands here instead of just constructing the new node normally? arsenm: Why use UpdateNodeOperands here instead of just constructing the new node normally?
		tsymallaAuthorUnsubmitted Done Reply Inline Actions This is updating the innermost FMA node which already exists, I am just replacing the last operand of it. The newly created node is the FMA node in NewFMA: Example steps: fadd (fma A, B, (fmul C, D)), E 1. NewFMA = fma_o (C, D, fma_i (A, B, fmul(C, D))) 2. fma_i = fma (A, B, E) => NewFMA = fma (C, D, fma (A, B, E)) More complex case: fadd (fma_i0 A, B, (fma (C, D, (fmul (E, F))))), G: 1. TmpFMA = fma_i = fma (C, D, (fmul (E, F)))) 2. NewFMA = fma (E, F, fma_i0) = fma (E, F, fma (A, B, fma (C, D, (fmul (E, F))))) (construct outermost FMA) 3. TmpFMA.UpdateNodeOperands: fma_i = fma (C, D, (fmul (E, F)) => fma (C, D, G) => fma (E, F, fma (A, B, fma (C, D, (fmul (E, F))))) = fma (E, F, fma (A, B, fma (C, D, G))) (replace innermost FMA operand with addition operator of the initial FADD) I can create a new node for the innermost FMA, but isn't `UpdateNodeOperands` being used for such cases? tsymalla: This is updating the innermost FMA node which already exists, I am just replacing the last…
if (CanReassociate && isFusedOp(N0) &&		if (isFusedOp(N0) && N0.hasOneUse()) {
N0.getOperand(2).getOpcode() == ISD::FMUL && N0.hasOneUse() &&
N0.getOperand(2).hasOneUse()) {
FMA = N0;		FMA = N0;
E = N1;		E = N1;
} else if (CanReassociate && isFusedOp(N1) &&		} else if (isFusedOp(N1) && N1.hasOneUse()) {
N1.getOperand(2).getOpcode() == ISD::FMUL && N1.hasOneUse() &&
N1.getOperand(2).hasOneUse()) {
FMA = N1;		FMA = N1;
E = N0;		E = N0;
}		}
if (FMA && E) {
SDValue A = FMA.getOperand(0);		SDValue TmpFMA = FMA;
SDValue B = FMA.getOperand(1);		while (E && TmpFMA && isFusedOp(TmpFMA)) {
SDValue C = FMA.getOperand(2).getOperand(0);		SDValue FMul = TmpFMA->getOperand(2);
SDValue D = FMA.getOperand(2).getOperand(1);		if (FMul.getOpcode() == ISD::FMUL && FMul.hasOneUse()) {
SDValue CDE = DAG.getNode(PreferredFusedOpcode, SL, VT, C, D, E);		SDValue A = TmpFMA->getOperand(0);
return DAG.getNode(PreferredFusedOpcode, SL, VT, A, B, CDE);		SDValue B = TmpFMA->getOperand(1);
		SDValue C = FMul.getOperand(0);
		SDValue D = FMul.getOperand(1);

		SDValue NewFMA = DAG.getNode(PreferredFusedOpcode, SL, VT, C, D, FMA);
		DAG.UpdateNodeOperands(TmpFMA.getNode(), A, B, E);

		return NewFMA;
		}

		TmpFMA = TmpFMA->getOperand(2);
		}
}		}
		foadUnsubmitted Not Done Reply Inline Actions Can you explain this? It is not at all obvious. Are you sure it's not just random? Perhaps it helps your case, but could harm other cases? foad: Can you explain this? It is not at all obvious. Are you sure it's not just random? Perhaps it…
		tsymallaAuthorUnsubmitted Done Reply Inline Actions When keeping the order fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E), The compiler will assign the result of the inner FMA to a new (virtual) register which will be used by the outer FMA. The outer FMA will write its output to the same virtual register which will, in the simple example (fmac_sequence_simple) cause a COPY to the first available register at the end. Finally, the register allocator will try recoloring the registers. This output register (in this case, %6) will be marked as recolorizable and assigned to the first available register. As the outer FMA is using %0 and % 1 (a and b), it can only assign to %2, which is in this case v2. During virtual register rewriting, the compiler will now try to eliminate identity copies. Even if the final copy is regarded as candidate for deletion, it cannot do so because the COPY is used by the final SI_RETURN. So, this generates a superfluous V_MOV at the end. By changing the operand order, we get following changes: The multiplicand operand order of the innermost and the outermost FMA is swapped. So, innermost uses %0 and %1 and outermost uses %2 and %3. So, the compiler is able to assign %6 (output of outermost FMA) to $vgpr0 because the register is not used inside the outermost FMA (only in the innermost FMA). By this, the final COPY can be eliminated because it's essentially a identity copy, removing the final V_MOV. Changing the instruction order in the DAG basically frees up the desired output register for the register allocator. I cannot assume this causes harm for other cases, but from fast-math point of view it should not cause any issues. Looking at the (currently failing) test cases, I don't see any actual issues, but please correct me if I'm wrong. tsymalla: When keeping the order ``` fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E), ```…
		foadUnsubmitted Not Done Reply Inline Actions https://reviews.llvm.org/differential/diff/460688/ adds a couple of variations of PPC test cases. All I have done is swapped the operands of the "fadd %F, %G" instruction. If you apply your patch on top of this you will see that the codegen gets worse for these cases, in exactly the same way that it got better for the existing tests. So I do not agree that changing the order of the fmas is a good thing in general. Because of this, I would prefer to keep the existing order of the fmas. That makes the combine simpler because you only have to mutate the fmul to an fma, and remove the fadd. You don't have to change the existing fmas at all. foad: https://reviews.llvm.org/differential/diff/460688/ adds a couple of variations of PPC test…
		tsymallaAuthorUnsubmitted Done Reply Inline Actions Agreed. Thanks for constructing the tests. However, in this example, the number of instructions (fma_innermost test) will not decrease due to the addition of the v_mov instead of constructing the return value in-place. I don't think this will negatively affect real-life examples (but I'll check), but for the sake of code quality, we could try getting rid of these moves probably at some other place. tsymalla: Agreed. Thanks for constructing the tests. However, in this example, the number of instructions…

// Look through FP_EXTEND nodes to do more combining.		// Look through FP_EXTEND nodes to do more combining.

// fold (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)		// fold (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)
if (N0.getOpcode() == ISD::FP_EXTEND) {		if (N0.getOpcode() == ISD::FP_EXTEND) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (isContractableFMUL(N00) &&		if (isContractableFMUL(N00) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
		foadUnsubmitted Done Reply Inline Actions I don't understand the logic here. What do copies and extracts have to do with reassociating fmuls and fadds? foad: I don't understand the logic here. What do copies and extracts have to do with reassociating…
N00.getValueType())) {		N00.getValueType())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(0)),		DAG.getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(0)),
DAG.getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(1)),		DAG.getNode(ISD::FP_EXTEND, SL, VT, N00.getOperand(1)),
N1);		N1);
}		}
}		}

▲ Show 20 Lines • Show All 10,967 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/fadd-combines.ll

Show First 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret <2 x double> %sub		ret <2 x double> %sub
}		}

; ((ab) + (cd)) + n1 --> (ab) + ((cd) + n1)		; ((ab) + (cd)) + n1 --> (ab) + ((cd) + n1)

define double @fadd_fma_fmul_1(double %a, double %b, double %c, double %d, double %n1) nounwind {		define double @fadd_fma_fmul_1(double %a, double %b, double %c, double %d, double %n1) nounwind {
; CHECK-LABEL: fadd_fma_fmul_1:		; CHECK-LABEL: fadd_fma_fmul_1:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: fmadd d2, d2, d3, d4		; CHECK-NEXT: fmadd d0, d0, d1, d4
; CHECK-NEXT: fmadd d0, d0, d1, d2		; CHECK-NEXT: fmadd d0, d2, d3, d0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%m1 = fmul fast double %a, %b		%m1 = fmul fast double %a, %b
%m2 = fmul fast double %c, %d		%m2 = fmul fast double %c, %d
%a1 = fadd fast double %m1, %m2		%a1 = fadd fast double %m1, %m2
%a2 = fadd fast double %a1, %n1		%a2 = fadd fast double %a1, %n1
ret double %a2		ret double %a2
}		}

; Minimum FMF - the 1st fadd is contracted because that combines		; Minimum FMF - the 1st fadd is contracted because that combines
; fmul+fadd as specified by the order of operations; the 2nd fadd		; fmul+fadd as specified by the order of operations; the 2nd fadd
; requires reassociation to fuse with c*d.		; requires reassociation to fuse with c*d.

define float @fadd_fma_fmul_fmf(float %a, float %b, float %c, float %d, float %n0) nounwind {		define float @fadd_fma_fmul_fmf(float %a, float %b, float %c, float %d, float %n0) nounwind {
; CHECK-LABEL: fadd_fma_fmul_fmf:		; CHECK-LABEL: fadd_fma_fmul_fmf:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: fmadd s2, s2, s3, s4		; CHECK-NEXT: fmadd s0, s0, s1, s4
; CHECK-NEXT: fmadd s0, s0, s1, s2		; CHECK-NEXT: fmadd s0, s2, s3, s0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%m1 = fmul contract float %a, %b		%m1 = fmul contract float %a, %b
%m2 = fmul contract float %c, %d		%m2 = fmul contract float %c, %d
%a1 = fadd contract float %m1, %m2		%a1 = fadd contract float %m1, %m2
%a2 = fadd contract reassoc float %n0, %a1		%a2 = fadd contract reassoc float %n0, %a1
ret float %a2		ret float %a2
}		}

Show All 15 Lines

; The final fadd can be folded with either 1 of the leading fmuls.		; The final fadd can be folded with either 1 of the leading fmuls.

define <2 x double> @fadd_fma_fmul_3(<2 x double> %x1, <2 x double> %x2, <2 x double> %x3, <2 x double> %x4, <2 x double> %x5, <2 x double> %x6, <2 x double> %x7, <2 x double> %x8) nounwind {		define <2 x double> @fadd_fma_fmul_3(<2 x double> %x1, <2 x double> %x2, <2 x double> %x3, <2 x double> %x4, <2 x double> %x5, <2 x double> %x6, <2 x double> %x7, <2 x double> %x8) nounwind {
; CHECK-LABEL: fadd_fma_fmul_3:		; CHECK-LABEL: fadd_fma_fmul_3:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: fmul v2.2d, v2.2d, v3.2d		; CHECK-NEXT: fmul v2.2d, v2.2d, v3.2d
; CHECK-NEXT: fmla v2.2d, v1.2d, v0.2d		; CHECK-NEXT: fmla v2.2d, v1.2d, v0.2d
; CHECK-NEXT: fmla v2.2d, v7.2d, v6.2d
; CHECK-NEXT: fmla v2.2d, v5.2d, v4.2d		; CHECK-NEXT: fmla v2.2d, v5.2d, v4.2d
		; CHECK-NEXT: fmla v2.2d, v7.2d, v6.2d
; CHECK-NEXT: mov v0.16b, v2.16b		; CHECK-NEXT: mov v0.16b, v2.16b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%m1 = fmul fast <2 x double> %x1, %x2		%m1 = fmul fast <2 x double> %x1, %x2
%m2 = fmul fast <2 x double> %x3, %x4		%m2 = fmul fast <2 x double> %x3, %x4
%m3 = fmul fast <2 x double> %x5, %x6		%m3 = fmul fast <2 x double> %x5, %x6
%m4 = fmul fast <2 x double> %x7, %x8		%m4 = fmul fast <2 x double> %x7, %x8
%a1 = fadd fast <2 x double> %m1, %m2		%a1 = fadd fast <2 x double> %m1, %m2
%a2 = fadd fast <2 x double> %m3, %m4		%a2 = fadd fast <2 x double> %m3, %m4
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll

Show All 28 Lines
; GCN-NEXT: s_buffer_load_dwordx4 s[0:3], s[0:3], 0x40		; GCN-NEXT: s_buffer_load_dwordx4 s[0:3], s[0:3], 0x40
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_clause 0x1		; GCN-NEXT: s_clause 0x1
; GCN-NEXT: s_buffer_load_dwordx4 s[4:7], s[0:3], 0x50		; GCN-NEXT: s_buffer_load_dwordx4 s[4:7], s[0:3], 0x50
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: s_buffer_load_dword s0, s[0:3], 0x2c		; GCN-NEXT: s_buffer_load_dword s0, s[0:3], 0x2c
; GCN-NEXT: v_sub_f32_e64 v5, s24, s28		; GCN-NEXT: v_sub_f32_e64 v5, s24, s28
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_clause 0x4		; GCN-NEXT: s_clause 0x3
; GCN-NEXT: s_buffer_load_dwordx4 s[8:11], s[0:3], 0x60		; GCN-NEXT: s_buffer_load_dwordx4 s[8:11], s[0:3], 0x60
; GCN-NEXT: s_buffer_load_dwordx4 s[12:15], s[0:3], 0x20		; GCN-NEXT: s_buffer_load_dwordx4 s[12:15], s[0:3], 0x20
; GCN-NEXT: s_buffer_load_dwordx4 s[16:19], s[0:3], 0x0		; GCN-NEXT: s_buffer_load_dwordx4 s[16:19], s[0:3], 0x0
; GCN-NEXT: s_buffer_load_dwordx4 s[20:23], s[0:3], 0x70		; GCN-NEXT: s_buffer_load_dwordx4 s[20:23], s[0:3], 0x70
		; GCN-NEXT: v_max_f32_e64 v6, s0, s0 clamp
; GCN-NEXT: s_buffer_load_dwordx4 s[24:27], s[0:3], 0x10		; GCN-NEXT: s_buffer_load_dwordx4 s[24:27], s[0:3], 0x10
; GCN-NEXT: v_fma_f32 v1, v1, v5, s28		; GCN-NEXT: v_fma_f32 v1, v1, v5, s28
; GCN-NEXT: v_max_f32_e64 v6, s0, s0 clamp
; GCN-NEXT: v_add_f32_e64 v5, s29, -1.0		; GCN-NEXT: v_add_f32_e64 v5, s29, -1.0
; GCN-NEXT: v_sub_f32_e32 v8, s0, v1
; GCN-NEXT: v_fma_f32 v7, -s2, v6, s6		; GCN-NEXT: v_fma_f32 v7, -s2, v6, s6
		; GCN-NEXT: v_sub_f32_e32 v8, s0, v1
; GCN-NEXT: v_fma_f32 v5, v6, v5, 1.0		; GCN-NEXT: v_fma_f32 v5, v6, v5, 1.0
; GCN-NEXT: v_mad_f32 v10, s2, v6, v2
; GCN-NEXT: s_mov_b32 s0, 0x3c23d70a		; GCN-NEXT: s_mov_b32 s0, 0x3c23d70a
		; GCN-NEXT: v_fma_f32 v7, v7, v6, v2
; GCN-NEXT: v_fmac_f32_e32 v1, v6, v8		; GCN-NEXT: v_fmac_f32_e32 v1, v6, v8
; GCN-NEXT: v_mac_f32_e32 v10, v7, v6		; GCN-NEXT: v_mac_f32_e32 v7, s2, v6
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: v_mul_f32_e32 v9, s10, v0		; GCN-NEXT: v_mul_f32_e32 v9, s10, v0
; GCN-NEXT: v_fma_f32 v0, -v0, s10, s14		; GCN-NEXT: v_fma_f32 v0, -v0, s10, s14
; GCN-NEXT: v_mul_f32_e32 v8, s18, v2		; GCN-NEXT: v_mul_f32_e32 v8, s18, v2
; GCN-NEXT: v_mul_f32_e32 v3, s22, v3		; GCN-NEXT: v_mul_f32_e32 v3, s22, v3
; GCN-NEXT: v_fmac_f32_e32 v9, v0, v6		; GCN-NEXT: v_fmac_f32_e32 v9, v0, v6
; GCN-NEXT: v_sub_f32_e32 v0, v1, v5		; GCN-NEXT: v_sub_f32_e32 v0, v1, v5
; GCN-NEXT: v_mul_f32_e32 v1, v8, v6		; GCN-NEXT: v_mul_f32_e32 v1, v8, v6
; GCN-NEXT: v_mul_f32_e32 v7, v6, v3		; GCN-NEXT: v_mul_f32_e32 v8, v6, v3
; GCN-NEXT: v_fma_f32 v3, -v6, v3, v9		; GCN-NEXT: v_fma_f32 v3, -v6, v3, v9
; GCN-NEXT: v_fmac_f32_e32 v5, v0, v6		; GCN-NEXT: v_fmac_f32_e32 v5, v0, v6
; GCN-NEXT: v_fma_f32 v0, v2, s26, -v1		; GCN-NEXT: v_fma_f32 v0, v2, s26, -v1
; GCN-NEXT: v_fmac_f32_e32 v7, v3, v6		; GCN-NEXT: v_fmac_f32_e32 v8, v3, v6
; GCN-NEXT: v_fmac_f32_e32 v1, v0, v6		; GCN-NEXT: v_fmac_f32_e32 v1, v0, v6
; GCN-NEXT: v_mul_f32_e32 v0, v2, v6		; GCN-NEXT: v_mul_f32_e32 v0, v2, v6
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_add_f32_e32 v4, v4, v10		; GCN-NEXT: v_add_f32_e32 v4, v4, v7
; GCN-NEXT: v_mul_f32_e32 v3, v4, v6		; GCN-NEXT: v_mul_f32_e32 v3, v4, v6
; GCN-NEXT: v_fmaak_f32 v4, s0, v5, 0x3ca3d70a		; GCN-NEXT: v_fmaak_f32 v4, s0, v5, 0x3ca3d70a
; GCN-NEXT: v_mul_f32_e32 v1, v3, v1		; GCN-NEXT: v_mul_f32_e32 v1, v3, v1
; GCN-NEXT: v_mul_f32_e32 v2, v7, v4		; GCN-NEXT: v_mul_f32_e32 v2, v8, v4
; GCN-NEXT: v_fmac_f32_e32 v1, v2, v0		; GCN-NEXT: v_fmac_f32_e32 v1, v2, v0
; GCN-NEXT: v_max_f32_e32 v0, 0, v1		; GCN-NEXT: v_max_f32_e32 v0, 0, v1
; GCN-NEXT: ; return to shader part epilog		; GCN-NEXT: ; return to shader part epilog
.entry:		.entry:
%0 = call <3 x float> @llvm.amdgcn.image.sample.2d.v3f32.f32(i32 7, float undef, float undef, <8 x i32> undef, <4 x i32> undef, i1 false, i32 0, i32 0)		%0 = call <3 x float> @llvm.amdgcn.image.sample.2d.v3f32.f32(i32 7, float undef, float undef, <8 x i32> undef, <4 x i32> undef, i1 false, i32 0, i32 0)
%.i2243 = extractelement <3 x float> %0, i32 2		%.i2243 = extractelement <3 x float> %0, i32 2
%1 = call <3 x i32> @llvm.amdgcn.s.buffer.load.v3i32(<4 x i32> undef, i32 0, i32 0)		%1 = call <3 x i32> @llvm.amdgcn.s.buffer.load.v3i32(<4 x i32> undef, i32 0, i32 0)
%2 = shufflevector <3 x i32> %1, <3 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 undef>		%2 = shufflevector <3 x i32> %1, <3 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 undef>
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	.entry:
%.i2539 = fmul reassoc nnan nsz arcp contract afn float %.i2536, %4		%.i2539 = fmul reassoc nnan nsz arcp contract afn float %.i2536, %4
%.i2542 = fadd reassoc nnan nsz arcp contract afn float %.i2488, %.i2539		%.i2542 = fadd reassoc nnan nsz arcp contract afn float %.i2488, %.i2539
%.i2545 = fmul reassoc nnan nsz arcp contract afn float %.i2525, %.i2542		%.i2545 = fmul reassoc nnan nsz arcp contract afn float %.i2525, %.i2542
%.i2548 = fadd reassoc nnan nsz arcp contract afn float %.i2469, %.i2545		%.i2548 = fadd reassoc nnan nsz arcp contract afn float %.i2469, %.i2545
%.i2551 = call reassoc nnan nsz arcp contract afn float @llvm.maxnum.f32(float %.i2548, float 0.000000e+00)		%.i2551 = call reassoc nnan nsz arcp contract afn float @llvm.maxnum.f32(float %.i2548, float 0.000000e+00)
ret float %.i2551		ret float %.i2551
}		}

define float @fmac_sequence_simple(float %a, float %b, float %c, float %d, float %e) #0 {		define float @fmac_sequence_simple(float %a, float %b, float %c, float %d, float %e) #0 {
		foadUnsubmitted Done Reply Inline Actions Please precommit the new tests and rebase, so the patch shows the codegen diff. foad: Please precommit the new tests and rebase, so the patch shows the codegen diff.
; GCN-LABEL: fmac_sequence_simple:		; GCN-LABEL: fmac_sequence_simple:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_waitcnt_vscnt null, 0x0		; GCN-NEXT: s_waitcnt_vscnt null, 0x0
; GCN-NEXT: v_fma_f32 v2, v2, v3, v4		; GCN-NEXT: v_fma_f32 v0, v0, v1, v4
; GCN-NEXT: v_fmac_f32_e32 v2, v0, v1		; GCN-NEXT: v_fmac_f32_e32 v0, v2, v3
; GCN-NEXT: v_mov_b32_e32 v0, v2
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
%t0 = fmul fast float %a, %b		%t0 = fmul fast float %a, %b
%t1 = fmul fast float %c, %d		%t1 = fmul fast float %c, %d
%t2 = fadd fast float %t0, %t1		%t2 = fadd fast float %t0, %t1
%t5 = fadd fast float %t2, %e		%t5 = fadd fast float %t2, %e
ret float %t5		ret float %t5
}		}

define float @fmac_sequence_innermost_fmul(float %a, float %b, float %c, float %d, float %e, float %f, float %g) #0 {		define float @fmac_sequence_innermost_fmul(float %a, float %b, float %c, float %d, float %e, float %f, float %g) #0 {
; GCN-LABEL: fmac_sequence_innermost_fmul:		; GCN-LABEL: fmac_sequence_innermost_fmul:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_waitcnt_vscnt null, 0x0		; GCN-NEXT: s_waitcnt_vscnt null, 0x0
; GCN-NEXT: v_mul_f32_e32 v2, v2, v3		; GCN-NEXT: v_fma_f32 v0, v0, v1, v6
; GCN-NEXT: v_fmac_f32_e32 v2, v0, v1		; GCN-NEXT: v_fmac_f32_e32 v0, v4, v5
; GCN-NEXT: v_fmac_f32_e32 v2, v4, v5		; GCN-NEXT: v_mac_f32_e32 v0, v2, v3
; GCN-NEXT: v_add_f32_e32 v0, v2, v6
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
%t0 = fmul fast float %a, %b		%t0 = fmul fast float %a, %b
%t1 = fmul fast float %c, %d		%t1 = fmul fast float %c, %d
%t2 = fadd fast float %t0, %t1		%t2 = fadd fast float %t0, %t1
%t3 = fmul fast float %e, %f		%t3 = fmul fast float %e, %f
%t4 = fadd fast float %t2, %t3		%t4 = fadd fast float %t2, %t3
%t5 = fadd fast float %t4, %g		%t5 = fadd fast float %t4, %g
ret float %t5		ret float %t5
Show All 30 Lines

llvm/test/CodeGen/AMDGPU/fadd-fma-fmul-combine.ll

	Show All 18 Lines
	; GCN-FLUSH-NEXT: buffer_load_dword v1, off, s[0:3], 0 glc			; GCN-FLUSH-NEXT: buffer_load_dword v1, off, s[0:3], 0 glc
	; GCN-FLUSH-NEXT: s_waitcnt vmcnt(0)			; GCN-FLUSH-NEXT: s_waitcnt vmcnt(0)
	; GCN-FLUSH-NEXT: buffer_load_dword v2, off, s[0:3], 0 glc			; GCN-FLUSH-NEXT: buffer_load_dword v2, off, s[0:3], 0 glc
	; GCN-FLUSH-NEXT: s_waitcnt vmcnt(0)			; GCN-FLUSH-NEXT: s_waitcnt vmcnt(0)
	; GCN-FLUSH-NEXT: buffer_load_dword v3, off, s[0:3], 0 glc			; GCN-FLUSH-NEXT: buffer_load_dword v3, off, s[0:3], 0 glc
	; GCN-FLUSH-NEXT: s_waitcnt vmcnt(0)			; GCN-FLUSH-NEXT: s_waitcnt vmcnt(0)
	; GCN-FLUSH-NEXT: buffer_load_dword v4, off, s[0:3], 0 glc			; GCN-FLUSH-NEXT: buffer_load_dword v4, off, s[0:3], 0 glc
	; GCN-FLUSH-NEXT: s_waitcnt vmcnt(0)			; GCN-FLUSH-NEXT: s_waitcnt vmcnt(0)
	; GCN-FLUSH-NEXT: v_mac_f32_e32 v2, v3, v4
	; GCN-FLUSH-NEXT: v_mac_f32_e32 v2, v0, v1			; GCN-FLUSH-NEXT: v_mac_f32_e32 v2, v0, v1
				; GCN-FLUSH-NEXT: v_mac_f32_e32 v2, v3, v4
	; GCN-FLUSH-NEXT: buffer_store_dword v2, off, s[0:3], 0			; GCN-FLUSH-NEXT: buffer_store_dword v2, off, s[0:3], 0
	; GCN-FLUSH-NEXT: s_waitcnt vmcnt(0)			; GCN-FLUSH-NEXT: s_waitcnt vmcnt(0)
	; GCN-FLUSH-NEXT: s_endpgm			; GCN-FLUSH-NEXT: s_endpgm
	;			;
	; GCN-FASTFMA-LABEL: fast_add_fmuladd_fmul:			; GCN-FASTFMA-LABEL: fast_add_fmuladd_fmul:
	; GCN-FASTFMA: ; %bb.0:			; GCN-FASTFMA: ; %bb.0:
	; GCN-FASTFMA-NEXT: s_mov_b32 s3, 0xf000			; GCN-FASTFMA-NEXT: s_mov_b32 s3, 0xf000
	; GCN-FASTFMA-NEXT: s_mov_b32 s2, -1			; GCN-FASTFMA-NEXT: s_mov_b32 s2, -1
	; GCN-FASTFMA-NEXT: buffer_load_dword v0, off, s[0:3], 0 glc			; GCN-FASTFMA-NEXT: buffer_load_dword v0, off, s[0:3], 0 glc
	; GCN-FASTFMA-NEXT: s_waitcnt vmcnt(0)			; GCN-FASTFMA-NEXT: s_waitcnt vmcnt(0)
	; GCN-FASTFMA-NEXT: buffer_load_dword v1, off, s[0:3], 0 glc			; GCN-FASTFMA-NEXT: buffer_load_dword v1, off, s[0:3], 0 glc
	; GCN-FASTFMA-NEXT: s_waitcnt vmcnt(0)			; GCN-FASTFMA-NEXT: s_waitcnt vmcnt(0)
	; GCN-FASTFMA-NEXT: buffer_load_dword v2, off, s[0:3], 0 glc			; GCN-FASTFMA-NEXT: buffer_load_dword v2, off, s[0:3], 0 glc
	; GCN-FASTFMA-NEXT: s_waitcnt vmcnt(0)			; GCN-FASTFMA-NEXT: s_waitcnt vmcnt(0)
	; GCN-FASTFMA-NEXT: buffer_load_dword v3, off, s[0:3], 0 glc			; GCN-FASTFMA-NEXT: buffer_load_dword v3, off, s[0:3], 0 glc
	; GCN-FASTFMA-NEXT: s_waitcnt vmcnt(0)			; GCN-FASTFMA-NEXT: s_waitcnt vmcnt(0)
	; GCN-FASTFMA-NEXT: buffer_load_dword v4, off, s[0:3], 0 glc			; GCN-FASTFMA-NEXT: buffer_load_dword v4, off, s[0:3], 0 glc
	; GCN-FASTFMA-NEXT: s_waitcnt vmcnt(0)			; GCN-FASTFMA-NEXT: s_waitcnt vmcnt(0)
	; GCN-FASTFMA-NEXT: v_fma_f32 v2, v3, v4, v2
	; GCN-FASTFMA-NEXT: v_fma_f32 v0, v0, v1, v2			; GCN-FASTFMA-NEXT: v_fma_f32 v0, v0, v1, v2
				; GCN-FASTFMA-NEXT: v_fma_f32 v0, v3, v4, v0
	; GCN-FASTFMA-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GCN-FASTFMA-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GCN-FASTFMA-NEXT: s_waitcnt vmcnt(0)			; GCN-FASTFMA-NEXT: s_waitcnt vmcnt(0)
	; GCN-FASTFMA-NEXT: s_endpgm			; GCN-FASTFMA-NEXT: s_endpgm
	;			;
	; GCN-SLOWFMA-LABEL: fast_add_fmuladd_fmul:			; GCN-SLOWFMA-LABEL: fast_add_fmuladd_fmul:
	; GCN-SLOWFMA: ; %bb.0:			; GCN-SLOWFMA: ; %bb.0:
	; GCN-SLOWFMA-NEXT: s_mov_b32 s3, 0xf000			; GCN-SLOWFMA-NEXT: s_mov_b32 s3, 0xf000
	; GCN-SLOWFMA-NEXT: s_mov_b32 s2, -1			; GCN-SLOWFMA-NEXT: s_mov_b32 s2, -1
	▲ Show 20 Lines • Show All 871 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/fma-assoc.ll

Show First 20 Lines • Show All 451 Lines • ▼ Show 20 Lines	; CHECK-SPE-NEXT: blr
%I = fpext float %H to double ; <double> [#uses=1]		%I = fpext float %H to double ; <double> [#uses=1]
%J = fsub double %E, %I ; <double> [#uses=1]		%J = fsub double %E, %I ; <double> [#uses=1]
ret double %J		ret double %J
}		}

define double @test_reassoc_FMADD_ASSOC1(double %A, double %B, double %C,		define double @test_reassoc_FMADD_ASSOC1(double %A, double %B, double %C,
; CHECK-LABEL: test_reassoc_FMADD_ASSOC1:		; CHECK-LABEL: test_reassoc_FMADD_ASSOC1:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: fmadd 0, 3, 4, 5		; CHECK-NEXT: fmadd 0, 1, 2, 5
; CHECK-NEXT: fmadd 1, 1, 2, 0		; CHECK-NEXT: fmadd 1, 3, 4, 0
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-VSX-LABEL: test_reassoc_FMADD_ASSOC1:		; CHECK-VSX-LABEL: test_reassoc_FMADD_ASSOC1:
; CHECK-VSX: # %bb.0:		; CHECK-VSX: # %bb.0:
; CHECK-VSX-NEXT: xsmaddmdp 3, 4, 5		; CHECK-VSX-NEXT: xsmaddmdp 1, 2, 5
; CHECK-VSX-NEXT: xsmaddadp 3, 1, 2		; CHECK-VSX-NEXT: xsmaddadp 1, 3, 4
; CHECK-VSX-NEXT: fmr 1, 3
; CHECK-VSX-NEXT: blr		; CHECK-VSX-NEXT: blr
;		;
; CHECK-SPE-LABEL: test_reassoc_FMADD_ASSOC1:		; CHECK-SPE-LABEL: test_reassoc_FMADD_ASSOC1:
; CHECK-SPE: # %bb.0:		; CHECK-SPE: # %bb.0:
; CHECK-SPE-NEXT: evmergelo 9, 9, 10		; CHECK-SPE-NEXT: evmergelo 9, 9, 10
; CHECK-SPE-NEXT: evmergelo 7, 7, 8		; CHECK-SPE-NEXT: evmergelo 7, 7, 8
; CHECK-SPE-NEXT: evmergelo 5, 5, 6		; CHECK-SPE-NEXT: evmergelo 5, 5, 6
; CHECK-SPE-NEXT: evmergelo 3, 3, 4		; CHECK-SPE-NEXT: evmergelo 3, 3, 4
Show All 12 Lines	; CHECK-SPE-NEXT: blr
%H = fadd reassoc double %F, %G ; <double> [#uses=1]		%H = fadd reassoc double %F, %G ; <double> [#uses=1]
%I = fadd reassoc double %H, %E ; <double> [#uses=1]		%I = fadd reassoc double %H, %E ; <double> [#uses=1]
ret double %I		ret double %I
}		}

define double @test_reassoc_FMADD_ASSOC2(double %A, double %B, double %C,		define double @test_reassoc_FMADD_ASSOC2(double %A, double %B, double %C,
; CHECK-LABEL: test_reassoc_FMADD_ASSOC2:		; CHECK-LABEL: test_reassoc_FMADD_ASSOC2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: fmadd 0, 3, 4, 5		; CHECK-NEXT: fmadd 0, 1, 2, 5
; CHECK-NEXT: fmadd 1, 1, 2, 0		; CHECK-NEXT: fmadd 1, 3, 4, 0
; CHECK-NEXT: blr		; CHECK-NEXT: blr
;		;
; CHECK-VSX-LABEL: test_reassoc_FMADD_ASSOC2:		; CHECK-VSX-LABEL: test_reassoc_FMADD_ASSOC2:
; CHECK-VSX: # %bb.0:		; CHECK-VSX: # %bb.0:
; CHECK-VSX-NEXT: xsmaddmdp 3, 4, 5		; CHECK-VSX-NEXT: xsmaddmdp 1, 2, 5
; CHECK-VSX-NEXT: xsmaddadp 3, 1, 2		; CHECK-VSX-NEXT: xsmaddadp 1, 3, 4
; CHECK-VSX-NEXT: fmr 1, 3
; CHECK-VSX-NEXT: blr		; CHECK-VSX-NEXT: blr
;		;
; CHECK-SPE-LABEL: test_reassoc_FMADD_ASSOC2:		; CHECK-SPE-LABEL: test_reassoc_FMADD_ASSOC2:
; CHECK-SPE: # %bb.0:		; CHECK-SPE: # %bb.0:
; CHECK-SPE-NEXT: evmergelo 9, 9, 10		; CHECK-SPE-NEXT: evmergelo 9, 9, 10
; CHECK-SPE-NEXT: evmergelo 7, 7, 8		; CHECK-SPE-NEXT: evmergelo 7, 7, 8
; CHECK-SPE-NEXT: evmergelo 5, 5, 6		; CHECK-SPE-NEXT: evmergelo 5, 5, 6
; CHECK-SPE-NEXT: evmergelo 3, 3, 4		; CHECK-SPE-NEXT: evmergelo 3, 3, 4
▲ Show 20 Lines • Show All 534 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/machine-combiner.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				foadUnsubmitted Not Done Reply Inline Actions Please do this as a separate patch and then rebase this patch on it. foad: Please do this as a separate patch and then rebase this patch on it.
				tsymallaAuthorUnsubmitted Done Reply Inline Actions These will be removed again in the next diff. tsymalla: These will be removed again in the next diff.
	; RUN: llc -verify-machineinstrs -O3 -mcpu=pwr7 < %s \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-PWR			; RUN: llc -verify-machineinstrs -O3 -mcpu=pwr7 < %s \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-PWR
	; RUN: llc -verify-machineinstrs -O3 -mcpu=pwr9 < %s \| FileCheck %s -check-prefix=FIXPOINT			; RUN: llc -verify-machineinstrs -O3 -mcpu=pwr9 < %s \| FileCheck %s -check-prefix=FIXPOINT
	target datalayout = "E-m:e-i64:64-n32:64"			target datalayout = "E-m:e-i64:64-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	; Verify that the first two adds are independent regardless of how the inputs are			; Verify that the first two adds are independent regardless of how the inputs are
	; commuted. The destination registers are used as source registers for the third add.			; commuted. The destination registers are used as source registers for the third add.

	define float @reassociate_adds1(float %x0, float %x1, float %x2, float %x3) {			define float @reassociate_adds1(float %x0, float %x1, float %x2, float %x3) {
	; CHECK-LABEL: reassociate_adds1:			; CHECK-LABEL: reassociate_adds1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK: fadds [[REG0:[0-9]+]], 1, 2			; CHECK-NEXT: fadds 0, 1, 2
	; CHECK: fadds [[REG1:[0-9]+]], 3, 4			; CHECK-NEXT: fadds 1, 3, 4
	; CHECK: fadds 1, [[REG0]], [[REG1]]			; CHECK-NEXT: fadds 1, 0, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: reassociate_adds1:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xsaddsp 0, 1, 2
				; FIXPOINT-NEXT: xsaddsp 1, 3, 4
				; FIXPOINT-NEXT: xsaddsp 1, 0, 1
				; FIXPOINT-NEXT: blr

	%t0 = fadd reassoc nsz float %x0, %x1			%t0 = fadd reassoc nsz float %x0, %x1
	%t1 = fadd reassoc nsz float %t0, %x2			%t1 = fadd reassoc nsz float %t0, %x2
	%t2 = fadd reassoc nsz float %t1, %x3			%t2 = fadd reassoc nsz float %t1, %x3
	ret float %t2			ret float %t2
	}			}

	define float @reassociate_adds2(float %x0, float %x1, float %x2, float %x3) {			define float @reassociate_adds2(float %x0, float %x1, float %x2, float %x3) {
	; CHECK-LABEL: reassociate_adds2:			; CHECK-LABEL: reassociate_adds2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK: fadds [[REG0:[0-9]+]], 1, 2			; CHECK-NEXT: fadds 0, 1, 2
	; CHECK: fadds [[REG1:[0-9]+]], 3, 4			; CHECK-NEXT: fadds 1, 3, 4
	; CHECK: fadds 1, [[REG0]], [[REG1]]			; CHECK-NEXT: fadds 1, 0, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: reassociate_adds2:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xsaddsp 0, 1, 2
				; FIXPOINT-NEXT: xsaddsp 1, 3, 4
				; FIXPOINT-NEXT: xsaddsp 1, 0, 1
				; FIXPOINT-NEXT: blr

	%t0 = fadd reassoc nsz float %x0, %x1			%t0 = fadd reassoc nsz float %x0, %x1
	%t1 = fadd reassoc nsz float %x2, %t0			%t1 = fadd reassoc nsz float %x2, %t0
	%t2 = fadd reassoc nsz float %t1, %x3			%t2 = fadd reassoc nsz float %t1, %x3
	ret float %t2			ret float %t2
	}			}

	define float @reassociate_adds3(float %x0, float %x1, float %x2, float %x3) {			define float @reassociate_adds3(float %x0, float %x1, float %x2, float %x3) {
	; CHECK-LABEL: reassociate_adds3:			; CHECK-LABEL: reassociate_adds3:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK: fadds [[REG0:[0-9]+]], 1, 2			; CHECK-NEXT: fadds 0, 1, 2
	; CHECK: fadds [[REG1:[0-9]+]], 3, 4			; CHECK-NEXT: fadds 1, 3, 4
	; CHECK: fadds 1, [[REG0]], [[REG1]]			; CHECK-NEXT: fadds 1, 0, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: reassociate_adds3:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xsaddsp 0, 1, 2
				; FIXPOINT-NEXT: xsaddsp 1, 3, 4
				; FIXPOINT-NEXT: xsaddsp 1, 0, 1
				; FIXPOINT-NEXT: blr

	%t0 = fadd reassoc nsz float %x0, %x1			%t0 = fadd reassoc nsz float %x0, %x1
	%t1 = fadd reassoc nsz float %t0, %x2			%t1 = fadd reassoc nsz float %t0, %x2
	%t2 = fadd reassoc nsz float %x3, %t1			%t2 = fadd reassoc nsz float %x3, %t1
	ret float %t2			ret float %t2
	}			}

	define float @reassociate_adds4(float %x0, float %x1, float %x2, float %x3) {			define float @reassociate_adds4(float %x0, float %x1, float %x2, float %x3) {
	; CHECK-LABEL: reassociate_adds4:			; CHECK-LABEL: reassociate_adds4:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK: fadds [[REG0:[0-9]+]], 1, 2			; CHECK-NEXT: fadds 0, 1, 2
	; CHECK: fadds [[REG1:[0-9]+]], 3, 4			; CHECK-NEXT: fadds 1, 3, 4
	; CHECK: fadds 1, [[REG0]], [[REG1]]			; CHECK-NEXT: fadds 1, 0, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: reassociate_adds4:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xsaddsp 0, 1, 2
				; FIXPOINT-NEXT: xsaddsp 1, 3, 4
				; FIXPOINT-NEXT: xsaddsp 1, 0, 1
				; FIXPOINT-NEXT: blr

	%t0 = fadd reassoc nsz float %x0, %x1			%t0 = fadd reassoc nsz float %x0, %x1
	%t1 = fadd reassoc nsz float %x2, %t0			%t1 = fadd reassoc nsz float %x2, %t0
	%t2 = fadd reassoc nsz float %x3, %t1			%t2 = fadd reassoc nsz float %x3, %t1
	ret float %t2			ret float %t2
	}			}

	; Verify that we reassociate some of these ops. The optimal balanced tree of adds is not			; Verify that we reassociate some of these ops. The optimal balanced tree of adds is not
	; produced because that would cost more compile time.			; produced because that would cost more compile time.

	define float @reassociate_adds5(float %x0, float %x1, float %x2, float %x3, float %x4, float %x5, float %x6, float %x7) {			define float @reassociate_adds5(float %x0, float %x1, float %x2, float %x3, float %x4, float %x5, float %x6, float %x7) {
	; CHECK-LABEL: reassociate_adds5:			; CHECK-LABEL: reassociate_adds5:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-DAG: fadds [[REG12:[0-9]+]], 5, 6			; CHECK-NEXT: fadds 0, 1, 2
	; CHECK-DAG: fadds [[REG0:[0-9]+]], 1, 2			; CHECK-NEXT: fadds 1, 3, 4
	; CHECK-DAG: fadds [[REG11:[0-9]+]], 3, 4			; CHECK-NEXT: fadds 2, 5, 6
	; CHECK-DAG: fadds [[REG13:[0-9]+]], [[REG12]], 7			; CHECK-NEXT: fadds 0, 0, 1
	; CHECK-DAG: fadds [[REG1:[0-9]+]], [[REG0]], [[REG11]]			; CHECK-NEXT: fadds 1, 2, 7
	; CHECK-DAG: fadds [[REG2:[0-9]+]], [[REG1]], [[REG13]]			; CHECK-NEXT: fadds 0, 0, 1
	; CHECK: fadds 1, [[REG2]], 8			; CHECK-NEXT: fadds 1, 0, 8
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: reassociate_adds5:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xsaddsp 0, 1, 2
				; FIXPOINT-NEXT: xsaddsp 1, 3, 4
				; FIXPOINT-NEXT: xsaddsp 0, 0, 1
				; FIXPOINT-NEXT: xsaddsp 1, 5, 6
				; FIXPOINT-NEXT: xsaddsp 1, 1, 7
				; FIXPOINT-NEXT: xsaddsp 0, 0, 1
				; FIXPOINT-NEXT: xsaddsp 1, 0, 8
				; FIXPOINT-NEXT: blr

	%t0 = fadd reassoc nsz float %x0, %x1			%t0 = fadd reassoc nsz float %x0, %x1
	%t1 = fadd reassoc nsz float %t0, %x2			%t1 = fadd reassoc nsz float %t0, %x2
	%t2 = fadd reassoc nsz float %t1, %x3			%t2 = fadd reassoc nsz float %t1, %x3
	%t3 = fadd reassoc nsz float %t2, %x4			%t3 = fadd reassoc nsz float %t2, %x4
	%t4 = fadd reassoc nsz float %t3, %x5			%t4 = fadd reassoc nsz float %t3, %x5
	%t5 = fadd reassoc nsz float %t4, %x6			%t5 = fadd reassoc nsz float %t4, %x6
	%t6 = fadd reassoc nsz float %t5, %x7			%t6 = fadd reassoc nsz float %t5, %x7
	ret float %t6			ret float %t6
	}			}

	; Verify that we reassociate vector instructions too.			; Verify that we reassociate vector instructions too.

	define <4 x float> @vector_reassociate_adds1(<4 x float> %x0, <4 x float> %x1, <4 x float> %x2, <4 x float> %x3) {			define <4 x float> @vector_reassociate_adds1(<4 x float> %x0, <4 x float> %x1, <4 x float> %x2, <4 x float> %x3) {
	; CHECK-LABEL: vector_reassociate_adds1:			; CHECK-LABEL: vector_reassociate_adds1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-PWR: xvaddsp [[REG0:[0-9]+]], 34, 35			; CHECK-NEXT: xvaddsp 0, 34, 35
	; CHECK-PWR: xvaddsp [[REG1:[0-9]+]], 36, 37			; CHECK-NEXT: xvaddsp 1, 36, 37
	; CHECK-PWR: xvaddsp 34, [[REG0]], [[REG1]]			; CHECK-NEXT: xvaddsp 34, 0, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: vector_reassociate_adds1:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xvaddsp 0, 34, 35
				; FIXPOINT-NEXT: xvaddsp 1, 36, 37
				; FIXPOINT-NEXT: xvaddsp 34, 0, 1
				; FIXPOINT-NEXT: blr

	%t0 = fadd reassoc nsz <4 x float> %x0, %x1			%t0 = fadd reassoc nsz <4 x float> %x0, %x1
	%t1 = fadd reassoc nsz <4 x float> %t0, %x2			%t1 = fadd reassoc nsz <4 x float> %t0, %x2
	%t2 = fadd reassoc nsz <4 x float> %t1, %x3			%t2 = fadd reassoc nsz <4 x float> %t1, %x3
	ret <4 x float> %t2			ret <4 x float> %t2
	}			}

	define <4 x float> @vector_reassociate_adds2(<4 x float> %x0, <4 x float> %x1, <4 x float> %x2, <4 x float> %x3) {			define <4 x float> @vector_reassociate_adds2(<4 x float> %x0, <4 x float> %x1, <4 x float> %x2, <4 x float> %x3) {
	; CHECK-LABEL: vector_reassociate_adds2:			; CHECK-LABEL: vector_reassociate_adds2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-PWR: xvaddsp [[REG0:[0-9]+]], 34, 35			; CHECK-NEXT: xvaddsp 0, 34, 35
	; CHECK-PWR: xvaddsp [[REG1:[0-9]+]], 36, 37			; CHECK-NEXT: xvaddsp 1, 36, 37
	; CHECK-PWR: xvaddsp 34, [[REG0]], [[REG1]]			; CHECK-NEXT: xvaddsp 34, 0, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: vector_reassociate_adds2:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xvaddsp 0, 34, 35
				; FIXPOINT-NEXT: xvaddsp 1, 36, 37
				; FIXPOINT-NEXT: xvaddsp 34, 0, 1
				; FIXPOINT-NEXT: blr

	%t0 = fadd reassoc nsz <4 x float> %x0, %x1			%t0 = fadd reassoc nsz <4 x float> %x0, %x1
	%t1 = fadd reassoc nsz <4 x float> %x2, %t0			%t1 = fadd reassoc nsz <4 x float> %x2, %t0
	%t2 = fadd reassoc nsz <4 x float> %t1, %x3			%t2 = fadd reassoc nsz <4 x float> %t1, %x3
	ret <4 x float> %t2			ret <4 x float> %t2
	}			}

	define <4 x float> @vector_reassociate_adds3(<4 x float> %x0, <4 x float> %x1, <4 x float> %x2, <4 x float> %x3) {			define <4 x float> @vector_reassociate_adds3(<4 x float> %x0, <4 x float> %x1, <4 x float> %x2, <4 x float> %x3) {
	; CHECK-LABEL: vector_reassociate_adds3:			; CHECK-LABEL: vector_reassociate_adds3:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-PWR: xvaddsp [[REG0:[0-9]+]], 34, 35			; CHECK-NEXT: xvaddsp 0, 34, 35
	; CHECK-PWR: xvaddsp [[REG1:[0-9]+]], 36, 37			; CHECK-NEXT: xvaddsp 1, 36, 37
	; CHECK-PWR: xvaddsp 34, [[REG0]], [[REG1]]			; CHECK-NEXT: xvaddsp 34, 0, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: vector_reassociate_adds3:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xvaddsp 0, 34, 35
				; FIXPOINT-NEXT: xvaddsp 1, 36, 37
				; FIXPOINT-NEXT: xvaddsp 34, 0, 1
				; FIXPOINT-NEXT: blr

	%t0 = fadd reassoc nsz <4 x float> %x0, %x1			%t0 = fadd reassoc nsz <4 x float> %x0, %x1
	%t1 = fadd reassoc nsz <4 x float> %t0, %x2			%t1 = fadd reassoc nsz <4 x float> %t0, %x2
	%t2 = fadd reassoc nsz <4 x float> %x3, %t1			%t2 = fadd reassoc nsz <4 x float> %x3, %t1
	ret <4 x float> %t2			ret <4 x float> %t2
	}			}

	define <4 x float> @vector_reassociate_adds4(<4 x float> %x0, <4 x float> %x1, <4 x float> %x2, <4 x float> %x3) {			define <4 x float> @vector_reassociate_adds4(<4 x float> %x0, <4 x float> %x1, <4 x float> %x2, <4 x float> %x3) {
	; CHECK-LABEL: vector_reassociate_adds4:			; CHECK-LABEL: vector_reassociate_adds4:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-PWR: xvaddsp [[REG0:[0-9]+]], 34, 35			; CHECK-NEXT: xvaddsp 0, 34, 35
	; CHECK-PWR: xvaddsp [[REG1:[0-9]+]], 36, 37			; CHECK-NEXT: xvaddsp 1, 36, 37
	; CHECK-PWR: xvaddsp 34, [[REG0]], [[REG1]]			; CHECK-NEXT: xvaddsp 34, 0, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: vector_reassociate_adds4:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xvaddsp 0, 34, 35
				; FIXPOINT-NEXT: xvaddsp 1, 36, 37
				; FIXPOINT-NEXT: xvaddsp 34, 0, 1
				; FIXPOINT-NEXT: blr

	%t0 = fadd reassoc nsz <4 x float> %x0, %x1			%t0 = fadd reassoc nsz <4 x float> %x0, %x1
	%t1 = fadd reassoc nsz <4 x float> %x2, %t0			%t1 = fadd reassoc nsz <4 x float> %x2, %t0
	%t2 = fadd reassoc nsz <4 x float> %x3, %t1			%t2 = fadd reassoc nsz <4 x float> %x3, %t1
	ret <4 x float> %t2			ret <4 x float> %t2
	}			}

	define float @reassociate_adds6(float %x0, float %x1, float %x2, float %x3) {			define float @reassociate_adds6(float %x0, float %x1, float %x2, float %x3) {
				; CHECK-LABEL: reassociate_adds6:
				; CHECK: # %bb.0:
				; CHECK-NEXT: fdivs 0, 1, 2
				; CHECK-NEXT: fadds 0, 3, 0
				; CHECK-NEXT: fadds 1, 4, 0
				; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: reassociate_adds6:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xsdivsp 0, 1, 2
				; FIXPOINT-NEXT: xsaddsp 0, 3, 0
				; FIXPOINT-NEXT: xsaddsp 1, 4, 0
				; FIXPOINT-NEXT: blr
	%t0 = fdiv float %x0, %x1			%t0 = fdiv float %x0, %x1
	%t1 = fadd float %x2, %t0			%t1 = fadd float %x2, %t0
	%t2 = fadd float %x3, %t1			%t2 = fadd float %x3, %t1
	ret float %t2			ret float %t2
	}			}

	define float @reassociate_muls1(float %x0, float %x1, float %x2, float %x3) {			define float @reassociate_muls1(float %x0, float %x1, float %x2, float %x3) {
				; CHECK-LABEL: reassociate_muls1:
				; CHECK: # %bb.0:
				; CHECK-NEXT: fdivs 0, 1, 2
				; CHECK-NEXT: fmuls 0, 3, 0
				; CHECK-NEXT: fmuls 1, 4, 0
				; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: reassociate_muls1:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xsdivsp 0, 1, 2
				; FIXPOINT-NEXT: xsmulsp 0, 3, 0
				; FIXPOINT-NEXT: xsmulsp 1, 4, 0
				; FIXPOINT-NEXT: blr
	%t0 = fdiv float %x0, %x1			%t0 = fdiv float %x0, %x1
	%t1 = fmul float %x2, %t0			%t1 = fmul float %x2, %t0
	%t2 = fmul float %x3, %t1			%t2 = fmul float %x3, %t1
	ret float %t2			ret float %t2
	}			}

	define double @reassociate_adds_double(double %x0, double %x1, double %x2, double %x3) {			define double @reassociate_adds_double(double %x0, double %x1, double %x2, double %x3) {
				; CHECK-LABEL: reassociate_adds_double:
				; CHECK: # %bb.0:
				; CHECK-NEXT: xsdivdp 0, 1, 2
				; CHECK-NEXT: xsadddp 0, 3, 0
				; CHECK-NEXT: xsadddp 1, 4, 0
				; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: reassociate_adds_double:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xsdivdp 0, 1, 2
				; FIXPOINT-NEXT: xsadddp 0, 3, 0
				; FIXPOINT-NEXT: xsadddp 1, 4, 0
				; FIXPOINT-NEXT: blr
	%t0 = fdiv double %x0, %x1			%t0 = fdiv double %x0, %x1
	%t1 = fadd double %x2, %t0			%t1 = fadd double %x2, %t0
	%t2 = fadd double %x3, %t1			%t2 = fadd double %x3, %t1
	ret double %t2			ret double %t2
	}			}

	define double @reassociate_muls_double(double %x0, double %x1, double %x2, double %x3) {			define double @reassociate_muls_double(double %x0, double %x1, double %x2, double %x3) {
				; CHECK-LABEL: reassociate_muls_double:
				; CHECK: # %bb.0:
				; CHECK-NEXT: xsdivdp 0, 1, 2
				; CHECK-NEXT: xsmuldp 0, 3, 0
				; CHECK-NEXT: xsmuldp 1, 4, 0
				; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: reassociate_muls_double:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xsdivdp 0, 1, 2
				; FIXPOINT-NEXT: xsmuldp 0, 3, 0
				; FIXPOINT-NEXT: xsmuldp 1, 4, 0
				; FIXPOINT-NEXT: blr
	%t0 = fdiv double %x0, %x1			%t0 = fdiv double %x0, %x1
	%t1 = fmul double %x2, %t0			%t1 = fmul double %x2, %t0
	%t2 = fmul double %x3, %t1			%t2 = fmul double %x3, %t1
	ret double %t2			ret double %t2
	}			}

	define i32 @reassociate_mullw(i32 %x0, i32 %x1, i32 %x2, i32 %x3) {			define i32 @reassociate_mullw(i32 %x0, i32 %x1, i32 %x2, i32 %x3) {
				; CHECK-LABEL: reassociate_mullw:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mullw 3, 3, 4
				; CHECK-NEXT: mullw 4, 5, 6
				; CHECK-NEXT: mullw 3, 3, 4
				; CHECK-NEXT: blr
				;
	; FIXPOINT-LABEL: reassociate_mullw:			; FIXPOINT-LABEL: reassociate_mullw:
	; FIXPOINT: # %bb.0:			; FIXPOINT: # %bb.0:
	; FIXPOINT: mullw [[REG0:[0-9]+]], 3, 4			; FIXPOINT-NEXT: mullw 3, 3, 4
	; FIXPOINT: mullw [[REG1:[0-9]+]], 5, 6			; FIXPOINT-NEXT: mullw 4, 5, 6
	; FIXPOINT: mullw 3, [[REG0]], [[REG1]]			; FIXPOINT-NEXT: mullw 3, 3, 4
	; FIXPOINT-NEXT: blr			; FIXPOINT-NEXT: blr

	%t0 = mul i32 %x0, %x1			%t0 = mul i32 %x0, %x1
	%t1 = mul i32 %t0, %x2			%t1 = mul i32 %t0, %x2
	%t2 = mul i32 %t1, %x3			%t2 = mul i32 %t1, %x3
	ret i32 %t2			ret i32 %t2
	}			}

	define i64 @reassociate_mulld(i64 %x0, i64 %x1, i64 %x2, i64 %x3) {			define i64 @reassociate_mulld(i64 %x0, i64 %x1, i64 %x2, i64 %x3) {
				; CHECK-LABEL: reassociate_mulld:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulld 3, 3, 4
				; CHECK-NEXT: mulld 4, 5, 6
				; CHECK-NEXT: mulld 3, 3, 4
				; CHECK-NEXT: blr
				;
	; FIXPOINT-LABEL: reassociate_mulld:			; FIXPOINT-LABEL: reassociate_mulld:
	; FIXPOINT: # %bb.0:			; FIXPOINT: # %bb.0:
	; FIXPOINT: mulld [[REG0:[0-9]+]], 3, 4			; FIXPOINT-NEXT: mulld 3, 3, 4
	; FIXPOINT: mulld [[REG1:[0-9]+]], 5, 6			; FIXPOINT-NEXT: mulld 4, 5, 6
	; FIXPOINT: mulld 3, [[REG0]], [[REG1]]			; FIXPOINT-NEXT: mulld 3, 3, 4
	; FIXPOINT-NEXT: blr			; FIXPOINT-NEXT: blr

	%t0 = mul i64 %x0, %x1			%t0 = mul i64 %x0, %x1
	%t1 = mul i64 %t0, %x2			%t1 = mul i64 %t0, %x2
	%t2 = mul i64 %t1, %x3			%t2 = mul i64 %t1, %x3
	ret i64 %t2			ret i64 %t2
	}			}

	define double @reassociate_mamaa_double(double %0, double %1, double %2, double %3, double %4, double %5) {			define double @reassociate_mamaa_double(double %0, double %1, double %2, double %3, double %4, double %5) {
	; CHECK-LABEL: reassociate_mamaa_double:			; CHECK-LABEL: reassociate_mamaa_double:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-PWR-DAG: xsmaddadp 1, 6, 5			; CHECK-NEXT: xsmaddadp 1, 6, 5
	; CHECK-PWR-DAG: xsmaddadp 2, 4, 3			; CHECK-NEXT: xsmaddadp 2, 4, 3
	; CHECK-PWR: xsadddp 1, 2, 1			; CHECK-NEXT: xsadddp 1, 2, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: reassociate_mamaa_double:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xsmaddadp 1, 6, 5
				; FIXPOINT-NEXT: xsmaddadp 2, 4, 3
				; FIXPOINT-NEXT: xsadddp 1, 2, 1
				; FIXPOINT-NEXT: blr
	%7 = fmul contract reassoc nsz double %3, %2			%7 = fmul contract reassoc nsz double %3, %2
	%8 = fmul contract reassoc nsz double %5, %4			%8 = fmul contract reassoc nsz double %5, %4
	%9 = fadd contract reassoc nsz double %1, %0			%9 = fadd contract reassoc nsz double %1, %0
	%10 = fadd contract reassoc nsz double %9, %7			%10 = fadd contract reassoc nsz double %9, %7
	%11 = fadd contract reassoc nsz double %10, %8			%11 = fadd contract reassoc nsz double %10, %8
	ret double %11			ret double %11
	}			}

	define float @reassociate_mamaa_float(float %0, float %1, float %2, float %3, float %4, float %5) {			define float @reassociate_mamaa_float(float %0, float %1, float %2, float %3, float %4, float %5) {
	; CHECK-LABEL: reassociate_mamaa_float:			; CHECK-LABEL: reassociate_mamaa_float:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-DAG: fmadds [[REG0:[0-9]+]], 4, 3, 2			; CHECK-NEXT: fmadds 0, 6, 5, 1
	; CHECK-DAG: fmadds [[REG1:[0-9]+]], 6, 5, 1			; CHECK-NEXT: fmadds 1, 4, 3, 2
	; CHECK: fadds 1, [[REG0]], [[REG1]]			; CHECK-NEXT: fadds 1, 1, 0
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: reassociate_mamaa_float:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xsmaddasp 1, 6, 5
				; FIXPOINT-NEXT: xsmaddasp 2, 4, 3
				; FIXPOINT-NEXT: xsaddsp 1, 2, 1
				; FIXPOINT-NEXT: blr
	%7 = fmul contract reassoc nsz float %3, %2			%7 = fmul contract reassoc nsz float %3, %2
	%8 = fmul contract reassoc nsz float %5, %4			%8 = fmul contract reassoc nsz float %5, %4
	%9 = fadd contract reassoc nsz float %1, %0			%9 = fadd contract reassoc nsz float %1, %0
	%10 = fadd contract reassoc nsz float %9, %7			%10 = fadd contract reassoc nsz float %9, %7
	%11 = fadd contract reassoc nsz float %10, %8			%11 = fadd contract reassoc nsz float %10, %8
	ret float %11			ret float %11
	}			}

	define <4 x float> @reassociate_mamaa_vec(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3, <4 x float> %4, <4 x float> %5) {			define <4 x float> @reassociate_mamaa_vec(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3, <4 x float> %4, <4 x float> %5) {
	; CHECK-LABEL: reassociate_mamaa_vec:			; CHECK-LABEL: reassociate_mamaa_vec:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-PWR-DAG: xvmaddasp [[REG0:[0-9]+]], 39, 38			; CHECK-NEXT: xvmaddasp 34, 39, 38
	; CHECK-PWR-DAG: xvmaddasp [[REG1:[0-9]+]], 37, 36			; CHECK-NEXT: xvmaddasp 35, 37, 36
	; CHECK-PWR: xvaddsp 34, [[REG1]], [[REG0]]			; CHECK-NEXT: xvaddsp 34, 35, 34
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: reassociate_mamaa_vec:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xvmaddasp 34, 39, 38
				; FIXPOINT-NEXT: xvmaddasp 35, 37, 36
				; FIXPOINT-NEXT: xvaddsp 34, 35, 34
				; FIXPOINT-NEXT: blr
	%7 = fmul contract reassoc nsz <4 x float> %3, %2			%7 = fmul contract reassoc nsz <4 x float> %3, %2
	%8 = fmul contract reassoc nsz <4 x float> %5, %4			%8 = fmul contract reassoc nsz <4 x float> %5, %4
	%9 = fadd contract reassoc nsz <4 x float> %1, %0			%9 = fadd contract reassoc nsz <4 x float> %1, %0
	%10 = fadd contract reassoc nsz <4 x float> %9, %7			%10 = fadd contract reassoc nsz <4 x float> %9, %7
	%11 = fadd contract reassoc nsz <4 x float> %10, %8			%11 = fadd contract reassoc nsz <4 x float> %10, %8
	ret <4 x float> %11			ret <4 x float> %11
	}			}

	define double @reassociate_mamama_double(double %0, double %1, double %2, double %3, double %4, double %5, double %6, double %7, double %8) {			define double @reassociate_mamama_double(double %0, double %1, double %2, double %3, double %4, double %5, double %6, double %7, double %8) {
	; CHECK-LABEL: reassociate_mamama_double:			; CHECK-LABEL: reassociate_mamama_double:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-PWR: xsmaddadp 7, 2, 1			; CHECK-NEXT: xsmaddadp 7, 4, 3
	; CHECK-PWR-DAG: xsmuldp [[REG0:[0-9]+]], 4, 3			; CHECK-NEXT: xsmuldp 0, 2, 1
	; CHECK-PWR-DAG: xsmaddadp 7, 6, 5			; CHECK-NEXT: xsmaddadp 7, 6, 5
	; CHECK-PWR-DAG: xsmaddadp [[REG0]], 9, 8			; CHECK-NEXT: xsmaddadp 0, 9, 8
	; CHECK-PWR: xsadddp 1, 7, [[REG0]]			; CHECK-NEXT: xsadddp 1, 7, 0
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: reassociate_mamama_double:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xsmaddadp 7, 4, 3
				; FIXPOINT-NEXT: xsmuldp 0, 2, 1
				; FIXPOINT-NEXT: xsmaddadp 7, 6, 5
				; FIXPOINT-NEXT: xsmaddadp 0, 9, 8
				; FIXPOINT-NEXT: xsadddp 1, 7, 0
				; FIXPOINT-NEXT: blr
	%10 = fmul contract reassoc nsz double %1, %0			%10 = fmul contract reassoc nsz double %1, %0
	%11 = fmul contract reassoc nsz double %3, %2			%11 = fmul contract reassoc nsz double %3, %2
	%12 = fmul contract reassoc nsz double %5, %4			%12 = fmul contract reassoc nsz double %5, %4
	%13 = fmul contract reassoc nsz double %8, %7			%13 = fmul contract reassoc nsz double %8, %7
	%14 = fadd contract reassoc nsz double %11, %10			%14 = fadd contract reassoc nsz double %11, %10
	%15 = fadd contract reassoc nsz double %14, %6			%15 = fadd contract reassoc nsz double %14, %6
	%16 = fadd contract reassoc nsz double %15, %12			%16 = fadd contract reassoc nsz double %15, %12
	%17 = fadd contract reassoc nsz double %16, %13			%17 = fadd contract reassoc nsz double %16, %13
	ret double %17			ret double %17
	}			}

	define dso_local float @reassociate_mamama_8(float %0, float %1, float %2, float %3, float %4, float %5, float %6, float %7, float %8,			define dso_local float @reassociate_mamama_8(float %0, float %1, float %2, float %3, float %4, float %5, float %6, float %7, float %8,
	float %9, float %10, float %11, float %12, float %13, float %14, float %15, float %16) {
	; CHECK-LABEL: reassociate_mamama_8:			; CHECK-LABEL: reassociate_mamama_8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-DAG: fmadds [[REG0:[0-9]+]], 3, 2, 1			; CHECK-NEXT: fmadds 0, 3, 2, 1
	; CHECK-DAG: fmuls [[REG1:[0-9]+]], 5, 4			; CHECK-NEXT: fmuls 1, 5, 4
	; CHECK-DAG: fmadds [[REG2:[0-9]+]], 7, 6, [[REG0]]			; CHECK-NEXT: lfs 2, 172(1)
	; CHECK-DAG: fmadds [[REG3:[0-9]+]], 9, 8, [[REG1]]			; CHECK-NEXT: lfs 3, 180(1)
	;			; CHECK-NEXT: lfs 4, 156(1)
	; CHECK-DAG: fmadds [[REG4:[0-9]+]], 13, 12, [[REG3]]			; CHECK-NEXT: lfs 5, 164(1)
	; CHECK-DAG: fmadds [[REG5:[0-9]+]], 11, 10, [[REG2]]			; CHECK-NEXT: fmadds 0, 7, 6, 0
	;			; CHECK-NEXT: fmadds 1, 9, 8, 1
	; CHECK-DAG: fmadds [[REG6:[0-9]+]], 3, 2, [[REG4]]			; CHECK-NEXT: fmadds 1, 13, 12, 1
	; CHECK-DAG: fmadds [[REG7:[0-9]+]], 5, 4, [[REG5]]			; CHECK-NEXT: fmadds 0, 11, 10, 0
	; CHECK: fadds 1, [[REG7]], [[REG6]]			; CHECK-NEXT: fmadds 1, 3, 2, 1
				; CHECK-NEXT: fmadds 0, 5, 4, 0
				; CHECK-NEXT: fadds 1, 0, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; FIXPOINT-LABEL: reassociate_mamama_8:
				; FIXPOINT: # %bb.0:
				; FIXPOINT-NEXT: xsmaddasp 1, 3, 2
				; FIXPOINT-NEXT: xsmulsp 2, 5, 4
				; FIXPOINT-NEXT: lxssp 2, 180(1)
				; FIXPOINT-NEXT: lxssp 3, 156(1)
				; FIXPOINT-NEXT: lxssp 4, 164(1)
				; FIXPOINT-NEXT: xsmaddasp 1, 7, 6
				; FIXPOINT-NEXT: xsmaddasp 2, 9, 8
				; FIXPOINT-NEXT: lfs 0, 172(1)
				; FIXPOINT-NEXT: xsmaddasp 2, 13, 12
				; FIXPOINT-NEXT: xsmaddasp 1, 11, 10
				; FIXPOINT-NEXT: xsmaddasp 2, 34, 0
				; FIXPOINT-NEXT: xsmaddasp 1, 36, 35
				; FIXPOINT-NEXT: xsaddsp 1, 1, 2
				; FIXPOINT-NEXT: blr
				float %9, float %10, float %11, float %12, float %13, float %14, float %15, float %16) {
	%18 = fmul contract reassoc nsz float %2, %1			%18 = fmul contract reassoc nsz float %2, %1
	%19 = fadd contract reassoc nsz float %18, %0			%19 = fadd contract reassoc nsz float %18, %0
	%20 = fmul contract reassoc nsz float %4, %3			%20 = fmul contract reassoc nsz float %4, %3
	%21 = fadd contract reassoc nsz float %19, %20			%21 = fadd contract reassoc nsz float %19, %20
	%22 = fmul contract reassoc nsz float %6, %5			%22 = fmul contract reassoc nsz float %6, %5
	%23 = fadd contract reassoc nsz float %21, %22			%23 = fadd contract reassoc nsz float %21, %22
	%24 = fmul contract reassoc nsz float %8, %7			%24 = fmul contract reassoc nsz float %8, %7
	%25 = fadd contract reassoc nsz float %23, %24			%25 = fadd contract reassoc nsz float %23, %24
	%26 = fmul contract reassoc nsz float %10, %9			%26 = fmul contract reassoc nsz float %10, %9
	%27 = fadd contract reassoc nsz float %25, %26			%27 = fadd contract reassoc nsz float %25, %26
	%28 = fmul contract reassoc nsz float %12, %11			%28 = fmul contract reassoc nsz float %12, %11
	%29 = fadd contract reassoc nsz float %27, %28			%29 = fadd contract reassoc nsz float %27, %28
	%30 = fmul contract reassoc nsz float %14, %13			%30 = fmul contract reassoc nsz float %14, %13
	%31 = fadd contract reassoc nsz float %29, %30			%31 = fadd contract reassoc nsz float %29, %30
	%32 = fmul contract reassoc nsz float %16, %15			%32 = fmul contract reassoc nsz float %16, %15
	%33 = fadd contract reassoc nsz float %31, %32			%33 = fadd contract reassoc nsz float %31, %32
	ret float %33			ret float %33
	}			}

				;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
				; CHECK-PWR: {{.*}}

llvm/test/CodeGen/X86/fma_patterns.ll

Show First 20 Lines • Show All 1,794 Lines • ▼ Show 20 Lines	; AVX512-NEXT: retq
ret <4 x double> %n		ret <4 x double> %n
}		}

; ((ab) + (cd)) + n1 --> (ab) + ((cd) + n1)		; ((ab) + (cd)) + n1 --> (ab) + ((cd) + n1)

define double @fadd_fma_fmul_1(double %a, double %b, double %c, double %d, double %n1) nounwind {		define double @fadd_fma_fmul_1(double %a, double %b, double %c, double %d, double %n1) nounwind {
; FMA-LABEL: fadd_fma_fmul_1:		; FMA-LABEL: fadd_fma_fmul_1:
; FMA: # %bb.0:		; FMA: # %bb.0:
; FMA-NEXT: vfmadd213sd {{.#+}} xmm2 = (xmm3 xmm2) + xmm4		; FMA-NEXT: vfmadd213sd {{.#+}} xmm0 = (xmm1 xmm0) + xmm4
; FMA-NEXT: vfmadd213sd {{.#+}} xmm0 = (xmm1 xmm0) + xmm2		; FMA-NEXT: vfmadd231sd {{.#+}} xmm0 = (xmm3 xmm2) + xmm0
; FMA-NEXT: retq		; FMA-NEXT: retq
;		;
; FMA4-LABEL: fadd_fma_fmul_1:		; FMA4-LABEL: fadd_fma_fmul_1:
; FMA4: # %bb.0:		; FMA4: # %bb.0:
; FMA4-NEXT: vfmaddsd {{.#+}} xmm2 = (xmm2 xmm3) + xmm4		; FMA4-NEXT: vfmaddsd {{.#+}} xmm0 = (xmm0 xmm1) + xmm4
; FMA4-NEXT: vfmaddsd {{.#+}} xmm0 = (xmm0 xmm1) + xmm2		; FMA4-NEXT: vfmaddsd {{.#+}} xmm0 = (xmm2 xmm3) + xmm0
; FMA4-NEXT: retq		; FMA4-NEXT: retq
;		;
; AVX512-LABEL: fadd_fma_fmul_1:		; AVX512-LABEL: fadd_fma_fmul_1:
; AVX512: # %bb.0:		; AVX512: # %bb.0:
; AVX512-NEXT: vfmadd213sd {{.#+}} xmm2 = (xmm3 xmm2) + xmm4		; AVX512-NEXT: vfmadd213sd {{.#+}} xmm0 = (xmm1 xmm0) + xmm4
; AVX512-NEXT: vfmadd213sd {{.#+}} xmm0 = (xmm1 xmm0) + xmm2		; AVX512-NEXT: vfmadd231sd {{.#+}} xmm0 = (xmm3 xmm2) + xmm0
; AVX512-NEXT: retq		; AVX512-NEXT: retq
%m1 = fmul fast double %a, %b		%m1 = fmul fast double %a, %b
%m2 = fmul fast double %c, %d		%m2 = fmul fast double %c, %d
%a1 = fadd fast double %m1, %m2		%a1 = fadd fast double %m1, %m2
%a2 = fadd fast double %a1, %n1		%a2 = fadd fast double %a1, %n1
ret double %a2		ret double %a2
}		}

; Minimum FMF - the 1st fadd is contracted because that combines		; Minimum FMF - the 1st fadd is contracted because that combines
; fmul+fadd as specified by the order of operations; the 2nd fadd		; fmul+fadd as specified by the order of operations; the 2nd fadd
; requires reassociation to fuse with c*d.		; requires reassociation to fuse with c*d.

define float @fadd_fma_fmul_fmf(float %a, float %b, float %c, float %d, float %n0) nounwind {		define float @fadd_fma_fmul_fmf(float %a, float %b, float %c, float %d, float %n0) nounwind {
; FMA-LABEL: fadd_fma_fmul_fmf:		; FMA-LABEL: fadd_fma_fmul_fmf:
; FMA: # %bb.0:		; FMA: # %bb.0:
; FMA-NEXT: vfmadd213ss {{.#+}} xmm2 = (xmm3 xmm2) + xmm4		; FMA-NEXT: vfmadd213ss {{.#+}} xmm0 = (xmm1 xmm0) + xmm4
; FMA-NEXT: vfmadd213ss {{.#+}} xmm0 = (xmm1 xmm0) + xmm2		; FMA-NEXT: vfmadd231ss {{.#+}} xmm0 = (xmm3 xmm2) + xmm0
; FMA-NEXT: retq		; FMA-NEXT: retq
;		;
; FMA4-LABEL: fadd_fma_fmul_fmf:		; FMA4-LABEL: fadd_fma_fmul_fmf:
; FMA4: # %bb.0:		; FMA4: # %bb.0:
; FMA4-NEXT: vfmaddss {{.#+}} xmm2 = (xmm2 xmm3) + xmm4		; FMA4-NEXT: vfmaddss {{.#+}} xmm0 = (xmm0 xmm1) + xmm4
; FMA4-NEXT: vfmaddss {{.#+}} xmm0 = (xmm0 xmm1) + xmm2		; FMA4-NEXT: vfmaddss {{.#+}} xmm0 = (xmm2 xmm3) + xmm0
; FMA4-NEXT: retq		; FMA4-NEXT: retq
;		;
; AVX512-LABEL: fadd_fma_fmul_fmf:		; AVX512-LABEL: fadd_fma_fmul_fmf:
; AVX512: # %bb.0:		; AVX512: # %bb.0:
; AVX512-NEXT: vfmadd213ss {{.#+}} xmm2 = (xmm3 xmm2) + xmm4		; AVX512-NEXT: vfmadd213ss {{.#+}} xmm0 = (xmm1 xmm0) + xmm4
; AVX512-NEXT: vfmadd213ss {{.#+}} xmm0 = (xmm1 xmm0) + xmm2		; AVX512-NEXT: vfmadd231ss {{.#+}} xmm0 = (xmm3 xmm2) + xmm0
; AVX512-NEXT: retq		; AVX512-NEXT: retq
%m1 = fmul float %a, %b		%m1 = fmul float %a, %b
%m2 = fmul float %c, %d		%m2 = fmul float %c, %d
%a1 = fadd contract float %m1, %m2		%a1 = fadd contract float %m1, %m2
%a2 = fadd reassoc float %n0, %a1		%a2 = fadd reassoc float %n0, %a1
ret float %a2		ret float %a2
}		}

Show All 29 Lines

; The final fadd can be folded with either 1 of the leading fmuls.		; The final fadd can be folded with either 1 of the leading fmuls.

define <2 x double> @fadd_fma_fmul_3(<2 x double> %x1, <2 x double> %x2, <2 x double> %x3, <2 x double> %x4, <2 x double> %x5, <2 x double> %x6, <2 x double> %x7, <2 x double> %x8) nounwind {		define <2 x double> @fadd_fma_fmul_3(<2 x double> %x1, <2 x double> %x2, <2 x double> %x3, <2 x double> %x4, <2 x double> %x5, <2 x double> %x6, <2 x double> %x7, <2 x double> %x8) nounwind {
; FMA-LABEL: fadd_fma_fmul_3:		; FMA-LABEL: fadd_fma_fmul_3:
; FMA: # %bb.0:		; FMA: # %bb.0:
; FMA-NEXT: vmulpd %xmm3, %xmm2, %xmm2		; FMA-NEXT: vmulpd %xmm3, %xmm2, %xmm2
; FMA-NEXT: vfmadd231pd {{.#+}} xmm2 = (xmm1 xmm0) + xmm2		; FMA-NEXT: vfmadd231pd {{.#+}} xmm2 = (xmm1 xmm0) + xmm2
; FMA-NEXT: vfmadd231pd {{.#+}} xmm2 = (xmm7 xmm6) + xmm2
; FMA-NEXT: vfmadd231pd {{.#+}} xmm2 = (xmm5 xmm4) + xmm2		; FMA-NEXT: vfmadd231pd {{.#+}} xmm2 = (xmm5 xmm4) + xmm2
		; FMA-NEXT: vfmadd231pd {{.#+}} xmm2 = (xmm7 xmm6) + xmm2
; FMA-NEXT: vmovapd %xmm2, %xmm0		; FMA-NEXT: vmovapd %xmm2, %xmm0
; FMA-NEXT: retq		; FMA-NEXT: retq
;		;
; FMA4-LABEL: fadd_fma_fmul_3:		; FMA4-LABEL: fadd_fma_fmul_3:
; FMA4: # %bb.0:		; FMA4: # %bb.0:
; FMA4-NEXT: vmulpd %xmm3, %xmm2, %xmm2		; FMA4-NEXT: vmulpd %xmm3, %xmm2, %xmm2
; FMA4-NEXT: vfmaddpd {{.#+}} xmm0 = (xmm0 xmm1) + xmm2		; FMA4-NEXT: vfmaddpd {{.#+}} xmm0 = (xmm0 xmm1) + xmm2
; FMA4-NEXT: vfmaddpd {{.#+}} xmm0 = (xmm6 xmm7) + xmm0
; FMA4-NEXT: vfmaddpd {{.#+}} xmm0 = (xmm4 xmm5) + xmm0		; FMA4-NEXT: vfmaddpd {{.#+}} xmm0 = (xmm4 xmm5) + xmm0
		; FMA4-NEXT: vfmaddpd {{.#+}} xmm0 = (xmm6 xmm7) + xmm0
; FMA4-NEXT: retq		; FMA4-NEXT: retq
;		;
; AVX512-LABEL: fadd_fma_fmul_3:		; AVX512-LABEL: fadd_fma_fmul_3:
; AVX512: # %bb.0:		; AVX512: # %bb.0:
; AVX512-NEXT: vmulpd %xmm3, %xmm2, %xmm2		; AVX512-NEXT: vmulpd %xmm3, %xmm2, %xmm2
; AVX512-NEXT: vfmadd231pd {{.#+}} xmm2 = (xmm1 xmm0) + xmm2		; AVX512-NEXT: vfmadd231pd {{.#+}} xmm2 = (xmm1 xmm0) + xmm2
; AVX512-NEXT: vfmadd231pd {{.#+}} xmm2 = (xmm7 xmm6) + xmm2
; AVX512-NEXT: vfmadd231pd {{.#+}} xmm2 = (xmm5 xmm4) + xmm2		; AVX512-NEXT: vfmadd231pd {{.#+}} xmm2 = (xmm5 xmm4) + xmm2
		; AVX512-NEXT: vfmadd231pd {{.#+}} xmm2 = (xmm7 xmm6) + xmm2
; AVX512-NEXT: vmovapd %xmm2, %xmm0		; AVX512-NEXT: vmovapd %xmm2, %xmm0
; AVX512-NEXT: retq		; AVX512-NEXT: retq
%m1 = fmul fast <2 x double> %x1, %x2		%m1 = fmul fast <2 x double> %x1, %x2
%m2 = fmul fast <2 x double> %x3, %x4		%m2 = fmul fast <2 x double> %x3, %x4
%m3 = fmul fast <2 x double> %x5, %x6		%m3 = fmul fast <2 x double> %x5, %x6
%m4 = fmul fast <2 x double> %x7, %x8		%m4 = fmul fast <2 x double> %x7, %x8
%a1 = fadd fast <2 x double> %m1, %m2		%a1 = fadd fast <2 x double> %m1, %m2
%a2 = fadd fast <2 x double> %m3, %m4		%a2 = fadd fast <2 x double> %m3, %m4
▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ISel] Enable generating more fma instructions.ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 460670

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/AArch64/fadd-combines.ll

llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll

llvm/test/CodeGen/AMDGPU/fadd-fma-fmul-combine.ll

llvm/test/CodeGen/PowerPC/fma-assoc.ll

llvm/test/CodeGen/PowerPC/machine-combiner.ll

llvm/test/CodeGen/X86/fma_patterns.ll

[ISel] Enable generating more fma instructions.
ClosedPublic