This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
1
MacroFusion.h
-
lib/
-
CodeGen/
-
MacroFusion.cpp
-
Target/
-
AArch64/
-
AArch64MacroFusion.cpp
-
AMDGPU/
-
AMDGPUMacroFusion.cpp
-
ARM/
-
ARMMacroFusion.cpp
-
X86/
-
X86MacroFusion.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
2
macro-fusion-verify.ll

Differential D69998

[MacroFusion] Create the missing artificial edges if there are more than 2 SU fused.
AbandonedPublic

Authored by steven.zhang on Nov 8 2019, 2:16 AM.

Download Raw Diff

Details

Reviewers

jsji
nemanjai
hfinkel
fhahn
evandro
MatzeB
arsenm

Group Reviewers

Restricted Project

Summary

For now, llvm MacroFusion would fuse the adjacent instructions no matter if it has been fused before. However, we miss to create some edges that cause problem.

Assume that we have the code:

int foo(int a, int b, int c, int d) {
  return a + b + c +d;
}

And ADD and ADD are a fusion pair. And this is the Dependency graph.

+------+       +------+       +------+       +------+
|  A   |       |  B   |       |  C   |       |  D   |
+--+--++       +---+--+       +--+---+       +--+---+
   ^  ^            ^  ^          ^              ^
   |  |            |  |          |              |
   |  |            |  |New1      +--------------+
   |  |            |  |          |
   |  |            |  |       +--+---+
   |  |New2        |  +-------+ ADD1 |
   |  |            |          +--+---+
   |  |            |    Fuse     ^
   |  |            +-------------+
   |  +------------+
   |               |
   |   Fuse     +--+---+
   +----------->+ ADD2 |
   |            +------+
+--+---+
| ADD3 |
+------+

When ADD1 and ADD2 are fused, we will create an artificial edge New1 to make sure that, B is scheduled before ADD1. And when ADD3 and ADD2 are fused,
another artificial edge New2 is created to make sure that, A is scheduled before ADD2. However, this is NOT enough. We need to create another artificial edge from ADD1 to A to make sure that, A is scheduled before ADD1 also.

Diff Detail

Event Timeline

steven.zhang created this revision.Nov 8 2019, 2:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2019, 2:16 AM

Herald added subscribers: • wuzish, hiraditya, nhaehnle and 2 others. · View Herald Transcript

My understanding is that you are doing two fixes in this patch:

Extend API of shouldScheduleAdjacent to avoid fusing an instruction more than once along the same dependency chain.
Extend fuseInstructionPair to add artificial data dependencies for chained-fusion

To me, yes, we should do #1 first, as https://reviews.llvm.org/D36704 was already trying to do so.

#2 might not be necessary, as you mentioned, all existing targets only support back-to-back fusion.
If we want to extend it for chained-fusion (3 or more in same dependency chain),
then I believe we need more changes than adding data dependencies.
Also need additional tests for some target that support it (maybe RISC-V?)

llvm/include/llvm/CodeGen/MacroFusion.h
34	New parameters, should be documented in above comment as well.
llvm/test/CodeGen/AArch64/macro-fusion-verify.ll
4	This comment is confusing. I believe the goal of this patch is to avoid chained-fusion, hence reducing unnecessary dependency, so you would like to verify that there is no `extra dependency`?
25	Maybe we should check that we only fuse `SU4 SU5`, not `SU5 SU6` too.

steven.zhang added a reviewer: arsenm.Nov 10 2019, 6:21 PM

Herald added a subscriber: wdng. · View Herald TranscriptNov 10 2019, 6:21 PM

In D69998#1739692, @jsji wrote:

My understanding is that you are doing two fixes in this patch:

Extend API of shouldScheduleAdjacent to avoid fusing an instruction more than once along the same dependency chain.

Extend fuseInstructionPair to add artificial data dependencies for chained-fusion

To me, yes, we should do #1 first, as https://reviews.llvm.org/D36704 was already trying to do so.

Yes, but still have problems.

#2 might not be necessary, as you mentioned, all existing targets only support back-to-back fusion.
If we want to extend it for chained-fusion (3 or more in same dependency chain),
then I believe we need more changes than adding data dependencies.
Also need additional tests for some target that support it (maybe RISC-V?)

The current implementation has already been implemented to support more than 2 SU's fusion, However, it misses to create some dependency edges. The patch is trying to fix the bug of the macro fusion infrastructure. From my understanding,
adding these missing edges is enough. I didn't see llvm support the macro fusion for RISC-V target. AMDGPU supports more than 2 SU's cluster for load. @arsenm Would you please help me confirm if AMDGPU target supports more than 2 SU's fusion ?

qiucf added a subscriber: qiucf.Nov 10 2019, 7:29 PM

This comment was removed by qiucf.

In D69998#1740279, @steven.zhang wrote:

In D69998#1739692, @jsji wrote:

My understanding is that you are doing two fixes in this patch:

Extend API of shouldScheduleAdjacent to avoid fusing an instruction more than once along the same dependency chain.

Extend fuseInstructionPair to add artificial data dependencies for chained-fusion

To me, yes, we should do #1 first, as https://reviews.llvm.org/D36704 was already trying to do so.

Yes, but still have problems.

#2 might not be necessary, as you mentioned, all existing targets only support back-to-back fusion.
If we want to extend it for chained-fusion (3 or more in same dependency chain),
then I believe we need more changes than adding data dependencies.
Also need additional tests for some target that support it (maybe RISC-V?)

The current implementation has already been implemented to support more than 2 SU's fusion, However, it misses to create some dependency edges. The patch is trying to fix the bug of the macro fusion infrastructure. From my understanding,
adding these missing edges is enough. I didn't see llvm support the macro fusion for RISC-V target. AMDGPU supports more than 2 SU's cluster for load. @arsenm Would you please help me confirm if AMDGPU target supports more than 2 SU's fusion ?

Load/store clustering should produce > 2 sections of clusters, but I don't remember the details of how the DAG mutation is implemented. Specifically for the MacroFusion mutation, I'm not sure. It may be useful to combine one def with multiple uses, but I'm not sure if that actually happens now.

In D69998#1740295, @arsenm wrote:

In D69998#1740279, @steven.zhang wrote:

In D69998#1739692, @jsji wrote:

My understanding is that you are doing two fixes in this patch:

Extend API of shouldScheduleAdjacent to avoid fusing an instruction more than once along the same dependency chain.

Extend fuseInstructionPair to add artificial data dependencies for chained-fusion

To me, yes, we should do #1 first, as https://reviews.llvm.org/D36704 was already trying to do so.

Yes, but still have problems.

#2 might not be necessary, as you mentioned, all existing targets only support back-to-back fusion.
If we want to extend it for chained-fusion (3 or more in same dependency chain),
then I believe we need more changes than adding data dependencies.
Also need additional tests for some target that support it (maybe RISC-V?)

The current implementation has already been implemented to support more than 2 SU's fusion, However, it misses to create some dependency edges. The patch is trying to fix the bug of the macro fusion infrastructure. From my understanding,
adding these missing edges is enough. I didn't see llvm support the macro fusion for RISC-V target. AMDGPU supports more than 2 SU's cluster for load. @arsenm Would you please help me confirm if AMDGPU target supports more than 2 SU's fusion ?

Load/store clustering should produce > 2 sections of clusters, but I don't remember the details of how the DAG mutation is implemented. Specifically for the MacroFusion mutation, I'm not sure. It may be useful to combine one def with multiple uses, but I'm not sure if that actually happens now.

Yeah, from my investigation, the MacroFusion implementation should support it. Do you know the AMDGPU hw supports more than 2 SU's macro fusion as the Load/Store cluster or just as other target, that is back-to-back. I guess it is also back-to-back, but I want to confirm it.

In D69998#1740313, @steven.zhang wrote:

In D69998#1740295, @arsenm wrote:

In D69998#1740279, @steven.zhang wrote:

In D69998#1739692, @jsji wrote:

My understanding is that you are doing two fixes in this patch:

Extend API of shouldScheduleAdjacent to avoid fusing an instruction more than once along the same dependency chain.

Extend fuseInstructionPair to add artificial data dependencies for chained-fusion

To me, yes, we should do #1 first, as https://reviews.llvm.org/D36704 was already trying to do so.

Yes, but still have problems.

#2 might not be necessary, as you mentioned, all existing targets only support back-to-back fusion.
If we want to extend it for chained-fusion (3 or more in same dependency chain),
then I believe we need more changes than adding data dependencies.
Also need additional tests for some target that support it (maybe RISC-V?)

The current implementation has already been implemented to support more than 2 SU's fusion, However, it misses to create some dependency edges. The patch is trying to fix the bug of the macro fusion infrastructure. From my understanding,
adding these missing edges is enough. I didn't see llvm support the macro fusion for RISC-V target. AMDGPU supports more than 2 SU's cluster for load. @arsenm Would you please help me confirm if AMDGPU target supports more than 2 SU's fusion ?

Load/store clustering should produce > 2 sections of clusters, but I don't remember the details of how the DAG mutation is implemented. Specifically for the MacroFusion mutation, I'm not sure. It may be useful to combine one def with multiple uses, but I'm not sure if that actually happens now.

Yeah, from my investigation, the MacroFusion implementation should support it. Do you know the AMDGPU hw supports more than 2 SU's macro fusion as the Load/Store cluster or just as other target, that is back-to-back. I guess it is also back-to-back, but I want to confirm it.

Load/Store does benefit from multiple instructions back to back.

The MacroFusion doesn't need back to back scheduling. We just want the use of the condition register to follow the def because it usually means we can use a smaller instruction encoding. It doesn't need to be the next instruction, it's just helpful to avoid another condition register def between the two instructions.

I get it. Thank you!

I will split this patch into two in response with Jinsong's comments.

Fix the missing edges.
Extend the interface to allow the target to specify the max fuse SU number.

This patch is to fix the missing edges. I have updated the patch.

steven.zhang added a child revision: D70066: [MacroFusion] Limit the max fused number as 2 to reduce the dependency.Nov 10 2019, 11:53 PM

https://reviews.llvm.org/D70066 is created to limit the max number of the fusion instr.

Gentle ping.

At first glance, this patch seems sensible, but I'm not sure that D70066 is necessary.

fhahn mentioned this in D70066: [MacroFusion] Limit the max fused number as 2 to reduce the dependency.Nov 25 2019, 4:38 AM

steven.zhang removed a child revision: D70066: [MacroFusion] Limit the max fused number as 2 to reduce the dependency.Nov 26 2019, 9:11 PM

Gentle ping...

With rGd84b320dfd0a7dbedacc287ede5e5bc4c0f113ba landed, is this still relevant?

In D69998#1768588, @fhahn wrote:

With rGd84b320dfd0a7dbedacc287ede5e5bc4c0f113ba landed, is this still relevant?

Yes, they are different problems. rGd84b320dfd0a7dbedacc287ede5e5bc4c0f113ba is trying to limit the number of chained SU's as two, while this patch is to fix the problem if we want to chain more than two SU's, though it is limited to two now. But by the design, we should have it work well if someone want to relax the limit later. It is somewhat like, we have the ability to chain any number of SU's, but now, it is limited to two, instead of, we can only chain two SU's, and have bugs if chain more.

There won't be any compiling time impact for this patch if it is limited to two, as the pred/succ of CurrentSU is always null if only chain two SU's.

In D69998#1769958, @steven.zhang wrote:

In D69998#1768588, @fhahn wrote:

With rGd84b320dfd0a7dbedacc287ede5e5bc4c0f113ba landed, is this still relevant?

Yes, they are different problems. rGd84b320dfd0a7dbedacc287ede5e5bc4c0f113ba is trying to limit the number of chained SU's as two, while this patch is to fix the problem if we want to chain more than two SU's, though it is limited to two now. But by the design, we should have it work well if someone want to relax the limit later. It is somewhat like, we have the ability to chain any number of SU's, but now, it is limited to two, instead of, we can only chain two SU's, and have bugs if chain more.

Sure, but currently there can be no bug and the interface prevents that case from happening. I don't see why we would need to deal with a case that might happen in the future, if the interface changes. To me it seems like the time to fix that would be when the interface gets extended. Until then, we cannot test the patch. To support fusing more than pairs, I think it would be better to do this in a separate function and deal with those cases there, rather than unnecessarily complicating the code for pairs.

There won't be any compiling time impact for this patch if it is limited to two, as the pred/succ of CurrentSU is always null if only chain two SU's.

Hm I do not think that's true, we still need to check all the predecessors/successors of the SUs, which potentially can be a large number for bad inputs. The compile-time impact of this patch on its own might be quite small, but at least in degenerate cases it could be measurable (same for rGd84b320dfd0a7dbedacc287ede5e5bc4c0f113ba )

Hmm, we are putting a bomb here if someone want to get extends :P But I agree with you that as we cannot test this patch now, it is NOT the best time to fix it.

In D69998#1772106, @steven.zhang wrote:

Hmm, we are putting a bomb here if someone want to get extends :P But I agree with you that as we cannot test this patch now, it is NOT the best time to fix it.

One possible way to deal with that would be to add an assertion that we chain at most 2 instructions together here, with a comment what the issue will be with chains longer than 2 instructions.

Good suggestion, thank you! I will post a patch to remove that bomb.

https://reviews.llvm.org/D71180 is created.

nhaehnle removed a subscriber: nhaehnle.Dec 9 2019, 1:26 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

MacroFusion.h

3 lines

lib/

CodeGen/

MacroFusion.cpp

59 lines

Target/

AArch64/

AArch64MacroFusion.cpp

7 lines

AMDGPU/

AMDGPUMacroFusion.cpp

7 lines

ARM/

ARMMacroFusion.cpp

7 lines

X86/

X86MacroFusion.cpp

7 lines

test/

CodeGen/

AArch64/

macro-fusion-verify.ll

40 lines

Diff 228375

llvm/include/llvm/CodeGen/MacroFusion.h

	Show All 24 Lines
	class TargetSubtargetInfo;			class TargetSubtargetInfo;

	/// Check if the instr pair, FirstMI and SecondMI, should be fused			/// Check if the instr pair, FirstMI and SecondMI, should be fused
	/// together. Given SecondMI, when FirstMI is unspecified, then check if			/// together. Given SecondMI, when FirstMI is unspecified, then check if
	/// SecondMI may be part of a fused pair at all.			/// SecondMI may be part of a fused pair at all.
	using ShouldSchedulePredTy = std::function<bool(const TargetInstrInfo &TII,			using ShouldSchedulePredTy = std::function<bool(const TargetInstrInfo &TII,
	const TargetSubtargetInfo &TSI,			const TargetSubtargetInfo &TSI,
	const MachineInstr *FirstMI,			const MachineInstr *FirstMI,
	const MachineInstr &SecondMI)>;			const MachineInstr &SecondMI,
				unsigned NumFused)>;
				jsjiUnsubmitted Not Done Reply Inline Actions New parameters, should be documented in above comment as well. jsji: New parameters, should be documented in above comment as well.

	/// Create a DAG scheduling mutation to pair instructions back to back			/// Create a DAG scheduling mutation to pair instructions back to back
	/// for instructions that benefit according to the target-specific			/// for instructions that benefit according to the target-specific
	/// shouldScheduleAdjacent predicate function.			/// shouldScheduleAdjacent predicate function.
	std::unique_ptr<ScheduleDAGMutation>			std::unique_ptr<ScheduleDAGMutation>
	createMacroFusionDAGMutation(ShouldSchedulePredTy shouldScheduleAdjacent);			createMacroFusionDAGMutation(ShouldSchedulePredTy shouldScheduleAdjacent);

	/// Create a DAG scheduling mutation to pair branch instructions with one			/// Create a DAG scheduling mutation to pair branch instructions with one
	/// of their predecessors back to back for instructions that benefit according			/// of their predecessors back to back for instructions that benefit according
	/// to the target-specific shouldScheduleAdjacent predicate function.			/// to the target-specific shouldScheduleAdjacent predicate function.
	std::unique_ptr<ScheduleDAGMutation>			std::unique_ptr<ScheduleDAGMutation>
	createBranchMacroFusionDAGMutation(ShouldSchedulePredTy shouldScheduleAdjacent);			createBranchMacroFusionDAGMutation(ShouldSchedulePredTy shouldScheduleAdjacent);

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_CODEGEN_MACROFUSION_H			#endif // LLVM_CODEGEN_MACROFUSION_H

llvm/lib/CodeGen/MacroFusion.cpp

Show All 30 Lines

static cl::opt<bool> EnableMacroFusion("misched-fusion", cl::Hidden,		static cl::opt<bool> EnableMacroFusion("misched-fusion", cl::Hidden,
cl::desc("Enable scheduling for macro fusion."), cl::init(true));		cl::desc("Enable scheduling for macro fusion."), cl::init(true));

static bool isHazard(const SDep &Dep) {		static bool isHazard(const SDep &Dep) {
return Dep.getKind() == SDep::Anti \|\| Dep.getKind() == SDep::Output;		return Dep.getKind() == SDep::Anti \|\| Dep.getKind() == SDep::Output;
}		}

		namespace {

		static SUnit *getPredClusterSU(const SUnit &SU) {
		for (const SDep &SI : SU.Preds)
		if (SI.isCluster())
		return SI.getSUnit();

		return nullptr;
		}

		static SUnit *getSuccClusterSU(const SUnit &SU) {
		for (const SDep &SI : SU.Succs)
		if (SI.isCluster())
		return SI.getSUnit();

		return nullptr;
		}

		static unsigned getNumOfClusterSU(const SUnit &SU) {
		unsigned Num = 0;
		const SUnit *CurrentSU = &SU;
		while ((CurrentSU = getPredClusterSU(*CurrentSU))) Num ++;
		return Num;
		}

static bool fuseInstructionPair(ScheduleDAGInstrs &DAG, SUnit &FirstSU,		static bool fuseInstructionPair(ScheduleDAGInstrs &DAG, SUnit &FirstSU,
SUnit &SecondSU) {		SUnit &SecondSU) {
// Check that neither instr is already paired with another along the edge		// Check that neither instr is already paired with another along the edge
// between them.		// between them.
for (SDep &SI : FirstSU.Succs)		for (SDep &SI : FirstSU.Succs)
if (SI.isCluster())		if (SI.isCluster())
return false;		return false;

Show All 21 Lines	static bool fuseInstructionPair(ScheduleDAGInstrs &DAG, SUnit &FirstSU,
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "Macro fuse: "; DAG.dumpNodeName(FirstSU); dbgs() << " - ";		dbgs() << "Macro fuse: "; DAG.dumpNodeName(FirstSU); dbgs() << " - ";
DAG.dumpNodeName(SecondSU); dbgs() << " / ";		DAG.dumpNodeName(SecondSU); dbgs() << " / ";
dbgs() << DAG.TII->getName(FirstSU.getInstr()->getOpcode()) << " - "		dbgs() << DAG.TII->getName(FirstSU.getInstr()->getOpcode()) << " - "
<< DAG.TII->getName(SecondSU.getInstr()->getOpcode()) << '\n';);		<< DAG.TII->getName(SecondSU.getInstr()->getOpcode()) << '\n';);

// Make data dependencies from the FirstSU also dependent on the SecondSU to		// Make data dependencies from the FirstSU also dependent on the SecondSU to
// prevent them from being scheduled between the FirstSU and the SecondSU.		// prevent them from being scheduled between the FirstSU and the SecondSU.
if (&SecondSU != &DAG.ExitSU)		SUnit *CurrentSU = &SecondSU;
		while (CurrentSU && CurrentSU != &DAG.ExitSU) {
for (const SDep &SI : FirstSU.Succs) {		for (const SDep &SI : FirstSU.Succs) {
SUnit *SU = SI.getSUnit();		SUnit *SU = SI.getSUnit();
if (SI.isWeak() \|\| isHazard(SI) \|\|		if (SI.isWeak() \|\| isHazard(SI) \|\|
SU == &DAG.ExitSU \|\| SU == &SecondSU \|\| SU->isPred(&SecondSU))		SU == &DAG.ExitSU \|\| SU == CurrentSU \|\|
		SU->isPred(CurrentSU))
continue;		continue;
LLVM_DEBUG(dbgs() << " Bind "; DAG.dumpNodeName(SecondSU);		LLVM_DEBUG(dbgs() << " Bind "; DAG.dumpNodeName(*CurrentSU);
dbgs() << " - "; DAG.dumpNodeName(*SU); dbgs() << '\n';);		dbgs() << " - "; DAG.dumpNodeName(*SU); dbgs() << '\n';);
DAG.addEdge(SU, SDep(&SecondSU, SDep::Artificial));		DAG.addEdge(SU, SDep(CurrentSU, SDep::Artificial));
		}

		CurrentSU = getSuccClusterSU(*CurrentSU);
}		}

// Make the FirstSU also dependent on the dependencies of the SecondSU to		// Make the FirstSU also dependent on the dependencies of the SecondSU to
// prevent them from being scheduled between the FirstSU and the SecondSU.		// prevent them from being scheduled between the FirstSU and the SecondSU.
if (&FirstSU != &DAG.EntrySU) {		CurrentSU = &FirstSU;
		while (CurrentSU && CurrentSU != &DAG.EntrySU) {
for (const SDep &SI : SecondSU.Preds) {		for (const SDep &SI : SecondSU.Preds) {
SUnit *SU = SI.getSUnit();		SUnit *SU = SI.getSUnit();
if (SI.isWeak() \|\| isHazard(SI) \|\| &FirstSU == SU \|\| FirstSU.isSucc(SU))		if (SI.isWeak() \|\| isHazard(SI) \|\| CurrentSU == SU \|\|
		CurrentSU->isSucc(SU))
continue;		continue;
LLVM_DEBUG(dbgs() << " Bind "; DAG.dumpNodeName(*SU); dbgs() << " - ";		LLVM_DEBUG(dbgs() << " Bind "; DAG.dumpNodeName(*SU); dbgs() << " - ";
DAG.dumpNodeName(FirstSU); dbgs() << '\n';);		DAG.dumpNodeName(*CurrentSU); dbgs() << '\n';);
DAG.addEdge(&FirstSU, SDep(SU, SDep::Artificial));		DAG.addEdge(CurrentSU, SDep(SU, SDep::Artificial));
}		}
// ExitSU comes last by design, which acts like an implicit dependency		// ExitSU comes last by design, which acts like an implicit dependency
// between ExitSU and any bottom root in the graph. We should transfer		// between ExitSU and any bottom root in the graph. We should transfer
// this to FirstSU as well.		// this to FirstSU as well.
if (&SecondSU == &DAG.ExitSU) {		if (&SecondSU == &DAG.ExitSU) {
for (SUnit &SU : DAG.SUnits) {		for (SUnit &SU : DAG.SUnits) {
if (SU.Succs.empty())		if (SU.Succs.empty())
DAG.addEdge(&FirstSU, SDep(&SU, SDep::Artificial));		DAG.addEdge(CurrentSU, SDep(&SU, SDep::Artificial));
}		}
}		}

		CurrentSU = getPredClusterSU(*CurrentSU);
}		}

++NumFused;		++NumFused;
return true;		return true;
}		}

namespace {

/// Post-process the DAG to create cluster edges between instrs that may		/// Post-process the DAG to create cluster edges between instrs that may
/// be fused by the processor into a single operation.		/// be fused by the processor into a single operation.
class MacroFusion : public ScheduleDAGMutation {		class MacroFusion : public ScheduleDAGMutation {
ShouldSchedulePredTy shouldScheduleAdjacent;		ShouldSchedulePredTy shouldScheduleAdjacent;
bool FuseBlock;		bool FuseBlock;
bool scheduleAdjacentImpl(ScheduleDAGInstrs &DAG, SUnit &AnchorSU);		bool scheduleAdjacentImpl(ScheduleDAGInstrs &DAG, SUnit &AnchorSU);

public:		public:
Show All 20 Lines
/// Implement the fusion of instr pairs in the scheduling DAG,		/// Implement the fusion of instr pairs in the scheduling DAG,
/// anchored at the instr in AnchorSU..		/// anchored at the instr in AnchorSU..
bool MacroFusion::scheduleAdjacentImpl(ScheduleDAGInstrs &DAG, SUnit &AnchorSU) {		bool MacroFusion::scheduleAdjacentImpl(ScheduleDAGInstrs &DAG, SUnit &AnchorSU) {
const MachineInstr &AnchorMI = *AnchorSU.getInstr();		const MachineInstr &AnchorMI = *AnchorSU.getInstr();
const TargetInstrInfo &TII = *DAG.TII;		const TargetInstrInfo &TII = *DAG.TII;
const TargetSubtargetInfo &ST = DAG.MF.getSubtarget();		const TargetSubtargetInfo &ST = DAG.MF.getSubtarget();

// Check if the anchor instr may be fused.		// Check if the anchor instr may be fused.
if (!shouldScheduleAdjacent(TII, ST, nullptr, AnchorMI))		if (!shouldScheduleAdjacent(TII, ST, nullptr, AnchorMI, 0))
return false;		return false;

// Explorer for fusion candidates among the dependencies of the anchor instr.		// Explorer for fusion candidates among the dependencies of the anchor instr.
for (SDep &Dep : AnchorSU.Preds) {		for (SDep &Dep : AnchorSU.Preds) {
// Ignore dependencies other than data or strong ordering.		// Ignore dependencies other than data or strong ordering.
if (Dep.isWeak() \|\| isHazard(Dep))		if (Dep.isWeak() \|\| isHazard(Dep))
continue;		continue;

SUnit &DepSU = *Dep.getSUnit();		SUnit &DepSU = *Dep.getSUnit();
if (DepSU.isBoundaryNode())		if (DepSU.isBoundaryNode())
continue;		continue;

const MachineInstr *DepMI = DepSU.getInstr();		const MachineInstr *DepMI = DepSU.getInstr();
if (!shouldScheduleAdjacent(TII, ST, DepMI, AnchorMI))		if (!shouldScheduleAdjacent(TII, ST, DepMI, AnchorMI,
		getNumOfClusterSU(DepSU)))
continue;		continue;

if (fuseInstructionPair(DAG, DepSU, AnchorSU))		if (fuseInstructionPair(DAG, DepSU, AnchorSU))
return true;		return true;
}		}

return false;		return false;
}		}
Show All 16 Lines

llvm/lib/Target/AArch64/AArch64MacroFusion.cpp

	Show First 20 Lines • Show All 369 Lines • ▼ Show 20 Lines
	}			}

	/// \brief Check if the instr pair, FirstMI and SecondMI, should be fused			/// \brief Check if the instr pair, FirstMI and SecondMI, should be fused
	/// together. Given SecondMI, when FirstMI is unspecified, then check if			/// together. Given SecondMI, when FirstMI is unspecified, then check if
	/// SecondMI may be part of a fused pair at all.			/// SecondMI may be part of a fused pair at all.
	static bool shouldScheduleAdjacent(const TargetInstrInfo &TII,			static bool shouldScheduleAdjacent(const TargetInstrInfo &TII,
	const TargetSubtargetInfo &TSI,			const TargetSubtargetInfo &TSI,
	const MachineInstr *FirstMI,			const MachineInstr *FirstMI,
	const MachineInstr &SecondMI) {			const MachineInstr &SecondMI,
				unsigned NumFused) {
				// Only back to back fusion are supported.
				if (NumFused > 0)
				return false;

	const AArch64Subtarget &ST = static_cast<const AArch64Subtarget&>(TSI);			const AArch64Subtarget &ST = static_cast<const AArch64Subtarget&>(TSI);

	// All checking functions assume that the 1st instr is a wildcard if it is			// All checking functions assume that the 1st instr is a wildcard if it is
	// unspecified.			// unspecified.
	if (ST.hasArithmeticBccFusion() && isArithmeticBccPair(FirstMI, SecondMI))			if (ST.hasArithmeticBccFusion() && isArithmeticBccPair(FirstMI, SecondMI))
	return true;			return true;
	if (ST.hasArithmeticCbzFusion() && isArithmeticCbzPair(FirstMI, SecondMI))			if (ST.hasArithmeticCbzFusion() && isArithmeticCbzPair(FirstMI, SecondMI))
	return true;			return true;
	Show All 26 Lines

llvm/lib/Target/AMDGPU/AMDGPUMacroFusion.cpp

	Show All 22 Lines
	namespace {			namespace {

	/// Check if the instr pair, FirstMI and SecondMI, should be fused			/// Check if the instr pair, FirstMI and SecondMI, should be fused
	/// together. Given SecondMI, when FirstMI is unspecified, then check if			/// together. Given SecondMI, when FirstMI is unspecified, then check if
	/// SecondMI may be part of a fused pair at all.			/// SecondMI may be part of a fused pair at all.
	static bool shouldScheduleAdjacent(const TargetInstrInfo &TII_,			static bool shouldScheduleAdjacent(const TargetInstrInfo &TII_,
	const TargetSubtargetInfo &TSI,			const TargetSubtargetInfo &TSI,
	const MachineInstr *FirstMI,			const MachineInstr *FirstMI,
	const MachineInstr &SecondMI) {			const MachineInstr &SecondMI,
				unsigned NumFused) {
				// Only back to back fusion are supported.
				if (NumFused > 0)
				return false;

	const SIInstrInfo &TII = static_cast<const SIInstrInfo&>(TII_);			const SIInstrInfo &TII = static_cast<const SIInstrInfo&>(TII_);

	switch (SecondMI.getOpcode()) {			switch (SecondMI.getOpcode()) {
	case AMDGPU::V_ADDC_U32_e64:			case AMDGPU::V_ADDC_U32_e64:
	case AMDGPU::V_SUBB_U32_e64:			case AMDGPU::V_SUBB_U32_e64:
	case AMDGPU::V_CNDMASK_B32_e64: {			case AMDGPU::V_CNDMASK_B32_e64: {
	// Try to cluster defs of condition registers to their uses. This improves			// Try to cluster defs of condition registers to their uses. This improves
	// the chance VCC will be available which will allow shrinking to VOP2			// the chance VCC will be available which will allow shrinking to VOP2
	Show All 28 Lines

llvm/lib/Target/ARM/ARMMacroFusion.cpp

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	}			}

	/// Check if the instr pair, FirstMI and SecondMI, should be fused			/// Check if the instr pair, FirstMI and SecondMI, should be fused
	/// together. Given SecondMI, when FirstMI is unspecified, then check if			/// together. Given SecondMI, when FirstMI is unspecified, then check if
	/// SecondMI may be part of a fused pair at all.			/// SecondMI may be part of a fused pair at all.
	static bool shouldScheduleAdjacent(const TargetInstrInfo &TII,			static bool shouldScheduleAdjacent(const TargetInstrInfo &TII,
	const TargetSubtargetInfo &TSI,			const TargetSubtargetInfo &TSI,
	const MachineInstr *FirstMI,			const MachineInstr *FirstMI,
	const MachineInstr &SecondMI) {			const MachineInstr &SecondMI,
				unsigned NumFused) {
				// Only back to back fusion are supported.
				if (NumFused > 0)
				return false;

	const ARMSubtarget &ST = static_cast<const ARMSubtarget&>(TSI);			const ARMSubtarget &ST = static_cast<const ARMSubtarget&>(TSI);

	if (ST.hasFuseAES() && isAESPair(FirstMI, SecondMI))			if (ST.hasFuseAES() && isAESPair(FirstMI, SecondMI))
	return true;			return true;
	if (ST.hasFuseLiterals() && isLiteralsPair(FirstMI, SecondMI))			if (ST.hasFuseLiterals() && isLiteralsPair(FirstMI, SecondMI))
	return true;			return true;

	return false;			return false;
	}			}

	std::unique_ptr<ScheduleDAGMutation> createARMMacroFusionDAGMutation () {			std::unique_ptr<ScheduleDAGMutation> createARMMacroFusionDAGMutation () {
	return createMacroFusionDAGMutation(shouldScheduleAdjacent);			return createMacroFusionDAGMutation(shouldScheduleAdjacent);
	}			}

	} // end namespace llvm			} // end namespace llvm

llvm/lib/Target/X86/X86MacroFusion.cpp

	Show First 20 Lines • Show All 174 Lines • ▼ Show 20 Lines
	}			}

	/// Check if the instr pair, FirstMI and SecondMI, should be fused			/// Check if the instr pair, FirstMI and SecondMI, should be fused
	/// together. Given SecondMI, when FirstMI is unspecified, then check if			/// together. Given SecondMI, when FirstMI is unspecified, then check if
	/// SecondMI may be part of a fused pair at all.			/// SecondMI may be part of a fused pair at all.
	static bool shouldScheduleAdjacent(const TargetInstrInfo &TII,			static bool shouldScheduleAdjacent(const TargetInstrInfo &TII,
	const TargetSubtargetInfo &TSI,			const TargetSubtargetInfo &TSI,
	const MachineInstr *FirstMI,			const MachineInstr *FirstMI,
	const MachineInstr &SecondMI) {			const MachineInstr &SecondMI,
				unsigned NumFused) {
				// Only back to back fusion are supported.
				if (NumFused > 0)
				return false;

	const X86Subtarget &ST = static_cast<const X86Subtarget &>(TSI);			const X86Subtarget &ST = static_cast<const X86Subtarget &>(TSI);

	// Check if this processor supports any kind of fusion.			// Check if this processor supports any kind of fusion.
	if (!(ST.hasBranchFusion() \|\| ST.hasMacroFusion()))			if (!(ST.hasBranchFusion() \|\| ST.hasMacroFusion()))
	return false;			return false;

	const JumpKind BranchKind = classifySecond(SecondMI);			const JumpKind BranchKind = classifySecond(SecondMI);

	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/macro-fusion-verify.ll

This file was added.

				; REQUIRES: asserts
				; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=+fuse-arith-logic -verify-misched -debug-only=machine-scheduler 2>&1 > /dev/null \| FileCheck %s

				; Verify that, the macro-fusion won't bring in extra dependency.
				jsjiUnsubmitted Not Done Reply Inline Actions This comment is confusing. I believe the goal of this patch is to avoid chained-fusion, hence reducing unnecessary dependency, so you would like to verify that there is no `extra dependency`? jsji: This comment is confusing. I believe the goal of this patch is to avoid chained-fusion, hence…
				define signext i32 @test(i32 signext %a, i32 signext %b, i32 signext %c, i32 signext %d) {
				entry:
				; CHECK: ******** MI Scheduling ********
				; CHECK-LABEL: %bb.0 entry
				; CHECK: Macro fuse: SU([[SU4:[0-9]+]]) - SU([[SU5:[0-9]+]])
				; CHECK: SU([[SU0:[0-9]+]]): %{{[0-9]+}}:gpr32 = COPY $w3
				; CHECK: SU([[SU1:[0-9]+]]): %{{[0-9]+}}:gpr32 = COPY $w2
				; CHECK: SU([[SU2:[0-9]+]]): %{{[0-9]+}}:gpr32 = COPY $w1
				; CHECK: SU([[SU3:[0-9]+]]): %{{[0-9]+}}:gpr32 = COPY $w0

				; Because SU(4) and SU(5) are cluster, SU(4) has the predecessor SU(1),
				; which is the predecessor of SU(5), to make sure that, SU(1) cannot
				; be scheduled in between SU(4) and SU(5)
				; CHECK: SU([[SU4:[0-9]+]]): %{{[0-9]+}}:gpr32 = nsw ADDWrr
				; CHECK: Predecessors:
				; CHECK-DAG: SU([[SU3]]):
				; CHECK-DAG: SU([[SU2]]):
				; CHECK-DAG: SU([[SU1]]):
				; CHECK-NOT: SU([[SU0]])
				; CHECK: Successors:
				; CHECK: SU([[SU5]]): Ord Latency=0 Cluster
				jsjiUnsubmitted Not Done Reply Inline Actions Maybe we should check that we only fuse `SU4 SU5`, not `SU5 SU6` too. jsji: Maybe we should check that we only fuse `SU4 SU5`, not `SU5 SU6` too.

				; SU(0) has nothing to do with SU(4) and SU(5). They shouldn't have
				; any dependency.
				; CHECK: SU([[SU5]]): %{{[0-9]+}}:gpr32 = nsw ADDWrr
				; CHECK: Predecessors:
				; CHECK-DAG: SU([[SU1]])
				; CHECK-DAG: SU([[SU4]])
				; CHECK-NOT: SU([[SU0]])
				; CHECK: Successors:

				%add = add nsw i32 %b, %a
				%add1 = add nsw i32 %add, %c
				%sub = sub nsw i32 %add1, %d
				ret i32 %sub
				}

This is an archive of the discontinued LLVM Phabricator instance.

[MacroFusion] Create the missing artificial edges if there are more than 2 SU fused.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 228375

llvm/include/llvm/CodeGen/MacroFusion.h

llvm/lib/CodeGen/MacroFusion.cpp

llvm/lib/Target/AArch64/AArch64MacroFusion.cpp

llvm/lib/Target/AMDGPU/AMDGPUMacroFusion.cpp

llvm/lib/Target/ARM/ARMMacroFusion.cpp

llvm/lib/Target/X86/X86MacroFusion.cpp

llvm/test/CodeGen/AArch64/macro-fusion-verify.ll

[MacroFusion] Create the missing artificial edges if there are more than 2 SU fused.
AbandonedPublic