This is an archive of the discontinued LLVM Phabricator instance.

Have you checked the effects for compiletime? This will check every edge in the schedule graph, I wonder if we shouldn't rather delegate the whole search to the target so it can restrict this to the actually interesting instructions instead of checking every edge.

evandro mentioned this in D28491: [AArch64] Add new subtarget feature to fuse AES crypto operations.Jan 9 2017, 3:18 PM

In D28489#640575, @MatzeB wrote:

Have you checked the effects for compiletime? This will check every edge in the schedule graph, I wonder if we shouldn't rather delegate the whole search to the target so it can restrict this to the actually interesting instructions instead of checking every edge.

At least on a rather fast x86 machine, any difference in the compile time was buried below the noise level.

And just to warn you because I am currently running into these issues: The current macrofusion code fails to work properly for nodes having around the pending queue (which mostly means macrofusion often failing for post-ra schedulers).
If there are no post-ra schedulers on the other hand the register allocator sometimes places copy, spill, reload instructions in between.

I am currently working on patches that form instruction bundles out of macrofusion opportunities, unfortunately this is coming along slowly as instruction bundles pre-ra are not a commonly used feature.

In D28489#640596, @MatzeB wrote:

If there are no post-ra schedulers on the other hand the register allocator sometimes places copy, spill, reload instructions in between.

Yes, I noticed such irritating occurrences.

Would it make sense to have a TII.mayFuseWithPrecedingInstr() to avoid testing all DAG edges? The DAG is quadratic, but this a rare opportunity.

In D28489#640628, @atrick wrote:

Would it make sense to have a TII.mayFuseWithPrecedingInstr() to avoid testing all DAG edges? The DAG is quadratic, but this a rare opportunity.

Or let targets subclass or write their own scheduledag mutation with an apropriate search strategy instead of the TII callback?

Yep, this could all be done in the target. SDep::Cluster is effectively a scheduler API for the subtarget to use as it wishes.

On the other hand, that just pushes the problem to the target code, and this is proposed for AArch64.

flyingforyou added a subscriber: flyingforyou.Jan 9 2017, 5:45 PM

mcrosier added a subscriber: mcrosier.Jan 10 2017, 7:21 AM

Pardon my cluelessness, but are you guys on a tangent? I'm truly confused.

evandro added a child revision: D28698: [AArch64] Add new target feature to fuse literal generation.Jan 13 2017, 1:57 PM

evandro updated this revision to Diff 84375.Jan 13 2017, 2:00 PM

evandro edited edge metadata.

sbaranga added a subscriber: sbaranga.Jan 18 2017, 8:11 AM

Ping^1

I think you should find a way to do this without calling shouldScheduleAdjacent on every DAG node. It's fine to say you've tested compilation time but what really matters here are the pathological cases with very large blocks. It's rare for instructions in the middle of blocks to have fusion opportunities, so it's wrong to introduce this potential cost for all blocks.

I have other patches in the line that depend on this one which fuse other pairs of instrs (e.g., D28698) that do happen in the middle of blocks.

But I understand your point. An alternative that I considered before was through TargetSubtargetInfo::adjustSchedDependency(). Thoughts?

Note that adjustSchedDependency is defined as updating the latency. It's very important for target hooks not to mutate data structures people's backs.

I think I see @MatzeB's point. Just remove MacroFusion from the target independent MachineScheduler. Code reuse is not really helpful here. X86 should just have it's own MacroFusion, as with AArch64. The still register the SchedDAGMutation the same way.

Remove shouldScheduleAdjacent from TargetInstrInfo. Targets can define that helper locally.

X86 MacroFusion doesn't change at all. The code just moves.

In AArch64 MacroFusion, *before* checking the edge, determine if if this opcode wants to be fused (e.g. isi it MOVK). Only the edges leading to fusable intrustrions are checked. No need to go through a TargetInstrInfo virtual call.

Just the skeleton of MacroFusion was left behind. If anything, in order to leave the option misched-fusion intact and to keep the interface using createMacroFusionDAGMutation().

Herald added a subscriber: aemerson. · View Herald TranscriptJan 27 2017, 12:55 PM

The targets add the DAG Mutators anyway, so you should be able to remove the whole shouldScheduleAdjacent() callback let the targets define their own class MacroFusion : public ScheduleDAGMutation { ... } class which they then add as a mutator.

In D28489#659179, @MatzeB wrote:

The targets add the DAG Mutators anyway, so you should be able to remove the whole shouldScheduleAdjacent() callback let the targets define their own class MacroFusion : public ScheduleDAGMutation { ... } class which they then add as a mutator.

That would also mean to get rid of the createMacroFusionDAGMutation(const TargetInstrInfo *TII) function and thereby the EnableMacroFusion flag. This is fine IMO as on AArch64 you can just as well enable/disable it with the FeatureArithmeticBccFusion/FeatureArithmeticCbzFusion.

@MatzeB,

I just thought that it was convenient to control MacroFusion with a global option, misched-fusion, regardless of what the target prefers.

In D28489#659194, @evandro wrote:

@MatzeB,

I just thought that it was convenient to control MacroFusion with a global option, misched-fusion, regardless of what the target prefers.

I don't think a global flag is worth adding a callback and extra functions to MachineScheduler. I don't think there is that much value in the flag to justify that esp. since you can do -mattr=-FeatureArithmeticBccFusion,-FeatureArithmeticCbzFusion as well.

evandro updated this revision to Diff 86350.Jan 30 2017, 2:30 PM

evandro edited the summary of this revision. (Show Details)

Herald added a subscriber: mgorny. · View Herald TranscriptJan 30 2017, 2:30 PM

MatzeB added inline comments.Jan 30 2017, 2:58 PM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2054–2063 ↗	(On Diff #86350)	This code could be put directly in `AArch64MacroFusion::apply()`. The same comment applies to the X86 version.
llvm/lib/Target/AArch64/AArch64MacroFusion.h
26–35 ↗	(On Diff #86350)	This can stay private to the .cpp file where createMacroFusionDAGMutation() is defined and doesn't need to go into a header. The same comment applies to the X86 version.
llvm/test/CodeGen/AArch64/misched-fusion.ll
10–19 ↗	(On Diff #86350)	Why is there a testcase change, shoulnd't this be NFC?

evandro added inline comments.Jan 30 2017, 3:08 PM

llvm/test/CodeGen/AArch64/misched-fusion.ll
10–19 ↗	(On Diff #86350)	Since I made it common for iOS and Linux (v. line 4), I meant to trim the check to the germane part.

MatzeB added inline comments.Jan 30 2017, 3:09 PM

llvm/test/CodeGen/AArch64/misched-fusion.ll
10–19 ↗	(On Diff #86350)	makes sense

evandro added inline comments.Jan 30 2017, 3:10 PM

llvm/lib/Target/AArch64/AArch64MacroFusion.h
26–35 ↗	(On Diff #86350)	You mean moving the method `scheduleAdjacent()` from `<Target>InstrInfo` to `<Target>MacroFusion` as a private function?

MatzeB added inline comments.Jan 30 2017, 4:38 PM

llvm/lib/Target/AArch64/AArch64MacroFusion.h
26–35 ↗	(On Diff #86350)	Pretty much. After moving the method you will probably realize that there the only caller is inside apply() and the apply() method consists only of that 1 call, so you can just as well "inline" manually and move the code into the apply() method.

evandro updated this revision to Diff 86373.Jan 30 2017, 5:38 PM

LGTM with nitpicks addressed:

llvm/lib/Target/AArch64/AArch64MacroFusion.cpp
10 ↗	(On Diff #86373)	Should be `/// \file This file ...`
191 ↗	(On Diff #86373)	No space before `()`.
195 ↗	(On Diff #86373)	Should be `// end namespace llvm` according to coding standards.
llvm/lib/Target/AArch64/AArch64MacroFusion.h
10 ↗	(On Diff #86373)	Should be `/// \file This file ...`
24–31 ↗	(On Diff #86373)	Please move the class declaration into the AArch64MacroFusion.cpp file and into an anonymous namespace.
llvm/lib/Target/X86/X86MacroFusion.cpp
10 ↗	(On Diff #86373)	Should be `/// \file ...`
245 ↗	(On Diff #86373)	No space before `()`
249 ↗	(On Diff #86373)	Should be `// end namespace llvm`.
llvm/lib/Target/X86/X86MacroFusion.h
10 ↗	(On Diff #86373)	Should be `/// \file This file ...`
24–31 ↗	(On Diff #86373)	Please move the class declaration into the X86MacroFusion.cpp file and into an anonymous namespace.

This revision is now accepted and ready to land.Jan 30 2017, 6:25 PM

evandro marked 6 inline comments as done.Jan 31 2017, 8:20 AM

Final patch after approval.

Closed by commit rL293737: [CodeGen] Move MacroFusion to the target (authored by evandro). · Explain WhyJan 31 2017, 7:05 PM

This revision was automatically updated to reflect the committed changes.

Usings of \param are improper. Tweaked in r293744 just to eliminate \param(s).

\param takes at least two parameters.

\param NAME Description...

You may write \param(s) like,

/// \brief Verify that the instruction pair, should be scheduled back to back.
/// \param First The first MI to verify
/// \param Second The second MI

Note, trunk clang doesn't recognize like "\param First,Second".

llvm/trunk/lib/Target/AArch64/AArch64MacroFusion.cpp
29	First and Second
123	DAG, ASU, and Preds
llvm/trunk/lib/Target/X86/X86MacroFusion.cpp
29	First and Second

Thank you, @chapuni.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

CodeGen/

MachineScheduler.h

3 lines

Target/

TargetInstrInfo.h

9 lines

lib/

CodeGen/

MachineScheduler.cpp

74 lines

Target/

AArch64/

AArch64InstrInfo.h

3 lines

AArch64InstrInfo.cpp

82 lines

AArch64MacroFusion.h

29 lines

AArch64MacroFusion.cpp

209 lines

AArch64TargetMachine.cpp

3 lines

CMakeLists.txt

1 line

X86/

1 line

3 lines

159 lines

30 lines

262 lines

3 lines

test/

CodeGen/

AArch64/

misched-fusion.ll

12 lines

Diff 86556

llvm/trunk/include/llvm/CodeGen/MachineScheduler.h

	Show First 20 Lines • Show All 1,027 Lines • ▼ Show 20 Lines
	createLoadClusterDAGMutation(const TargetInstrInfo *TII,			createLoadClusterDAGMutation(const TargetInstrInfo *TII,
	const TargetRegisterInfo *TRI);			const TargetRegisterInfo *TRI);

	std::unique_ptr<ScheduleDAGMutation>			std::unique_ptr<ScheduleDAGMutation>
	createStoreClusterDAGMutation(const TargetInstrInfo *TII,			createStoreClusterDAGMutation(const TargetInstrInfo *TII,
	const TargetRegisterInfo *TRI);			const TargetRegisterInfo *TRI);

	std::unique_ptr<ScheduleDAGMutation>			std::unique_ptr<ScheduleDAGMutation>
	createMacroFusionDAGMutation(const TargetInstrInfo *TII);

	std::unique_ptr<ScheduleDAGMutation>
	createCopyConstrainDAGMutation(const TargetInstrInfo *TII,			createCopyConstrainDAGMutation(const TargetInstrInfo *TII,
	const TargetRegisterInfo *TRI);			const TargetRegisterInfo *TRI);

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_CODEGEN_MACHINESCHEDULER_H			#endif // LLVM_CODEGEN_MACHINESCHEDULER_H

llvm/trunk/include/llvm/Target/TargetInstrInfo.h

Show First 20 Lines • Show All 1,064 Lines • ▼ Show 20 Lines	public:
/// DAG->addMutation(createStoreClusterDAGMutation(DAG->TII, DAG->TRI));		/// DAG->addMutation(createStoreClusterDAGMutation(DAG->TII, DAG->TRI));
/// to TargetPassConfig::createMachineScheduler() to have an effect.		/// to TargetPassConfig::createMachineScheduler() to have an effect.
virtual bool shouldClusterMemOps(MachineInstr &FirstLdSt,		virtual bool shouldClusterMemOps(MachineInstr &FirstLdSt,
MachineInstr &SecondLdSt,		MachineInstr &SecondLdSt,
unsigned NumLoads) const {		unsigned NumLoads) const {
llvm_unreachable("target did not implement shouldClusterMemOps()");		llvm_unreachable("target did not implement shouldClusterMemOps()");
}		}

/// Can this target fuse the given instructions if they are scheduled
/// adjacent. Note that you have to add:
/// DAG.addMutation(createMacroFusionDAGMutation());
/// to TargetPassConfig::createMachineScheduler() to have an effect.
virtual bool shouldScheduleAdjacent(const MachineInstr &First,
const MachineInstr &Second) const {
llvm_unreachable("target did not implement shouldScheduleAdjacent()");
}

/// Reverses the branch condition of the specified condition list,		/// Reverses the branch condition of the specified condition list,
/// returning false on success and true if it cannot be reversed.		/// returning false on success and true if it cannot be reversed.
virtual		virtual
bool reverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const {		bool reverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const {
return true;		return true;
}		}

/// Insert a noop into the instruction stream at the specified point.		/// Insert a noop into the instruction stream at the specified point.
▲ Show 20 Lines • Show All 466 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/MachineScheduler.cpp

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines

static cl::opt<bool> EnableCyclicPath("misched-cyclicpath", cl::Hidden,		static cl::opt<bool> EnableCyclicPath("misched-cyclicpath", cl::Hidden,
cl::desc("Enable cyclic critical path analysis."), cl::init(true));		cl::desc("Enable cyclic critical path analysis."), cl::init(true));

static cl::opt<bool> EnableMemOpCluster("misched-cluster", cl::Hidden,		static cl::opt<bool> EnableMemOpCluster("misched-cluster", cl::Hidden,
cl::desc("Enable memop clustering."),		cl::desc("Enable memop clustering."),
cl::init(true));		cl::init(true));

// Experimental heuristics
static cl::opt<bool> EnableMacroFusion("misched-fusion", cl::Hidden,
cl::desc("Enable scheduling for macro fusion."), cl::init(true));

static cl::opt<bool> VerifyScheduling("verify-misched", cl::Hidden,		static cl::opt<bool> VerifyScheduling("verify-misched", cl::Hidden,
cl::desc("Verify machine instrs before and after machine scheduling"));		cl::desc("Verify machine instrs before and after machine scheduling"));

// DAG subtrees must have at least this many nodes.		// DAG subtrees must have at least this many nodes.
static const unsigned MinSubtreeSize = 8;		static const unsigned MinSubtreeSize = 8;

// Pin the vtables to this file.		// Pin the vtables to this file.
void MachineSchedStrategy::anchor() {}		void MachineSchedStrategy::anchor() {}
▲ Show 20 Lines • Show All 1,444 Lines • ▼ Show 20 Lines	void BaseMemOpClusterMutation::apply(ScheduleDAGInstrs *DAGInstrs) {
}		}

// Iterate over the store chains.		// Iterate over the store chains.
for (unsigned Idx = 0, End = StoreChainDependents.size(); Idx != End; ++Idx)		for (unsigned Idx = 0, End = StoreChainDependents.size(); Idx != End; ++Idx)
clusterNeighboringMemOps(StoreChainDependents[Idx], DAG);		clusterNeighboringMemOps(StoreChainDependents[Idx], DAG);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// MacroFusion - DAG post-processing to encourage fusion of macro ops.
//===----------------------------------------------------------------------===//

namespace {
/// \brief Post-process the DAG to create cluster edges between instructions
/// that may be fused by the processor into a single operation.
class MacroFusion : public ScheduleDAGMutation {
const TargetInstrInfo &TII;
public:
MacroFusion(const TargetInstrInfo &TII)
: TII(TII) {}

void apply(ScheduleDAGInstrs *DAGInstrs) override;
};
} // anonymous

namespace llvm {

std::unique_ptr<ScheduleDAGMutation>
createMacroFusionDAGMutation(const TargetInstrInfo *TII) {
return EnableMacroFusion ? make_unique<MacroFusion>(*TII) : nullptr;
}

} // namespace llvm

/// \brief Callback from DAG postProcessing to create cluster edges to encourage
/// fused operations.
void MacroFusion::apply(ScheduleDAGInstrs *DAGInstrs) {
ScheduleDAGMI DAG = static_cast<ScheduleDAGMI>(DAGInstrs);

// For now, assume targets can only fuse with the branch.
SUnit &ExitSU = DAG->ExitSU;
MachineInstr *Branch = ExitSU.getInstr();
if (!Branch)
return;

for (SDep &PredDep : ExitSU.Preds) {
if (PredDep.isWeak())
continue;
SUnit &SU = *PredDep.getSUnit();
MachineInstr &Pred = *SU.getInstr();
if (!TII.shouldScheduleAdjacent(Pred, *Branch))
continue;

// Create a single weak edge from SU to ExitSU. The only effect is to cause
// bottom-up scheduling to heavily prioritize the clustered SU. There is no
// need to copy predecessor edges from ExitSU to SU, since top-down
// scheduling cannot prioritize ExitSU anyway. To defer top-down scheduling
// of SU, we could create an artificial edge from the deepest root, but it
// hasn't been needed yet.
bool Success = DAG->addEdge(&ExitSU, SDep(&SU, SDep::Cluster));
(void)Success;
assert(Success && "No DAG nodes should be reachable from ExitSU");

// Adjust latency of data deps between the nodes.
for (SDep &PredDep : ExitSU.Preds) {
if (PredDep.getSUnit() == &SU)
PredDep.setLatency(0);
}
for (SDep &SuccDep : SU.Succs) {
if (SuccDep.getSUnit() == &ExitSU)
SuccDep.setLatency(0);
}

DEBUG(dbgs() << "Macro Fuse SU(" << SU.NodeNum << ")\n");
break;
}
}

//===----------------------------------------------------------------------===//
// CopyConstrain - DAG post-processing to encourage copy elimination.		// CopyConstrain - DAG post-processing to encourage copy elimination.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {
/// \brief Post-process the DAG to create weak edges from all uses of a copy to		/// \brief Post-process the DAG to create weak edges from all uses of a copy to
/// the one use that defines the copy's source vreg, most likely an induction		/// the one use that defines the copy's source vreg, most likely an induction
/// variable increment.		/// variable increment.
class CopyConstrain : public ScheduleDAGMutation {		class CopyConstrain : public ScheduleDAGMutation {
▲ Show 20 Lines • Show All 1,977 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.h

Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	public:

bool getMemOpBaseRegImmOfsWidth(MachineInstr &LdSt, unsigned &BaseReg,		bool getMemOpBaseRegImmOfsWidth(MachineInstr &LdSt, unsigned &BaseReg,
int64_t &Offset, unsigned &Width,		int64_t &Offset, unsigned &Width,
const TargetRegisterInfo *TRI) const;		const TargetRegisterInfo *TRI) const;

bool shouldClusterMemOps(MachineInstr &FirstLdSt, MachineInstr &SecondLdSt,		bool shouldClusterMemOps(MachineInstr &FirstLdSt, MachineInstr &SecondLdSt,
unsigned NumLoads) const override;		unsigned NumLoads) const override;

bool shouldScheduleAdjacent(const MachineInstr &First,
const MachineInstr &Second) const override;

MachineInstr *emitFrameIndexDebugValue(MachineFunction &MF, int FrameIx,		MachineInstr *emitFrameIndexDebugValue(MachineFunction &MF, int FrameIx,
uint64_t Offset, const MDNode *Var,		uint64_t Offset, const MDNode *Var,
const MDNode *Expr,		const MDNode *Expr,
const DebugLoc &DL) const;		const DebugLoc &DL) const;
void copyPhysRegTuple(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,		void copyPhysRegTuple(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
const DebugLoc &DL, unsigned DestReg, unsigned SrcReg,		const DebugLoc &DL, unsigned DestReg, unsigned SrcReg,
bool KillSrc, unsigned Opcode,		bool KillSrc, unsigned Opcode,
llvm::ArrayRef<unsigned> Indices) const;		llvm::ArrayRef<unsigned> Indices) const;
▲ Show 20 Lines • Show All 173 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp

Show First 20 Lines • Show All 1,908 Lines • ▼ Show 20 Lines	bool AArch64InstrInfo::shouldClusterMemOps(MachineInstr &FirstLdSt,
if (Offset1 > 63 \|\| Offset1 < -64)		if (Offset1 > 63 \|\| Offset1 < -64)
return false;		return false;

// The caller should already have ordered First/SecondLdSt by offset.		// The caller should already have ordered First/SecondLdSt by offset.
assert(Offset1 <= Offset2 && "Caller should have ordered offsets.");		assert(Offset1 <= Offset2 && "Caller should have ordered offsets.");
return Offset1 + 1 == Offset2;		return Offset1 + 1 == Offset2;
}		}

bool AArch64InstrInfo::shouldScheduleAdjacent(
const MachineInstr &First, const MachineInstr &Second) const {
if (Subtarget.hasArithmeticBccFusion()) {
// Fuse CMN, CMP, TST followed by Bcc.
unsigned SecondOpcode = Second.getOpcode();
if (SecondOpcode == AArch64::Bcc) {
switch (First.getOpcode()) {
default:
return false;
case AArch64::ADDSWri:
case AArch64::ADDSWrr:
case AArch64::ADDSXri:
case AArch64::ADDSXrr:
case AArch64::ANDSWri:
case AArch64::ANDSWrr:
case AArch64::ANDSXri:
case AArch64::ANDSXrr:
case AArch64::SUBSWri:
case AArch64::SUBSWrr:
case AArch64::SUBSXri:
case AArch64::SUBSXrr:
case AArch64::BICSWrr:
case AArch64::BICSXrr:
return true;
case AArch64::ADDSWrs:
case AArch64::ADDSXrs:
case AArch64::ANDSWrs:
case AArch64::ANDSXrs:
case AArch64::SUBSWrs:
case AArch64::SUBSXrs:
case AArch64::BICSWrs:
case AArch64::BICSXrs:
// Shift value can be 0 making these behave like the "rr" variant...
return !hasShiftedReg(Second);
}
}
}
if (Subtarget.hasArithmeticCbzFusion()) {
// Fuse ALU operations followed by CBZ/CBNZ.
unsigned SecondOpcode = Second.getOpcode();
if (SecondOpcode == AArch64::CBNZW \|\| SecondOpcode == AArch64::CBNZX \|\|
SecondOpcode == AArch64::CBZW \|\| SecondOpcode == AArch64::CBZX) {
switch (First.getOpcode()) {
default:
return false;
case AArch64::ADDWri:
case AArch64::ADDWrr:
case AArch64::ADDXri:
case AArch64::ADDXrr:
case AArch64::ANDWri:
case AArch64::ANDWrr:
case AArch64::ANDXri:
case AArch64::ANDXrr:
case AArch64::EORWri:
case AArch64::EORWrr:
case AArch64::EORXri:
case AArch64::EORXrr:
case AArch64::ORRWri:
case AArch64::ORRWrr:
case AArch64::ORRXri:
case AArch64::ORRXrr:
case AArch64::SUBWri:
case AArch64::SUBWrr:
case AArch64::SUBXri:
case AArch64::SUBXrr:
return true;
case AArch64::ADDWrs:
case AArch64::ADDXrs:
case AArch64::ANDWrs:
case AArch64::ANDXrs:
case AArch64::SUBWrs:
case AArch64::SUBXrs:
case AArch64::BICWrs:
case AArch64::BICXrs:
// Shift value can be 0 making these behave like the "rr" variant...
return !hasShiftedReg(Second);
}
}
}
return false;
}

MachineInstr *AArch64InstrInfo::emitFrameIndexDebugValue(		MachineInstr *AArch64InstrInfo::emitFrameIndexDebugValue(
MachineFunction &MF, int FrameIx, uint64_t Offset, const MDNode *Var,		MachineFunction &MF, int FrameIx, uint64_t Offset, const MDNode *Var,
const MDNode *Expr, const DebugLoc &DL) const {		const MDNode *Expr, const DebugLoc &DL) const {
MachineInstrBuilder MIB = BuildMI(MF, DL, get(AArch64::DBG_VALUE))		MachineInstrBuilder MIB = BuildMI(MF, DL, get(AArch64::DBG_VALUE))
.addFrameIndex(FrameIx)		.addFrameIndex(FrameIx)
.addImm(0)		.addImm(0)
.addImm(Offset)		.addImm(Offset)
.addMetadata(Var)		.addMetadata(Var)
▲ Show 20 Lines • Show All 2,293 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64MacroFusion.h

				//===- AArch64MacroFusion.h - AArch64 Macro Fusion ------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// \fileThis file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the AArch64 definition of the DAG scheduling mutation
				// to pair instructions back to back.
				//
				//===----------------------------------------------------------------------===//

				#include "AArch64InstrInfo.h"
				#include "llvm/CodeGen/MachineScheduler.h"

				//===----------------------------------------------------------------------===//
				// AArch64MacroFusion - DAG post-processing to encourage fusion of macro ops.
				//===----------------------------------------------------------------------===//

				namespace llvm {

				/// Note that you have to add:
				/// DAG.addMutation(createAArch64MacroFusionDAGMutation());
				/// to AArch64PassConfig::createMachineScheduler() to have an effect.
				std::unique_ptr<ScheduleDAGMutation> createAArch64MacroFusionDAGMutation();

				} // llvm

llvm/trunk/lib/Target/AArch64/AArch64MacroFusion.cpp

				//===- AArch64MacroFusion.cpp - AArch64 Macro Fusion ----------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// \file This file contains the AArch64 implementation of the DAG scheduling mutation
				// to pair instructions back to back.
				//
				//===----------------------------------------------------------------------===//

				#include "AArch64MacroFusion.h"
				#include "AArch64Subtarget.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Target/TargetInstrInfo.h"

				#define DEBUG_TYPE "misched"

				using namespace llvm;

				static cl::opt<bool> EnableMacroFusion("aarch64-misched-fusion", cl::Hidden,
				cl::desc("Enable scheduling for macro fusion."), cl::init(true));

				namespace {

				/// \brief Verify that the instruction pair, \param First and \param Second,
				chapuniUnsubmitted Not Done Reply Inline Actions First and Second chapuni: First and Second
				/// should be scheduled back to back. Given an anchor instruction, if the other
				/// instruction is unspecified, then verify that the anchor instruction may be
				/// part of a pair at all.
				static bool shouldScheduleAdjacent(const AArch64InstrInfo &TII,
				const AArch64Subtarget &ST,
				const MachineInstr *First,
				const MachineInstr *Second) {
				unsigned FirstOpcode = First ?
				First->getOpcode() : AArch64::INSTRUCTION_LIST_END;
				unsigned SecondOpcode = Second ?
				Second->getOpcode() : AArch64::INSTRUCTION_LIST_END;

				if (ST.hasArithmeticBccFusion())
				// Fuse CMN, CMP, TST followed by Bcc.
				if (SecondOpcode == AArch64::Bcc)
				switch (FirstOpcode) {
				default:
				return false;
				case AArch64::ADDSWri:
				case AArch64::ADDSWrr:
				case AArch64::ADDSXri:
				case AArch64::ADDSXrr:
				case AArch64::ANDSWri:
				case AArch64::ANDSWrr:
				case AArch64::ANDSXri:
				case AArch64::ANDSXrr:
				case AArch64::SUBSWri:
				case AArch64::SUBSWrr:
				case AArch64::SUBSXri:
				case AArch64::SUBSXrr:
				case AArch64::BICSWrr:
				case AArch64::BICSXrr:
				return true;
				case AArch64::ADDSWrs:
				case AArch64::ADDSXrs:
				case AArch64::ANDSWrs:
				case AArch64::ANDSXrs:
				case AArch64::SUBSWrs:
				case AArch64::SUBSXrs:
				case AArch64::BICSWrs:
				case AArch64::BICSXrs:
				// Shift value can be 0 making these behave like the "rr" variant...
				return !TII.hasShiftedReg(*First);
				case AArch64::INSTRUCTION_LIST_END:
				return true;
				}

				if (ST.hasArithmeticCbzFusion())
				// Fuse ALU operations followed by CBZ/CBNZ.
				if (SecondOpcode == AArch64::CBNZW \|\| SecondOpcode == AArch64::CBNZX \|\|
				SecondOpcode == AArch64::CBZW \|\| SecondOpcode == AArch64::CBZX)
				switch (FirstOpcode) {
				default:
				return false;
				case AArch64::ADDWri:
				case AArch64::ADDWrr:
				case AArch64::ADDXri:
				case AArch64::ADDXrr:
				case AArch64::ANDWri:
				case AArch64::ANDWrr:
				case AArch64::ANDXri:
				case AArch64::ANDXrr:
				case AArch64::EORWri:
				case AArch64::EORWrr:
				case AArch64::EORXri:
				case AArch64::EORXrr:
				case AArch64::ORRWri:
				case AArch64::ORRWrr:
				case AArch64::ORRXri:
				case AArch64::ORRXrr:
				case AArch64::SUBWri:
				case AArch64::SUBWrr:
				case AArch64::SUBXri:
				case AArch64::SUBXrr:
				return true;
				case AArch64::ADDWrs:
				case AArch64::ADDXrs:
				case AArch64::ANDWrs:
				case AArch64::ANDXrs:
				case AArch64::SUBWrs:
				case AArch64::SUBXrs:
				case AArch64::BICWrs:
				case AArch64::BICXrs:
				// Shift value can be 0 making these behave like the "rr" variant...
				return !TII.hasShiftedReg(*First);
				case AArch64::INSTRUCTION_LIST_END:
				return true;
				}

				return false;
				}

				/// \brief Implement the fusion of instruction pairs in the scheduling
				/// \param DAG, anchored at the instruction in \param ASU. \param Preds
				chapuniUnsubmitted Not Done Reply Inline Actions DAG, ASU, and Preds chapuni: DAG, ASU, and Preds
				/// indicates if its dependencies in \param APreds are predecessors instead of
				/// successors.
				static bool scheduleAdjacentImpl(ScheduleDAGMI DAG, SUnit ASU,
				SmallVectorImpl<SDep> &APreds, bool Preds) {
				const AArch64InstrInfo TII = static_cast<const AArch64InstrInfo >(DAG->TII);
				const AArch64Subtarget &ST = DAG->MF.getSubtarget<AArch64Subtarget>();

				const MachineInstr *AMI = ASU->getInstr();
				if (!AMI \|\| AMI->isPseudo() \|\| AMI->isTransient() \|\|
				(Preds && !shouldScheduleAdjacent(*TII, ST, nullptr, AMI)) \|\|
				(!Preds && !shouldScheduleAdjacent(*TII, ST, AMI, nullptr)))
				return false;

				for (SDep &BDep : APreds) {
				if (BDep.isWeak())
				continue;

				SUnit *BSU = BDep.getSUnit();
				const MachineInstr *BMI = BSU->getInstr();
				if (!BMI \|\| BMI->isPseudo() \|\| BMI->isTransient() \|\|
				(Preds && !shouldScheduleAdjacent(*TII, ST, BMI, AMI)) \|\|
				(!Preds && !shouldScheduleAdjacent(*TII, ST, AMI, BMI)))
				continue;

				// Create a single weak edge between the adjacent instrs. The only
				// effect is to cause bottom-up scheduling to heavily prioritize the
				// clustered instrs.
				if (Preds)
				DAG->addEdge(ASU, SDep(BSU, SDep::Cluster));
				else
				DAG->addEdge(BSU, SDep(ASU, SDep::Cluster));

				// Adjust the latency between the 1st instr and its predecessors/successors.
				for (SDep &Dep : APreds)
				if (Dep.getSUnit() == BSU)
				Dep.setLatency(0);

				// Adjust the latency between the 2nd instr and its successors/predecessors.
				auto &BSuccs = Preds ? BSU->Succs : BSU->Preds;
				for (SDep &Dep : BSuccs)
				if (Dep.getSUnit() == ASU)
				Dep.setLatency(0);

				DEBUG(dbgs() << "Macro fuse ";
				Preds ? BSU->print(dbgs(), DAG) : ASU->print(dbgs(), DAG);
				dbgs() << " - ";
				Preds ? ASU->print(dbgs(), DAG) : BSU->print(dbgs(), DAG);
				dbgs() << '\n');

				return true;
				}

				return false;
				}

				/// \brief Post-process the DAG to create cluster edges between instructions
				/// that may be fused by the processor into a single operation.
				class AArch64MacroFusion : public ScheduleDAGMutation {
				public:
				AArch64MacroFusion() {}

				void apply(ScheduleDAGInstrs *DAGInstrs) override;
				};

				void AArch64MacroFusion::apply(ScheduleDAGInstrs *DAGInstrs) {
				ScheduleDAGMI DAG = static_cast<ScheduleDAGMI>(DAGInstrs);

				// For each of the SUnits in the scheduling block, try to fuse the instruction
				// in it with one in its successors.
				for (SUnit &ASU : DAG->SUnits)
				scheduleAdjacentImpl(DAG, &ASU, ASU.Succs, false);

				// Try to fuse the instruction in the ExitSU with one in its predecessors.
				scheduleAdjacentImpl(DAG, &DAG->ExitSU, DAG->ExitSU.Preds, true);
				}

				} // end namespace


				namespace llvm {

				std::unique_ptr<ScheduleDAGMutation> createAArch64MacroFusionDAGMutation () {
				return EnableMacroFusion ? make_unique<AArch64MacroFusion>() : nullptr;
				}

				} // end namespace llvm

llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.cpp

//===-- AArch64TargetMachine.cpp - Define TargetMachine for AArch64 -------===//		//===-- AArch64TargetMachine.cpp - Define TargetMachine for AArch64 -------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AArch64.h"		#include "AArch64.h"
#include "AArch64CallLowering.h"		#include "AArch64CallLowering.h"
#include "AArch64InstructionSelector.h"		#include "AArch64InstructionSelector.h"
#include "AArch64LegalizerInfo.h"		#include "AArch64LegalizerInfo.h"
		#include "AArch64MacroFusion.h"
#ifdef LLVM_BUILD_GLOBAL_ISEL		#ifdef LLVM_BUILD_GLOBAL_ISEL
#include "AArch64RegisterBankInfo.h"		#include "AArch64RegisterBankInfo.h"
#endif		#endif
#include "AArch64Subtarget.h"		#include "AArch64Subtarget.h"
#include "AArch64TargetMachine.h"		#include "AArch64TargetMachine.h"
#include "AArch64TargetObjectFile.h"		#include "AArch64TargetObjectFile.h"
#include "AArch64TargetTransformInfo.h"		#include "AArch64TargetTransformInfo.h"
#include "MCTargetDesc/AArch64MCTargetDesc.h"		#include "MCTargetDesc/AArch64MCTargetDesc.h"
▲ Show 20 Lines • Show All 295 Lines • ▼ Show 20 Lines	AArch64TargetMachine &getAArch64TargetMachine() const {
return getTM<AArch64TargetMachine>();		return getTM<AArch64TargetMachine>();
}		}

ScheduleDAGInstrs *		ScheduleDAGInstrs *
createMachineScheduler(MachineSchedContext *C) const override {		createMachineScheduler(MachineSchedContext *C) const override {
ScheduleDAGMILive *DAG = createGenericSchedLive(C);		ScheduleDAGMILive *DAG = createGenericSchedLive(C);
DAG->addMutation(createLoadClusterDAGMutation(DAG->TII, DAG->TRI));		DAG->addMutation(createLoadClusterDAGMutation(DAG->TII, DAG->TRI));
DAG->addMutation(createStoreClusterDAGMutation(DAG->TII, DAG->TRI));		DAG->addMutation(createStoreClusterDAGMutation(DAG->TII, DAG->TRI));
DAG->addMutation(createMacroFusionDAGMutation(DAG->TII));		DAG->addMutation(createAArch64MacroFusionDAGMutation());
return DAG;		return DAG;
}		}

void addIRPasses() override;		void addIRPasses() override;
bool addPreISel() override;		bool addPreISel() override;
bool addInstSelector() override;		bool addInstSelector() override;
#ifdef LLVM_BUILD_GLOBAL_ISEL		#ifdef LLVM_BUILD_GLOBAL_ISEL
bool addIRTranslator() override;		bool addIRTranslator() override;
▲ Show 20 Lines • Show All 177 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/CMakeLists.txt

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	add_llvm_target(AArch64CodeGen
AArch64A53Fix835769.cpp		AArch64A53Fix835769.cpp
AArch64FrameLowering.cpp		AArch64FrameLowering.cpp
AArch64ConditionOptimizer.cpp		AArch64ConditionOptimizer.cpp
AArch64RedundantCopyElimination.cpp		AArch64RedundantCopyElimination.cpp
AArch64ISelDAGToDAG.cpp		AArch64ISelDAGToDAG.cpp
AArch64ISelLowering.cpp		AArch64ISelLowering.cpp
AArch64InstrInfo.cpp		AArch64InstrInfo.cpp
AArch64LoadStoreOptimizer.cpp		AArch64LoadStoreOptimizer.cpp
		AArch64MacroFusion.cpp
AArch64MCInstLower.cpp		AArch64MCInstLower.cpp
AArch64PromoteConstant.cpp		AArch64PromoteConstant.cpp
AArch64PBQPRegAlloc.cpp		AArch64PBQPRegAlloc.cpp
AArch64RegisterInfo.cpp		AArch64RegisterInfo.cpp
AArch64SelectionDAGInfo.cpp		AArch64SelectionDAGInfo.cpp
AArch64StorePairSuppress.cpp		AArch64StorePairSuppress.cpp
AArch64Subtarget.cpp		AArch64Subtarget.cpp
AArch64TargetMachine.cpp		AArch64TargetMachine.cpp
Show All 15 Lines

llvm/trunk/lib/Target/X86/CMakeLists.txt

Show All 37 Lines	set(sources
X86ISelDAGToDAG.cpp		X86ISelDAGToDAG.cpp
X86ISelLowering.cpp		X86ISelLowering.cpp
X86InterleavedAccess.cpp		X86InterleavedAccess.cpp
X86InstrFMA3Info.cpp		X86InstrFMA3Info.cpp
X86InstrInfo.cpp		X86InstrInfo.cpp
X86EvexToVex.cpp		X86EvexToVex.cpp
X86MCInstLower.cpp		X86MCInstLower.cpp
X86MachineFunctionInfo.cpp		X86MachineFunctionInfo.cpp
		X86MacroFusion.cpp
X86OptimizeLEAs.cpp		X86OptimizeLEAs.cpp
X86PadShortFunction.cpp		X86PadShortFunction.cpp
X86RegisterInfo.cpp		X86RegisterInfo.cpp
X86SelectionDAGInfo.cpp		X86SelectionDAGInfo.cpp
X86ShuffleDecodeConstantPool.cpp		X86ShuffleDecodeConstantPool.cpp
X86Subtarget.cpp		X86Subtarget.cpp
X86TargetMachine.cpp		X86TargetMachine.cpp
X86TargetObjectFile.cpp		X86TargetObjectFile.cpp
Show All 16 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.h

Show First 20 Lines • Show All 437 Lines • ▼ Show 20 Lines	public:
/// together. This function takes two integers that represent the load offsets		/// together. This function takes two integers that represent the load offsets
/// from the common base address. It returns true if it decides it's desirable		/// from the common base address. It returns true if it decides it's desirable
/// to schedule the two loads together. "NumLoads" is the number of loads that		/// to schedule the two loads together. "NumLoads" is the number of loads that
/// have already been scheduled after Load1.		/// have already been scheduled after Load1.
bool shouldScheduleLoadsNear(SDNode Load1, SDNode Load2,		bool shouldScheduleLoadsNear(SDNode Load1, SDNode Load2,
int64_t Offset1, int64_t Offset2,		int64_t Offset1, int64_t Offset2,
unsigned NumLoads) const override;		unsigned NumLoads) const override;

bool shouldScheduleAdjacent(const MachineInstr &First,
const MachineInstr &Second) const override;

void getNoopForMachoTarget(MCInst &NopInst) const override;		void getNoopForMachoTarget(MCInst &NopInst) const override;

bool		bool
reverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const override;		reverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const override;

/// isSafeToMoveRegClassDefs - Return true if it's safe to move a machine		/// isSafeToMoveRegClassDefs - Return true if it's safe to move a machine
/// instruction that defines the specified register class.		/// instruction that defines the specified register class.
bool isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const override;		bool isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const override;
▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,413 Lines • ▼ Show 20 Lines	case MVT::f64:
if (NumLoads)		if (NumLoads)
return false;		return false;
break;		break;
}		}

return true;		return true;
}		}

bool X86InstrInfo::shouldScheduleAdjacent(const MachineInstr &First,
const MachineInstr &Second) const {
// Check if this processor supports macro-fusion. Since this is a minor
// heuristic, we haven't specifically reserved a feature. hasAVX is a decent
// proxy for SandyBridge+.
if (!Subtarget.hasAVX())
return false;

enum {
FuseTest,
FuseCmp,
FuseInc
} FuseKind;

switch (Second.getOpcode()) {
default:
return false;
case X86::JE_1:
case X86::JNE_1:
case X86::JL_1:
case X86::JLE_1:
case X86::JG_1:
case X86::JGE_1:
FuseKind = FuseInc;
break;
case X86::JB_1:
case X86::JBE_1:
case X86::JA_1:
case X86::JAE_1:
FuseKind = FuseCmp;
break;
case X86::JS_1:
case X86::JNS_1:
case X86::JP_1:
case X86::JNP_1:
case X86::JO_1:
case X86::JNO_1:
FuseKind = FuseTest;
break;
}
switch (First.getOpcode()) {
default:
return false;
case X86::TEST8rr:
case X86::TEST16rr:
case X86::TEST32rr:
case X86::TEST64rr:
case X86::TEST8ri:
case X86::TEST16ri:
case X86::TEST32ri:
case X86::TEST32i32:
case X86::TEST64i32:
case X86::TEST64ri32:
case X86::TEST8rm:
case X86::TEST16rm:
case X86::TEST32rm:
case X86::TEST64rm:
case X86::TEST8ri_NOREX:
case X86::AND16i16:
case X86::AND16ri:
case X86::AND16ri8:
case X86::AND16rm:
case X86::AND16rr:
case X86::AND32i32:
case X86::AND32ri:
case X86::AND32ri8:
case X86::AND32rm:
case X86::AND32rr:
case X86::AND64i32:
case X86::AND64ri32:
case X86::AND64ri8:
case X86::AND64rm:
case X86::AND64rr:
case X86::AND8i8:
case X86::AND8ri:
case X86::AND8rm:
case X86::AND8rr:
return true;
case X86::CMP16i16:
case X86::CMP16ri:
case X86::CMP16ri8:
case X86::CMP16rm:
case X86::CMP16rr:
case X86::CMP32i32:
case X86::CMP32ri:
case X86::CMP32ri8:
case X86::CMP32rm:
case X86::CMP32rr:
case X86::CMP64i32:
case X86::CMP64ri32:
case X86::CMP64ri8:
case X86::CMP64rm:
case X86::CMP64rr:
case X86::CMP8i8:
case X86::CMP8ri:
case X86::CMP8rm:
case X86::CMP8rr:
case X86::ADD16i16:
case X86::ADD16ri:
case X86::ADD16ri8:
case X86::ADD16ri8_DB:
case X86::ADD16ri_DB:
case X86::ADD16rm:
case X86::ADD16rr:
case X86::ADD16rr_DB:
case X86::ADD32i32:
case X86::ADD32ri:
case X86::ADD32ri8:
case X86::ADD32ri8_DB:
case X86::ADD32ri_DB:
case X86::ADD32rm:
case X86::ADD32rr:
case X86::ADD32rr_DB:
case X86::ADD64i32:
case X86::ADD64ri32:
case X86::ADD64ri32_DB:
case X86::ADD64ri8:
case X86::ADD64ri8_DB:
case X86::ADD64rm:
case X86::ADD64rr:
case X86::ADD64rr_DB:
case X86::ADD8i8:
case X86::ADD8mi:
case X86::ADD8mr:
case X86::ADD8ri:
case X86::ADD8rm:
case X86::ADD8rr:
case X86::SUB16i16:
case X86::SUB16ri:
case X86::SUB16ri8:
case X86::SUB16rm:
case X86::SUB16rr:
case X86::SUB32i32:
case X86::SUB32ri:
case X86::SUB32ri8:
case X86::SUB32rm:
case X86::SUB32rr:
case X86::SUB64i32:
case X86::SUB64ri32:
case X86::SUB64ri8:
case X86::SUB64rm:
case X86::SUB64rr:
case X86::SUB8i8:
case X86::SUB8ri:
case X86::SUB8rm:
case X86::SUB8rr:
return FuseKind == FuseCmp \|\| FuseKind == FuseInc;
case X86::INC16r:
case X86::INC32r:
case X86::INC64r:
case X86::INC8r:
case X86::DEC16r:
case X86::DEC32r:
case X86::DEC64r:
case X86::DEC8r:
return FuseKind == FuseInc;
}
}

bool X86InstrInfo::		bool X86InstrInfo::
reverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const {		reverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const {
assert(Cond.size() == 1 && "Invalid X86 branch condition!");		assert(Cond.size() == 1 && "Invalid X86 branch condition!");
X86::CondCode CC = static_cast<X86::CondCode>(Cond[0].getImm());		X86::CondCode CC = static_cast<X86::CondCode>(Cond[0].getImm());
Cond[0].setImm(GetOppositeBranchCondition(CC));		Cond[0].setImm(GetOppositeBranchCondition(CC));
return false;		return false;
}		}

▲ Show 20 Lines • Show All 1,288 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86MacroFusion.h

				//===- X86MacroFusion.h - X86 Macro Fusion --------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// \file This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the X86 definition of the DAG scheduling mutation to pair
				// instructions back to back.
				//
				//===----------------------------------------------------------------------===//

				#include "X86InstrInfo.h"
				#include "llvm/CodeGen/MachineScheduler.h"

				//===----------------------------------------------------------------------===//
				// X86MacroFusion - DAG post-processing to encourage fusion of macro ops.
				//===----------------------------------------------------------------------===//

				namespace llvm {

				/// Note that you have to add:
				/// DAG.addMutation(createX86MacroFusionDAGMutation());
				/// to X86PassConfig::createMachineScheduler() to have an effect.
				std::unique_ptr<ScheduleDAGMutation>
				createX86MacroFusionDAGMutation();

				} // end namespace llvm

llvm/trunk/lib/Target/X86/X86MacroFusion.cpp

				//===- X86MacroFusion.cpp - X86 Macro Fusion ------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// \file This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the X86 implementation of the DAG scheduling mutation to
				// pair instructions back to back.
				//
				//===----------------------------------------------------------------------===//

				#include "X86MacroFusion.h"
				#include "X86Subtarget.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Target/TargetInstrInfo.h"

				#define DEBUG_TYPE "misched"

				using namespace llvm;

				static cl::opt<bool> EnableMacroFusion("x86-misched-fusion", cl::Hidden,
				cl::desc("Enable scheduling for macro fusion."), cl::init(true));

				namespace {

				/// \brief Verify that the instruction pair, \param First and \param Second,
				chapuniUnsubmitted Not Done Reply Inline Actions First and Second chapuni: First and Second
				/// should be scheduled back to back. If either instruction is unspecified,
				/// then verify that the other instruction may be part of a pair at all.
				static bool shouldScheduleAdjacent(const X86Subtarget &ST,
				const MachineInstr *First,
				const MachineInstr *Second) {
				// Check if this processor supports macro-fusion. Since this is a minor
				// heuristic, we haven't specifically reserved a feature. hasAVX is a decent
				// proxy for SandyBridge+.
				if (!ST.hasAVX())
				return false;

				enum {
				FuseTest,
				FuseCmp,
				FuseInc
				} FuseKind;

				unsigned FirstOpcode = First ?
				First->getOpcode() : X86::INSTRUCTION_LIST_END;
				unsigned SecondOpcode = Second ?
				Second->getOpcode() : X86::INSTRUCTION_LIST_END;

				switch (SecondOpcode) {
				default:
				return false;
				case X86::JE_1:
				case X86::JNE_1:
				case X86::JL_1:
				case X86::JLE_1:
				case X86::JG_1:
				case X86::JGE_1:
				FuseKind = FuseInc;
				break;
				case X86::JB_1:
				case X86::JBE_1:
				case X86::JA_1:
				case X86::JAE_1:
				FuseKind = FuseCmp;
				break;
				case X86::JS_1:
				case X86::JNS_1:
				case X86::JP_1:
				case X86::JNP_1:
				case X86::JO_1:
				case X86::JNO_1:
				FuseKind = FuseTest;
				break;
				}

				switch (FirstOpcode) {
				default:
				return false;
				case X86::TEST8rr:
				case X86::TEST16rr:
				case X86::TEST32rr:
				case X86::TEST64rr:
				case X86::TEST8ri:
				case X86::TEST16ri:
				case X86::TEST32ri:
				case X86::TEST32i32:
				case X86::TEST64i32:
				case X86::TEST64ri32:
				case X86::TEST8rm:
				case X86::TEST16rm:
				case X86::TEST32rm:
				case X86::TEST64rm:
				case X86::TEST8ri_NOREX:
				case X86::AND16i16:
				case X86::AND16ri:
				case X86::AND16ri8:
				case X86::AND16rm:
				case X86::AND16rr:
				case X86::AND32i32:
				case X86::AND32ri:
				case X86::AND32ri8:
				case X86::AND32rm:
				case X86::AND32rr:
				case X86::AND64i32:
				case X86::AND64ri32:
				case X86::AND64ri8:
				case X86::AND64rm:
				case X86::AND64rr:
				case X86::AND8i8:
				case X86::AND8ri:
				case X86::AND8rm:
				case X86::AND8rr:
				return true;
				case X86::CMP16i16:
				case X86::CMP16ri:
				case X86::CMP16ri8:
				case X86::CMP16rm:
				case X86::CMP16rr:
				case X86::CMP32i32:
				case X86::CMP32ri:
				case X86::CMP32ri8:
				case X86::CMP32rm:
				case X86::CMP32rr:
				case X86::CMP64i32:
				case X86::CMP64ri32:
				case X86::CMP64ri8:
				case X86::CMP64rm:
				case X86::CMP64rr:
				case X86::CMP8i8:
				case X86::CMP8ri:
				case X86::CMP8rm:
				case X86::CMP8rr:
				case X86::ADD16i16:
				case X86::ADD16ri:
				case X86::ADD16ri8:
				case X86::ADD16ri8_DB:
				case X86::ADD16ri_DB:
				case X86::ADD16rm:
				case X86::ADD16rr:
				case X86::ADD16rr_DB:
				case X86::ADD32i32:
				case X86::ADD32ri:
				case X86::ADD32ri8:
				case X86::ADD32ri8_DB:
				case X86::ADD32ri_DB:
				case X86::ADD32rm:
				case X86::ADD32rr:
				case X86::ADD32rr_DB:
				case X86::ADD64i32:
				case X86::ADD64ri32:
				case X86::ADD64ri32_DB:
				case X86::ADD64ri8:
				case X86::ADD64ri8_DB:
				case X86::ADD64rm:
				case X86::ADD64rr:
				case X86::ADD64rr_DB:
				case X86::ADD8i8:
				case X86::ADD8mi:
				case X86::ADD8mr:
				case X86::ADD8ri:
				case X86::ADD8rm:
				case X86::ADD8rr:
				case X86::SUB16i16:
				case X86::SUB16ri:
				case X86::SUB16ri8:
				case X86::SUB16rm:
				case X86::SUB16rr:
				case X86::SUB32i32:
				case X86::SUB32ri:
				case X86::SUB32ri8:
				case X86::SUB32rm:
				case X86::SUB32rr:
				case X86::SUB64i32:
				case X86::SUB64ri32:
				case X86::SUB64ri8:
				case X86::SUB64rm:
				case X86::SUB64rr:
				case X86::SUB8i8:
				case X86::SUB8ri:
				case X86::SUB8rm:
				case X86::SUB8rr:
				return FuseKind == FuseCmp \|\| FuseKind == FuseInc;
				case X86::INC16r:
				case X86::INC32r:
				case X86::INC64r:
				case X86::INC8r:
				case X86::DEC16r:
				case X86::DEC32r:
				case X86::DEC64r:
				case X86::DEC8r:
				return FuseKind == FuseInc;
				case X86::INSTRUCTION_LIST_END:
				return true;
				}
				}

				/// \brief Post-process the DAG to create cluster edges between instructions
				/// that may be fused by the processor into a single operation.
				class X86MacroFusion : public ScheduleDAGMutation {
				public:
				X86MacroFusion() {}

				void apply(ScheduleDAGInstrs *DAGInstrs) override;
				};

				void X86MacroFusion::apply(ScheduleDAGInstrs *DAGInstrs) {
				ScheduleDAGMI DAG = static_cast<ScheduleDAGMI>(DAGInstrs);
				const X86Subtarget &ST = DAG->MF.getSubtarget<X86Subtarget>();

				// For now, assume targets can only fuse with the branch.
				SUnit &ExitSU = DAG->ExitSU;
				MachineInstr *Branch = ExitSU.getInstr();
				if (!shouldScheduleAdjacent(ST, nullptr, Branch))
				return;

				for (SDep &PredDep : ExitSU.Preds) {
				if (PredDep.isWeak())
				continue;
				SUnit &SU = *PredDep.getSUnit();
				MachineInstr &Pred = *SU.getInstr();
				if (!shouldScheduleAdjacent(ST, &Pred, Branch))
				continue;

				// Create a single weak edge from SU to ExitSU. The only effect is to cause
				// bottom-up scheduling to heavily prioritize the clustered SU. There is no
				// need to copy predecessor edges from ExitSU to SU, since top-down
				// scheduling cannot prioritize ExitSU anyway. To defer top-down scheduling
				// of SU, we could create an artificial edge from the deepest root, but it
				// hasn't been needed yet.
				bool Success = DAG->addEdge(&ExitSU, SDep(&SU, SDep::Cluster));
				(void)Success;
				assert(Success && "No DAG nodes should be reachable from ExitSU");

				// Adjust latency of data deps between the nodes.
				for (SDep &PredDep : ExitSU.Preds)
				if (PredDep.getSUnit() == &SU)
				PredDep.setLatency(0);
				for (SDep &SuccDep : SU.Succs)
				if (SuccDep.getSUnit() == &ExitSU)
				SuccDep.setLatency(0);

				DEBUG(dbgs() << "Macro fuse ";
				SU.print(dbgs(), DAG);
				dbgs() << " - ExitSU" << '\n');

				break;
				}
				}

				} // end namespace

				namespace llvm {

				std::unique_ptr<ScheduleDAGMutation>
				createX86MacroFusionDAGMutation () {
				return EnableMacroFusion ? make_unique<X86MacroFusion>() : nullptr;
				}

				} // end namespace llvm

llvm/trunk/lib/Target/X86/X86TargetMachine.cpp

//===-- X86TargetMachine.cpp - Define TargetMachine for the X86 -----------===//		//===-- X86TargetMachine.cpp - Define TargetMachine for the X86 -----------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines the X86 specific subclass of TargetMachine.		// This file defines the X86 specific subclass of TargetMachine.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "X86TargetMachine.h"		#include "X86TargetMachine.h"
#include "X86.h"		#include "X86.h"
#include "X86CallLowering.h"		#include "X86CallLowering.h"
		#include "X86MacroFusion.h"
#include "X86TargetObjectFile.h"		#include "X86TargetObjectFile.h"
#include "X86TargetTransformInfo.h"		#include "X86TargetTransformInfo.h"
#include "llvm/CodeGen/GlobalISel/GISelAccessor.h"		#include "llvm/CodeGen/GlobalISel/GISelAccessor.h"
#include "llvm/CodeGen/GlobalISel/IRTranslator.h"		#include "llvm/CodeGen/GlobalISel/IRTranslator.h"
#include "llvm/CodeGen/MachineScheduler.h"		#include "llvm/CodeGen/MachineScheduler.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
▲ Show 20 Lines • Show All 259 Lines • ▼ Show 20 Lines	public:

X86TargetMachine &getX86TargetMachine() const {		X86TargetMachine &getX86TargetMachine() const {
return getTM<X86TargetMachine>();		return getTM<X86TargetMachine>();
}		}

ScheduleDAGInstrs *		ScheduleDAGInstrs *
createMachineScheduler(MachineSchedContext *C) const override {		createMachineScheduler(MachineSchedContext *C) const override {
ScheduleDAGMILive *DAG = createGenericSchedLive(C);		ScheduleDAGMILive *DAG = createGenericSchedLive(C);
DAG->addMutation(createMacroFusionDAGMutation(DAG->TII));		DAG->addMutation(createX86MacroFusionDAGMutation());
return DAG;		return DAG;
}		}

void addIRPasses() override;		void addIRPasses() override;
bool addInstSelector() override;		bool addInstSelector() override;
#ifdef LLVM_BUILD_GLOBAL_ISEL		#ifdef LLVM_BUILD_GLOBAL_ISEL
bool addIRTranslator() override;		bool addIRTranslator() override;
bool addLegalizeMachineIR() override;		bool addLegalizeMachineIR() override;
▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/misched-fusion.ll

	; RUN: llc -o - %s -mattr=+arith-cbz-fusion \| FileCheck %s			; RUN: llc -o - %s -mattr=+arith-cbz-fusion \| FileCheck %s
	; RUN: llc -o - %s -mcpu=cyclone \| FileCheck %s			; RUN: llc -o - %s -mcpu=cyclone \| FileCheck %s

	target triple = "arm64-apple-ios"			target triple = "aarch64-unknown"

	declare void @foobar(i32 %v0, i32 %v1)			declare void @foobar(i32 %v0, i32 %v1)

	; Make sure sub is scheduled in front of cbnz			; Make sure sub is scheduled in front of cbnz
	; CHECK-LABEL: test_sub_cbz:			; CHECK-LABEL: test_sub_cbz:
	; CHECK: add w[[ADDRES:[0-9]+]], w1, #7
	; CHECK: sub w[[SUBRES:[0-9]+]], w0, #13			; CHECK: sub w[[SUBRES:[0-9]+]], w0, #13
	; CHECK-NEXT: cbnz w[[SUBRES]], [[SKIPBLOCK:LBB[0-9_]+]]			; CHECK-NEXT: cbnz w[[SUBRES]], {{.?LBB[0-9_]+}}
	; CHECK: mov [[REGTY:[x,w]]]0, [[REGTY]][[ADDRES]]
	; CHECK: mov [[REGTY]]1, [[REGTY]][[SUBRES]]
	; CHECK: bl _foobar
	; CHECK: [[SKIPBLOCK]]:
	; CHECK: mov [[REGTY]]0, [[REGTY]][[SUBRES]]
	; CHECK: mov [[REGTY]]1, [[REGTY]][[ADDRES]]
	; CHECK: bl _foobar
	define void @test_sub_cbz(i32 %a0, i32 %a1) {			define void @test_sub_cbz(i32 %a0, i32 %a1) {
	entry:			entry:
	; except for the fusion opportunity the sub/add should be equal so the			; except for the fusion opportunity the sub/add should be equal so the
	; scheduler would leave them in source order if it weren't for the scheduling			; scheduler would leave them in source order if it weren't for the scheduling
	%v0 = sub i32 %a0, 13			%v0 = sub i32 %a0, 13
	%cond = icmp eq i32 %v0, 0			%cond = icmp eq i32 %v0, 0
	%v1 = add i32 %a1, 7			%v1 = add i32 %a1, 7
	br i1 %cond, label %if, label %exit			br i1 %cond, label %if, label %exit
	Show All 9 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Move MacroFusion to the targetClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 86556

llvm/trunk/include/llvm/CodeGen/MachineScheduler.h

llvm/trunk/include/llvm/Target/TargetInstrInfo.h

llvm/trunk/lib/CodeGen/MachineScheduler.cpp

llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.h

llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp

llvm/trunk/lib/Target/AArch64/AArch64MacroFusion.h

llvm/trunk/lib/Target/AArch64/AArch64MacroFusion.cpp

llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.cpp

llvm/trunk/lib/Target/AArch64/CMakeLists.txt

llvm/trunk/lib/Target/X86/CMakeLists.txt

llvm/trunk/lib/Target/X86/X86InstrInfo.h

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

llvm/trunk/lib/Target/X86/X86MacroFusion.h

llvm/trunk/lib/Target/X86/X86MacroFusion.cpp

llvm/trunk/lib/Target/X86/X86TargetMachine.cpp

llvm/trunk/test/CodeGen/AArch64/misched-fusion.ll

[CodeGen] Move MacroFusion to the target
ClosedPublic