This is an archive of the discontinued LLVM Phabricator instance.

Review for machine combiner pass
Needs ReviewPublic

Authored by Gerolf on Jul 2 2014, 10:26 PM.

Download Raw Diff

Details

Reviewers

Summary

The late machine instruction combiner may replace an instruction
sequence by combined instruction(s) when it is beneficial to do so. It provides
the infrastructure to evaluate instruction combining patterns like mul+add->madd
based on machine trace information. Currently the DAG Combiner greedily
generates combined instructions, which usually is a win for code size, but
unfortunately can cause performance losses. To remedy this the new pass changes
the logic from always generate the combiner instruction(s) to only do so when it
is beneficial.

Diff Detail

Event Timeline

Gerolf updated this revision to Diff 11041.Jul 2 2014, 10:26 PM

Gerolf retitled this revision from to Review for machine combiner pass.

Gerolf updated this object.

Gerolf edited the test plan for this revision. (Show Details)

Gerolf added a reviewer: Gerolf.

Gerolf added a subscriber: Gerolf.

Herald added a subscriber: mcrosier. · View Herald TranscriptJul 2 2014, 10:26 PM

echristo added a reviewer: echristo.Jul 2 2014, 10:45 PM

OlegM added a subscriber: OlegM.Jul 3 2014, 2:47 AM

Hi,
Do you have plans to enable this optimization as well for x86 FMA instructions?

zinovy.nis added a subscriber: zinovy.nis.Jul 3 2014, 7:40 AM

Hi,

I have no plans for x86, but the design is supposed to make adding other targets easy. Please review the code from that angle and let me know of any issue. I will follow up on these aspects as part of this review.

To support a new target you have to add header with pattern enums, provide the implementation of the combiner interface and disable combinations in the DAG combiner.

Cheers
Gerolf

silviu.baranga added a subscriber: silviu.baranga.Jul 6 2014, 3:22 AM

Hi Gerolf,

This looks like a widely applicable change, and great for OOO cores but I have no data for this. Do you happen to have some (how frequent does the optimization fire in lnt, how much does it shorten the critical path, etc)?

Some comments about the design:

Ideally the instruction selection algorithm would only need the already existing DAG patterns and the MachineTraceMetrics to be able to make these decisions.

The target specific code for your use case seems to be inferable from the existing selection patterns.
Would it be possible to derive the new selection patterns from the already existing tablegen patterns? I suspect the answer would be yes. for the simpler cases, while the more complex ones would require custom handling.

Even if the new patterns are not derivable from the existing ones, I think there is an opportunity to generate the new selection code through tablegen. This might require some changes to the infrastructure, but in principle this would reduce the amount of code required to add support for a new instruction. With the tablegen changes, the size of the AArch64 code would be 2 lines in a .td file. I think your use case is not unique, so doing this would reduce the amount required to do further changes.

Hope this helps!

Cheers,
Silviu

silviu.baranga added inline comments.Jul 6 2014, 5:57 AM

lib/Target/AArch64/AArch64InstrFormats.td
1356	Doing this will disable MADD/MSUB generation for in-order cores (for example Cortex-A53). Maybe guard this with a predicate?

FWIW Silviu's comments precisely mirror my own that I just hadn't had a chance to work up yet. :)

-eric

We could have a target specific option like always combine. In principle that should have the same effect.

Gerolf

mcrosier removed a subscriber: mcrosier.Jul 7 2014, 7:24 PM

mcrosier added a subscriber: mcrosier.

I certainly agree with everyone else, it would be really nice to generate these replacements using TableGen. I can think of two possible ways of doing this:

For all existing patterns of non-trivial complexity, look at how each of the individual pieces would match. Collect those instructions in a corresponding tree and match against that tree to find the instruction with a higher-complexity pattern.
Specify separate special "output->output" TableGen patterns, and generate the mapping from those.

If we really believe that this is the "right" way to match instructions with complex pattern inputs, then we should probably try for (1) and use this to completely replace the existing complex-pattern-matching infrastructure (for everything except immediates or other "free" operands).

include/llvm/Target/TargetInstrInfo.h
570	All potential pattern a listed -> All potential patterns are returned
572	Why is this restricted to binary instructions?
582	make the call -> decide
584	(Likewise, why binary?)
588	old instruction including Root that could -> old instructions, including Root, that could
lib/CodeGen/MachineCombiner.cpp
121	This seems like an unnecessary restriction. Why don't you use a SmallVector?
144	If you use a SmallVector, you can replace the 16 here with InstrDepth.size().
239	You don't need to repeat this comment.
241	This loop is the same as the previous one, please make this a function.
285	What "original code" are you referring to? Do you mean the code in DAGCombine?
285	replace -> replaced
301	This ordering should be mentioned in the header where genAlternativeCodeSequence is declared.
lib/CodeGen/MachineTraceMetrics.cpp
1250	Exactly what are you proposing?
lib/Target/AArch64/AArch64InstrInfo.cpp
686	Is this a separable (or unrelated) change?

Changes:

Added bool alwaysCombine() (Target/TargetInstrInfo.h) so targets

can decide to always replace a given pattern. This should be equivalent to
the current code in DAGCombine when a given pattern is disabled.

InstrDepth is now a small vector (MachineCombiner.cpp)
Added helper function instr2instrSC (MachineCombiner.cpp)
Improved comments as suggested by reviewers

Just some minor issues that I've noticed (comments inlined). Otherwise looks good.

Cheers,
Silviu

include/llvm/Target/TargetInstrInfo.h
609	The insertMove method isn't used anywhere. Could it be removed?
lib/CodeGen/MachineTraceMetrics.cpp
1228	This duplicates (almost) the code above. I think these should be merged.
1247	This FIXME comment now seems confusing since you already added the fix.
test/CodeGen/AArch64/aarch64-neon-mul-div.ll
82 ↗	(On Diff #11483)	How are the div tests related to this change?

Minor changes:
a) Cleaned up tests in test/CodeGen/AArch64: arm64-neon-mul-div.ll,
dp-3source.ll and mul-lohi.ll. The only change to a critical path
length is in the single block of mul-lohi.ll: generating mul-add
instead of madd shortens cpl from 16 to 8 cycles. Removed previous
tests since they were too elaborate for this commit.
b) Added capture extraCycles() in MachineTraceMetrics.cpp
c) Removed insertMove() in TargetInstInfo.h
d) Comments/Format changes

Please see below. Changes are on phabricator

http://reviews.llvm.org/D4367

Hi Gerolf,

A few comments inline and a general question:

It seems like the matching infrastructure is very verbose on the backend level. What alternatives do we have to make this smaller? Perhaps something ala the existing pattern match infrastructure we have for IR? Something else? It looks even more painful than DAGCombines at the moment to add things :)

Thanks!

-eric

lib/CodeGen/MachineCombiner.cpp
115	Seems to be a very long function... could you break it up into computing the path and then making the determination?
217	Can use a typedef or a SmallVectorImpl instead of writing the number all over the place. Should help with formatting.
284	"beneficial"
321	This conditional is a little hard to read - some way to break it up/hoist things out?
327	Extra space.
lib/Target/AArch64/AArch64InstrInfo.cpp
2103	The name here is a bit limiting. What if you want to combine something else in the future? Same with the rest of these helpers.
2172	Unnecessary cast?
2191	"All potential patterns are..."
2195	This code looks like it could be written ala include/llvm/IR/PatternMatch.h?
2304	Documentation for these functions describing the incoming variables, constraints, etc.
lib/Target/AArch64/AArch64InstrInfo.h
158	"All potential patterns are..."

Minor cleanups + added comments.
New functions getDepth() and getLatency() in MachineCombiner.cpp

to simplify preservesCriticalPathLen().

New function doSubstitute() to make a conditional in combineInstructions()

Revision Contents

Path

Size

include/

llvm/

CodeGen/

MachineCombinerPattern.h

29 lines

MachineTraceMetrics.h

11 lines

Passes.h

4 lines

TargetSchedule.h

1 line

InitializePasses.h

1 line

Target/

TargetInstrInfo.h

40 lines

lib/

CodeGen/

1 line

1 line

429 lines

18 lines

MachineTraceMetrics.cpp

59 lines

TargetSchedule.cpp

22 lines

Target/

AArch64/

AArch64InstrFormats.td

5 lines

AArch64InstrInfo.h

17 lines

AArch64InstrInfo.cpp

471 lines

AArch64TargetMachine.cpp

6 lines

test/

CodeGen/

AArch64/

arm64-neon-mul-div.ll

10 lines

dp-3source.ll

2 lines

mul-lohi.ll

12 lines

Diff 11801

include/llvm/CodeGen/MachineCombinerPattern.h

This file was added.

				//===-- llvm/CodeGen/MachineCombinerPattern.h - Instruction pattern supported by
				// combiner ------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines instruction pattern supported by combiner
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CODEGEN_MACHINECOMBINERPATTERN_H
				#define LLVM_CODEGEN_MACHINECOMBINERPATTERN_H

				namespace llvm {

				/// Enumeration of instruction pattern supported by machine combiner
				///
				///
				namespace MachineCombinerPattern {
				// Forward declaration
				enum MC_PATTERN : int;
				} // end namespace MachineCombinerPattern
				} // end namespace llvm

				#endif

include/llvm/CodeGen/MachineTraceMetrics.h

Show First 20 Lines • Show All 258 Lines • ▼ Show 20 Lines	public:
/// required to execute the instructions in the trace if they were all		/// required to execute the instructions in the trace if they were all
/// independent, exposing the maximum instruction-level parallelism.		/// independent, exposing the maximum instruction-level parallelism.
///		///
/// Any blocks in Extrablocks are included as if they were part of the		/// Any blocks in Extrablocks are included as if they were part of the
/// trace. Likewise, extra resources required by the specified scheduling		/// trace. Likewise, extra resources required by the specified scheduling
/// classes are included. For the caller to account for extra machine		/// classes are included. For the caller to account for extra machine
/// instructions, it must first resolve each instruction's scheduling class.		/// instructions, it must first resolve each instruction's scheduling class.
unsigned getResourceLength(		unsigned getResourceLength(
ArrayRef<const MachineBasicBlock*> Extrablocks = None,		ArrayRef<const MachineBasicBlock *> Extrablocks = None,
ArrayRef<const MCSchedClassDesc*> ExtraInstrs = None) const;		ArrayRef<const MCSchedClassDesc *> ExtraInstrs = None,
		ArrayRef<const MCSchedClassDesc *> RemoveInstrs = None) const;

/// Return the length of the (data dependency) critical path through the		/// Return the length of the (data dependency) critical path through the
/// trace.		/// trace.
unsigned getCriticalPath() const { return TBI.CriticalPath; }		unsigned getCriticalPath() const { return TBI.CriticalPath; }

/// Return the depth and height of MI. The depth is only valid for		/// Return the depth and height of MI. The depth is only valid for
/// instructions in or above the trace center block. The height is only		/// instructions in or above the trace center block. The height is only
/// valid for instructions in or below the trace center block.		/// valid for instructions in or below the trace center block.
InstrCycles getInstrCycles(const MachineInstr *MI) const {		InstrCycles getInstrCycles(const MachineInstr *MI) const {
return TE.Cycles.lookup(MI);		return TE.Cycles.lookup(MI);
}		}

/// Return the slack of MI. This is the number of cycles MI can be delayed		/// Return the slack of MI. This is the number of cycles MI can be delayed
/// before the critical path becomes longer.		/// before the critical path becomes longer.
/// MI must be an instruction in the trace center block.		/// MI must be an instruction in the trace center block.
unsigned getInstrSlack(const MachineInstr *MI) const;		unsigned getInstrSlack(const MachineInstr *MI) const;

/// Return the Depth of a PHI instruction in a trace center block successor.		/// Return the Depth of a PHI instruction in a trace center block successor.
/// The PHI does not have to be part of the trace.		/// The PHI does not have to be part of the trace.
unsigned getPHIDepth(const MachineInstr *PHI) const;		unsigned getPHIDepth(const MachineInstr *PHI) const;

		/// A dependence is useful if the basic block of the defining instruction
		/// is part of the trace of the user instruction. It is assumed that DefMI
		/// dominates UseMI (see also isUsefulDominator).
		bool isDepInTrace(const MachineInstr *DefMI,
		const MachineInstr *UseMI) const;
};		};

/// A trace ensemble is a collection of traces selected using the same		/// A trace ensemble is a collection of traces selected using the same
/// strategy, for example 'minimum resource height'. There is one trace for		/// strategy, for example 'minimum resource height'. There is one trace for
/// every block in the function.		/// every block in the function.
class Ensemble {		class Ensemble {
SmallVector<TraceBlockInfo, 4> BlockInfo;		SmallVector<TraceBlockInfo, 4> BlockInfo;
DenseMap<const MachineInstr*, InstrCycles> Cycles;		DenseMap<const MachineInstr*, InstrCycles> Cycles;
▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 483 Lines • ▼ Show 20 Lines	/// MachineDominanaceFrontier - This pass is a machine dominators analysis pass.
/// MachineTraceMetrics - This pass computes critical path and CPU resource		/// MachineTraceMetrics - This pass computes critical path and CPU resource
/// usage in an ensemble of traces.		/// usage in an ensemble of traces.
extern char &MachineTraceMetricsID;		extern char &MachineTraceMetricsID;

/// EarlyIfConverter - This pass performs if-conversion on SSA form by		/// EarlyIfConverter - This pass performs if-conversion on SSA form by
/// inserting cmov instructions.		/// inserting cmov instructions.
extern char &EarlyIfConverterID;		extern char &EarlyIfConverterID;

		/// This pass performs instruction combining using trace metrics to estimate
		/// critical-path and resource depth.
		extern char &MachineCombinerID;

/// StackSlotColoring - This pass performs stack coloring and merging.		/// StackSlotColoring - This pass performs stack coloring and merging.
/// It merges disjoint allocas to reduce the stack size.		/// It merges disjoint allocas to reduce the stack size.
extern char &StackColoringID;		extern char &StackColoringID;

/// IfConverter - This pass performs machine code if conversion.		/// IfConverter - This pass performs machine code if conversion.
extern char &IfConverterID;		extern char &IfConverterID;

/// MachineBlockPlacement - This pass places basic blocks based on branch		/// MachineBlockPlacement - This pass places basic blocks based on branch
▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

include/llvm/CodeGen/TargetSchedule.h

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	#endif
/// occasionally useful to help estimate instruction cost.		/// occasionally useful to help estimate instruction cost.
///		///
/// If UseDefaultDefLatency is false and no new machine sched model is		/// If UseDefaultDefLatency is false and no new machine sched model is
/// present this method falls back to TII->getInstrLatency with an empty		/// present this method falls back to TII->getInstrLatency with an empty
/// instruction itinerary (this is so we preserve the previous behavior of the		/// instruction itinerary (this is so we preserve the previous behavior of the
/// if converter after moving it to TargetSchedModel).		/// if converter after moving it to TargetSchedModel).
unsigned computeInstrLatency(const MachineInstr *MI,		unsigned computeInstrLatency(const MachineInstr *MI,
bool UseDefaultDefLatency = true) const;		bool UseDefaultDefLatency = true) const;
		unsigned computeInstrLatency(unsigned Opcode) const;

/// \brief Output dependency latency of a pair of defs of the same register.		/// \brief Output dependency latency of a pair of defs of the same register.
///		///
/// This is typically one cycle.		/// This is typically one cycle.
unsigned computeOutputLatency(const MachineInstr *DefMI, unsigned DefIdx,		unsigned computeOutputLatency(const MachineInstr *DefMI, unsigned DefIdx,
const MachineInstr *DepMI) const;		const MachineInstr *DepMI) const;
};		};

} // namespace llvm		} // namespace llvm

#endif		#endif

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 271 Lines • ▼ Show 20 Lines
	void initializeInstSimplifierPass(PassRegistry&);			void initializeInstSimplifierPass(PassRegistry&);
	void initializeUnpackMachineBundlesPass(PassRegistry&);			void initializeUnpackMachineBundlesPass(PassRegistry&);
	void initializeFinalizeMachineBundlesPass(PassRegistry&);			void initializeFinalizeMachineBundlesPass(PassRegistry&);
	void initializeLoopVectorizePass(PassRegistry&);			void initializeLoopVectorizePass(PassRegistry&);
	void initializeSLPVectorizerPass(PassRegistry&);			void initializeSLPVectorizerPass(PassRegistry&);
	void initializeBBVectorizePass(PassRegistry&);			void initializeBBVectorizePass(PassRegistry&);
	void initializeMachineFunctionPrinterPassPass(PassRegistry&);			void initializeMachineFunctionPrinterPassPass(PassRegistry&);
	void initializeStackMapLivenessPass(PassRegistry&);			void initializeStackMapLivenessPass(PassRegistry&);
				void initializeMachineCombinerPass(PassRegistry &);
	void initializeLoadCombinePass(PassRegistry&);			void initializeLoadCombinePass(PassRegistry&);
	}			}

	#endif			#endif

include/llvm/Target/TargetInstrInfo.h

Show All 9 Lines
// This file describes the target machine instruction set to the code generator.		// This file describes the target machine instruction set to the code generator.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TARGET_TARGETINSTRINFO_H		#ifndef LLVM_TARGET_TARGETINSTRINFO_H
#define LLVM_TARGET_TARGETINSTRINFO_H		#define LLVM_TARGET_TARGETINSTRINFO_H

#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
		#include "llvm/ADT/DenseMap.h"
#include "llvm/CodeGen/DFAPacketizer.h"		#include "llvm/CodeGen/DFAPacketizer.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
		#include "llvm/CodeGen/MachineCombinerPattern.h"
#include "llvm/MC/MCInstrInfo.h"		#include "llvm/MC/MCInstrInfo.h"
		#include "llvm/Target/TargetRegisterInfo.h"

namespace llvm {		namespace llvm {

class InstrItineraryData;		class InstrItineraryData;
class LiveVariables;		class LiveVariables;
class MCAsmInfo;		class MCAsmInfo;
class MachineMemOperand;		class MachineMemOperand;
class MachineRegisterInfo;		class MachineRegisterInfo;
▲ Show 20 Lines • Show All 529 Lines • ▼ Show 20 Lines	public:

/// foldMemoryOperand - Same as the previous version except it allows folding		/// foldMemoryOperand - Same as the previous version except it allows folding
/// of any load and store from / to any address, not just from a specific		/// of any load and store from / to any address, not just from a specific
/// stack slot.		/// stack slot.
MachineInstr* foldMemoryOperand(MachineBasicBlock::iterator MI,		MachineInstr* foldMemoryOperand(MachineBasicBlock::iterator MI,
const SmallVectorImpl<unsigned> &Ops,		const SmallVectorImpl<unsigned> &Ops,
MachineInstr* LoadMI) const;		MachineInstr* LoadMI) const;

		/// hasPattern - return true when there is potentially a faster code sequence
		/// for an instruction chain ending in \p Root. All potential pattern are
		hfinkelUnsubmitted Not Done Reply Inline Actions All potential pattern a listed -> All potential patterns are returned hfinkel: All potential pattern a listed -> All potential patterns are returned
		/// returned in the \p Pattern vector. Pattern should be sorted in priority
		/// order since the pattern evaluator stops checking as soon as it finds a
		hfinkelUnsubmitted Not Done Reply Inline Actions Why is this restricted to binary instructions? hfinkel: Why is this restricted to binary instructions?
		/// faster sequence.
		/// \param Root - Instruction that could be combined with one of its operands
		/// \param Pattern - Vector of possible combination pattern

		virtual bool hasPattern(
		MachineInstr &Root,
		SmallVectorImpl<MachineCombinerPattern::MC_PATTERN> &Pattern) const {
		return false;
		}

		hfinkelUnsubmitted Not Done Reply Inline Actions make the call -> decide hfinkel: make the call -> decide
		/// genAlternativeCodeSequence - when hasPattern() finds a pattern this
		/// function generates the instructions that could replace the original code
		hfinkelUnsubmitted Not Done Reply Inline Actions (Likewise, why binary?) hfinkel: (Likewise, why binary?)
		/// sequence. The client has to decide whether the actual replacementment is
		/// beneficial or not.
		/// \param Root - Instruction that could be combined with one of its operands
		/// \param P - Combination pattern for Root
		hfinkelUnsubmitted Not Done Reply Inline Actions old instruction including Root that could -> old instructions, including Root, that could hfinkel: old instruction including Root that could -> old instructions, including Root, that could
		/// \param InsInstr - Vector of new instructions that implement P
		/// \param DelInstr - Old instructions, including Root, that could be replaced
		/// by InsInstr
		/// \param InstrIdxForVirtReg - map of virtual register to instruction in
		/// InsInstr that defines it
		virtual void genAlternativeCodeSequence(
		MachineInstr &Root, MachineCombinerPattern::MC_PATTERN P,
		SmallVectorImpl<MachineInstr *> &InsInstrs,
		SmallVectorImpl<MachineInstr *> &DelInstrs,
		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {
		return;
		}

		/// alwaysCombine - return true when a given code sequence should always
		/// be replaced when it can be combined
		virtual bool alwaysCombine(void) const { return false; }

protected:		protected:
/// foldMemoryOperandImpl - Target-dependent implementation for		/// foldMemoryOperandImpl - Target-dependent implementation for
/// foldMemoryOperand. Target-independent code in foldMemoryOperand will		/// foldMemoryOperand. Target-independent code in foldMemoryOperand will
/// take care of adding a MachineMemOperand to the newly created instruction.		/// take care of adding a MachineMemOperand to the newly created instruction.
		silviu.barangaUnsubmitted Not Done Reply Inline Actions The insertMove method isn't used anywhere. Could it be removed? silviu.baranga: The insertMove method isn't used anywhere. Could it be removed?
virtual MachineInstr* foldMemoryOperandImpl(MachineFunction &MF,		virtual MachineInstr* foldMemoryOperandImpl(MachineFunction &MF,
MachineInstr* MI,		MachineInstr* MI,
const SmallVectorImpl<unsigned> &Ops,		const SmallVectorImpl<unsigned> &Ops,
int FrameIndex) const {		int FrameIndex) const {
return nullptr;		return nullptr;
}		}

/// foldMemoryOperandImpl - Target-dependent implementation for		/// foldMemoryOperandImpl - Target-dependent implementation for
▲ Show 20 Lines • Show All 454 Lines • Show Last 20 Lines

lib/CodeGen/CMakeLists.txt

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMCodeGen
LiveVariables.cpp		LiveVariables.cpp
LocalStackSlotAllocation.cpp		LocalStackSlotAllocation.cpp
MachineBasicBlock.cpp		MachineBasicBlock.cpp
MachineBlockFrequencyInfo.cpp		MachineBlockFrequencyInfo.cpp
MachineBlockPlacement.cpp		MachineBlockPlacement.cpp
MachineBranchProbabilityInfo.cpp		MachineBranchProbabilityInfo.cpp
MachineCSE.cpp		MachineCSE.cpp
MachineCodeEmitter.cpp		MachineCodeEmitter.cpp
		MachineCombiner.cpp
MachineCopyPropagation.cpp		MachineCopyPropagation.cpp
MachineDominators.cpp		MachineDominators.cpp
MachineDominanceFrontier.cpp		MachineDominanceFrontier.cpp
MachineFunction.cpp		MachineFunction.cpp
MachineFunctionAnalysis.cpp		MachineFunctionAnalysis.cpp
MachineFunctionPass.cpp		MachineFunctionPass.cpp
MachineFunctionPrinterPass.cpp		MachineFunctionPrinterPass.cpp
MachineInstr.cpp		MachineInstr.cpp
▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

lib/CodeGen/CodeGen.cpp

Show All 35 Lines	void llvm::initializeCodeGen(PassRegistry &Registry) {
initializeLiveIntervalsPass(Registry);		initializeLiveIntervalsPass(Registry);
initializeLiveStacksPass(Registry);		initializeLiveStacksPass(Registry);
initializeLiveVariablesPass(Registry);		initializeLiveVariablesPass(Registry);
initializeLocalStackSlotPassPass(Registry);		initializeLocalStackSlotPassPass(Registry);
initializeMachineBlockFrequencyInfoPass(Registry);		initializeMachineBlockFrequencyInfoPass(Registry);
initializeMachineBlockPlacementPass(Registry);		initializeMachineBlockPlacementPass(Registry);
initializeMachineBlockPlacementStatsPass(Registry);		initializeMachineBlockPlacementStatsPass(Registry);
initializeMachineCopyPropagationPass(Registry);		initializeMachineCopyPropagationPass(Registry);
		initializeMachineCombinerPass(Registry);
initializeMachineCSEPass(Registry);		initializeMachineCSEPass(Registry);
initializeMachineDominatorTreePass(Registry);		initializeMachineDominatorTreePass(Registry);
initializeMachinePostDominatorTreePass(Registry);		initializeMachinePostDominatorTreePass(Registry);
initializeMachineLICMPass(Registry);		initializeMachineLICMPass(Registry);
initializeMachineLoopInfoPass(Registry);		initializeMachineLoopInfoPass(Registry);
initializeMachineModuleInfoPass(Registry);		initializeMachineModuleInfoPass(Registry);
initializeMachineSchedulerPass(Registry);		initializeMachineSchedulerPass(Registry);
initializeMachineSinkingPass(Registry);		initializeMachineSinkingPass(Registry);
Show All 29 Lines

lib/CodeGen/MachineCombiner.cpp

This file was added.

				//===---- MachineCombiner.cpp - Instcombining on SSA form machine code ----===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// The machine combiner pass uses machine trace metrics to ensure the combined
				// instructions does not lengthen the critical path or the resource depth.
				//===----------------------------------------------------------------------===//
				#define DEBUG_TYPE "machine-combiner"

				#include "llvm/ADT/Statistic.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/CodeGen/MachineDominators.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineLoopInfo.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/MachineTraceMetrics.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/CodeGen/TargetSchedule.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetInstrInfo.h"
				#include "llvm/Target/TargetRegisterInfo.h"
				#include "llvm/Target/TargetSubtargetInfo.h"

				using namespace llvm;

				STATISTIC(NumInstCombined, "Number of machineinst combined");

				namespace {
				class MachineCombiner : public MachineFunctionPass {
				const TargetInstrInfo *TII;
				const TargetRegisterInfo *TRI;
				const MCSchedModel *SchedModel;
				MachineRegisterInfo *MRI;
				MachineTraceMetrics *Traces;
				MachineTraceMetrics::Ensemble *MinInstr;

				TargetSchedModel TSchedModel;

				/// OptSize - True if optimizing for code size.
				bool OptSize;

				public:
				static char ID;
				MachineCombiner() : MachineFunctionPass(ID) {
				initializeMachineCombinerPass(*PassRegistry::getPassRegistry());
				}
				void getAnalysisUsage(AnalysisUsage &AU) const override;
				bool runOnMachineFunction(MachineFunction &MF) override;
				const char *getPassName() const override { return "Machine InstCombiner"; }

				private:
				bool doSubstitute(unsigned NewSize, unsigned OldSize);
				bool combineInstructions(MachineBasicBlock *);
				MachineInstr *getOperandDef(const MachineOperand &MO);
				unsigned getDepth(SmallVectorImpl<MachineInstr *> &InsInstrs,
				DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,
				MachineTraceMetrics::Trace BlockTrace);
				unsigned getLatency(MachineInstr Root, MachineInstr NewRoot,
				MachineTraceMetrics::Trace BlockTrace);
				bool
				preservesCriticalPathLen(MachineBasicBlock MBB, MachineInstr Root,
				MachineTraceMetrics::Trace BlockTrace,
				SmallVectorImpl<MachineInstr *> &InsInstrs,
				DenseMap<unsigned, unsigned> &InstrIdxForVirtReg);
				bool preservesResourceLen(MachineBasicBlock *MBB,
				MachineTraceMetrics::Trace BlockTrace,
				SmallVectorImpl<MachineInstr *> &InsInstrs,
				SmallVectorImpl<MachineInstr *> &DelInstrs);
				void instr2instrSC(SmallVectorImpl<MachineInstr *> &Instrs,
				SmallVectorImpl<const MCSchedClassDesc *> &InstrsSC);
				};
				}

				char MachineCombiner::ID = 0;
				char &llvm::MachineCombinerID = MachineCombiner::ID;

				INITIALIZE_PASS_BEGIN(MachineCombiner, "machine-combiner",
				"Machine InstCombiner", false, false)
				INITIALIZE_PASS_DEPENDENCY(MachineTraceMetrics)
				INITIALIZE_PASS_END(MachineCombiner, "machine-combiner", "Machine InstCombiner",
				false, false)

				void MachineCombiner::getAnalysisUsage(AnalysisUsage &AU) const {
				AU.setPreservesCFG();
				AU.addPreserved<MachineDominatorTree>();
				AU.addPreserved<MachineLoopInfo>();
				AU.addRequired<MachineTraceMetrics>();
				AU.addPreserved<MachineTraceMetrics>();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				MachineInstr *MachineCombiner::getOperandDef(const MachineOperand &MO) {
				MachineInstr *DefInstr = nullptr;
				// We need a virtual register definition.
				if (MO.isReg() && TargetRegisterInfo::isVirtualRegister(MO.getReg()))
				DefInstr = MRI->getUniqueVRegDef(MO.getReg());
				// PHI's have no depth etc.
				if (DefInstr && DefInstr->isPHI())
				DefInstr = nullptr;
				return DefInstr;
				}

				/// getDepth - Computes depth of instructions in vector \InsInstr.
				///
				/// \param InsInstrs is a vector of machine instructions
				/// \param InstrIdxForVirtReg is a dense map of virtual register to index
				echristoUnsubmitted Not Done Reply Inline Actions Seems to be a very long function... could you break it up into computing the path and then making the determination? echristo: Seems to be a very long function... could you break it up into computing the path and then…
				/// of defining machine instruction in \p InsInstrs
				/// \param BlockTrace is a trace of machine instructions
				///
				/// \returns Depth of last instruction in \InsInstrs ("NewRoot")
				unsigned
				MachineCombiner::getDepth(SmallVectorImpl<MachineInstr *> &InsInstrs,
				hfinkelUnsubmitted Not Done Reply Inline Actions This seems like an unnecessary restriction. Why don't you use a SmallVector? hfinkel: This seems like an unnecessary restriction. Why don't you use a SmallVector?
				DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,
				MachineTraceMetrics::Trace BlockTrace) {

				SmallVector<unsigned, 16> InstrDepth;

				// Foreach instruction in in the new sequence compute the depth based on the
				// operands. Use the trace information when possible. For new operands which
				// are tracked in the InstrIdxForVirtReg map depth is looked up in InstrDepth
				for (auto *InstrPtr : InsInstrs) { // for each Use
				unsigned IDepth = 0;
				DEBUG(dbgs() << "NEW INSTR "; InstrPtr->dump(); dbgs() << "\n";);
				for (unsigned i = 0, e = InstrPtr->getNumOperands(); i != e; ++i) {
				const MachineOperand &MO = InstrPtr->getOperand(i);
				// Check for virtual register operand.
				if (!(MO.isReg() && TargetRegisterInfo::isVirtualRegister(MO.getReg())))
				continue;
				if (!MO.isUse())
				continue;
				unsigned DepthOp = 0;
				unsigned LatencyOp = 0;
				DenseMap<unsigned, unsigned>::iterator II =
				InstrIdxForVirtReg.find(MO.getReg());
				if (II != InstrIdxForVirtReg.end()) {
				hfinkelUnsubmitted Not Done Reply Inline Actions If you use a SmallVector, you can replace the 16 here with InstrDepth.size(). hfinkel: If you use a SmallVector, you can replace the 16 here with InstrDepth.size().
				// Operand is new virtual register not in trace
				assert(II->second >= 0 && II->second < InstrDepth.size() &&
				"Bad Index");
				MachineInstr *DefInstr = InsInstrs[II->second];
				assert(DefInstr &&
				"There must be a definition for a new virtual register");
				DepthOp = InstrDepth[II->second];
				LatencyOp = TSchedModel.computeOperandLatency(
				DefInstr, DefInstr->findRegisterDefOperandIdx(MO.getReg()),
				InstrPtr, InstrPtr->findRegisterUseOperandIdx(MO.getReg()));
				} else {
				MachineInstr *DefInstr = getOperandDef(MO);
				if (DefInstr) {
				DepthOp = BlockTrace.getInstrCycles(DefInstr).Depth;
				LatencyOp = TSchedModel.computeOperandLatency(
				DefInstr, DefInstr->findRegisterDefOperandIdx(MO.getReg()),
				InstrPtr, InstrPtr->findRegisterUseOperandIdx(MO.getReg()));
				}
				}
				IDepth = std::max(IDepth, DepthOp + LatencyOp);
				}
				InstrDepth.push_back(IDepth);
				}
				unsigned NewRootIdx = InsInstrs.size() - 1;
				return InstrDepth[NewRootIdx];
				}

				/// getLatency - Computes instruction latency as max of latency of defined
				/// operands
				///
				/// \param Root is a machine instruction that could be replaced by NewRoot.
				/// It is used to compute a more accurate latency information for NewRoot in
				/// case there is a dependent instruction in the same trace (\p BlockTrace)
				/// \param NewRoot is the instruction for which the latency is computed
				/// \param BlockTrace is a trace of machine instructions
				///
				/// \returns Latency of \p NewRoot
				unsigned MachineCombiner::getLatency(MachineInstr Root, MachineInstr NewRoot,
				MachineTraceMetrics::Trace BlockTrace) {

				// Check each definition in NewRoot and compute the latency
				unsigned NewRootLatency = 0;
				for (unsigned i = 0, e = NewRoot->getNumOperands(); i != e; ++i) {
				const MachineOperand &MO = NewRoot->getOperand(i);
				// Check for virtual register operand.
				if (!(MO.isReg() && TargetRegisterInfo::isVirtualRegister(MO.getReg())))
				continue;
				if (!MO.isDef())
				continue;
				// Get the first instruction that uses MO
				MachineRegisterInfo::reg_iterator RI = MRI->reg_begin(MO.getReg());
				RI++;
				MachineInstr *UseMO = RI->getParent();
				unsigned LatencyOp = 0;
				if (UseMO && BlockTrace.isDepInTrace(Root, UseMO)) {
				LatencyOp = TSchedModel.computeOperandLatency(
				NewRoot, NewRoot->findRegisterDefOperandIdx(MO.getReg()), UseMO,
				UseMO->findRegisterUseOperandIdx(MO.getReg()));
				} else {
				LatencyOp = TSchedModel.computeInstrLatency(NewRoot->getOpcode());
				}
				NewRootLatency = std::max(NewRootLatency, LatencyOp);
				}
				return NewRootLatency;
				}

				/// preservesCriticalPathlen - True when the new instruction sequence does not
				/// lengthen the critical path. The DAGCombine code sequence ends in MI
				/// (Machine Instruction) Root. The new code sequence ends in MI NewRoot. A
				/// necessary condition for the new sequence to replace the old sequence is that
				/// is cannot lengthen the critical path. This is decided by the formula
				/// (NewRootDepth + NewRootLatency) <= (RootDepth + RootLatency + RootSlack)).
				/// The slack is the number of cycles Root can be delayed before the critical
				echristoUnsubmitted Not Done Reply Inline Actions Can use a typedef or a SmallVectorImpl instead of writing the number all over the place. Should help with formatting. echristo: Can use a typedef or a SmallVectorImpl instead of writing the number all over the place. Should…
				/// patch becomes longer.
				bool MachineCombiner::preservesCriticalPathLen(
				MachineBasicBlock MBB, MachineInstr Root,
				MachineTraceMetrics::Trace BlockTrace,
				SmallVectorImpl<MachineInstr *> &InsInstrs,
				DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) {

				// NewRoot is the last instruction in the \p InsInstrs vector
				// Get depth and latency of NewRoot
				unsigned NewRootIdx = InsInstrs.size() - 1;
				MachineInstr *NewRoot = InsInstrs[NewRootIdx];
				unsigned NewRootDepth = getDepth(InsInstrs, InstrIdxForVirtReg, BlockTrace);
				unsigned NewRootLatency = getLatency(Root, NewRoot, BlockTrace);

				// Get depth, latency and slack of Root
				unsigned RootDepth = BlockTrace.getInstrCycles(Root).Depth;
				unsigned RootLatency = TSchedModel.computeInstrLatency(Root);
				unsigned RootSlack = BlockTrace.getInstrSlack(Root);

				DEBUG(dbgs() << "DEPENDENCE DATA FOR " << Root << "\n";
				dbgs() << " NewRootDepth: " << NewRootDepth
				<< " NewRootLatency: " << NewRootLatency << "\n";
				hfinkelUnsubmitted Not Done Reply Inline Actions You don't need to repeat this comment. hfinkel: You don't need to repeat this comment.
				dbgs() << " RootDepth: " << RootDepth << " RootLatency: " << RootLatency
				<< " RootSlack: " << RootSlack << "\n";
				hfinkelUnsubmitted Not Done Reply Inline Actions This loop is the same as the previous one, please make this a function. hfinkel: This loop is the same as the previous one, please make this a function.
				dbgs() << " NewRootDepth + NewRootLatency "
				<< NewRootDepth + NewRootLatency << "\n";
				dbgs() << " RootDepth + RootLatency + RootSlack "
				<< RootDepth + RootLatency + RootSlack << "\n";);

				/// True when the new sequence does not lenghten the critical path.
				return ((NewRootDepth + NewRootLatency) <=
				(RootDepth + RootLatency + RootSlack));
				}

				/// helper routine to convert instructions into SC
				void MachineCombiner::instr2instrSC(
				SmallVectorImpl<MachineInstr *> &Instrs,
				SmallVectorImpl<const MCSchedClassDesc *> &InstrsSC) {
				for (auto *InstrPtr : Instrs) {
				unsigned Opc = InstrPtr->getOpcode();
				unsigned Idx = TII->get(Opc).getSchedClass();
				const MCSchedClassDesc *SC = SchedModel->getSchedClassDesc(Idx);
				InstrsSC.push_back(SC);
				}
				}
				/// preservesResourceLen - True when the new instructions do not increase
				/// resource length
				bool MachineCombiner::preservesResourceLen(
				MachineBasicBlock *MBB, MachineTraceMetrics::Trace BlockTrace,
				SmallVectorImpl<MachineInstr *> &InsInstrs,
				SmallVectorImpl<MachineInstr *> &DelInstrs) {

				// Compute current resource length

				ArrayRef<const MachineBasicBlock *> MBBarr(MBB);
				unsigned ResLenBeforeCombine = BlockTrace.getResourceLength(MBBarr);

				// Deal with SC rather than Instructions.
				SmallVector<const MCSchedClassDesc *, 16> InsInstrsSC;
				SmallVector<const MCSchedClassDesc *, 16> DelInstrsSC;

				instr2instrSC(InsInstrs, InsInstrsSC);
				instr2instrSC(DelInstrs, DelInstrsSC);

				ArrayRef<const MCSchedClassDesc *> MSCInsArr = makeArrayRef(InsInstrsSC);
				ArrayRef<const MCSchedClassDesc *> MSCDelArr = makeArrayRef(DelInstrsSC);

				echristoUnsubmitted Not Done Reply Inline Actions "beneficial" echristo: "beneficial"
				// Compute new resource length
				hfinkelUnsubmitted Not Done Reply Inline Actions What "original code" are you referring to? Do you mean the code in DAGCombine? hfinkel: What "original code" are you referring to? Do you mean the code in DAGCombine?
				hfinkelUnsubmitted Not Done Reply Inline Actions replace -> replaced hfinkel: replace -> replaced
				unsigned ResLenAfterCombine =
				BlockTrace.getResourceLength(MBBarr, MSCInsArr, MSCDelArr);

				DEBUG(dbgs() << "RESOURCE DATA: \n";
				dbgs() << " resource len before: " << ResLenBeforeCombine
				<< " after: " << ResLenAfterCombine << "\n";);

				return ResLenAfterCombine <= ResLenBeforeCombine;
				}

				/// \returns true when new instruction sequence should be generated
				/// independent if it lenghtens critical path or not
				bool MachineCombiner::doSubstitute(unsigned NewSize, unsigned OldSize) {
				if (OptSize && (NewSize < OldSize))
				return true;
				if (TII->alwaysCombine())
				hfinkelUnsubmitted Not Done Reply Inline Actions This ordering should be mentioned in the header where genAlternativeCodeSequence is declared. hfinkel: This ordering should be mentioned in the header where genAlternativeCodeSequence is declared.
				return true;
				return false;
				}

				/// combineInstructions - substitute a slow code sequence with a faster one by
				/// evaluating instruction combining pattern.
				/// The prototype of such a pattern is MUl + ADD -> MADD. Performs instruction
				/// combining based on machine trace metrics. Only combine a sequence of
				/// instructions when this neither lengthens the critical path nor increases
				/// resource pressure. When optimizing for codesize always combine when the new
				/// sequence is shorter.
				bool MachineCombiner::combineInstructions(MachineBasicBlock *MBB) {
				bool Changed = false;
				DEBUG(dbgs() << "Combining MBB " << MBB->getName() << "\n");

				auto BlockIter = MBB->begin();

				while (BlockIter != MBB->end()) {
				auto &MI = *BlockIter++;

				echristoUnsubmitted Not Done Reply Inline Actions This conditional is a little hard to read - some way to break it up/hoist things out? echristo: This conditional is a little hard to read - some way to break it up/hoist things out?
				DEBUG(dbgs() << "INSTR "; MI.dump(); dbgs() << "\n";);
				SmallVector<MachineCombinerPattern::MC_PATTERN, 16> Pattern;
				// The motivating example is:
				//
				// MUL Other MUL_op1 MUL_op2 Other
				// \ / \ \| /
				echristoUnsubmitted Not Done Reply Inline Actions Extra space. echristo: Extra space.
				// ADD/SUB => MADD/MSUB
				// (=Root) (=NewRoot)

				// The DAGCombine code always replaced MUL + ADD/SUB by MADD. While this is
				// usually beneficial for code size it unfortunately can hurt performance
				// when the ADD is on the critical path, but the MUL is not. With the
				// substitution the MUL becomes part of the critical path (in form of the
				// MADD) and can lengthen it on architectures where the MADD latency is
				// longer than the ADD latency.
				//
				// For each instruction we check if it can be the root of a combiner
				// pattern. Then for each pattern the new code sequence in form of MI is
				// generated and evaluated. When the efficiency criteria (don't lengthen
				// critical path, don't use more resources) is met the new sequence gets
				// hooked up into the basic block before the old sequence is removed.
				//
				// The algorithm does not try to evaluate all patterns and pick the best.
				// This is only an artificial restriction though. In practice there is
				// mostly one pattern and hasPattern() can order patterns based on an
				// internal cost heuristic.

				if (TII->hasPattern(MI, Pattern)) {
				for (auto P : Pattern) {
				SmallVector<MachineInstr *, 16> InsInstrs;
				SmallVector<MachineInstr *, 16> DelInstrs;
				DenseMap<unsigned, unsigned> InstrIdxForVirtReg;
				if (!MinInstr)
				MinInstr = Traces->getEnsemble(MachineTraceMetrics::TS_MinInstrCount);
				MachineTraceMetrics::Trace BlockTrace = MinInstr->getTrace(MBB);
				Traces->verifyAnalysis();
				TII->genAlternativeCodeSequence(MI, P, InsInstrs, DelInstrs,
				InstrIdxForVirtReg);
				// Found pattern, but did not generate alternative sequence.
				// This can happen e.g. when an immediate could not be materialized
				// in a single instruction.
				if (!InsInstrs.size())
				continue;
				// Substitute when we optimize for codesize and the new sequence has
				// fewer instructions OR
				// the new sequence neither lenghten the critical path nor increases
				// resource pressure.
				if (doSubstitute(InsInstrs.size(), DelInstrs.size()) \|\|
				(preservesCriticalPathLen(MBB, &MI, BlockTrace, InsInstrs,
				InstrIdxForVirtReg) &&
				preservesResourceLen(MBB, BlockTrace, InsInstrs, DelInstrs))) {
				for (auto *InstrPtr : InsInstrs)
				MBB->insert((MachineBasicBlock::iterator) & MI,
				(MachineInstr *)InstrPtr);
				for (auto *InstrPtr : DelInstrs)
				InstrPtr->eraseFromParent();

				Changed = true;
				++NumInstCombined;

				Traces->invalidate(MBB);
				Traces->verifyAnalysis();
				// Eagerly stop after the first pattern fired
				break;
				} else {
				// Cleanup instructions of the alternative code sequence. There is no
				// use for them.
				for (auto *InstrPtr : InsInstrs) {
				MachineFunction *MF = MBB->getParent();
				MF->DeleteMachineInstr((MachineInstr *)InstrPtr);
				}
				}
				InstrIdxForVirtReg.clear();
				}
				}
				}

				return Changed;
				}

				bool MachineCombiner::runOnMachineFunction(MachineFunction &MF) {
				TII = MF.getTarget().getInstrInfo();
				TRI = MF.getTarget().getRegisterInfo();
				const TargetSubtargetInfo &STI =
				MF.getTarget().getSubtarget<TargetSubtargetInfo>();
				SchedModel = STI.getSchedModel();
				TSchedModel.init(*SchedModel, &STI, TII);
				MRI = &MF.getRegInfo();
				Traces = &getAnalysis<MachineTraceMetrics>();
				MinInstr = 0;

				OptSize = MF.getFunction()->getAttributes().hasAttribute(
				AttributeSet::FunctionIndex, Attribute::OptimizeForSize);

				DEBUG(dbgs() << getPassName() << ": " << MF.getName() << '\n');
				if (!TSchedModel.hasInstrSchedModel()) {
				DEBUG(dbgs() << " Skipping pass: no machine model available\n");
				return false;
				silviu.barangaUnsubmitted Not Done Reply Inline Actions Would it be better to always combine if there is no machine model (for example in case the cpu is not specified)? If we bail out here, we would no longer generate the patterns that are handled by this pass. silviu.baranga: Would it be better to always combine if there is no machine model (for example in case the cpu…
				}

				bool Changed = false;

				// Try to combine instructions.
				for (auto &MBB : MF)
				Changed \|= combineInstructions(&MBB);

				return Changed;
				}

lib/CodeGen/MachineScheduler.cpp

Show All 34 Lines

#define DEBUG_TYPE "misched"		#define DEBUG_TYPE "misched"

namespace llvm {		namespace llvm {
cl::opt<bool> ForceTopDown("misched-topdown", cl::Hidden,		cl::opt<bool> ForceTopDown("misched-topdown", cl::Hidden,
cl::desc("Force top-down list scheduling"));		cl::desc("Force top-down list scheduling"));
cl::opt<bool> ForceBottomUp("misched-bottomup", cl::Hidden,		cl::opt<bool> ForceBottomUp("misched-bottomup", cl::Hidden,
cl::desc("Force bottom-up list scheduling"));		cl::desc("Force bottom-up list scheduling"));
		cl::opt<bool>
		DumpCriticalPathLength("misched-dcpl", cl::Hidden,
		cl::desc("Print critical path length to stdout"));
}		}

#ifndef NDEBUG		#ifndef NDEBUG
static cl::opt<bool> ViewMISchedDAGs("view-misched-dags", cl::Hidden,		static cl::opt<bool> ViewMISchedDAGs("view-misched-dags", cl::Hidden,
cl::desc("Pop up a window to show MISched dags after they are processed"));		cl::desc("Pop up a window to show MISched dags after they are processed"));

static cl::opt<unsigned> MISchedCutoff("misched-cutoff", cl::Hidden,		static cl::opt<unsigned> MISchedCutoff("misched-cutoff", cl::Hidden,
cl::desc("Stop scheduling after N instructions"), cl::init(~0U));		cl::desc("Stop scheduling after N instructions"), cl::init(~0U));
▲ Show 20 Lines • Show All 395 Lines • ▼ Show 20 Lines	for(MachineBasicBlock::iterator RegionEnd = MBB->end();
<< "MI Scheduling **********\n");		<< "MI Scheduling **********\n");
DEBUG(dbgs() << MF->getName()		DEBUG(dbgs() << MF->getName()
<< ":BB#" << MBB->getNumber() << " " << MBB->getName()		<< ":BB#" << MBB->getNumber() << " " << MBB->getName()
<< "\n From: " << *I << " To: ";		<< "\n From: " << *I << " To: ";
if (RegionEnd != MBB->end()) dbgs() << *RegionEnd;		if (RegionEnd != MBB->end()) dbgs() << *RegionEnd;
else dbgs() << "End";		else dbgs() << "End";
dbgs() << " RegionInstrs: " << NumRegionInstrs		dbgs() << " RegionInstrs: " << NumRegionInstrs
<< " Remaining: " << RemainingInstrs << "\n");		<< " Remaining: " << RemainingInstrs << "\n");
		if (DumpCriticalPathLength) {
		errs() << MF->getName();
		errs() << ":BB# " << MBB->getNumber();
		errs() << " " << MBB->getName() << " \n";
		}

// Schedule a region: possibly reorder instructions.		// Schedule a region: possibly reorder instructions.
// This invalidates 'RegionEnd' and 'I'.		// This invalidates 'RegionEnd' and 'I'.
Scheduler.schedule();		Scheduler.schedule();

// Close the current region.		// Close the current region.
Scheduler.exitRegion();		Scheduler.exitRegion();

▲ Show 20 Lines • Show All 1,993 Lines • ▼ Show 20 Lines	void GenericScheduler::registerRoots() {
Rem.CriticalPath = DAG->ExitSU.getDepth();		Rem.CriticalPath = DAG->ExitSU.getDepth();

// Some roots may not feed into ExitSU. Check all of them in case.		// Some roots may not feed into ExitSU. Check all of them in case.
for (std::vector<SUnit*>::const_iterator		for (std::vector<SUnit*>::const_iterator
I = Bot.Available.begin(), E = Bot.Available.end(); I != E; ++I) {		I = Bot.Available.begin(), E = Bot.Available.end(); I != E; ++I) {
if ((*I)->getDepth() > Rem.CriticalPath)		if ((*I)->getDepth() > Rem.CriticalPath)
Rem.CriticalPath = (*I)->getDepth();		Rem.CriticalPath = (*I)->getDepth();
}		}
DEBUG(dbgs() << "Critical Path: " << Rem.CriticalPath << '\n');		DEBUG(dbgs() << "Critical Path(GS-RR ): " << Rem.CriticalPath << '\n');
		if (DumpCriticalPathLength) {
		errs() << "Critical Path(GS-RR ): " << Rem.CriticalPath << " \n";
		}

if (EnableCyclicPath) {		if (EnableCyclicPath) {
Rem.CyclicCritPath = DAG->computeCyclicCriticalPath();		Rem.CyclicCritPath = DAG->computeCyclicCriticalPath();
checkAcyclicLatency();		checkAcyclicLatency();
}		}
}		}

static bool tryPressure(const PressureChange &TryP,		static bool tryPressure(const PressureChange &TryP,
▲ Show 20 Lines • Show All 425 Lines • ▼ Show 20 Lines	void PostGenericScheduler::registerRoots() {
Rem.CriticalPath = DAG->ExitSU.getDepth();		Rem.CriticalPath = DAG->ExitSU.getDepth();

// Some roots may not feed into ExitSU. Check all of them in case.		// Some roots may not feed into ExitSU. Check all of them in case.
for (SmallVectorImpl<SUnit*>::const_iterator		for (SmallVectorImpl<SUnit*>::const_iterator
I = BotRoots.begin(), E = BotRoots.end(); I != E; ++I) {		I = BotRoots.begin(), E = BotRoots.end(); I != E; ++I) {
if ((*I)->getDepth() > Rem.CriticalPath)		if ((*I)->getDepth() > Rem.CriticalPath)
Rem.CriticalPath = (*I)->getDepth();		Rem.CriticalPath = (*I)->getDepth();
}		}
DEBUG(dbgs() << "Critical Path: " << Rem.CriticalPath << '\n');		DEBUG(dbgs() << "Critical Path: (PGS-RR) " << Rem.CriticalPath << '\n');
		if (DumpCriticalPathLength) {
		errs() << "Critical Path(PGS-RR ): " << Rem.CriticalPath << " \n";
		}
}		}

/// Apply a set of heursitics to a new candidate for PostRA scheduling.		/// Apply a set of heursitics to a new candidate for PostRA scheduling.
///		///
/// \param Cand provides the policy and current best candidate.		/// \param Cand provides the policy and current best candidate.
/// \param TryCand refers to the next SUnit candidate, otherwise uninitialized.		/// \param TryCand refers to the next SUnit candidate, otherwise uninitialized.
void PostGenericScheduler::tryCandidate(SchedCandidate &Cand,		void PostGenericScheduler::tryCandidate(SchedCandidate &Cand,
SchedCandidate &TryCand) {		SchedCandidate &TryCand) {
▲ Show 20 Lines • Show All 381 Lines • Show Last 20 Lines

lib/CodeGen/MachineTraceMetrics.cpp

Show First 20 Lines • Show All 1,163 Lines • ▼ Show 20 Lines	MachineTraceMetrics::Trace::getPHIDepth(const MachineInstr *PHI) const {
unsigned DepCycle = getInstrCycles(Dep.DefMI).Depth;		unsigned DepCycle = getInstrCycles(Dep.DefMI).Depth;
// Add latency if DefMI is a real instruction. Transients get latency 0.		// Add latency if DefMI is a real instruction. Transients get latency 0.
if (!Dep.DefMI->isTransient())		if (!Dep.DefMI->isTransient())
DepCycle += TE.MTM.SchedModel		DepCycle += TE.MTM.SchedModel
.computeOperandLatency(Dep.DefMI, Dep.DefOp, PHI, Dep.UseOp);		.computeOperandLatency(Dep.DefMI, Dep.DefOp, PHI, Dep.UseOp);
return DepCycle;		return DepCycle;
}		}

		/// When bottom is set include instructions in current block in estimate.
unsigned MachineTraceMetrics::Trace::getResourceDepth(bool Bottom) const {		unsigned MachineTraceMetrics::Trace::getResourceDepth(bool Bottom) const {
// Find the limiting processor resource.		// Find the limiting processor resource.
// Numbers have been pre-scaled to be comparable.		// Numbers have been pre-scaled to be comparable.
unsigned PRMax = 0;		unsigned PRMax = 0;
ArrayRef<unsigned> PRDepths = TE.getProcResourceDepths(getBlockNum());		ArrayRef<unsigned> PRDepths = TE.getProcResourceDepths(getBlockNum());
if (Bottom) {		if (Bottom) {
ArrayRef<unsigned> PRCycles = TE.MTM.getProcResourceCycles(getBlockNum());		ArrayRef<unsigned> PRCycles = TE.MTM.getProcResourceCycles(getBlockNum());
for (unsigned K = 0; K != PRDepths.size(); ++K)		for (unsigned K = 0; K != PRDepths.size(); ++K)
PRMax = std::max(PRMax, PRDepths[K] + PRCycles[K]);		PRMax = std::max(PRMax, PRDepths[K] + PRCycles[K]);
} else {		} else {
for (unsigned K = 0; K != PRDepths.size(); ++K)		for (unsigned K = 0; K != PRDepths.size(); ++K)
PRMax = std::max(PRMax, PRDepths[K]);		PRMax = std::max(PRMax, PRDepths[K]);
}		}
// Convert to cycle count.		// Convert to cycle count.
PRMax = TE.MTM.getCycles(PRMax);		PRMax = TE.MTM.getCycles(PRMax);

		/// All instructions before current block
unsigned Instrs = TBI.InstrDepth;		unsigned Instrs = TBI.InstrDepth;
		// plus instructions in current block
if (Bottom)		if (Bottom)
Instrs += TE.MTM.BlockInfo[getBlockNum()].InstrCount;		Instrs += TE.MTM.BlockInfo[getBlockNum()].InstrCount;
if (unsigned IW = TE.MTM.SchedModel.getIssueWidth())		if (unsigned IW = TE.MTM.SchedModel.getIssueWidth())
Instrs /= IW;		Instrs /= IW;
// Assume issue width 1 without a schedule model.		// Assume issue width 1 without a schedule model.
return std::max(Instrs, PRMax);		return std::max(Instrs, PRMax);
}		}

		unsigned MachineTraceMetrics::Trace::getResourceLength(
unsigned MachineTraceMetrics::Trace::		ArrayRef<const MachineBasicBlock *> Extrablocks,
getResourceLength(ArrayRef<const MachineBasicBlock*> Extrablocks,		ArrayRef<const MCSchedClassDesc *> ExtraInstrs,
ArrayRef<const MCSchedClassDesc*> ExtraInstrs) const {		ArrayRef<const MCSchedClassDesc *> RemoveInstrs) const {
// Add up resources above and below the center block.		// Add up resources above and below the center block.
ArrayRef<unsigned> PRDepths = TE.getProcResourceDepths(getBlockNum());		ArrayRef<unsigned> PRDepths = TE.getProcResourceDepths(getBlockNum());
ArrayRef<unsigned> PRHeights = TE.getProcResourceHeights(getBlockNum());		ArrayRef<unsigned> PRHeights = TE.getProcResourceHeights(getBlockNum());
unsigned PRMax = 0;		unsigned PRMax = 0;
for (unsigned K = 0; K != PRDepths.size(); ++K) {
unsigned PRCycles = PRDepths[K] + PRHeights[K];		// Capture computing cycles from extra instructions
for (unsigned I = 0; I != Extrablocks.size(); ++I)		auto extraCycles = [this](ArrayRef<const MCSchedClassDesc *> Instrs,
PRCycles += TE.MTM.getProcResourceCycles(Extrablocks[I]->getNumber())[K];		unsigned ResourceIdx)
for (unsigned I = 0; I != ExtraInstrs.size(); ++I) {		->unsigned {
const MCSchedClassDesc* SC = ExtraInstrs[I];		unsigned Cycles = 0;
		for (unsigned I = 0; I != Instrs.size(); ++I) {
		const MCSchedClassDesc *SC = Instrs[I];
if (!SC->isValid())		if (!SC->isValid())
continue;		continue;
for (TargetSchedModel::ProcResIter		for (TargetSchedModel::ProcResIter
PI = TE.MTM.SchedModel.getWriteProcResBegin(SC),		PI = TE.MTM.SchedModel.getWriteProcResBegin(SC),
PE = TE.MTM.SchedModel.getWriteProcResEnd(SC); PI != PE; ++PI) {		PE = TE.MTM.SchedModel.getWriteProcResEnd(SC);
if (PI->ProcResourceIdx != K)		PI != PE; ++PI) {
		if (PI->ProcResourceIdx != ResourceIdx)
continue;		continue;
PRCycles += (PI->Cycles * TE.MTM.SchedModel.getResourceFactor(K));		Cycles +=
		(PI->Cycles * TE.MTM.SchedModel.getResourceFactor(ResourceIdx));
}		}
}		}
		return Cycles;
		silviu.barangaUnsubmitted Not Done Reply Inline Actions This duplicates (almost) the code above. I think these should be merged. silviu.baranga: This duplicates (almost) the code above. I think these should be merged.
		};

		for (unsigned K = 0; K != PRDepths.size(); ++K) {
		unsigned PRCycles = PRDepths[K] + PRHeights[K];
		for (unsigned I = 0; I != Extrablocks.size(); ++I)
		PRCycles += TE.MTM.getProcResourceCycles(Extrablocks[I]->getNumber())[K];
		PRCycles += extraCycles(ExtraInstrs, K);
		PRCycles -= extraCycles(RemoveInstrs, K);
PRMax = std::max(PRMax, PRCycles);		PRMax = std::max(PRMax, PRCycles);
}		}
// Convert to cycle count.		// Convert to cycle count.
PRMax = TE.MTM.getCycles(PRMax);		PRMax = TE.MTM.getCycles(PRMax);

		// Instrs: #instructions in current trace outside current block.
unsigned Instrs = TBI.InstrDepth + TBI.InstrHeight;		unsigned Instrs = TBI.InstrDepth + TBI.InstrHeight;
		// Add instruction count from the extra blocks.
for (unsigned i = 0, e = Extrablocks.size(); i != e; ++i)		for (unsigned i = 0, e = Extrablocks.size(); i != e; ++i)
Instrs += TE.MTM.getResources(Extrablocks[i])->InstrCount;		Instrs += TE.MTM.getResources(Extrablocks[i])->InstrCount;
		Instrs += ExtraInstrs.size();
		silviu.barangaUnsubmitted Not Done Reply Inline Actions This FIXME comment now seems confusing since you already added the fix. silviu.baranga: This FIXME comment now seems confusing since you already added the fix.
		Instrs -= RemoveInstrs.size();
if (unsigned IW = TE.MTM.SchedModel.getIssueWidth())		if (unsigned IW = TE.MTM.SchedModel.getIssueWidth())
Instrs /= IW;		Instrs /= IW;
		hfinkelUnsubmitted Not Done Reply Inline Actions Exactly what are you proposing? hfinkel: Exactly what are you proposing?
// Assume issue width 1 without a schedule model.		// Assume issue width 1 without a schedule model.
return std::max(Instrs, PRMax);		return std::max(Instrs, PRMax);
}		}

		bool MachineTraceMetrics::Trace::isDepInTrace(const MachineInstr *DefMI,
		const MachineInstr *UseMI) const {
		if (DefMI->getParent() == UseMI->getParent())
		return true;

		const TraceBlockInfo &DepTBI = TE.BlockInfo[DefMI->getParent()->getNumber()];
		const TraceBlockInfo &TBI = TE.BlockInfo[UseMI->getParent()->getNumber()];

		return DepTBI.isUsefulDominator(TBI);
		}

void MachineTraceMetrics::Ensemble::print(raw_ostream &OS) const {		void MachineTraceMetrics::Ensemble::print(raw_ostream &OS) const {
OS << getName() << " ensemble:\n";		OS << getName() << " ensemble:\n";
for (unsigned i = 0, e = BlockInfo.size(); i != e; ++i) {		for (unsigned i = 0, e = BlockInfo.size(); i != e; ++i) {
OS << " BB#" << i << '\t';		OS << " BB#" << i << '\t';
BlockInfo[i].print(OS);		BlockInfo[i].print(OS);
OS << '\n';		OS << '\n';
}		}
}		}
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

lib/CodeGen/TargetSchedule.cpp

Show First 20 Lines • Show All 219 Lines • ▼ Show 20 Lines	#ifndef NDEBUG
}		}
#endif		#endif
// FIXME: Automatically giving all implicit defs defaultDefLatency is		// FIXME: Automatically giving all implicit defs defaultDefLatency is
// undesirable. We should only do it for defs that are known to the MC		// undesirable. We should only do it for defs that are known to the MC
// desc like flags. Truly implicit defs should get 1 cycle latency.		// desc like flags. Truly implicit defs should get 1 cycle latency.
return DefMI->isTransient() ? 0 : TII->defaultDefLatency(&SchedModel, DefMI);		return DefMI->isTransient() ? 0 : TII->defaultDefLatency(&SchedModel, DefMI);
}		}

		unsigned TargetSchedModel::computeInstrLatency(unsigned Opcode) const {
		assert(hasInstrSchedModel() && "Only call this function with a SchedModel");

		unsigned SCIdx = TII->get(Opcode).getSchedClass();
		const MCSchedClassDesc *SCDesc = SchedModel.getSchedClassDesc(SCIdx);
		unsigned Latency = 0;

		if (SCDesc->isValid() && !SCDesc->isVariant()) {
		for (unsigned DefIdx = 0, DefEnd = SCDesc->NumWriteLatencyEntries;
		DefIdx != DefEnd; ++DefIdx) {
		// Lookup the definition's write latency in SubtargetInfo.
		const MCWriteLatencyEntry *WLEntry =
		STI->getWriteLatencyEntry(SCDesc, DefIdx);
		Latency = std::max(Latency, capLatency(WLEntry->Cycles));
		}
		return Latency;
		}

		assert(Latency && "No MI sched latency");
		return 0;
		}

unsigned		unsigned
TargetSchedModel::computeInstrLatency(const MachineInstr *MI,		TargetSchedModel::computeInstrLatency(const MachineInstr *MI,
bool UseDefaultDefLatency) const {		bool UseDefaultDefLatency) const {
// For the itinerary model, fall back to the old subtarget hook.		// For the itinerary model, fall back to the old subtarget hook.
// Allow subtargets to compute Bundle latencies outside the machine model.		// Allow subtargets to compute Bundle latencies outside the machine model.
if (hasInstrItineraries() \|\| MI->isBundle() \|\|		if (hasInstrItineraries() \|\| MI->isBundle() \|\|
(!hasInstrSchedModel() && !UseDefaultDefLatency))		(!hasInstrSchedModel() && !UseDefaultDefLatency))
return TII->getInstrLatency(&InstrItins, MI);		return TII->getInstrLatency(&InstrItins, MI);
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64InstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,345 Lines • ▼ Show 20 Lines	class BaseMulAccum<bit isSub, bits<3> opc, RegisterClass multype,
let Inst{20-16} = Rm;		let Inst{20-16} = Rm;
let Inst{15} = isSub;		let Inst{15} = isSub;
let Inst{14-10} = Ra;		let Inst{14-10} = Ra;
let Inst{9-5} = Rn;		let Inst{9-5} = Rn;
let Inst{4-0} = Rd;		let Inst{4-0} = Rd;
}		}

multiclass MulAccum<bit isSub, string asm, SDNode AccNode> {		multiclass MulAccum<bit isSub, string asm, SDNode AccNode> {
		// MADD/MSUB generation is decided by MachineCombiner.cpp
def Wrrr : BaseMulAccum<isSub, 0b000, GPR32, GPR32, asm,		def Wrrr : BaseMulAccum<isSub, 0b000, GPR32, GPR32, asm,
[(set GPR32:$Rd, (AccNode GPR32:$Ra, (mul GPR32:$Rn, GPR32:$Rm)))]>,		[/(set GPR32:$Rd, (AccNode GPR32:$Ra, (mul GPR32:$Rn, GPR32:$Rm)))/]>,
		silviu.barangaUnsubmitted Not Done Reply Inline Actions Doing this will disable MADD/MSUB generation for in-order cores (for example Cortex-A53). Maybe guard this with a predicate? silviu.baranga: Doing this will disable MADD/MSUB generation for in-order cores (for example Cortex-A53). Maybe…
Sched<[WriteIM32, ReadIM, ReadIM, ReadIMA]> {		Sched<[WriteIM32, ReadIM, ReadIM, ReadIMA]> {
let Inst{31} = 0;		let Inst{31} = 0;
}		}

def Xrrr : BaseMulAccum<isSub, 0b000, GPR64, GPR64, asm,		def Xrrr : BaseMulAccum<isSub, 0b000, GPR64, GPR64, asm,
[(set GPR64:$Rd, (AccNode GPR64:$Ra, (mul GPR64:$Rn, GPR64:$Rm)))]>,		[/(set GPR64:$Rd, (AccNode GPR64:$Ra, (mul GPR64:$Rn, GPR64:$Rm)))/]>,
Sched<[WriteIM64, ReadIM, ReadIM, ReadIMA]> {		Sched<[WriteIM64, ReadIM, ReadIM, ReadIMA]> {
let Inst{31} = 1;		let Inst{31} = 1;
}		}
}		}

class WideMulAccum<bit isSub, bits<3> opc, string asm,		class WideMulAccum<bit isSub, bits<3> opc, string asm,
SDNode AccNode, SDNode ExtNode>		SDNode AccNode, SDNode ExtNode>
: BaseMulAccum<isSub, opc, GPR32, GPR64, asm,		: BaseMulAccum<isSub, opc, GPR32, GPR64, asm,
▲ Show 20 Lines • Show All 7,256 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64InstrInfo.h

Show All 11 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TARGET_AArch64INSTRINFO_H		#ifndef LLVM_TARGET_AArch64INSTRINFO_H
#define LLVM_TARGET_AArch64INSTRINFO_H		#define LLVM_TARGET_AArch64INSTRINFO_H

#include "AArch64.h"		#include "AArch64.h"
#include "AArch64RegisterInfo.h"		#include "AArch64RegisterInfo.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"
		#include "llvm/CodeGen/MachineCombinerPattern.h"

#define GET_INSTRINFO_HEADER		#define GET_INSTRINFO_HEADER
#include "AArch64GenInstrInfo.inc"		#include "AArch64GenInstrInfo.inc"

namespace llvm {		namespace llvm {

class AArch64Subtarget;		class AArch64Subtarget;
class AArch64TargetMachine;		class AArch64TargetMachine;
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	public:
bool analyzeCompare(const MachineInstr *MI, unsigned &SrcReg,		bool analyzeCompare(const MachineInstr *MI, unsigned &SrcReg,
unsigned &SrcReg2, int &CmpMask,		unsigned &SrcReg2, int &CmpMask,
int &CmpValue) const override;		int &CmpValue) const override;
/// optimizeCompareInstr - Convert the instruction supplying the argument to		/// optimizeCompareInstr - Convert the instruction supplying the argument to
/// the comparison into one that sets the zero bit in the flags register.		/// the comparison into one that sets the zero bit in the flags register.
bool optimizeCompareInstr(MachineInstr *CmpInstr, unsigned SrcReg,		bool optimizeCompareInstr(MachineInstr *CmpInstr, unsigned SrcReg,
unsigned SrcReg2, int CmpMask, int CmpValue,		unsigned SrcReg2, int CmpMask, int CmpValue,
const MachineRegisterInfo *MRI) const override;		const MachineRegisterInfo *MRI) const override;
		/// hasPattern - return true when there is potentially a faster code sequence
		/// for an instruction chain ending in <Root>. All potential patterns are
		echristoUnsubmitted Not Done Reply Inline Actions "All potential patterns are..." echristo: "All potential patterns are..."
		/// listed
		/// in the <Pattern> array.
		virtual bool hasPattern(
		MachineInstr &Root,
		SmallVectorImpl<MachineCombinerPattern::MC_PATTERN> &Pattern) const;

		/// genAlternativeCodeSequence - when hasPattern() finds a pattern
		/// this function generates the instructions that could replace the
		/// original code sequence
		virtual void genAlternativeCodeSequence(
		MachineInstr &Root, MachineCombinerPattern::MC_PATTERN P,
		SmallVectorImpl<MachineInstr *> &InsInstrs,
		SmallVectorImpl<MachineInstr *> &DelInstrs,
		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const;

private:		private:
void instantiateCondBranch(MachineBasicBlock &MBB, DebugLoc DL,		void instantiateCondBranch(MachineBasicBlock &MBB, DebugLoc DL,
MachineBasicBlock *TBB,		MachineBasicBlock *TBB,
const SmallVectorImpl<MachineOperand> &Cond) const;		const SmallVectorImpl<MachineOperand> &Cond) const;
};		};

/// emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg		/// emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64InstrInfo.cpp

//===- AArch64InstrInfo.cpp - AArch64 Instruction Information -------------===//		//===- AArch64InstrInfo.cpp - AArch64 Instruction Information -------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file contains the AArch64 implementation of the TargetInstrInfo class.		// This file contains the AArch64 implementation of the TargetInstrInfo class.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AArch64InstrInfo.h"		#include "AArch64InstrInfo.h"
#include "AArch64Subtarget.h"		#include "AArch64Subtarget.h"
#include "MCTargetDesc/AArch64AddressingModes.h"		#include "MCTargetDesc/AArch64AddressingModes.h"
		#include "AArch64MachineCombinerPattern.h"
		silviu.barangaUnsubmitted Not Done Reply Inline Actions This header file is missing from the review. silviu.baranga: This header file is missing from the review.
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineMemOperand.h"		#include "llvm/CodeGen/MachineMemOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/PseudoSourceValue.h"		#include "llvm/CodeGen/PseudoSourceValue.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
▲ Show 20 Lines • Show All 622 Lines • ▼ Show 20 Lines	for (unsigned OpIdx = 0, EndIdx = Instr->getNumOperands(); OpIdx < EndIdx;
} else if (!OpRegCstraints->hasSubClassEq(MRI->getRegClass(Reg)) &&		} else if (!OpRegCstraints->hasSubClassEq(MRI->getRegClass(Reg)) &&
!MRI->constrainRegClass(Reg, OpRegCstraints))		!MRI->constrainRegClass(Reg, OpRegCstraints))
return false;		return false;
}		}

return true;		return true;
}		}

/// optimizeCompareInstr - Convert the instruction supplying the argument to the		/// convertFlagSettingOpcode - return opcode that does not
/// comparison into one that sets the zero bit in the flags register.		/// set flags when possible. The caller is responsible to do
bool AArch64InstrInfo::optimizeCompareInstr(		/// the actual substitution and legality checking.
MachineInstr *CmpInstr, unsigned SrcReg, unsigned SrcReg2, int CmpMask,		static unsigned convertFlagSettingOpcode(MachineInstr *MI) {
int CmpValue, const MachineRegisterInfo *MRI) const {

// Replace SUBSWrr with SUBWrr if NZCV is not used.
int Cmp_NZCV = CmpInstr->findRegisterDefOperandIdx(AArch64::NZCV, true);
if (Cmp_NZCV != -1) {
unsigned NewOpc;		unsigned NewOpc;
switch (CmpInstr->getOpcode()) {		switch (MI->getOpcode()) {
default:		default:
return false;		return false;
case AArch64::ADDSWrr: NewOpc = AArch64::ADDWrr; break;		case AArch64::ADDSWrr: NewOpc = AArch64::ADDWrr; break;
case AArch64::ADDSWri: NewOpc = AArch64::ADDWri; break;		case AArch64::ADDSWri: NewOpc = AArch64::ADDWri; break;
case AArch64::ADDSWrs: NewOpc = AArch64::ADDWrs; break;		case AArch64::ADDSWrs: NewOpc = AArch64::ADDWrs; break;
case AArch64::ADDSWrx: NewOpc = AArch64::ADDWrx; break;		case AArch64::ADDSWrx: NewOpc = AArch64::ADDWrx; break;
case AArch64::ADDSXrr: NewOpc = AArch64::ADDXrr; break;		case AArch64::ADDSXrr: NewOpc = AArch64::ADDXrr; break;
case AArch64::ADDSXri: NewOpc = AArch64::ADDXri; break;		case AArch64::ADDSXri: NewOpc = AArch64::ADDXri; break;
case AArch64::ADDSXrs: NewOpc = AArch64::ADDXrs; break;		case AArch64::ADDSXrs: NewOpc = AArch64::ADDXrs; break;
case AArch64::ADDSXrx: NewOpc = AArch64::ADDXrx; break;		case AArch64::ADDSXrx: NewOpc = AArch64::ADDXrx; break;
case AArch64::SUBSWrr: NewOpc = AArch64::SUBWrr; break;		case AArch64::SUBSWrr: NewOpc = AArch64::SUBWrr; break;
case AArch64::SUBSWri: NewOpc = AArch64::SUBWri; break;		case AArch64::SUBSWri: NewOpc = AArch64::SUBWri; break;
case AArch64::SUBSWrs: NewOpc = AArch64::SUBWrs; break;		case AArch64::SUBSWrs: NewOpc = AArch64::SUBWrs; break;
case AArch64::SUBSWrx: NewOpc = AArch64::SUBWrx; break;		case AArch64::SUBSWrx: NewOpc = AArch64::SUBWrx; break;
case AArch64::SUBSXrr: NewOpc = AArch64::SUBXrr; break;		case AArch64::SUBSXrr: NewOpc = AArch64::SUBXrr; break;
case AArch64::SUBSXri: NewOpc = AArch64::SUBXri; break;		case AArch64::SUBSXri: NewOpc = AArch64::SUBXri; break;
case AArch64::SUBSXrs: NewOpc = AArch64::SUBXrs; break;		case AArch64::SUBSXrs: NewOpc = AArch64::SUBXrs; break;
case AArch64::SUBSXrx: NewOpc = AArch64::SUBXrx; break;		case AArch64::SUBSXrx: NewOpc = AArch64::SUBXrx; break;
}		}
		return NewOpc;
		}

		/// optimizeCompareInstr - Convert the instruction supplying the argument to the
		/// comparison into one that sets the zero bit in the flags register.
		bool AArch64InstrInfo::optimizeCompareInstr(
		hfinkelUnsubmitted Not Done Reply Inline Actions Is this a separable (or unrelated) change? hfinkel: Is this a separable (or unrelated) change?
		MachineInstr *CmpInstr, unsigned SrcReg, unsigned SrcReg2, int CmpMask,
		int CmpValue, const MachineRegisterInfo *MRI) const {

		// Replace SUBSWrr with SUBWrr if NZCV is not used.
		int Cmp_NZCV = CmpInstr->findRegisterDefOperandIdx(AArch64::NZCV, true);
		if (Cmp_NZCV != -1) {
		unsigned Opc = CmpInstr->getOpcode();
		unsigned NewOpc = convertFlagSettingOpcode(CmpInstr);
		if (NewOpc == Opc)
		return false;
const MCInstrDesc &MCID = get(NewOpc);		const MCInstrDesc &MCID = get(NewOpc);
CmpInstr->setDesc(MCID);		CmpInstr->setDesc(MCID);
CmpInstr->RemoveOperand(Cmp_NZCV);		CmpInstr->RemoveOperand(Cmp_NZCV);
bool succeeded = UpdateOperandRegClass(CmpInstr);		bool succeeded = UpdateOperandRegClass(CmpInstr);
(void)succeeded;		(void)succeeded;
assert(succeeded && "Some operands reg class are incompatible!");		assert(succeeded && "Some operands reg class are incompatible!");
return true;		return true;
}		}
▲ Show 20 Lines • Show All 1,388 Lines • ▼ Show 20 Lines	bool llvm::rewriteAArch64FrameIndex(MachineInstr &MI, unsigned FrameRegIdx,

return false;		return false;
}		}

void AArch64InstrInfo::getNoopForMachoTarget(MCInst &NopInst) const {		void AArch64InstrInfo::getNoopForMachoTarget(MCInst &NopInst) const {
NopInst.setOpcode(AArch64::HINT);		NopInst.setOpcode(AArch64::HINT);
NopInst.addOperand(MCOperand::CreateImm(0));		NopInst.addOperand(MCOperand::CreateImm(0));
}		}
		//
		// True when Opc sets flag
		static bool isCombineInstrSettingFlag(unsigned Opc) {
		echristoUnsubmitted Not Done Reply Inline Actions The name here is a bit limiting. What if you want to combine something else in the future? Same with the rest of these helpers. echristo: The name here is a bit limiting. What if you want to combine something else in the future? Same…
		switch (Opc) {
		case AArch64::ADDSWrr:
		case AArch64::ADDSWri:
		case AArch64::ADDSXrr:
		case AArch64::ADDSXri:
		case AArch64::SUBSWrr:
		case AArch64::SUBSXrr:
		// Note: MSUB Wd,Wn,Wm,Wi -> Wd = Wi - WnxWm, not Wd=WnxWm - Wi.
		case AArch64::SUBSWri:
		case AArch64::SUBSXri:
		return true;
		default:
		break;
		}
		return false;
		}
		//
		// 32b Opcodes that can be combined with a MUL
		static bool isCombineInstrCandidate32(unsigned Opc) {
		switch (Opc) {
		case AArch64::ADDWrr:
		case AArch64::ADDWri:
		case AArch64::SUBWrr:
		case AArch64::ADDSWrr:
		case AArch64::ADDSWri:
		case AArch64::SUBSWrr:
		// Note: MSUB Wd,Wn,Wm,Wi -> Wd = Wi - WnxWm, not Wd=WnxWm - Wi.
		case AArch64::SUBWri:
		case AArch64::SUBSWri:
		return true;
		default:
		break;
		}
		return false;
		}
		//
		// 64b Opcodes that can be combined with a MUL
		static bool isCombineInstrCandidate64(unsigned Opc) {
		switch (Opc) {
		case AArch64::ADDXrr:
		case AArch64::ADDXri:
		case AArch64::SUBXrr:
		case AArch64::ADDSXrr:
		case AArch64::ADDSXri:
		case AArch64::SUBSXrr:
		// Note: MSUB Wd,Wn,Wm,Wi -> Wd = Wi - WnxWm, not Wd=WnxWm - Wi.
		case AArch64::SUBXri:
		case AArch64::SUBSXri:
		return true;
		default:
		break;
		}
		return false;
		}
		//
		// Opcodes that can be combined with a MUL
		static bool isCombineInstrCandidate(unsigned Opc) {
		return (isCombineInstrCandidate32(Opc) \|\| isCombineInstrCandidate64(Opc));
		}

		static bool canCombineWithMUL(MachineBasicBlock &MBB, MachineOperand &MO,
		unsigned MulOpc, unsigned ZeroReg) {
		MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
		MachineInstr *MI = nullptr;
		// We need a virtual register definition.
		if (MO.isReg() && TargetRegisterInfo::isVirtualRegister(MO.getReg()))
		MI = MRI.getUniqueVRegDef(MO.getReg());
		// And it needs to be in the trace (otherwise, it won't have a depth).
		if (!MI \|\| MI->getParent() != &MBB \|\| (unsigned)MI->getOpcode() != MulOpc)
		echristoUnsubmitted Not Done Reply Inline Actions Unnecessary cast? echristo: Unnecessary cast?
		return false;

		assert(MI->getNumOperands() >= 4 && MI->getOperand(0).isReg() &&
		MI->getOperand(1).isReg() && MI->getOperand(2).isReg() &&
		MI->getOperand(3).isReg() && "MAdd/MSub must have a least 4 regs");

		// The third input reg must be zero.
		if (MI->getOperand(3).getReg() != ZeroReg)
		return false;

		// Must only used by the user we combine with.
		if (!MRI.hasOneNonDBGUse(MI->getOperand(0).getReg()))
		return false;

		return true;
		}

		/// hasPattern - return true when there is potentially a faster code sequence
		/// for an instruction chain ending in \p Root. All potential patterns are
		echristoUnsubmitted Not Done Reply Inline Actions "All potential patterns are..." echristo: "All potential patterns are..."
		/// listed
		/// in the \p Pattern vector. Pattern should be sorted in priority order since
		/// the pattern evaluator stops checking as soon as it finds a faster sequence.

		echristoUnsubmitted Not Done Reply Inline Actions This code looks like it could be written ala include/llvm/IR/PatternMatch.h? echristo: This code looks like it could be written ala include/llvm/IR/PatternMatch.h?
		bool AArch64InstrInfo::hasPattern(
		MachineInstr &Root,
		SmallVectorImpl<MachineCombinerPattern::MC_PATTERN> &Pattern) const {
		unsigned Opc = Root.getOpcode();
		MachineBasicBlock &MBB = *Root.getParent();
		bool Found = false;

		if (!isCombineInstrCandidate(Opc))
		return 0;
		if (isCombineInstrSettingFlag(Opc)) {
		int Cmp_NZCV = Root.findRegisterDefOperandIdx(AArch64::NZCV, true);
		// When NZCV is live bail out.
		if (Cmp_NZCV == -1)
		return 0;
		unsigned NewOpc = convertFlagSettingOpcode(&Root);
		// When opcode can't change bail out.
		// CHECKME: do we miss any cases for opcode conversion?
		if (NewOpc == Opc)
		return 0;
		Opc = NewOpc;
		}

		switch (Opc) {
		default:
		break;
		case AArch64::ADDWrr:
		assert(Root.getOperand(1).isReg() && Root.getOperand(2).isReg() &&
		"ADDWrr does not have register operands");
		if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDWrrr,
		AArch64::WZR)) {
		Pattern.push_back(MachineCombinerPattern::MC_MULADDW_OP1);
		Found = true;
		}
		if (canCombineWithMUL(MBB, Root.getOperand(2), AArch64::MADDWrrr,
		AArch64::WZR)) {
		Pattern.push_back(MachineCombinerPattern::MC_MULADDW_OP2);
		Found = true;
		}
		break;
		case AArch64::ADDXrr:
		if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDXrrr,
		AArch64::XZR)) {
		Pattern.push_back(MachineCombinerPattern::MC_MULADDX_OP1);
		Found = true;
		}
		if (canCombineWithMUL(MBB, Root.getOperand(2), AArch64::MADDXrrr,
		AArch64::XZR)) {
		Pattern.push_back(MachineCombinerPattern::MC_MULADDX_OP2);
		Found = true;
		}
		break;
		case AArch64::SUBWrr:
		if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDWrrr,
		AArch64::WZR)) {
		Pattern.push_back(MachineCombinerPattern::MC_MULSUBW_OP1);
		Found = true;
		}
		if (canCombineWithMUL(MBB, Root.getOperand(2), AArch64::MADDWrrr,
		AArch64::WZR)) {
		Pattern.push_back(MachineCombinerPattern::MC_MULSUBW_OP2);
		Found = true;
		}
		break;
		case AArch64::SUBXrr:
		if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDXrrr,
		AArch64::XZR)) {
		Pattern.push_back(MachineCombinerPattern::MC_MULSUBX_OP1);
		Found = true;
		}
		if (canCombineWithMUL(MBB, Root.getOperand(2), AArch64::MADDXrrr,
		AArch64::XZR)) {
		Pattern.push_back(MachineCombinerPattern::MC_MULSUBX_OP2);
		Found = true;
		}
		break;
		case AArch64::ADDWri:
		if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDWrrr,
		AArch64::WZR)) {
		Pattern.push_back(MachineCombinerPattern::MC_MULADDWI_OP1);
		Found = true;
		}
		break;
		case AArch64::ADDXri:
		if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDXrrr,
		AArch64::XZR)) {
		Pattern.push_back(MachineCombinerPattern::MC_MULADDXI_OP1);
		Found = true;
		}
		break;
		case AArch64::SUBWri:
		if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDWrrr,
		AArch64::WZR)) {
		Pattern.push_back(MachineCombinerPattern::MC_MULSUBWI_OP1);
		Found = true;
		}
		break;
		case AArch64::SUBXri:
		if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDXrrr,
		AArch64::XZR)) {
		Pattern.push_back(MachineCombinerPattern::MC_MULSUBXI_OP1);
		Found = true;
		}
		break;
		}
		return Found;
		}

		/// genMadd - Generate madd instruction and combine mul and add.
		/// Example:
		echristoUnsubmitted Not Done Reply Inline Actions Documentation for these functions describing the incoming variables, constraints, etc. echristo: Documentation for these functions describing the incoming variables, constraints, etc.
		/// MUL I=A,B,0
		/// ADD R,I,C
		/// ==> MADD R,A,B,C
		/// \param Root is the ADD instruction
		/// \param [out] InsInstr is a vector of machine instructions and will
		/// contain the generated madd instruction
		/// \param IdxMulOpd is index of operand in Root that is the result of
		/// the MUL. In the example above IdxMulOpd is 1.
		/// \param MaddOpc the opcode fo the madd instruction
		static MachineInstr *genMadd(MachineFunction &MF, MachineRegisterInfo &MRI,
		const TargetInstrInfo *TII, MachineInstr &Root,
		SmallVectorImpl<MachineInstr *> &InsInstrs,
		unsigned IdxMulOpd, unsigned MaddOpc) {
		assert(IdxMulOpd == 1 \|\| IdxMulOpd == 2);

		unsigned IdxOtherOpd = IdxMulOpd == 1 ? 2 : 1;
		MachineInstr *MUL = MRI.getUniqueVRegDef(Root.getOperand(IdxMulOpd).getReg());
		MachineOperand R = Root.getOperand(0);
		MachineOperand A = MUL->getOperand(1);
		MachineOperand B = MUL->getOperand(2);
		MachineOperand C = Root.getOperand(IdxOtherOpd);
		MachineInstrBuilder MIB = BuildMI(MF, Root.getDebugLoc(), TII->get(MaddOpc))
		.addOperand(R)
		.addOperand(A)
		.addOperand(B)
		.addOperand(C);
		// Insert the MADD
		InsInstrs.push_back(MIB);
		return MUL;
		}

		/// genMaddR - Generate madd instruction and combine mul and add using
		/// an extra virtual register
		/// Example - an ADD intermediate needs to be stored in a register:
		/// MUL I=A,B,0
		/// ADD R,I,Imm
		/// ==> ORR V, ZR, Imm
		/// ==> MADD R,A,B,V
		/// \param Root is the ADD instruction
		/// \param [out] InsInstr is a vector of machine instructions and will
		/// contain the generated madd instruction
		/// \param IdxMulOpd is index of operand in Root that is the result of
		/// the MUL. In the example above IdxMulOpd is 1.
		/// \param MaddOpc the opcode fo the madd instruction
		/// \param VR is a virtual register that holds the value of an ADD operand
		/// (V in the example above).
		static MachineInstr *genMaddR(MachineFunction &MF, MachineRegisterInfo &MRI,
		const TargetInstrInfo *TII, MachineInstr &Root,
		SmallVectorImpl<MachineInstr *> &InsInstrs,
		unsigned IdxMulOpd, unsigned MaddOpc,
		unsigned VR) {
		assert(IdxMulOpd == 1 \|\| IdxMulOpd == 2);

		MachineInstr *MUL = MRI.getUniqueVRegDef(Root.getOperand(IdxMulOpd).getReg());
		MachineOperand R = Root.getOperand(0);
		MachineOperand A = MUL->getOperand(1);
		MachineOperand B = MUL->getOperand(2);
		MachineInstrBuilder MIB = BuildMI(MF, Root.getDebugLoc(), TII->get(MaddOpc))
		.addOperand(R)
		.addOperand(A)
		.addOperand(B)
		.addReg(VR);
		// Insert the MADD
		InsInstrs.push_back(MIB);
		return MUL;
		}
		/// genAlternativeCodeSequence - when hasPattern() finds a pattern
		/// this function generates the instructions that could replace the
		/// original code sequence
		void AArch64InstrInfo::genAlternativeCodeSequence(
		MachineInstr &Root, MachineCombinerPattern::MC_PATTERN Pattern,
		SmallVectorImpl<MachineInstr *> &InsInstrs,
		SmallVectorImpl<MachineInstr *> &DelInstrs,
		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {
		MachineBasicBlock &MBB = *Root.getParent();
		MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
		MachineFunction &MF = *MBB.getParent();
		const TargetInstrInfo *TII = MF.getTarget().getInstrInfo();

		MachineInstr *MUL;
		unsigned Opc;
		switch (Pattern) {
		default:
		// signal error.
		break;
		case MachineCombinerPattern::MC_MULADDW_OP1:
		case MachineCombinerPattern::MC_MULADDX_OP1:
		// MUL I=A,B,0
		// ADD R,I,C
		// ==> MADD R,A,B,C
		// --- Create(MADD);
		Opc = Pattern == MachineCombinerPattern::MC_MULADDW_OP1 ? AArch64::MADDWrrr
		: AArch64::MADDXrrr;
		MUL = genMadd(MF, MRI, TII, Root, InsInstrs, 1, Opc);
		break;
		case MachineCombinerPattern::MC_MULADDW_OP2:
		case MachineCombinerPattern::MC_MULADDX_OP2:
		// MUL I=A,B,0
		// ADD R,C,I
		// ==> MADD R,A,B,C
		// --- Create(MADD);
		Opc = Pattern == MachineCombinerPattern::MC_MULADDW_OP2 ? AArch64::MADDWrrr
		: AArch64::MADDXrrr;
		MUL = genMadd(MF, MRI, TII, Root, InsInstrs, 2, Opc);
		break;
		case MachineCombinerPattern::MC_MULADDWI_OP1:
		case MachineCombinerPattern::MC_MULADDXI_OP1:
		// MUL I=A,B,0
		// ADD R,I,Imm
		// ==> ORR V, ZR, Imm
		// ==> MADD R,A,B,V
		// --- Create(MADD);
		{
		const TargetRegisterClass *RC =
		MRI.getRegClass(Root.getOperand(1).getReg());
		unsigned NewVR = MRI.createVirtualRegister(RC);
		unsigned BitSize, OrrOpc, ZeroReg;
		if (Pattern == MachineCombinerPattern::MC_MULADDWI_OP1) {
		BitSize = 32;
		OrrOpc = AArch64::ORRWri;
		ZeroReg = AArch64::WZR;
		Opc = AArch64::MADDWrrr;
		} else {
		OrrOpc = AArch64::ORRXri;
		BitSize = 64;
		ZeroReg = AArch64::XZR;
		Opc = AArch64::MADDXrrr;
		}
		uint64_t Imm = Root.getOperand(2).getImm();

		if (Root.getOperand(3).isImm()) {
		unsigned val = Root.getOperand(3).getImm();
		Imm = Imm << val;
		}
		uint64_t UImm = Imm << (64 - BitSize) >> (64 - BitSize);
		uint64_t Encoding;

		if (AArch64_AM::processLogicalImmediate(UImm, BitSize, Encoding)) {
		MachineInstrBuilder MIB1 =
		BuildMI(MF, Root.getDebugLoc(), TII->get(OrrOpc))
		.addOperand(MachineOperand::CreateReg(NewVR, RegState::Define))
		.addReg(ZeroReg)
		.addImm(Encoding);
		InsInstrs.push_back(MIB1);
		InstrIdxForVirtReg.insert(std::make_pair(NewVR, 0));
		MUL = genMaddR(MF, MRI, TII, Root, InsInstrs, 1, Opc, NewVR);
		}
		}
		break;
		case MachineCombinerPattern::MC_MULSUBW_OP1:
		case MachineCombinerPattern::MC_MULSUBX_OP1: {
		// MUL I=A,B,0
		// SUB R,I, C
		// ==> SUB V, 0, C
		// ==> MADD R,A,B,V // = -C + A*B
		// --- Create(MADD);
		const TargetRegisterClass *RC =
		MRI.getRegClass(Root.getOperand(1).getReg());
		unsigned NewVR = MRI.createVirtualRegister(RC);
		unsigned SubOpc, ZeroReg;
		if (Pattern == MachineCombinerPattern::MC_MULSUBW_OP1) {
		SubOpc = AArch64::SUBWrr;
		ZeroReg = AArch64::WZR;
		Opc = AArch64::MADDWrrr;
		} else {
		SubOpc = AArch64::SUBXrr;
		ZeroReg = AArch64::XZR;
		Opc = AArch64::MADDXrrr;
		}
		// SUB NewVR, 0, C
		MachineInstrBuilder MIB1 =
		BuildMI(MF, Root.getDebugLoc(), TII->get(SubOpc))
		.addOperand(MachineOperand::CreateReg(NewVR, RegState::Define))
		.addReg(ZeroReg)
		.addOperand(Root.getOperand(2));
		InsInstrs.push_back(MIB1);
		InstrIdxForVirtReg.insert(std::make_pair(NewVR, 0));
		MUL = genMaddR(MF, MRI, TII, Root, InsInstrs, 1, Opc, NewVR);
		} break;
		case MachineCombinerPattern::MC_MULSUBW_OP2:
		case MachineCombinerPattern::MC_MULSUBX_OP2:
		// MUL I=A,B,0
		// SUB R,C,I
		// ==> MSUB R,A,B,C (computes C - A*B)
		// --- Create(MSUB);
		Opc = Pattern == MachineCombinerPattern::MC_MULSUBW_OP2 ? AArch64::MSUBWrrr
		: AArch64::MSUBXrrr;
		MUL = genMadd(MF, MRI, TII, Root, InsInstrs, 2, Opc);
		break;
		case MachineCombinerPattern::MC_MULSUBWI_OP1:
		case MachineCombinerPattern::MC_MULSUBXI_OP1: {
		// MUL I=A,B,0
		// SUB R,I, Imm
		// ==> ORR V, ZR, -Imm
		// ==> MADD R,A,B,V // = -Imm + A*B
		// --- Create(MADD);
		const TargetRegisterClass *RC =
		MRI.getRegClass(Root.getOperand(1).getReg());
		unsigned NewVR = MRI.createVirtualRegister(RC);
		unsigned BitSize, OrrOpc, ZeroReg;
		if (Pattern == MachineCombinerPattern::MC_MULSUBWI_OP1) {
		BitSize = 32;
		OrrOpc = AArch64::ORRWri;
		ZeroReg = AArch64::WZR;
		Opc = AArch64::MADDWrrr;
		} else {
		OrrOpc = AArch64::ORRXri;
		BitSize = 64;
		ZeroReg = AArch64::XZR;
		Opc = AArch64::MADDXrrr;
		}
		int Imm = Root.getOperand(2).getImm();
		if (Root.getOperand(3).isImm()) {
		unsigned val = Root.getOperand(3).getImm();
		Imm = Imm << val;
		}
		uint64_t UImm = -Imm << (64 - BitSize) >> (64 - BitSize);
		uint64_t Encoding;
		if (AArch64_AM::processLogicalImmediate(UImm, BitSize, Encoding)) {
		MachineInstrBuilder MIB1 =
		BuildMI(MF, Root.getDebugLoc(), TII->get(OrrOpc))
		.addOperand(MachineOperand::CreateReg(NewVR, RegState::Define))
		.addReg(ZeroReg)
		.addImm(Encoding);
		InsInstrs.push_back(MIB1);
		InstrIdxForVirtReg.insert(std::make_pair(NewVR, 0));
		MUL = genMaddR(MF, MRI, TII, Root, InsInstrs, 1, Opc, NewVR);
		}
		} break;
		}
		// Record MUL and ADD/SUB for deletion
		DelInstrs.push_back(MUL);
		DelInstrs.push_back(&Root);

		return;
		}

lib/Target/AArch64/AArch64TargetMachine.cpp

Show All 18 Lines
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
using namespace llvm;		using namespace llvm;

static cl::opt<bool>		static cl::opt<bool>
EnableCCMP("aarch64-ccmp", cl::desc("Enable the CCMP formation pass"),		EnableCCMP("aarch64-ccmp", cl::desc("Enable the CCMP formation pass"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

		static cl::opt<bool> EnableMCR("aarch64-mcr",
		cl::desc("Enable the machine combiner pass"),
		cl::init(true), cl::Hidden);

static cl::opt<bool>		static cl::opt<bool>
EnableStPairSuppress("aarch64-stp-suppress", cl::desc("Suppress STP for AArch64"),		EnableStPairSuppress("aarch64-stp-suppress", cl::desc("Suppress STP for AArch64"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

static cl::opt<bool>		static cl::opt<bool>
EnableAdvSIMDScalar("aarch64-simd-scalar", cl::desc("Enable use of AdvSIMD scalar"		EnableAdvSIMDScalar("aarch64-simd-scalar", cl::desc("Enable use of AdvSIMD scalar"
" integer instructions"), cl::init(false), cl::Hidden);		" integer instructions"), cl::init(false), cl::Hidden);

▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	if (TM->getSubtarget<AArch64Subtarget>().isTargetELF() &&
addPass(createAArch64CleanupLocalDynamicTLSPass());		addPass(createAArch64CleanupLocalDynamicTLSPass());

return false;		return false;
}		}

bool AArch64PassConfig::addILPOpts() {		bool AArch64PassConfig::addILPOpts() {
if (EnableCCMP)		if (EnableCCMP)
addPass(createAArch64ConditionalCompares());		addPass(createAArch64ConditionalCompares());
		if (EnableMCR)
		addPass(&MachineCombinerID);
addPass(&EarlyIfConverterID);		addPass(&EarlyIfConverterID);
if (EnableStPairSuppress)		if (EnableStPairSuppress)
addPass(createAArch64StorePairSuppressPass());		addPass(createAArch64StorePairSuppressPass());
return true;		return true;
}		}

bool AArch64PassConfig::addPreRegAlloc() {		bool AArch64PassConfig::addPreRegAlloc() {
// Use AdvSIMD scalar instructions whenever profitable.		// Use AdvSIMD scalar instructions whenever profitable.
Show All 30 Lines

test/CodeGen/AArch64/arm64-neon-mul-div.ll

	; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon -mcpu=cyclone \| FileCheck %s
	; arm64 has its own copy of this because of the intrinsics			; arm64 has its own copy of this because of the intrinsics

	define <8 x i8> @mul8xi8(<8 x i8> %A, <8 x i8> %B) {			define <8 x i8> @mul8xi8(<8 x i8> %A, <8 x i8> %B) {
	; CHECK-LABEL: mul8xi8:			; CHECK-LABEL: mul8xi8:
	; CHECK: mul {{v[0-9]+}}.8b, {{v[0-9]+}}.8b, {{v[0-9]+}}.8b			; CHECK: mul {{v[0-9]+}}.8b, {{v[0-9]+}}.8b, {{v[0-9]+}}.8b
	%tmp3 = mul <8 x i8> %A, %B;			%tmp3 = mul <8 x i8> %A, %B;
	ret <8 x i8> %tmp3			ret <8 x i8> %tmp3
	}			}
	▲ Show 20 Lines • Show All 435 Lines • ▼ Show 20 Lines
	; CHECK: msub {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: msub {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
	%tmp3 = srem <1 x i32> %A, %B;			%tmp3 = srem <1 x i32> %A, %B;
	ret <1 x i32> %tmp3			ret <1 x i32> %tmp3
	}			}

	define <2 x i32> @srem2x32(<2 x i32> %A, <2 x i32> %B) {			define <2 x i32> @srem2x32(<2 x i32> %A, <2 x i32> %B) {
	; CHECK-LABEL: srem2x32:			; CHECK-LABEL: srem2x32:
	; CHECK: sdiv {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: sdiv {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK: sdiv {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK: msub {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: msub {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
				; CHECK: sdiv {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK: msub {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: msub {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
	%tmp3 = srem <2 x i32> %A, %B;			%tmp3 = srem <2 x i32> %A, %B;
	ret <2 x i32> %tmp3			ret <2 x i32> %tmp3
	}			}

	define <4 x i32> @srem4x32(<4 x i32> %A, <4 x i32> %B) {			define <4 x i32> @srem4x32(<4 x i32> %A, <4 x i32> %B) {
	; CHECK-LABEL: srem4x32:			; CHECK-LABEL: srem4x32:
	; CHECK: sdiv {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: sdiv {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
	Show All 14 Lines
	; CHECK: msub {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}			; CHECK: msub {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}
	%tmp3 = srem <1 x i64> %A, %B;			%tmp3 = srem <1 x i64> %A, %B;
	ret <1 x i64> %tmp3			ret <1 x i64> %tmp3
	}			}

	define <2 x i64> @srem2x64(<2 x i64> %A, <2 x i64> %B) {			define <2 x i64> @srem2x64(<2 x i64> %A, <2 x i64> %B) {
	; CHECK-LABEL: srem2x64:			; CHECK-LABEL: srem2x64:
	; CHECK: sdiv {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}			; CHECK: sdiv {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}
	; CHECK: sdiv {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}
	; CHECK: msub {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}			; CHECK: msub {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}
				; CHECK: sdiv {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}
	; CHECK: msub {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}			; CHECK: msub {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}
	%tmp3 = srem <2 x i64> %A, %B;			%tmp3 = srem <2 x i64> %A, %B;
	ret <2 x i64> %tmp3			ret <2 x i64> %tmp3
	}			}

	define <1 x i8> @urem1x8(<1 x i8> %A, <1 x i8> %B) {			define <1 x i8> @urem1x8(<1 x i8> %A, <1 x i8> %B) {
	; CHECK-LABEL: urem1x8:			; CHECK-LABEL: urem1x8:
	; CHECK: udiv {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: udiv {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
	▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
	; CHECK: msub {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: msub {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
	%tmp3 = urem <1 x i32> %A, %B;			%tmp3 = urem <1 x i32> %A, %B;
	ret <1 x i32> %tmp3			ret <1 x i32> %tmp3
	}			}

	define <2 x i32> @urem2x32(<2 x i32> %A, <2 x i32> %B) {			define <2 x i32> @urem2x32(<2 x i32> %A, <2 x i32> %B) {
	; CHECK-LABEL: urem2x32:			; CHECK-LABEL: urem2x32:
	; CHECK: udiv {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: udiv {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK: udiv {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK: msub {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: msub {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
				; CHECK: udiv {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK: msub {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: msub {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
	%tmp3 = urem <2 x i32> %A, %B;			%tmp3 = urem <2 x i32> %A, %B;
	ret <2 x i32> %tmp3			ret <2 x i32> %tmp3
	}			}

	define <4 x i32> @urem4x32(<4 x i32> %A, <4 x i32> %B) {			define <4 x i32> @urem4x32(<4 x i32> %A, <4 x i32> %B) {
	; CHECK-LABEL: urem4x32:			; CHECK-LABEL: urem4x32:
	; CHECK: udiv {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: udiv {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
	Show All 14 Lines
	; CHECK: msub {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}			; CHECK: msub {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}
	%tmp3 = urem <1 x i64> %A, %B;			%tmp3 = urem <1 x i64> %A, %B;
	ret <1 x i64> %tmp3			ret <1 x i64> %tmp3
	}			}

	define <2 x i64> @urem2x64(<2 x i64> %A, <2 x i64> %B) {			define <2 x i64> @urem2x64(<2 x i64> %A, <2 x i64> %B) {
	; CHECK-LABEL: urem2x64:			; CHECK-LABEL: urem2x64:
	; CHECK: udiv {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}			; CHECK: udiv {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}
	; CHECK: udiv {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}
	; CHECK: msub {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}			; CHECK: msub {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}
				; CHECK: udiv {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}
	; CHECK: msub {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}			; CHECK: msub {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}
	%tmp3 = urem <2 x i64> %A, %B;			%tmp3 = urem <2 x i64> %A, %B;
	ret <2 x i64> %tmp3			ret <2 x i64> %tmp3
	}			}

	define <2 x float> @frem2f32(<2 x float> %A, <2 x float> %B) {			define <2 x float> @frem2f32(<2 x float> %A, <2 x float> %B) {
	; CHECK-LABEL: frem2f32:			; CHECK-LABEL: frem2f32:
	; CHECK: bl fmodf			; CHECK: bl fmodf
	▲ Show 20 Lines • Show All 141 Lines • Show Last 20 Lines

test/CodeGen/AArch64/dp-3source.ll

	; RUN: llc -verify-machineinstrs -o - %s -mtriple=arm64-apple-ios7.0 \| FileCheck %s			; RUN: llc -verify-machineinstrs -o - %s -mtriple=arm64-apple-ios7.0 -mcpu=cyclone \| FileCheck %s

	define i32 @test_madd32(i32 %val0, i32 %val1, i32 %val2) {			define i32 @test_madd32(i32 %val0, i32 %val1, i32 %val2) {
	; CHECK-LABEL: test_madd32:			; CHECK-LABEL: test_madd32:
	%mid = mul i32 %val1, %val2			%mid = mul i32 %val1, %val2
	%res = add i32 %val0, %mid			%res = add i32 %val0, %mid
	; CHECK: madd {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: madd {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}
	ret i32 %res			ret i32 %res
	}			}
	▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

test/CodeGen/AArch64/mul-lohi.ll

	; RUN: llc -mtriple=arm64-apple-ios7.0 %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm64-apple-ios7.0 -mcpu=cyclone %s -o - \| FileCheck %s
	; RUN: llc -mtriple=arm64_be-linux-gnu %s -o - \| FileCheck --check-prefix=CHECK-BE %s			; RUN: llc -mtriple=arm64_be-linux-gnu -mcpu=cyclone %s -o - \| FileCheck --check-prefix=CHECK-BE %s

	define i128 @test_128bitmul(i128 %lhs, i128 %rhs) {			define i128 @test_128bitmul(i128 %lhs, i128 %rhs) {
	; CHECK-LABEL: test_128bitmul:			; CHECK-LABEL: test_128bitmul:
				; CHECK-DAG: mul [[PART1:x[0-9]+]], x0, x3
	; CHECK-DAG: umulh [[CARRY:x[0-9]+]], x0, x2			; CHECK-DAG: umulh [[CARRY:x[0-9]+]], x0, x2
	; CHECK-DAG: madd [[PART1:x[0-9]+]], x0, x3, [[CARRY]]			; CHECK: mul [[PART2:x[0-9]+]], x1, x2
	; CHECK: madd x1, x1, x2, [[PART1]]
	; CHECK: mul x0, x0, x2			; CHECK: mul x0, x0, x2

	; CHECK-BE-LABEL: test_128bitmul:			; CHECK-BE-LABEL: test_128bitmul:
				; CHECK-BE-DAG: mul [[PART1:x[0-9]+]], x1, x2
	; CHECK-BE-DAG: umulh [[CARRY:x[0-9]+]], x1, x3			; CHECK-BE-DAG: umulh [[CARRY:x[0-9]+]], x1, x3
	; CHECK-BE-DAG: madd [[PART1:x[0-9]+]], x1, x2, [[CARRY]]			; CHECK-BE: mul [[PART2:x[0-9]+]], x0, x3
	; CHECK-BE: madd x0, x0, x3, [[PART1]]
	; CHECK-BE: mul x1, x1, x3			; CHECK-BE: mul x1, x1, x3

	%prod = mul i128 %lhs, %rhs			%prod = mul i128 %lhs, %rhs
	ret i128 %prod			ret i128 %prod
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

Review for machine combiner passNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 11801

include/llvm/CodeGen/MachineCombinerPattern.h

include/llvm/CodeGen/MachineTraceMetrics.h

include/llvm/CodeGen/Passes.h

include/llvm/CodeGen/TargetSchedule.h

include/llvm/InitializePasses.h

include/llvm/Target/TargetInstrInfo.h

lib/CodeGen/CMakeLists.txt

lib/CodeGen/CodeGen.cpp

lib/CodeGen/MachineCombiner.cpp

lib/CodeGen/MachineScheduler.cpp

lib/CodeGen/MachineTraceMetrics.cpp

lib/CodeGen/TargetSchedule.cpp

lib/Target/AArch64/AArch64InstrFormats.td

lib/Target/AArch64/AArch64InstrInfo.h

lib/Target/AArch64/AArch64InstrInfo.cpp

lib/Target/AArch64/AArch64TargetMachine.cpp

test/CodeGen/AArch64/arm64-neon-mul-div.ll

test/CodeGen/AArch64/dp-3source.ll

test/CodeGen/AArch64/mul-lohi.ll

Review for machine combiner pass
Needs ReviewPublic