This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AArch64/
-
Target/
-
AArch64/
5/8
AArch64CollectLOH.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
arm64-collect-loh-garbage-crash.ll
-
arm64-collect-loh-str.ll
2/4
arm64-collect-loh.ll

Differential D27329

AArch64CollectLOH: Rewrite as block-local analysis.
ClosedPublic

Authored by MatzeB on Dec 1 2016, 6:22 PM.

Download Raw Diff

Details

Reviewers

qcolombet

Commits

rG258b847c4f47: AArch64CollectLOH: Rewrite as block-local analysis.
rGe813cf457ae2: AArch64CollectLOH: Rewrite as block-local analysis.
rG1fbb0f6dd997: AArch64CollectLOH: Rewrite as block-local analysis.
rL290026: AArch64CollectLOH: Rewrite as block-local analysis.
rL288561: AArch64CollectLOH: Rewrite as block-local analysis.

Summary

Previously this pass was using up to 5% compiletime in some cases which
is a bit much for what it is doing. The pass featured a full blown
dataflow analysis which in the default configuration was restricted to a
single block.

This rewrites the pass under the assumption that we only ever work on a
single block. It now works in a single pass just maintaining a small state
machines per tracked physreg. This makes the pass 5-10x faster.

Diff Detail

Repository: rL LLVM

Event Timeline

MatzeB updated this revision to Diff 80013.Dec 1 2016, 6:22 PM

MatzeB retitled this revision from to AArch64CollectLOH: Rewrite as block-local analysis..

MatzeB updated this object.

MatzeB added a reviewer: qcolombet.

MatzeB set the repository for this revision to rL LLVM.

MatzeB added a subscriber: llvm-commits.

Herald added subscribers: mcrosier, rengolin, aemerson. · View Herald TranscriptDec 1 2016, 6:22 PM

Hi Matthias,

This looks indeed much simpler! That helps to narrow down the supported cases :).
LGTM, with a couple of nitpicks.
Also, the pass does not have any DEBUG statement now AFAICT. Could you add some?

Thanks,
-Quentin

lib/Target/AArch64/AArch64CollectLOH.cpp
274	Given how the code is structured, i.e., this definition and all the helper functions before the main algorithm, I would add a comment saying that this structure will be populated via a bottom up traversal of the basic blocks. Then, a LOH will be recorded when we reach an ADRP with a candidate chain (or chains when we have the ADRP_ADRP case on top of the other).
278	I wouldn't bother defining bitfield size.
328	Typo -> s/an/a
355	I'd also assert that the operand is a GOT entry.
505	Shouldn't we have a default case to avoid warnings?
test/CodeGen/AArch64/arm64-collect-loh.ll
637	Why is this not on the next line anymore?
670	There shouldn't be any nondeterminism in the output, so I would rather stick to whatever order you get now.

This revision is now accepted and ready to land.Dec 2 2016, 10:49 AM

Thanks for the review. I added some debug statements and spend some time creating a .mir test for all situations happening in the code (and fixed a bug discovered while doing so).

lib/Target/AArch64/AArch64CollectLOH.cpp
328	I imagine this as "an 'el' 'oh' 'haitch'"
355	Good point.
505	We're switching over `unsigned Opcode` and don't get a warning.
test/CodeGen/AArch64/arm64-collect-loh.ll
637	This example contained a perfectly fine LOH opportunity that wasn't catched before (probably because the old algorithm bailed out on the double q reg and didn't resume properly on the LDR address). I updated the test to make it apparent.
670	ok.

Closed by commit rL288561: AArch64CollectLOH: Rewrite as block-local analysis. (authored by matze). · Explain WhyDec 2 2016, 5:03 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64CollectLOH.cpp

1078 lines

test/

CodeGen/

AArch64/

arm64-collect-loh-garbage-crash.ll

2 lines

arm64-collect-loh-str.ll

2 lines

arm64-collect-loh.ll

14 lines

Diff 80013

lib/Target/AArch64/AArch64CollectLOH.cpp

	Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	#include "MCTargetDesc/AArch64AddressingModes.h"			#include "MCTargetDesc/AArch64AddressingModes.h"
	#include "llvm/ADT/BitVector.h"			#include "llvm/ADT/BitVector.h"
	#include "llvm/ADT/DenseMap.h"			#include "llvm/ADT/DenseMap.h"
	#include "llvm/ADT/MapVector.h"			#include "llvm/ADT/MapVector.h"
	#include "llvm/ADT/SetVector.h"			#include "llvm/ADT/SetVector.h"
	#include "llvm/ADT/SmallVector.h"			#include "llvm/ADT/SmallVector.h"
	#include "llvm/ADT/Statistic.h"			#include "llvm/ADT/Statistic.h"
	#include "llvm/CodeGen/MachineBasicBlock.h"			#include "llvm/CodeGen/MachineBasicBlock.h"
	#include "llvm/CodeGen/MachineDominators.h"
	#include "llvm/CodeGen/MachineFunctionPass.h"			#include "llvm/CodeGen/MachineFunctionPass.h"
	#include "llvm/CodeGen/MachineInstr.h"			#include "llvm/CodeGen/MachineInstr.h"
	#include "llvm/CodeGen/MachineInstrBuilder.h"			#include "llvm/CodeGen/MachineInstrBuilder.h"
	#include "llvm/Support/CommandLine.h"
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
	#include "llvm/Support/ErrorHandling.h"			#include "llvm/Support/ErrorHandling.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"
	#include "llvm/Target/TargetInstrInfo.h"
	#include "llvm/Target/TargetMachine.h"			#include "llvm/Target/TargetMachine.h"
	#include "llvm/Target/TargetRegisterInfo.h"			#include "llvm/Target/TargetRegisterInfo.h"
	using namespace llvm;			using namespace llvm;

	#define DEBUG_TYPE "aarch64-collect-loh"			#define DEBUG_TYPE "aarch64-collect-loh"

	static cl::opt<bool>
	PreCollectRegister("aarch64-collect-loh-pre-collect-register", cl::Hidden,
	cl::desc("Restrict analysis to registers invovled"
	" in LOHs"),
	cl::init(true));

	static cl::opt<bool>
	BasicBlockScopeOnly("aarch64-collect-loh-bb-only", cl::Hidden,
	cl::desc("Restrict analysis at basic block scope"),
	cl::init(true));

	STATISTIC(NumADRPSimpleCandidate,			STATISTIC(NumADRPSimpleCandidate,
	"Number of simplifiable ADRP dominate by another");			"Number of simplifiable ADRP dominate by another");
	#ifndef NDEBUG
	STATISTIC(NumADRPComplexCandidate2,
	"Number of simplifiable ADRP reachable by 2 defs");
	STATISTIC(NumADRPComplexCandidate3,
	"Number of simplifiable ADRP reachable by 3 defs");
	STATISTIC(NumADRPComplexCandidateOther,
	"Number of simplifiable ADRP reachable by 4 or more defs");
	STATISTIC(NumADDToSTRWithImm,
	"Number of simplifiable STR with imm reachable by ADD");
	STATISTIC(NumLDRToSTRWithImm,
	"Number of simplifiable STR with imm reachable by LDR");
	STATISTIC(NumADDToSTR, "Number of simplifiable STR reachable by ADD");			STATISTIC(NumADDToSTR, "Number of simplifiable STR reachable by ADD");
	STATISTIC(NumLDRToSTR, "Number of simplifiable STR reachable by LDR");			STATISTIC(NumLDRToSTR, "Number of simplifiable STR reachable by LDR");
	STATISTIC(NumADDToLDRWithImm,
	"Number of simplifiable LDR with imm reachable by ADD");
	STATISTIC(NumLDRToLDRWithImm,
	"Number of simplifiable LDR with imm reachable by LDR");
	STATISTIC(NumADDToLDR, "Number of simplifiable LDR reachable by ADD");			STATISTIC(NumADDToLDR, "Number of simplifiable LDR reachable by ADD");
	STATISTIC(NumLDRToLDR, "Number of simplifiable LDR reachable by LDR");			STATISTIC(NumLDRToLDR, "Number of simplifiable LDR reachable by LDR");
	#endif // NDEBUG
	STATISTIC(NumADRPToLDR, "Number of simplifiable LDR reachable by ADRP");			STATISTIC(NumADRPToLDR, "Number of simplifiable LDR reachable by ADRP");
	#ifndef NDEBUG
	STATISTIC(NumCplxLvl1, "Number of complex case of level 1");
	STATISTIC(NumTooCplxLvl1, "Number of too complex case of level 1");
	STATISTIC(NumCplxLvl2, "Number of complex case of level 2");
	STATISTIC(NumTooCplxLvl2, "Number of too complex case of level 2");
	#endif // NDEBUG
	STATISTIC(NumADRSimpleCandidate, "Number of simplifiable ADRP + ADD");			STATISTIC(NumADRSimpleCandidate, "Number of simplifiable ADRP + ADD");
	STATISTIC(NumADRComplexCandidate, "Number of too complex ADRP + ADD");

	#define AARCH64_COLLECT_LOH_NAME "AArch64 Collect Linker Optimization Hint (LOH)"			#define AARCH64_COLLECT_LOH_NAME "AArch64 Collect Linker Optimization Hint (LOH)"

	namespace {			namespace {

	struct AArch64CollectLOH : public MachineFunctionPass {			struct AArch64CollectLOH : public MachineFunctionPass {
	static char ID;			static char ID;
	AArch64CollectLOH() : MachineFunctionPass(ID) {			AArch64CollectLOH() : MachineFunctionPass(ID) {}
	initializeAArch64CollectLOHPass(*PassRegistry::getPassRegistry());
	}

	bool runOnMachineFunction(MachineFunction &MF) override;			bool runOnMachineFunction(MachineFunction &MF) override;

	MachineFunctionProperties getRequiredProperties() const override {			MachineFunctionProperties getRequiredProperties() const override {
	return MachineFunctionProperties().set(			return MachineFunctionProperties().set(
	MachineFunctionProperties::Property::NoVRegs);			MachineFunctionProperties::Property::NoVRegs);
	}			}

	StringRef getPassName() const override { return AARCH64_COLLECT_LOH_NAME; }			StringRef getPassName() const override { return AARCH64_COLLECT_LOH_NAME; }

	void getAnalysisUsage(AnalysisUsage &AU) const override {			void getAnalysisUsage(AnalysisUsage &AU) const override {
	AU.setPreservesAll();
	MachineFunctionPass::getAnalysisUsage(AU);			MachineFunctionPass::getAnalysisUsage(AU);
	AU.addRequired<MachineDominatorTree>();			AU.setPreservesAll();
	}			}

	private:
	};			};

	/// A set of MachineInstruction.
	typedef SetVector<const MachineInstr *> SetOfMachineInstr;
	/// Map a basic block to a set of instructions per register.
	/// This is used to represent the exposed uses of a basic block
	/// per register.
	typedef MapVector<const MachineBasicBlock *,
	std::unique_ptr<SetOfMachineInstr[]>>
	BlockToSetOfInstrsPerColor;
	/// Map a basic block to an instruction per register.
	/// This is used to represent the live-out definitions of a basic block
	/// per register.
	typedef MapVector<const MachineBasicBlock *,
	std::unique_ptr<const MachineInstr *[]>>
	BlockToInstrPerColor;
	/// Map an instruction to a set of instructions. Used to represent the
	/// mapping def to reachable uses or use to definitions.
	typedef MapVector<const MachineInstr *, SetOfMachineInstr> InstrToInstrs;
	/// Map a basic block to a BitVector.
	/// This is used to record the kill registers per basic block.
	typedef MapVector<const MachineBasicBlock *, BitVector> BlockToRegSet;

	/// Map a register to a dense id.
	typedef DenseMap<unsigned, unsigned> MapRegToId;
	/// Map a dense id to a register. Used for debug purposes.
	typedef SmallVector<unsigned, 32> MapIdToReg;
	} // end anonymous namespace.

	char AArch64CollectLOH::ID = 0;			char AArch64CollectLOH::ID = 0;

	INITIALIZE_PASS_BEGIN(AArch64CollectLOH, "aarch64-collect-loh",			} // end anonymous namespace.
	AARCH64_COLLECT_LOH_NAME, false, false)
	INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
	INITIALIZE_PASS_END(AArch64CollectLOH, "aarch64-collect-loh",
	AARCH64_COLLECT_LOH_NAME, false, false)

	/// Given a couple (MBB, reg) get the corresponding set of instruction from
	/// the given "sets".
	/// If this couple does not reference any set, an empty set is added to "sets"
	/// for this couple and returned.
	/// \param nbRegs is used internally allocate some memory. It must be consistent
	/// with the way sets is used.
	static SetOfMachineInstr &getSet(BlockToSetOfInstrsPerColor &sets,
	const MachineBasicBlock &MBB, unsigned reg,
	unsigned nbRegs) {
	SetOfMachineInstr *result;
	BlockToSetOfInstrsPerColor::iterator it = sets.find(&MBB);
	if (it != sets.end())
	result = it->second.get();
	else
	result = (sets[&MBB] = make_unique<SetOfMachineInstr[]>(nbRegs)).get();

	return result[reg];
	}

	/// Given a couple (reg, MI) get the corresponding set of instructions from the
	/// the given "sets".
	/// This is used to get the uses record in sets of a definition identified by
	/// MI and reg, i.e., MI defines reg.
	/// If the couple does not reference anything, an empty set is added to
	/// "sets[reg]".
	/// \pre set[reg] is valid.
	static SetOfMachineInstr &getUses(InstrToInstrs *sets, unsigned reg,
	const MachineInstr &MI) {
	return sets[reg][&MI];
	}

	/// Same as getUses but does not modify the input map: sets.
	/// \return NULL if the couple (reg, MI) is not in sets.
	static const SetOfMachineInstr getUses(const InstrToInstrs sets, unsigned reg,
	const MachineInstr &MI) {
	InstrToInstrs::const_iterator Res = sets[reg].find(&MI);
	if (Res != sets[reg].end())
	return &(Res->second);
	return nullptr;
	}

	/// Initialize the reaching definition algorithm:
	/// For each basic block BB in MF, record:
	/// - its kill set.
	/// - its reachable uses (uses that are exposed to BB's predecessors).
	/// - its the generated definitions.
	/// \param DummyOp if not NULL, specifies a Dummy Operation to be added to
	/// the list of uses of exposed defintions.
	/// \param ADRPMode specifies to only consider ADRP instructions for generated
	/// definition. It also consider definitions of ADRP instructions as uses and
	/// ignore other uses. The ADRPMode is used to collect the information for LHO
	/// that involve ADRP operation only.
	static void initReachingDef(const MachineFunction &MF,
	InstrToInstrs *ColorOpToReachedUses,
	BlockToInstrPerColor &Gen, BlockToRegSet &Kill,
	BlockToSetOfInstrsPerColor &ReachableUses,
	const MapRegToId &RegToId,
	const MachineInstr *DummyOp, bool ADRPMode) {
	const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
	unsigned NbReg = RegToId.size();

	for (const MachineBasicBlock &MBB : MF) {
	auto &BBGen = Gen[&MBB];
	BBGen = make_unique<const MachineInstr *[]>(NbReg);
	std::fill(BBGen.get(), BBGen.get() + NbReg, nullptr);

	BitVector &BBKillSet = Kill[&MBB];
	BBKillSet.resize(NbReg);
	for (const MachineInstr &MI : MBB) {
	bool IsADRP = MI.getOpcode() == AArch64::ADRP;

	// Process uses first.
	if (IsADRP \|\| !ADRPMode)
	for (const MachineOperand &MO : MI.operands()) {
	// Treat ADRP def as use, as the goal of the analysis is to find
	// ADRP defs reached by other ADRP defs.
	if (!MO.isReg() \|\| (!ADRPMode && !MO.isUse()) \|\|
	(ADRPMode && (!IsADRP \|\| !MO.isDef())))
	continue;
	unsigned CurReg = MO.getReg();
	MapRegToId::const_iterator ItCurRegId = RegToId.find(CurReg);
	if (ItCurRegId == RegToId.end())
	continue;
	CurReg = ItCurRegId->second;

	// if CurReg has not been defined, this use is reachable.
	if (!BBGen[CurReg] && !BBKillSet.test(CurReg))
	getSet(ReachableUses, MBB, CurReg, NbReg).insert(&MI);
	// current basic block definition for this color, if any, is in Gen.
	if (BBGen[CurReg])
	getUses(ColorOpToReachedUses, CurReg, *BBGen[CurReg]).insert(&MI);
	}

	// Process clobbers.
	for (const MachineOperand &MO : MI.operands()) {
	if (!MO.isRegMask())
	continue;
	// Clobbers kill the related colors.
	const uint32_t *PreservedRegs = MO.getRegMask();

	// Set generated regs.
	for (const auto &Entry : RegToId) {
	unsigned Reg = Entry.second;
	// Use the global register ID when querying APIs external to this
	// pass.
	if (MachineOperand::clobbersPhysReg(PreservedRegs, Entry.first)) {
	// Do not register clobbered definition for no ADRP.
	// This definition is not used anyway (otherwise register
	// allocation is wrong).
	BBGen[Reg] = ADRPMode ? &MI : nullptr;
	BBKillSet.set(Reg);
	}
	}
	}

	// Process register defs.
	for (const MachineOperand &MO : MI.operands()) {
	if (!MO.isReg() \|\| !MO.isDef())
	continue;
	unsigned CurReg = MO.getReg();
	MapRegToId::const_iterator ItCurRegId = RegToId.find(CurReg);
	if (ItCurRegId == RegToId.end())
	continue;

	for (MCRegAliasIterator AI(CurReg, TRI, true); AI.isValid(); ++AI) {
	MapRegToId::const_iterator ItRegId = RegToId.find(*AI);
	// If this alias has not been recorded, then it is not interesting
	// for the current analysis.
	// We can end up in this situation because of tuple registers.
	// E.g., Let say we are interested in S1. When we register
	// S1, we will also register its aliases and in particular
	// the tuple Q1_Q2.
	// Now, when we encounter Q1_Q2, we will look through its aliases
	// and will find that S2 is not registered.
	if (ItRegId == RegToId.end())
	continue;

	BBKillSet.set(ItRegId->second);
	BBGen[ItRegId->second] = &MI;
	}
	BBGen[ItCurRegId->second] = &MI;
	}
	}

	// If we restrict our analysis to basic block scope, conservatively add a
	// dummy
	// use for each generated value.
	if (!ADRPMode && DummyOp && !MBB.succ_empty())
	for (unsigned CurReg = 0; CurReg < NbReg; ++CurReg)
	if (BBGen[CurReg])
	getUses(ColorOpToReachedUses, CurReg, *BBGen[CurReg]).insert(DummyOp);
	}
	}

	/// Reaching def core algorithm:
	/// while an Out has changed
	/// for each bb
	/// for each color
	/// In[bb][color] = U Out[bb.predecessors][color]
	/// insert reachableUses[bb][color] in each in[bb][color]
	/// op.reachedUses
	///
	/// Out[bb] = Gen[bb] U (In[bb] - Kill[bb])
	static void reachingDefAlgorithm(const MachineFunction &MF,
	InstrToInstrs *ColorOpToReachedUses,
	BlockToSetOfInstrsPerColor &In,
	BlockToSetOfInstrsPerColor &Out,
	BlockToInstrPerColor &Gen, BlockToRegSet &Kill,
	BlockToSetOfInstrsPerColor &ReachableUses,
	unsigned NbReg) {
	bool HasChanged;
	do {
	HasChanged = false;
	for (const MachineBasicBlock &MBB : MF) {
	unsigned CurReg;
	for (CurReg = 0; CurReg < NbReg; ++CurReg) {
	SetOfMachineInstr &BBInSet = getSet(In, MBB, CurReg, NbReg);
	SetOfMachineInstr &BBReachableUses =
	getSet(ReachableUses, MBB, CurReg, NbReg);
	SetOfMachineInstr &BBOutSet = getSet(Out, MBB, CurReg, NbReg);
	unsigned Size = BBOutSet.size();
	// In[bb][color] = U Out[bb.predecessors][color]
	for (const MachineBasicBlock *PredMBB : MBB.predecessors()) {
	SetOfMachineInstr &PredOutSet = getSet(Out, *PredMBB, CurReg, NbReg);
	BBInSet.insert(PredOutSet.begin(), PredOutSet.end());
	}
	// insert reachableUses[bb][color] in each in[bb][color] op.reachedses
	for (const MachineInstr *MI : BBInSet) {
	SetOfMachineInstr &OpReachedUses =
	getUses(ColorOpToReachedUses, CurReg, *MI);
	OpReachedUses.insert(BBReachableUses.begin(), BBReachableUses.end());
	}
	// Out[bb] = Gen[bb] U (In[bb] - Kill[bb])
	if (!Kill[&MBB].test(CurReg))
	BBOutSet.insert(BBInSet.begin(), BBInSet.end());
	if (Gen[&MBB][CurReg])
	BBOutSet.insert(Gen[&MBB][CurReg]);
	HasChanged \|= BBOutSet.size() != Size;
	}
	}
	} while (HasChanged);
	}

	/// Reaching definition algorithm.
	/// \param MF function on which the algorithm will operate.
	/// \param[out] ColorOpToReachedUses will contain the result of the reaching
	/// def algorithm.
	/// \param ADRPMode specify whether the reaching def algorithm should be tuned
	/// for ADRP optimization. \see initReachingDef for more details.
	/// \param DummyOp if not NULL, the algorithm will work at
	/// basic block scope and will set for every exposed definition a use to
	/// @p DummyOp.
	/// \pre ColorOpToReachedUses is an array of at least number of registers of
	/// InstrToInstrs.
	static void reachingDef(const MachineFunction &MF,
	InstrToInstrs *ColorOpToReachedUses,
	const MapRegToId &RegToId, bool ADRPMode = false,
	const MachineInstr *DummyOp = nullptr) {
	// structures:
	// For each basic block.
	// Out: a set per color of definitions that reach the
	// out boundary of this block.
	// In: Same as Out but for in boundary.
	// Gen: generated color in this block (one operation per color).
	// Kill: register set of killed color in this block.
	// ReachableUses: a set per color of uses (operation) reachable
	// for "In" definitions.
	BlockToSetOfInstrsPerColor Out, In, ReachableUses;
	BlockToInstrPerColor Gen;
	BlockToRegSet Kill;

	// Initialize Gen, kill and reachableUses.
	initReachingDef(MF, ColorOpToReachedUses, Gen, Kill, ReachableUses, RegToId,
	DummyOp, ADRPMode);

	// Algo.
	if (!DummyOp)
	reachingDefAlgorithm(MF, ColorOpToReachedUses, In, Out, Gen, Kill,
	ReachableUses, RegToId.size());
	}

	#ifndef NDEBUG
	/// print the result of the reaching definition algorithm.
	static void printReachingDef(const InstrToInstrs *ColorOpToReachedUses,
	unsigned NbReg, const TargetRegisterInfo *TRI,
	const MapIdToReg &IdToReg) {
	unsigned CurReg;
	for (CurReg = 0; CurReg < NbReg; ++CurReg) {
	if (ColorOpToReachedUses[CurReg].empty())
	continue;
	DEBUG(dbgs() << "* Reg " << PrintReg(IdToReg[CurReg], TRI) << " *\n");

	for (const auto &DefsIt : ColorOpToReachedUses[CurReg]) {			INITIALIZE_PASS(AArch64CollectLOH, "aarch64-collect-loh",
	DEBUG(dbgs() << "Def:\n");			AARCH64_COLLECT_LOH_NAME, false, false)
	DEBUG(DefsIt.first->print(dbgs()));
	DEBUG(dbgs() << "Reachable uses:\n");
	for (const MachineInstr *MI : DefsIt.second) {
	DEBUG(MI->print(dbgs()));
	}
	}
	}
	}
	#endif // NDEBUG

	/// Answer the following question: Can Def be one of the definition			/// Answer the following question: Can Def be one of the definition
	/// involved in a part of a LOH?			/// involved in a part of a LOH?
	static bool canDefBePartOfLOH(const MachineInstr *Def) {			static bool canDefBePartOfLOH(const MachineInstr &MI) {
	unsigned Opc = Def->getOpcode();
	// Accept ADRP, ADDLow and LOADGot.			// Accept ADRP, ADDLow and LOADGot.
	switch (Opc) {			switch (MI.getOpcode()) {
	default:			default:
	return false;			return false;
	case AArch64::ADRP:			case AArch64::ADRP:
	return true;			return true;
	case AArch64::ADDXri:			case AArch64::ADDXri:
	// Check immediate to see if the immediate is an address.			// Check immediate to see if the immediate is an address.
	switch (Def->getOperand(2).getType()) {			switch (MI.getOperand(2).getType()) {
	default:			default:
	return false;			return false;
	case MachineOperand::MO_GlobalAddress:			case MachineOperand::MO_GlobalAddress:
	case MachineOperand::MO_JumpTableIndex:			case MachineOperand::MO_JumpTableIndex:
	case MachineOperand::MO_ConstantPoolIndex:			case MachineOperand::MO_ConstantPoolIndex:
	case MachineOperand::MO_BlockAddress:			case MachineOperand::MO_BlockAddress:
	return true;			return true;
	}			}
	case AArch64::LDRXui:			case AArch64::LDRXui:
	// Check immediate to see if the immediate is an address.			// Check immediate to see if the immediate is an address.
	switch (Def->getOperand(2).getType()) {			switch (MI.getOperand(2).getType()) {
	default:			default:
	return false;			return false;
	case MachineOperand::MO_GlobalAddress:			case MachineOperand::MO_GlobalAddress:
	return true;			return true;
	}			}
	}			}
	// Unreachable.
	return false;
	}			}

	/// Check whether the given instruction can the end of a LOH chain involving a			/// Check whether the given instruction can the end of a LOH chain involving a
	/// store.			/// store.
	static bool isCandidateStore(const MachineInstr *Instr) {			static bool isCandidateStore(const MachineInstr &MI) {
	switch (Instr->getOpcode()) {			switch (MI.getOpcode()) {
	default:			default:
	return false;			return false;
	case AArch64::STRBBui:			case AArch64::STRBBui:
	case AArch64::STRHHui:			case AArch64::STRHHui:
	case AArch64::STRBui:			case AArch64::STRBui:
	case AArch64::STRHui:			case AArch64::STRHui:
	case AArch64::STRWui:			case AArch64::STRWui:
	case AArch64::STRXui:			case AArch64::STRXui:
	case AArch64::STRSui:			case AArch64::STRSui:
	case AArch64::STRDui:			case AArch64::STRDui:
	case AArch64::STRQui:			case AArch64::STRQui:
	// In case we have str xA, [xA, #imm], this is two different uses			// In case we have str xA, [xA, #imm], this is two different uses
	// of xA and we cannot fold, otherwise the xA stored may be wrong,			// of xA and we cannot fold, otherwise the xA stored may be wrong,
	// even if #imm == 0.			// even if #imm == 0.
	if (Instr->getOperand(0).getReg() != Instr->getOperand(1).getReg())			return MI.getOperand(0).getReg() != MI.getOperand(1).getReg();
	return true;
	}
	return false;
	}

	/// Given the result of a reaching definition algorithm in ColorOpToReachedUses,
	/// Build the Use to Defs information and filter out obvious non-LOH candidates.
	/// In ADRPMode, non-LOH candidates are "uses" with non-ADRP definitions.
	/// In non-ADRPMode, non-LOH candidates are "uses" with several definition,
	/// i.e., no simple chain.
	/// \param ADRPMode -- \see initReachingDef.
	static void reachedUsesToDefs(InstrToInstrs &UseToReachingDefs,
	const InstrToInstrs *ColorOpToReachedUses,
	const MapRegToId &RegToId,
	bool ADRPMode = false) {

	SetOfMachineInstr NotCandidate;
	unsigned NbReg = RegToId.size();
	MapRegToId::const_iterator EndIt = RegToId.end();
	for (unsigned CurReg = 0; CurReg < NbReg; ++CurReg) {
	// If this color is never defined, continue.
	if (ColorOpToReachedUses[CurReg].empty())
	continue;

	for (const auto &DefsIt : ColorOpToReachedUses[CurReg]) {
	for (const MachineInstr *MI : DefsIt.second) {
	const MachineInstr *Def = DefsIt.first;
	MapRegToId::const_iterator It;
	// if all the reaching defs are not adrp, this use will not be
	// simplifiable.
	if ((ADRPMode && Def->getOpcode() != AArch64::ADRP) \|\|
	(!ADRPMode && !canDefBePartOfLOH(Def)) \|\|
	(!ADRPMode && isCandidateStore(MI) &&
	// store are LOH candidate iff the end of the chain is used as
	// base.
	((It = RegToId.find((MI)->getOperand(1).getReg())) == EndIt \|\|
	It->second != CurReg))) {
	NotCandidate.insert(MI);
	continue;
	}
	// Do not consider self reaching as a simplifiable case for ADRP.
	if (!ADRPMode \|\| MI != DefsIt.first) {
	UseToReachingDefs[MI].insert(DefsIt.first);
	// If UsesIt has several reaching definitions, it is not
	// candidate for simplificaton in non-ADRPMode.
	if (!ADRPMode && UseToReachingDefs[MI].size() > 1)
	NotCandidate.insert(MI);
	}
	}
	}
	}
	for (const MachineInstr *Elem : NotCandidate) {
	DEBUG(dbgs() << "Too many reaching defs: " << *Elem << "\n");
	// It would have been better if we could just remove the entry
	// from the map. Because of that, we have to filter the garbage
	// (second.empty) in the subsequence analysis.
	UseToReachingDefs[Elem].clear();
	}
	}

	/// Based on the use to defs information (in ADRPMode), compute the
	/// opportunities of LOH ADRP-related.
	static void computeADRP(const InstrToInstrs &UseToDefs,
	AArch64FunctionInfo &AArch64FI,
	const MachineDominatorTree *MDT) {
	DEBUG(dbgs() << "*** Compute LOH for ADRP\n");
	for (const auto &Entry : UseToDefs) {
	unsigned Size = Entry.second.size();
	if (Size == 0)
	continue;
	if (Size == 1) {
	const MachineInstr L2 = Entry.second.begin();
	const MachineInstr *L1 = Entry.first;
	if (!MDT->dominates(L2, L1)) {
	DEBUG(dbgs() << "Dominance check failed:\n" << L2 << '\n' << L1
	<< '\n');
	continue;
	}
	DEBUG(dbgs() << "Record AdrpAdrp:\n" << L2 << '\n' << L1 << '\n');
	AArch64FI.addLOHDirective(MCLOH_AdrpAdrp, {L2, L1});
	++NumADRPSimpleCandidate;
	}
	#ifndef NDEBUG
	else if (Size == 2)
	++NumADRPComplexCandidate2;
	else if (Size == 3)
	++NumADRPComplexCandidate3;
	else
	++NumADRPComplexCandidateOther;
	#endif
	// if Size < 1, the use should have been removed from the candidates
	assert(Size >= 1 && "No reaching defs for that use!");
	}			}
	}			}

	/// Check whether the given instruction can be the end of a LOH chain			/// Check whether the given instruction can be the end of a LOH chain
	/// involving a load.			/// involving a load.
	static bool isCandidateLoad(const MachineInstr *Instr) {			static bool isCandidateLoad(const MachineInstr &MI) {
	switch (Instr->getOpcode()) {			switch (MI.getOpcode()) {
	default:			default:
	return false;			return false;
	case AArch64::LDRSBWui:			case AArch64::LDRSBWui:
	case AArch64::LDRSBXui:			case AArch64::LDRSBXui:
	case AArch64::LDRSHWui:			case AArch64::LDRSHWui:
	case AArch64::LDRSHXui:			case AArch64::LDRSHXui:
	case AArch64::LDRSWui:			case AArch64::LDRSWui:
	case AArch64::LDRBui:			case AArch64::LDRBui:
	case AArch64::LDRHui:			case AArch64::LDRHui:
	case AArch64::LDRWui:			case AArch64::LDRWui:
	case AArch64::LDRXui:			case AArch64::LDRXui:
	case AArch64::LDRSui:			case AArch64::LDRSui:
	case AArch64::LDRDui:			case AArch64::LDRDui:
	case AArch64::LDRQui:			case AArch64::LDRQui:
	if (Instr->getOperand(2).getTargetFlags() & AArch64II::MO_GOT)			return !(MI.getOperand(2).getTargetFlags() & AArch64II::MO_GOT);
	return false;
	return true;
	}			}
	// Unreachable.
	return false;
	}			}

	/// Check whether the given instruction can load a litteral.			/// Check whether the given instruction can load a litteral.
	static bool supportLoadFromLiteral(const MachineInstr *Instr) {			static bool supportLoadFromLiteral(const MachineInstr &MI) {
	switch (Instr->getOpcode()) {			switch (MI.getOpcode()) {
	default:			default:
	return false;			return false;
	case AArch64::LDRSWui:			case AArch64::LDRSWui:
	case AArch64::LDRWui:			case AArch64::LDRWui:
	case AArch64::LDRXui:			case AArch64::LDRXui:
	case AArch64::LDRSui:			case AArch64::LDRSui:
	case AArch64::LDRDui:			case AArch64::LDRDui:
	case AArch64::LDRQui:			case AArch64::LDRQui:
	return true;			return true;
	}			}
	// Unreachable.
	return false;
	}			}

	/// Check whether the given instruction is a LOH candidate.			/// Number of GPR registers traked by mapRegToGPRIndex()
	/// \param UseToDefs is used to check that Instr is at the end of LOH supported			static const unsigned N_GPR_REGS = 31;
	/// chain.			/// Map register number to index from 0-30.
	/// \pre UseToDefs contains only on def per use, i.e., obvious non candidate are			static int mapRegToGPRIndex(unsigned Reg) {
	/// already been filtered out.			static_assert(AArch64::X28 - AArch64::X0 + 3 == N_GPR_REGS, "Number of GPRs");
	static bool isCandidate(const MachineInstr *Instr,			static_assert(AArch64::W30 - AArch64::W0 + 1 == N_GPR_REGS, "Number of GPRs");
	const InstrToInstrs &UseToDefs,			if (AArch64::X0 <= Reg && Reg <= AArch64::X28)
	const MachineDominatorTree *MDT) {			return Reg - AArch64::X0;
	if (!isCandidateLoad(Instr) && !isCandidateStore(Instr))			if (AArch64::W0 <= Reg && Reg <= AArch64::W30)
	return false;			return Reg - AArch64::W0;
				// TableGen gives "FP" and "LR" an index not adjacent to X28 so we have to
	const MachineInstr Def = UseToDefs.find(Instr)->second.begin();			// handle them as special cases.
	if (Def->getOpcode() != AArch64::ADRP) {			if (Reg == AArch64::FP)
	// At this point, Def is ADDXri or LDRXui of the right type of			return 29;
	// symbol, because we filtered out the uses that were not defined			if (Reg == AArch64::LR)
	// by these kind of instructions (+ ADRP).			return 30;
				return -1;
	// Check if this forms a simple chain: each intermediate node must			}
	// dominates the next one.
	if (!MDT->dominates(Def, Instr))			/// State tracked per physical register.
	return false;			struct LOHInfo {
				qcolombetUnsubmitted Done Reply Inline Actions Given how the code is structured, i.e., this definition and all the helper functions before the main algorithm, I would add a comment saying that this structure will be populated via a bottom up traversal of the basic blocks. Then, a LOH will be recorded when we reach an ADRP with a candidate chain (or chains when we have the ADRP_ADRP case on top of the other). qcolombet: Given how the code is structured, i.e., this definition and all the helper functions before the…
	// Move one node up in the simple chain.			MCLOHType Type : 8; ///< "Best" type of LOH possible.
	if (UseToDefs.find(Def) ==			bool IsCandidate : 1; ///< Possible LOH candidate.
	UseToDefs.end()			bool OneUser : 1; ///< Found exactly one user (yet).
	// The map may contain garbage we have to ignore.			bool MultiUsers : 1; ///< Found multiple users.
				qcolombetUnsubmitted Done Reply Inline Actions I wouldn't bother defining bitfield size. qcolombet: I wouldn't bother defining bitfield size.
	\|\|			const MachineInstr *MI0; ///< First instruction involved in the LOH.
	UseToDefs.find(Def)->second.empty())			const MachineInstr *MI1; ///< Second instruction involved in the LOH
	return false;			/// (if any).
	Instr = Def;			const MachineInstr *LastADRP; ///< Last ADRP in same register.
	Def = *UseToDefs.find(Def)->second.begin();			};
	}
	// Check if we reached the top of the simple chain:
	// - top is ADRP.
	// - check the simple chain property: each intermediate node must
	// dominates the next one.
	if (Def->getOpcode() == AArch64::ADRP)
	return MDT->dominates(Def, Instr);
	return false;
	}

	static bool registerADRCandidate(const MachineInstr &Use,			/// Update state \p Info given \p MI uses the tracked register.
	const InstrToInstrs &UseToDefs,			static void handleUse(const MachineInstr &MI, LOHInfo &Info) {
	const InstrToInstrs *DefsPerColorToUses,			// We have multiple uses if we already found one before.
	AArch64FunctionInfo &AArch64FI,			if (Info.MultiUsers \|\| Info.OneUser) {
	SetOfMachineInstr *InvolvedInLOHs,			Info.IsCandidate = false;
	const MapRegToId &RegToId) {			Info.MultiUsers = true;
	// Look for opportunities to turn ADRP -> ADD or			return;
	// ADRP -> LDR GOTPAGEOFF into ADR.
	// If ADRP has more than one use. Give up.
	if (Use.getOpcode() != AArch64::ADDXri &&
	(Use.getOpcode() != AArch64::LDRXui \|\|
	!(Use.getOperand(2).getTargetFlags() & AArch64II::MO_GOT)))
	return false;
	InstrToInstrs::const_iterator It = UseToDefs.find(&Use);
	// The map may contain garbage that we need to ignore.
	if (It == UseToDefs.end() \|\| It->second.empty())
	return false;
	const MachineInstr &Def = **It->second.begin();
	if (Def.getOpcode() != AArch64::ADRP)
	return false;
	// Check the number of users of ADRP.
	const SetOfMachineInstr *Users =
	getUses(DefsPerColorToUses,
	RegToId.find(Def.getOperand(0).getReg())->second, Def);
	if (Users->size() > 1) {
	++NumADRComplexCandidate;
	return false;
	}
	++NumADRSimpleCandidate;
	assert((!InvolvedInLOHs \|\| InvolvedInLOHs->insert(&Def)) &&
	"ADRP already involved in LOH.");
	assert((!InvolvedInLOHs \|\| InvolvedInLOHs->insert(&Use)) &&
	"ADD already involved in LOH.");
	DEBUG(dbgs() << "Record AdrpAdd\n" << Def << '\n' << Use << '\n');

	AArch64FI.addLOHDirective(
	Use.getOpcode() == AArch64::ADDXri ? MCLOH_AdrpAdd : MCLOH_AdrpLdrGot,
	{&Def, &Use});
	return true;
	}			}
				Info.OneUser = true;

	/// Based on the use to defs information (in non-ADRPMode), compute the			// Start new LOHInfo if applicable.
	/// opportunities of LOH non-ADRP-related			if (isCandidateLoad(MI)) {
	static void computeOthers(const InstrToInstrs &UseToDefs,			Info.Type = MCLOH_AdrpLdr;
	const InstrToInstrs *DefsPerColorToUses,			Info.IsCandidate = true;
	AArch64FunctionInfo &AArch64FI, const MapRegToId &RegToId,			Info.MI0 = &MI;
	const MachineDominatorTree *MDT) {			// Note that even this is AdrpLdr now, we can switch to a Ldr variant
	SetOfMachineInstr *InvolvedInLOHs = nullptr;			// later.
	#ifndef NDEBUG			} else if (isCandidateStore(MI)) {
	SetOfMachineInstr InvolvedInLOHsStorage;			Info.Type = MCLOH_AdrpAddStr;
	InvolvedInLOHs = &InvolvedInLOHsStorage;			Info.IsCandidate = true;
	#endif // NDEBUG			Info.MI0 = &MI;
	DEBUG(dbgs() << "*** Compute LOH for Others\n");			Info.MI1 = nullptr;
	// ADRP -> ADD/LDR -> LDR/STR pattern.			} else if (MI.getOpcode() == AArch64::ADDXri) {
	// Fall back to ADRP -> ADD pattern if we fail to catch the bigger pattern.			Info.Type = MCLOH_AdrpAdd;
				Info.IsCandidate = true;
	// FIXME: When the statistics are not important,			Info.MI0 = &MI;
	// This initial filtering loop can be merged into the next loop.			} else if (MI.getOpcode() == AArch64::LDRXui &&
	// Currently, we didn't do it to have the same code for both DEBUG and			MI.getOperand(2).getTargetFlags() & AArch64II::MO_GOT) {
	// NDEBUG builds. Indeed, the iterator of the second loop would need			Info.Type = MCLOH_AdrpLdrGot;
	// to be changed.			Info.IsCandidate = true;
	SetOfMachineInstr PotentialCandidates;			Info.MI0 = &MI;
	SetOfMachineInstr PotentialADROpportunities;
	for (auto &Use : UseToDefs) {
	// If no definition is available, this is a non candidate.
	if (Use.second.empty())
	continue;
	// Keep only instructions that are load or store and at the end of
	// a ADRP -> ADD/LDR/Nothing chain.
	// We already filtered out the no-chain cases.
	if (!isCandidate(Use.first, UseToDefs, MDT)) {
	PotentialADROpportunities.insert(Use.first);
	continue;
	}			}
	PotentialCandidates.insert(Use.first);
	}			}

	// Make the following distinctions for statistics as the linker does			/// Update state \p Info given the tracked register is clobbered.
	// know how to decode instructions:			static void handleClobber(LOHInfo &Info) {
	// - ADD/LDR/Nothing make there different patterns.			Info.IsCandidate = false;
	// - LDR/STR make two different patterns.			Info.OneUser = false;
	// Hence, 6 - 1 base patterns.			Info.MultiUsers = false;
	// (because ADRP-> Nothing -> STR is not simplifiable)			Info.LastADRP = nullptr;
				}
	// The linker is only able to have a simple semantic, i.e., if pattern A
	// do B.			/// Update state \p Info given that \p MI is possibly the middle instruction
	// However, we want to see the opportunity we may miss if we were able to			/// of an LOH involving 3 instructions.
				qcolombetUnsubmitted Not Done Reply Inline Actions Typo -> s/an/a qcolombet: Typo -> s/an/a
				MatzeBAuthorUnsubmitted Not Done Reply Inline Actions I imagine this as "an 'el' 'oh' 'haitch'" MatzeB: I imagine this as "an 'el' 'oh' 'haitch'"
	// catch more complex cases.			static bool handleMiddleInst(const MachineInstr &MI, LOHInfo &DefInfo,
				LOHInfo &OpInfo) {
	// PotentialCandidates are result of a chain ADRP -> ADD/LDR ->			if (!DefInfo.IsCandidate \|\| (&DefInfo != &OpInfo && OpInfo.OneUser))
	// A potential candidate becomes a candidate, if its current immediate			return false;
	// operand is zero and all nodes of the chain have respectively only one user			// Copy LOHInfo for dest register to LOHInfo for source register.
	#ifndef NDEBUG			if (&DefInfo != &OpInfo) {
	SetOfMachineInstr DefsOfPotentialCandidates;			OpInfo = DefInfo;
	#endif			// Invalidate \p DefInfo because we track it in \p OpInfo now.
	for (const MachineInstr *Candidate : PotentialCandidates) {			handleClobber(DefInfo);
	// Get the definition of the candidate i.e., ADD or LDR.			}
	const MachineInstr Def = UseToDefs.find(Candidate)->second.begin();
	// Record the elements of the chain.			// Advance state machine.
	const MachineInstr *L1 = Def;			assert(OpInfo.IsCandidate && "Expect valid state");
	const MachineInstr *L2 = nullptr;			if (MI.getOpcode() == AArch64::ADDXri) {
	unsigned ImmediateDefOpc = Def->getOpcode();			if (OpInfo.Type == MCLOH_AdrpLdr) {
	if (Def->getOpcode() != AArch64::ADRP) {			OpInfo.Type = MCLOH_AdrpAddLdr;
	// Check the number of users of this node.			OpInfo.IsCandidate = true;
	const SetOfMachineInstr *Users =			OpInfo.MI1 = &MI;
	getUses(DefsPerColorToUses,			return true;
	RegToId.find(Def->getOperand(0).getReg())->second, *Def);			} else if (OpInfo.Type == MCLOH_AdrpAddStr && OpInfo.MI1 == nullptr) {
	if (Users->size() > 1) {			OpInfo.Type = MCLOH_AdrpAddStr;
	#ifndef NDEBUG			OpInfo.IsCandidate = true;
	// if all the uses of this def are in potential candidate, this is			OpInfo.MI1 = &MI;
	// a complex candidate of level 2.			return true;
	bool IsLevel2 = true;
	for (const MachineInstr MI : Users) {
	if (!PotentialCandidates.count(MI)) {
	++NumTooCplxLvl2;
	IsLevel2 = false;
	break;
	}
	}
	if (IsLevel2)
	++NumCplxLvl2;
	#endif // NDEBUG
	PotentialADROpportunities.insert(Def);
	continue;
	}			}
	L2 = Def;			} else {
	Def = *UseToDefs.find(Def)->second.begin();			assert(MI.getOpcode() == AArch64::LDRXui && "Expect LDRXui");
				qcolombetUnsubmitted Not Done Reply Inline Actions I'd also assert that the operand is a GOT entry. qcolombet: I'd also assert that the operand is a GOT entry.
				MatzeBAuthorUnsubmitted Not Done Reply Inline Actions Good point. MatzeB: Good point.
	L1 = Def;			if (OpInfo.Type == MCLOH_AdrpAddStr && OpInfo.MI1 == nullptr) {
	} // else the element in the middle of the chain is nothing, thus			OpInfo.Type = MCLOH_AdrpLdrGotStr;
	// Def already contains the first element of the chain.			OpInfo.IsCandidate = true;
				OpInfo.MI1 = &MI;
	// Check the number of users of the first node in the chain, i.e., ADRP			return true;
	const SetOfMachineInstr *Users =			} else if (OpInfo.Type == MCLOH_AdrpLdr) {
	getUses(DefsPerColorToUses,			OpInfo.Type = MCLOH_AdrpLdrGotLdr;
	RegToId.find(Def->getOperand(0).getReg())->second, *Def);			OpInfo.IsCandidate = true;
	if (Users->size() > 1) {			OpInfo.MI1 = &MI;
	#ifndef NDEBUG			return true;
	// if all the uses of this def are in the defs of the potential candidate,
	// this is a complex candidate of level 1
	if (DefsOfPotentialCandidates.empty()) {
	// lazy init
	DefsOfPotentialCandidates = PotentialCandidates;
	for (const MachineInstr *Candidate : PotentialCandidates) {
	if (!UseToDefs.find(Candidate)->second.empty())
	DefsOfPotentialCandidates.insert(
	*UseToDefs.find(Candidate)->second.begin());
	}
	}
	bool Found = false;
	for (auto &Use : *Users) {
	if (!DefsOfPotentialCandidates.count(Use)) {
	++NumTooCplxLvl1;
	Found = true;
	break;
	}			}
	}			}
	if (!Found)			return false;
	++NumCplxLvl1;
	#endif // NDEBUG
	continue;
	}			}

	bool IsL2Add = (ImmediateDefOpc == AArch64::ADDXri);			/// Update state when seeing and ADRP instruction.
	// If the chain is three instructions long and ldr is the second element,			static void handleADRP(const MachineInstr &MI, AArch64FunctionInfo &AFI,
	// then this ldr must load form GOT, otherwise this is not a correct chain.			LOHInfo &Info) {
	if (L2 && !IsL2Add &&			if (Info.LastADRP != nullptr) {
	!(L2->getOperand(2).getTargetFlags() & AArch64II::MO_GOT))			AFI.addLOHDirective(MCLOH_AdrpAdrp, {&MI, Info.LastADRP});
	continue;			++NumADRPSimpleCandidate;
	SmallVector<const MachineInstr *, 3> Args;			}
	MCLOHType Kind;
	if (isCandidateLoad(Candidate)) {
	if (!L2) {
	// At this point, the candidate LOH indicates that the ldr instruction
	// may use a direct access to the symbol. There is not such encoding
	// for loads of byte and half.
	if (!supportLoadFromLiteral(Candidate))
	continue;

	DEBUG(dbgs() << "Record AdrpLdr:\n" << L1 << '\n' << Candidate			// Produce LOH directive if possible.
	<< '\n');			switch (Info.Type) {
	Kind = MCLOH_AdrpLdr;			case MCLOH_AdrpAdd:
	Args.push_back(L1);			AFI.addLOHDirective(MCLOH_AdrpAdd, {&MI, Info.MI0});
	Args.push_back(Candidate);			++NumADRSimpleCandidate;
	assert((!InvolvedInLOHs \|\| InvolvedInLOHs->insert(L1)) &&			break;
	"L1 already involved in LOH.");			case MCLOH_AdrpLdr:
	assert((!InvolvedInLOHs \|\| InvolvedInLOHs->insert(Candidate)) &&			if (supportLoadFromLiteral(*Info.MI0)) {
	"Candidate already involved in LOH.");			AFI.addLOHDirective(MCLOH_AdrpLdr, {&MI, Info.MI0});
	++NumADRPToLDR;			++NumADRPToLDR;
	} else {			}
	DEBUG(dbgs() << "Record Adrp" << (IsL2Add ? "Add" : "LdrGot")			break;
	<< "Ldr:\n" << L1 << '\n' << L2 << '\n' << *Candidate			case MCLOH_AdrpAddLdr:
	<< '\n');			AFI.addLOHDirective(MCLOH_AdrpAddLdr, {&MI, Info.MI1, Info.MI0});

	Kind = IsL2Add ? MCLOH_AdrpAddLdr : MCLOH_AdrpLdrGotLdr;
	Args.push_back(L1);
	Args.push_back(L2);
	Args.push_back(Candidate);

	PotentialADROpportunities.remove(L2);
	assert((!InvolvedInLOHs \|\| InvolvedInLOHs->insert(L1)) &&
	"L1 already involved in LOH.");
	assert((!InvolvedInLOHs \|\| InvolvedInLOHs->insert(L2)) &&
	"L2 already involved in LOH.");
	assert((!InvolvedInLOHs \|\| InvolvedInLOHs->insert(Candidate)) &&
	"Candidate already involved in LOH.");
	#ifndef NDEBUG
	// get the immediate of the load
	if (Candidate->getOperand(2).getImm() == 0)
	if (ImmediateDefOpc == AArch64::ADDXri)
	++NumADDToLDR;			++NumADDToLDR;
	else			break;
				case MCLOH_AdrpAddStr:
				if (Info.MI1 != nullptr) {
				AFI.addLOHDirective(MCLOH_AdrpAddStr, {&MI, Info.MI1, Info.MI0});
				++NumADDToSTR;
				}
				break;
				case MCLOH_AdrpLdrGotLdr:
				AFI.addLOHDirective(MCLOH_AdrpLdrGotLdr, {&MI, Info.MI1, Info.MI0});
	++NumLDRToLDR;			++NumLDRToLDR;
	else if (ImmediateDefOpc == AArch64::ADDXri)			break;
	++NumADDToLDRWithImm;			case MCLOH_AdrpLdrGotStr:
	else			AFI.addLOHDirective(MCLOH_AdrpLdrGotStr, {&MI, Info.MI1, Info.MI0});
	++NumLDRToLDRWithImm;			++NumLDRToSTR;
	#endif // NDEBUG			break;
				case MCLOH_AdrpLdrGot:
				AFI.addLOHDirective(MCLOH_AdrpLdrGot, {&MI, Info.MI0});
				break;
				case MCLOH_AdrpAdrp:
				llvm_unreachable("MCLOH_AdrpAdrp not used in state machine");
	}			}
	} else {
	if (ImmediateDefOpc == AArch64::ADRP)
	continue;
	else {

	DEBUG(dbgs() << "Record Adrp" << (IsL2Add ? "Add" : "LdrGot")			handleClobber(Info);
	<< "Str:\n" << L1 << '\n' << L2 << '\n' << *Candidate			Info.LastADRP = &MI;
	<< '\n');

	Kind = IsL2Add ? MCLOH_AdrpAddStr : MCLOH_AdrpLdrGotStr;
	Args.push_back(L1);
	Args.push_back(L2);
	Args.push_back(Candidate);

	PotentialADROpportunities.remove(L2);
	assert((!InvolvedInLOHs \|\| InvolvedInLOHs->insert(L1)) &&
	"L1 already involved in LOH.");
	assert((!InvolvedInLOHs \|\| InvolvedInLOHs->insert(L2)) &&
	"L2 already involved in LOH.");
	assert((!InvolvedInLOHs \|\| InvolvedInLOHs->insert(Candidate)) &&
	"Candidate already involved in LOH.");
	#ifndef NDEBUG
	// get the immediate of the store
	if (Candidate->getOperand(2).getImm() == 0)
	if (ImmediateDefOpc == AArch64::ADDXri)
	++NumADDToSTR;
	else
	++NumLDRToSTR;
	else if (ImmediateDefOpc == AArch64::ADDXri)
	++NumADDToSTRWithImm;
	else
	++NumLDRToSTRWithImm;
	#endif // DEBUG
	}
	}
	AArch64FI.addLOHDirective(Kind, Args);
	}

	// Now, we grabbed all the big patterns, check ADR opportunities.
	for (const MachineInstr *Candidate : PotentialADROpportunities)
	registerADRCandidate(*Candidate, UseToDefs, DefsPerColorToUses, AArch64FI,
	InvolvedInLOHs, RegToId);
	}

	/// Look for every register defined by potential LOHs candidates.
	/// Map these registers with dense id in @p RegToId and vice-versa in
	/// @p IdToReg. @p IdToReg is populated only in DEBUG mode.
	static void collectInvolvedReg(const MachineFunction &MF, MapRegToId &RegToId,
	MapIdToReg &IdToReg,
	const TargetRegisterInfo *TRI) {
	unsigned CurRegId = 0;
	if (!PreCollectRegister) {
	unsigned NbReg = TRI->getNumRegs();
	for (; CurRegId < NbReg; ++CurRegId) {
	RegToId[CurRegId] = CurRegId;
	DEBUG(IdToReg.push_back(CurRegId));
	DEBUG(assert(IdToReg[CurRegId] == CurRegId && "Reg index mismatches"));
	}			}

				static void handleRegMaskClobber(const uint32_t *RegMask, MCPhysReg Reg,
				LOHInfo *LOHInfos) {
				if (!MachineOperand::clobbersPhysReg(RegMask, Reg))
	return;			return;
				int Idx = mapRegToGPRIndex(Reg);
				if (Idx >= 0)
				handleClobber(LOHInfos[Idx]);
	}			}

	DEBUG(dbgs() << "** Collect Involved Register\n");			static void handleNormalInst(const MachineInstr &MI, LOHInfo *LOHInfos) {
	for (const auto &MBB : MF) {			// Handle defs and regmasks.
	for (const MachineInstr &MI : MBB) {			for (const MachineOperand &MO : MI.operands()) {
	if (!canDefBePartOfLOH(&MI) &&			if (MO.isRegMask()) {
	!isCandidateLoad(&MI) && !isCandidateStore(&MI))			const uint32_t *RegMask = MO.getRegMask();
	continue;			for (MCPhysReg Reg : AArch64::GPR32RegClass)
				handleRegMaskClobber(RegMask, Reg, LOHInfos);
	// Process defs			for (MCPhysReg Reg : AArch64::GPR64RegClass)
	for (MachineInstr::const_mop_iterator IO = MI.operands_begin(),			handleRegMaskClobber(RegMask, Reg, LOHInfos);
	IOEnd = MI.operands_end();
	IO != IOEnd; ++IO) {
	if (!IO->isReg() \|\| !IO->isDef())
	continue;			continue;
	unsigned CurReg = IO->getReg();
	for (MCRegAliasIterator AI(CurReg, TRI, true); AI.isValid(); ++AI)
	if (RegToId.find(*AI) == RegToId.end()) {
	DEBUG(IdToReg.push_back(*AI);
	assert(IdToReg[CurRegId] == *AI &&
	"Reg index mismatches insertion index."));
	RegToId[*AI] = CurRegId++;
	DEBUG(dbgs() << "Register: " << PrintReg(*AI, TRI) << '\n');
	}
	}			}
				if (!MO.isReg() \|\| !MO.isDef())
				continue;
				int Idx = mapRegToGPRIndex(MO.getReg());
				if (Idx < 0)
				continue;
				handleClobber(LOHInfos[Idx]);
	}			}
				// Handle uses.
				for (const MachineOperand &MO : MI.uses()) {
				if (!MO.isReg() \|\| !MO.readsReg())
				continue;
				int Idx = mapRegToGPRIndex(MO.getReg());
				if (Idx < 0)
				continue;
				handleUse(MI, LOHInfos[Idx]);
	}			}
	}			}

	bool AArch64CollectLOH::runOnMachineFunction(MachineFunction &MF) {			bool AArch64CollectLOH::runOnMachineFunction(MachineFunction &MF) {
	if (skipFunction(*MF.getFunction()))			if (skipFunction(*MF.getFunction()))
	return false;			return false;

	const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
	const MachineDominatorTree *MDT = &getAnalysis<MachineDominatorTree>();

	MapRegToId RegToId;
	MapIdToReg IdToReg;
	AArch64FunctionInfo *AArch64FI = MF.getInfo<AArch64FunctionInfo>();
	assert(AArch64FI && "No MachineFunctionInfo for this function!");

	DEBUG(dbgs() << "Looking for LOH in " << MF.getName() << '\n');			DEBUG(dbgs() << "Looking for LOH in " << MF.getName() << '\n');

	collectInvolvedReg(MF, RegToId, IdToReg, TRI);			LOHInfo LOHInfos[N_GPR_REGS];
	if (RegToId.empty())			AArch64FunctionInfo &AFI = *MF.getInfo<AArch64FunctionInfo>();
	return false;			for (const MachineBasicBlock &MBB : MF) {
				// Reset register tracking state.
	MachineInstr *DummyOp = nullptr;			memset(LOHInfos, 0, sizeof(LOHInfos));
	if (BasicBlockScopeOnly) {			// Live-out registers are used.
	const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();			for (const MachineBasicBlock *Succ : MBB.successors()) {
	// For local analysis, create a dummy operation to record uses that are not			for (const auto &LI : Succ->liveins()) {
	// local.			int RegIdx = mapRegToGPRIndex(LI.PhysReg);
	DummyOp = MF.CreateMachineInstr(TII->get(AArch64::COPY), DebugLoc());			if (RegIdx >= 0)
				LOHInfos[RegIdx].OneUser = true;
				}
	}			}

	unsigned NbReg = RegToId.size();			// Walk the basic block backwards and update the per register state machine
	bool Modified = false;			// in the process.
				for (const MachineInstr &MI : make_range(MBB.rbegin(), MBB.rend())) {
	// Start with ADRP.			unsigned Opcode = MI.getOpcode();
	InstrToInstrs *ColorOpToReachedUses = new InstrToInstrs[NbReg];			switch (Opcode) {
				case AArch64::ADDXri:
	// Compute the reaching def in ADRP mode, meaning ADRP definitions			case AArch64::LDRXui:
	// are first considered as uses.			if (canDefBePartOfLOH(MI)) {
	reachingDef(MF, ColorOpToReachedUses, RegToId, true, DummyOp);			const MachineOperand &Def = MI.getOperand(0);
	DEBUG(dbgs() << "ADRP reaching defs\n");			const MachineOperand &Op = MI.getOperand(1);
	DEBUG(printReachingDef(ColorOpToReachedUses, NbReg, TRI, IdToReg));			assert(Def.isReg() && Def.isDef() && "Expected reg def");
				assert(Op.isReg() && Op.isUse() && "Expected reg use");
	// Translate the definition to uses map into a use to definitions map to ease			int DefIdx = mapRegToGPRIndex(Def.getReg());
	// statistic computation.			int OpIdx = mapRegToGPRIndex(Op.getReg());
	InstrToInstrs ADRPToReachingDefs;			if (DefIdx >= 0 && OpIdx >= 0 &&
	reachedUsesToDefs(ADRPToReachingDefs, ColorOpToReachedUses, RegToId, true);			handleMiddleInst(MI, LOHInfos[DefIdx], LOHInfos[OpIdx]))
				continue;
	// Compute LOH for ADRP.			}
	computeADRP(ADRPToReachingDefs, *AArch64FI, MDT);			break;
	delete[] ColorOpToReachedUses;			case AArch64::ADRP:
				const MachineOperand &Op0 = MI.getOperand(0);
	// Continue with general ADRP -> ADD/LDR -> LDR/STR pattern.			int Idx = mapRegToGPRIndex(Op0.getReg());
	ColorOpToReachedUses = new InstrToInstrs[NbReg];			if (Idx >= 0) {
				handleADRP(MI, AFI, LOHInfos[Idx]);
	// first perform a regular reaching def analysis.			continue;
	reachingDef(MF, ColorOpToReachedUses, RegToId, false, DummyOp);			}
	DEBUG(dbgs() << "All reaching defs\n");			break;
	DEBUG(printReachingDef(ColorOpToReachedUses, NbReg, TRI, IdToReg));			}
				qcolombetUnsubmitted Not Done Reply Inline Actions Shouldn't we have a default case to avoid warnings? qcolombet: Shouldn't we have a default case to avoid warnings?
				MatzeBAuthorUnsubmitted Not Done Reply Inline Actions We're switching over `unsigned Opcode` and don't get a warning. MatzeB: We're switching over `unsigned Opcode` and don't get a warning.
				handleNormalInst(MI, LOHInfos);
	// Turn that into a use to defs to ease statistic computation.			}
	InstrToInstrs UsesToReachingDefs;			}
	reachedUsesToDefs(UsesToReachingDefs, ColorOpToReachedUses, RegToId, false);

	// Compute other than AdrpAdrp LOH.
	computeOthers(UsesToReachingDefs, ColorOpToReachedUses, *AArch64FI, RegToId,
	MDT);
	delete[] ColorOpToReachedUses;

	if (BasicBlockScopeOnly)
	MF.DeleteMachineInstr(DummyOp);

	return Modified;			// Return "no change": The pass only collects information.
				return false;
	}			}

	/// createAArch64CollectLOHPass - returns an instance of the Statistic for
	/// linker optimization pass.
	FunctionPass *llvm::createAArch64CollectLOHPass() {			FunctionPass *llvm::createAArch64CollectLOHPass() {
	return new AArch64CollectLOH();			return new AArch64CollectLOH();
	}			}

test/CodeGen/AArch64/arm64-collect-loh-garbage-crash.ll

	; RUN: llc -mtriple=arm64-apple-ios -O3 -aarch64-enable-collect-loh -aarch64-collect-loh-bb-only=true -aarch64-collect-loh-pre-collect-register=false < %s -o - \| FileCheck %s			; RUN: llc -o - %s -mtriple=arm64-apple-ios -O3 -aarch64-enable-collect-loh \| FileCheck %s
	; Check that the LOH analysis does not crash when the analysed chained			; Check that the LOH analysis does not crash when the analysed chained
	; contains instructions that are filtered out.			; contains instructions that are filtered out.
	;			;
	; Before the fix for <rdar://problem/16041712>, these cases were removed			; Before the fix for <rdar://problem/16041712>, these cases were removed
	; from the main container. Now, the deterministic container does not allow			; from the main container. Now, the deterministic container does not allow
	; to remove arbitrary values, so we have to live with garbage values.			; to remove arbitrary values, so we have to live with garbage values.
	; <rdar://problem/16041712>			; <rdar://problem/16041712>

	Show All 28 Lines

test/CodeGen/AArch64/arm64-collect-loh-str.ll

	; RUN: llc -mtriple=arm64-apple-ios -O2 -aarch64-enable-collect-loh -aarch64-collect-loh-bb-only=false < %s -o - \| FileCheck %s			; RUN: llc -o - %s -mtriple=arm64-apple-ios -O2 \| FileCheck %s
	; Test case for <rdar://problem/15942912>.			; Test case for <rdar://problem/15942912>.
	; AdrpAddStr cannot be used when the store uses same			; AdrpAddStr cannot be used when the store uses same
	; register as address and value. Indeed, the related			; register as address and value. Indeed, the related
	; if applied, may completely remove the definition or			; if applied, may completely remove the definition or
	; at least provide a wrong one (with the offset folded			; at least provide a wrong one (with the offset folded
	; into the definition).			; into the definition).

	%struct.anon = type { i32, i32* }			%struct.anon = type { i32, i32* }
	Show All 14 Lines

test/CodeGen/AArch64/arm64-collect-loh.ll

	; RUN: llc -mtriple=arm64-apple-ios -O2 -aarch64-enable-collect-loh -aarch64-collect-loh-bb-only=false < %s -o - \| FileCheck %s			; RUN: llc -o - %s -mtriple=arm64-apple-ios -O2 \| FileCheck %s
	; RUN: llc -mtriple=arm64-linux-gnu -O2 -aarch64-enable-collect-loh -aarch64-collect-loh-bb-only=false < %s -o - \| FileCheck %s --check-prefix=CHECK-ELF			; RUN: llc -o - %s -mtriple=arm64-linux-gnu -O2 \| FileCheck %s --check-prefix=CHECK-ELF

	; CHECK-ELF-NOT: .loh			; CHECK-ELF-NOT: .loh
	; CHECK-ELF-NOT: AdrpAdrp			; CHECK-ELF-NOT: AdrpAdrp
	; CHECK-ELF-NOT: AdrpAdd			; CHECK-ELF-NOT: AdrpAdd
	; CHECK-ELF-NOT: AdrpLdrGot			; CHECK-ELF-NOT: AdrpLdrGot

	@a = internal unnamed_addr global i32 0, align 4			@a = internal unnamed_addr global i32 0, align 4
	@b = external global i32			@b = external global i32
	▲ Show 20 Lines • Show All 618 Lines • ▼ Show 20 Lines
	; Indeed the tuple register can be tracked because of			; Indeed the tuple register can be tracked because of
	; one of its element, but the other elements of the tuple			; one of its element, but the other elements of the tuple
	; do not need to be tracked and we used to assert on that.			; do not need to be tracked and we used to assert on that.
	; Note: The test case is fragile in the sense that we need			; Note: The test case is fragile in the sense that we need
	; a tuple register to appear in the lowering. Thus, the target			; a tuple register to appear in the lowering. Thus, the target
	; cpu is required to have the problem reproduced.			; cpu is required to have the problem reproduced.
	; CHECK-LABEL: _uninterestingSub			; CHECK-LABEL: _uninterestingSub
	; CHECK: adrp [[ADRP_REG:x[0-9]+]], [[CONSTPOOL:lCPI[0-9]+_[0-9]+]]@PAGE			; CHECK: adrp [[ADRP_REG:x[0-9]+]], [[CONSTPOOL:lCPI[0-9]+_[0-9]+]]@PAGE
	; CHECK-NEXT: ldr q[[IDX:[0-9]+]], {{\[}}[[ADRP_REG]], [[CONSTPOOL]]@PAGEOFF]			; CHECK: ldr q[[IDX:[0-9]+]], {{\[}}[[ADRP_REG]], [[CONSTPOOL]]@PAGEOFF]
				qcolombetUnsubmitted Not Done Reply Inline Actions Why is this not on the next line anymore? qcolombet: Why is this not on the next line anymore?
				MatzeBAuthorUnsubmitted Not Done Reply Inline Actions This example contained a perfectly fine LOH opportunity that wasn't catched before (probably because the old algorithm bailed out on the double q reg and didn't resume properly on the LDR address). I updated the test to make it apparent. MatzeB: This example contained a perfectly fine LOH opportunity that wasn't catched before (probably…
	; The tuple comes from the next instruction.			; The tuple comes from the next instruction.
	; CHECK-NEXT: tbl.16b v{{[0-9]+}}, { v{{[0-9]+}}, v{{[0-9]+}} }, v[[IDX]]			; CHECK-NEXT: tbl.16b v{{[0-9]+}}, { v{{[0-9]+}}, v{{[0-9]+}} }, v[[IDX]]
	; CHECK: ret			; CHECK: ret
	define void @uninterestingSub(i8* nocapture %row) #0 {			define void @uninterestingSub(i8* nocapture %row) #0 {
	%tmp = bitcast i8* %row to <16 x i8>*			%tmp = bitcast i8* %row to <16 x i8>*
	%tmp1 = load <16 x i8>, <16 x i8>* %tmp, align 16			%tmp1 = load <16 x i8>, <16 x i8>* %tmp, align 16
	%vext43 = shufflevector <16 x i8> <i8 undef, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>, <16 x i8> %tmp1, <16 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>			%vext43 = shufflevector <16 x i8> <i8 undef, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>, <16 x i8> %tmp1, <16 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>
	%add.i.414 = add <16 x i8> zeroinitializer, %vext43			%add.i.414 = add <16 x i8> zeroinitializer, %vext43
	Show All 13 Lines
	@.str.90 = external unnamed_addr constant [5 x i8], align 1			@.str.90 = external unnamed_addr constant [5 x i8], align 1
	; CHECK-LABEL: test_r274582			; CHECK-LABEL: test_r274582
	define void @test_r274582() {			define void @test_r274582() {
	entry:			entry:
	br i1 undef, label %if.then.i, label %if.end.i			br i1 undef, label %if.then.i, label %if.end.i
	if.then.i:			if.then.i:
	ret void			ret void
	if.end.i:			if.end.i:
	; CHECK: .loh AdrpAdrp Lloh91, Lloh93			; CHECK-DAG: .loh AdrpAdrp
	; CHECK: .loh AdrpLdr Lloh91, Lloh92			; CHECK-DAG: .loh AdrpLdr
	; CHECK: .loh AdrpLdrGot Lloh93, Lloh95			; CHECK-DAG: .loh AdrpLdrGot
	; CHECK: .loh AdrpLdrGot Lloh94, Lloh96			; CHECK-DAG: .loh AdrpLdrGot
				qcolombetUnsubmitted Not Done Reply Inline Actions There shouldn't be any nondeterminism in the output, so I would rather stick to whatever order you get now. qcolombet: There shouldn't be any nondeterminism in the output, so I would rather stick to whatever order…
				MatzeBAuthorUnsubmitted Not Done Reply Inline Actions ok. MatzeB: ok.
	%mul.i.i.i = fmul double undef, 1.000000e-06			%mul.i.i.i = fmul double undef, 1.000000e-06
	%add.i.i.i = fadd double undef, %mul.i.i.i			%add.i.i.i = fadd double undef, %mul.i.i.i
	%sub.i.i = fsub double %add.i.i.i, undef			%sub.i.i = fsub double %add.i.i.i, undef
	call void (i8, ...) @callee(i8 getelementptr inbounds ([12 x i8], [12 x i8]* @.str.89, i64 0, i64 0), i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str.90, i64 0, i64 0), double %sub.i.i)			call void (i8, ...) @callee(i8 getelementptr inbounds ([12 x i8], [12 x i8]* @.str.89, i64 0, i64 0), i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str.90, i64 0, i64 0), double %sub.i.i)
	unreachable			unreachable
	}			}
	declare void @callee(i8* nocapture readonly, ...)			declare void @callee(i8* nocapture readonly, ...)

	attributes #0 = { "target-cpu"="cyclone" }			attributes #0 = { "target-cpu"="cyclone" }