This is an archive of the discontinued LLVM Phabricator instance.

ScheduleDAGInstrs::buildSchedGraph() rewritten.
AbandonedPublic

Authored by jonpa on Feb 24 2015, 2:17 AM.

Download Raw Diff

Details

Reviewers

atrick
hfinkel

Summary

The buildSchedGraph() was in need of reworking as the AA features had been added on top of earlier code, and it was very difficult to undertand, and buggy.

What was really awkward before was that a dependency would be checked, and then the potential successor could instead of becoming the successor be put into RejectMemNodes, while otherwise being forgotten about. It was then very difficult to know what to do to make sure later nodes actually do all the right dependency checks. RejectMemNodes was checked basically all the time, which was slow, and not so clever because in that set all information about SUs, such as mapping to Values were lost.

New design:

Basically, I have removed RejectMemNodes, adjustChainDeps(), iterateChainSucc(), and the AliasChain concept. An unknown store used to become the AliasChain, but now becomes a store mapped to 'unknownValue'. RejectMemNodes and adjustChainDeps() used to be a safety-net for everything, but this is not needed anymore because the lists of SUs mapped from any Value are not cleared.

There are now four maps: Stores, Loads, NonAliasStores and NonAliasLoads. What used to be PendingLoads is instead the list of SUs mapped from 'unknownValue' in Loads, just like unknown stores. In addition to this, there is the BarrierChain node.

To build memory dependencies, analyze each SUnit in turn, and handle the different cases of 1) global memory object, 2) unknown store, 3) unknown load and (4-7) Value mapped (NonAlias)Stores / (NonAlias)Loads. Each SUnit either becomes the BarrierChain, or is put into one of the maps. For each SUnit encountered, all the information about previous ones are still available, and the proper dependencies are easily checked.

I tried exactly this, and expected this to be significantly slower than the previous implementation, since there is no clearing of any lists or any iteration limits. I was however surprised to find that it seemed to do just as well if not better, on my out-of-tree target test suite (for VLIW architecture). I am guessing it is the brute force approach of RejectMemNodes that is the reason for this.

test suite totals:
buildSchedGraph() MIsNeedChainEdge() calls AA queries Num Edges in DAG (compile time clang)
Old 738465 249456 2829228 (1785.37s)
New 394762 183551 2848443 (1599.43s)

AA queries are those from within MIsNeedChainEdge(). Compile time is for clang invocation, not just the scheduler pass time, and please note as usual that the time results preliminary. The code output was practically identical. Similar results was gotten without AA. The old implementation seemed to also have the worst single compile time.

My main requirements are [for new implementation (Code owner)]

AA-aware dependencies are still off by default

default behavior is (obviously) not impacted

default compile time does not significantly increase

I think the new implementation meets these requirements, but I do not consider my patch to be necessarily ready to commit. Rather, I would like to have some feedback from various target owners which should evaluate it, and then of course I would appreciate any opinions and suggestions for improvements. The point I feel most unsure about is how everyone wants to deal with compile time and huge regions, see below.

Note that test/CodeGen/PowerPC/vec-abi-align.ll currently fails with this patch.

Regarding search bounds and huge regions:

The scheduler will have to compromise with huge regions, although this should be realtively rare. This used to be handled in iterateChainSucc() with a limit on depth. It seems that the general motto so far has been to let the scheduler do its work without time limits on any reasonable code, since depth < 200 is very generous and is loosing to this new implementation, even as it is unbounded.

I have begun to experiment with this, and in my patch I have set a limit on the size on each of the four maps (Stores, Loads, NonAliasStores and NonAliasLoads). This is achieved by reducing the lists contained in a map in order of size. The limit is very high by default (500), but can be controlled with the option -max-build-sched-nodes. The map is reduced in size to 3/4 of its original size when this limit is reached. I found that I could set it to 50 while noticing nearly nothing in the output, and this was an improvement in clang total compile time by ~3% compared to value of 500 (or ~6% compared to original implementation). Of course, this was probably due to just a few big test cases out of the many.

There are different approaches to this, and I list some additional ideas for everyone to consider:

Call addPred() instead of addChainDependency() after some limit of number of calls to addChainDependency(). This would be most similar to the old version.

Set a limit on each list size in a map, instead of on the whole map. A bit simpler.

Very simplistic: make next memory access BarrierChain after a certain number of memory accesses have been analyzed. This might suffice if all we want is to guard against compile time explosion in very rare, super huge cases.

Add one more level of mapping: Value -> TargetRegisterAnalysisResult -> SUs. I suspect that some lists grow very big, but could be split by the target. I for instance had a list of +100 unknown stores, which were all 'no-alias' due to calls to TII->areMemAccessesTriviallyDisjoint(). Instead of reducing this list when it grows too large, it would be better to split it into a map with the key of something like a TargetMemAccessAnalysisResult {register, offset}.

Diff Detail

Event Timeline

jonpa updated this revision to Diff 20573.Feb 24 2015, 2:17 AM

jonpa retitled this revision from to ScheduleDAGInstrs::buildSchedGraph() rewritten..

jonpa updated this object.

jonpa edited the test plan for this revision. (Show Details)

jonpa added reviewers: atrick, hfinkel.Feb 24 2015, 2:29 AM

jonpa added a subscriber: Unknown Object (MLST).

materi added a subscriber: materi.Feb 24 2015, 5:14 AM

Wow! This sounds like really great progress.

include/llvm/CodeGen/ScheduleDAGInstrs.h
78	Indent?
83	Don't add a blank line here.
147	There may be a better option compared to deriving this from std::list; for one thing, std::list tends to have terrible cache locality. Given that the SUs are numbered, and we're limiting the depth to something in the 100s, would using a SparseSet be better (from llvm/ADT/SparseSet.h)? it has constant time clear, ordered vector-speed iteration, and constant-time find/insert/erase.
261	Line too long.
266	SUList &sulist -> SUList &SL (or something like that)
287	unknownValue -> UnknownValue
lib/CodeGen/ScheduleDAGInstrs.cpp
116	Smaller than 0?
143	Line too long.

Minor fixes, as requested.

Regarding SUList and use of std::list:

I have tried a few options to using std::list for SUList, but unfortunately I cannot find any faster alternative:

SparseSet: Quite memory expensive, considering that a map of max size MaxS, will in the worst case contain MaxS lists of SUs, each of size one but with the SparseSet::Universe of SUnits.size(). This will certainly also degrade cache performance. Furthermore, this data structure does not guarantee iteration in the order of insertion, after an element has been erased. This is needed to make repeated FIFO order reductions of the list.
CircularSmallVector: I tried to derive a vector from SmallVector that was reducible in FIFO order without needless re-allocation. The class uses Head and Tail indexes into the vector and instead of erasing elements, a Tail is introduced, which may wrap around after repeated calls to reduce(). It was however marginally slower than std::list.
SmallVector: Just using a vector was of course very slow, due to all the copying during reductions.

FIFO order reduction should be desireble, as in a region with many unknown stores and loads, it would probably be better to avoid close dependencies rather than between SUs with very different NodeNums (far apart in MBB).
It does not matter in which order the list is iterated over, and indexed access is not needed.

If anyone has a good idea on this, or wants to see my circular vector, let me know. Until then, std::list remains.

In D7850#131115, @jonpa wrote:

Minor fixes, as requested.

Regarding SUList and use of std::list:

I have tried a few options to using std::list for SUList, but unfortunately I cannot find any faster alternative:

SparseSet: Quite memory expensive, considering that a map of max size MaxS, will in the worst case contain MaxS lists of SUs, each of size one but with the SparseSet::Universe of SUnits.size(). This will certainly also degrade cache performance. Furthermore, this data structure does not guarantee iteration in the order of insertion, after an element has been erased. This is needed to make repeated FIFO order reductions of the list.

CircularSmallVector: I tried to derive a vector from SmallVector that was reducible in FIFO order without needless re-allocation. The class uses Head and Tail indexes into the vector and instead of erasing elements, a Tail is introduced, which may wrap around after repeated calls to reduce(). It was however marginally slower than std::list.

SmallVector: Just using a vector was of course very slow, due to all the copying during reductions.

FIFO order reduction should be desireble, as in a region with many unknown stores and loads, it would probably be better to avoid close dependencies rather than between SUs with very different NodeNums (far apart in MBB).
It does not matter in which order the list is iterated over, and indexed access is not needed.

If anyone has a good idea on this, or wants to see my circular vector, let me know. Until then, std::list remains.

Okay, fair enough. I'd think that what you really want is a linked list stored in approximately-cache-line-sized groups. Unfortunately, we don't have this data structure. What you have certainly seems like a significant improvement (std::list or not), so I'd not insist on a different data structure now.

Ok, great.

What remains then before I commit this, is that it gets tested on other targets than my own (out-of-tree) VLIW target.

I would be happy to give it a try if no one else has the time, but then I need detailed instructions. The easiest would be if somebody with testing / benchmarking experience with the various targets could try the patch and evaluate compile time and code output and give an okay for the patch.

Don't forget that there is the size limit on the maps to be tested for various values. 50 seemed pretty good for me, at least, and 500 was still faster than the old version :-)

In D7850#131190, @jonpa wrote:

Ok, great.

What remains then before I commit this, is that it gets tested on other targets than my own (out-of-tree) VLIW target.

I apologize for the delay, but I've not had a chance to look at this until today. Unfortunately, the patch no longer cleanly applies. Can you please rebase it?

This issue was unfortunately continued months ago on a new review: D8705. Abandoning this one to avoid confusion.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

ScheduleDAG.h

11 lines

ScheduleDAGInstrs.h

156 lines

lib/

CodeGen/

ScheduleDAGInstrs.cpp

499 lines

Diff 20842

include/llvm/CodeGen/ScheduleDAG.h

Show First 20 Lines • Show All 407 Lines • ▼ Show 20 Lines	MachineInstr *getInstr() const {
return Instr;		return Instr;
}		}

/// addPred - This adds the specified edge as a pred of the current node if		/// addPred - This adds the specified edge as a pred of the current node if
/// not already. It also adds the current node as a successor of the		/// not already. It also adds the current node as a successor of the
/// specified node.		/// specified node.
bool addPred(const SDep &D, bool Required = true);		bool addPred(const SDep &D, bool Required = true);

		/// addPredBarrier - This adds a barrier edge to SU by calling
		/// addPred(), with latency 0 generally or latency 1 for a store
		/// followed by a load.
		bool addPredBarrier(SUnit *SU, unsigned latency = 0) {
		SDep Dep(SU, SDep::Barrier);
		unsigned TrueMemOrderLatency =
		((SU->getInstr()->mayStore() && this->getInstr()->mayLoad()) ? 1 : 0);
		Dep.setLatency(TrueMemOrderLatency);
		return addPred(Dep);
		}

/// removePred - This removes the specified edge as a pred of the current		/// removePred - This removes the specified edge as a pred of the current
/// node if it exists. It also removes the current node as a successor of		/// node if it exists. It also removes the current node as a successor of
/// the specified node.		/// the specified node.
void removePred(const SDep &D);		void removePred(const SDep &D);

/// getDepth - Return the depth of this node, which is the length of the		/// getDepth - Return the depth of this node, which is the length of the
/// maximum path up to any node which has no predecessors.		/// maximum path up to any node which has no predecessors.
unsigned getDepth() const {		unsigned getDepth() const {
▲ Show 20 Lines • Show All 336 Lines • Show Last 20 Lines

include/llvm/CodeGen/ScheduleDAGInstrs.h

Show All 9 Lines
// This file implements the ScheduleDAGInstrs class, which implements		// This file implements the ScheduleDAGInstrs class, which implements
// scheduling for a MachineInstr-based dependency graph.		// scheduling for a MachineInstr-based dependency graph.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CODEGEN_SCHEDULEDAGINSTRS_H		#ifndef LLVM_CODEGEN_SCHEDULEDAGINSTRS_H
#define LLVM_CODEGEN_SCHEDULEDAGINSTRS_H		#define LLVM_CODEGEN_SCHEDULEDAGINSTRS_H

		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SparseMultiSet.h"		#include "llvm/ADT/SparseMultiSet.h"
#include "llvm/ADT/SparseSet.h"		#include "llvm/ADT/SparseSet.h"
#include "llvm/CodeGen/ScheduleDAG.h"		#include "llvm/CodeGen/ScheduleDAG.h"
#include "llvm/CodeGen/TargetSchedule.h"		#include "llvm/CodeGen/TargetSchedule.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Target/TargetRegisterInfo.h"		#include "llvm/Target/TargetRegisterInfo.h"
		#include <list>

namespace llvm {		namespace llvm {
class MachineFrameInfo;		class MachineFrameInfo;
class MachineLoopInfo;		class MachineLoopInfo;
class MachineDominatorTree;		class MachineDominatorTree;
class LiveIntervals;		class LiveIntervals;
class RegPressureTracker;		class RegPressureTracker;
class PressureDiffs;		class PressureDiffs;
Show All 34 Lines	namespace llvm {
/// require a destructor.		/// require a destructor.
typedef SparseSet<VReg2SUnit, VirtReg2IndexFunctor> VReg2SUnitMap;		typedef SparseSet<VReg2SUnit, VirtReg2IndexFunctor> VReg2SUnitMap;

/// Track local uses of virtual registers. These uses are gathered by the DAG		/// Track local uses of virtual registers. These uses are gathered by the DAG
/// builder and may be consulted by the scheduler to avoid iterating an entire		/// builder and may be consulted by the scheduler to avoid iterating an entire
/// vreg use list.		/// vreg use list.
typedef SparseMultiSet<VReg2SUnit, VirtReg2IndexFunctor> VReg2UseMap;		typedef SparseMultiSet<VReg2SUnit, VirtReg2IndexFunctor> VReg2UseMap;

		typedef PointerUnion<const Value , const PseudoSourceValue > ValueType;
		typedef SmallVector<PointerIntPair<ValueType, 1, bool>, 4>
		UnderlyingObjectsVector;
		hfinkelUnsubmitted Not Done Reply Inline Actions Indent? hfinkel: Indent?

/// ScheduleDAGInstrs - A ScheduleDAG subclass for scheduling lists of		/// ScheduleDAGInstrs - A ScheduleDAG subclass for scheduling lists of
/// MachineInstrs.		/// MachineInstrs.
class ScheduleDAGInstrs : public ScheduleDAG {		class ScheduleDAGInstrs : public ScheduleDAG {
protected:		protected:
		hfinkelUnsubmitted Not Done Reply Inline Actions Don't add a blank line here. hfinkel: Don't add a blank line here.
const MachineLoopInfo *MLI;		const MachineLoopInfo *MLI;
const MachineFrameInfo *MFI;		const MachineFrameInfo *MFI;

/// Live Intervals provides reaching defs in preRA scheduling.		/// Live Intervals provides reaching defs in preRA scheduling.
LiveIntervals *LIS;		LiveIntervals *LIS;

/// TargetSchedModel provides an interface to the machine model.		/// TargetSchedModel provides an interface to the machine model.
TargetSchedModel SchedModel;		TargetSchedModel SchedModel;
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	protected:
/// of inside BuildSchedGraph to avoid the need for it to be initialized and		/// of inside BuildSchedGraph to avoid the need for it to be initialized and
/// destructed for each block.		/// destructed for each block.
Reg2SUnitsMap Defs;		Reg2SUnitsMap Defs;
Reg2SUnitsMap Uses;		Reg2SUnitsMap Uses;

/// Track the last instruction in this region defining each virtual register.		/// Track the last instruction in this region defining each virtual register.
VReg2SUnitMap VRegDefs;		VReg2SUnitMap VRegDefs;

/// PendingLoads - Remember where unknown loads are after the most recent		AliasAnalysis *AAForDep;
/// unknown store, as we iterate. As with Defs and Uses, this is here
/// to minimize construction/destruction.		/// A list of SUnits, used in Value2SUsMap, during DAG construction.
std::vector<SUnit *> PendingLoads;		class SUList : public std::list<SUnit *> {
		hfinkelUnsubmitted Not Done Reply Inline Actions There may be a better option compared to deriving this from std::list; for one thing, std::list tends to have terrible cache locality. Given that the SUs are numbered, and we're limiting the depth to something in the 100s, would using a SparseSet be better (from llvm/ADT/SparseSet.h)? it has constant time clear, ordered vector-speed iteration, and constant-time find/insert/erase. hfinkel: There may be a better option compared to deriving this from std::list; for one thing, std::list…
		/// This SUnit reaches all nodes that has gotten pruned from this
		/// list, by call to reduce().
		SUnit *BarrierSU;

		/// Chain dependencies used to enforce memory order should have
		/// latency of 0 (except for true dependency of Store followed by
		/// aliased Load... we estimate that with a single cycle of latency
		/// assuming the hardware will bypass)
		unsigned TrueMemOrderLatency;

		public:

		SUList(unsigned lat) : BarrierSU(nullptr), TrueMemOrderLatency(lat) {}
		SUList() : BarrierSU(nullptr), TrueMemOrderLatency(0) {}

		bool hasBarrierSU() const { return BarrierSU != nullptr; }
		SUnit *getBarrierSU() const { return BarrierSU; }

		unsigned getTrueMemOrderLatency() const { return TrueMemOrderLatency; }

		/// Reduce the size of this list by popping from front (FIFO)
		/// until newSize is reached. The popped nodes are made
		/// successors of the BarrierSU, which was the last node popped.
		void reduce(unsigned newSize);

		/// Clear all previously added SUs from this list.
		void clear() {
		BarrierSU = nullptr;
		std::list<SUnit *>::clear();
		}

		void dump() const;
		};

		/// A map from ValueType to SUList, used during DAG construction,
		/// as a means of remembering which SUs depend on which memory
		/// locations.
		class Value2SUsMap : public MapVector<ValueType, SUList> {

		/// Current total number of SUs in map.
		unsigned NumNodes;

		/// If NumNodes reaches MaxNumNodes, the SU lists will be reduced
		/// as needed.
		void reduce(unsigned newSize);

		/// 1 for loads, 0 for stores. (see comment in SUList)
		unsigned TrueMemOrderLatency;
		public:

		Value2SUsMap(unsigned lat = 0) : NumNodes(0), TrueMemOrderLatency(lat) {}

		/// To keep NumNodes up to date, addToMap() is used instead of
		/// this operator w/ push_back().
		ValueType &operator[](const SUList &Key) { assert(0 && "Don't use"); };

		/// Add SU to the SUList of V. If Map grows huge, reduce its size
		/// by calling reduce().
		void addToMap(SUnit *SU, ValueType V);

		/// Clears the list of SUs mapped to V.
		void clearList(ValueType V) {
		iterator Itr = find(V);
		if (Itr != end()) {
		assert (NumNodes >= Itr->second.size());
		NumNodes -= Itr->second.size();

		Itr->second.clear();
		}
		}

		/// Clears map from all contents.
		void clear() {
		MapVector<ValueType, SUList>::clear();
		NumNodes = 0;
		}

		unsigned getTrueMemOrderLatency() const { return TrueMemOrderLatency; }
		};

		/// Add a chain edge between SUa and SUb, but only if both AliasAnalysis
		/// and Target fail to deny the dependency.
		void addChainDependency(SUnit SUa, SUnit SUb,
		unsigned Latency = 0);

		/// Add dependencies as needed from all SUs in list to SU.
		void addChainDependencies(SUnit *SU, SUList &sus, unsigned Latency) {
		for (auto *su : sus)
		addChainDependency(SU, su, Latency);

		// If this sulist has been reduced, its BarrierSU must also be
		// considered.
		if (sus.hasBarrierSU() && sus.getBarrierSU() != SU)
		sus.getBarrierSU()->addPredBarrier(SU);
		}

		/// Add dependencies as needed from all SUs in list to SU, with
		/// the TrueMemOrderLatency value of sulist.
		void addChainDependencies(SUnit *SU, SUList &sus) {
		addChainDependencies(SU, sus, sus.getTrueMemOrderLatency());
		}

		/// Add dependencies as needed from all SUs in map, to SU.
		void addChainDependencies(SUnit *SU, Value2SUsMap &Val2SUsMap) {
		for (auto &I : Val2SUsMap)
		addChainDependencies(SU, I.second,
		Val2SUsMap.getTrueMemOrderLatency());
		}

		/// Add dependencies as needed to SU, from all SUs mapped to V.
		void addChainDependencies(SUnit *SU, Value2SUsMap &Val2SUsMap,
		ValueType V) {
		Value2SUsMap::iterator Itr = Val2SUsMap.find(V);
		if (Itr != Val2SUsMap.end())
		hfinkelUnsubmitted Not Done Reply Inline Actions Line too long. hfinkel: Line too long.
		addChainDependencies(SU, Itr->second,
		Val2SUsMap.getTrueMemOrderLatency());
		}

		/// Add barrier chains from all SUs in list to BarrierSU and then
		hfinkelUnsubmitted Not Done Reply Inline Actions SUList &sulist -> SUList &SL (or something like that) hfinkel: SUList &sulist -> SUList &SL (or something like that)
		/// clear the list. Called from addBarrierChainsAndClearMap().
		void addBarrierChains(SUnit *BarrierSU, SUList &sus) {
		for (auto *su : sus)
		su->addPredBarrier(BarrierSU);

		if (sus.hasBarrierSU())
		sus.getBarrierSU()->addPredBarrier(BarrierSU);

		sus.clear();
		}

		/// Add barrier chains from all SUs in map to BarrierSU and then
		/// clear the map.
		void addBarrierChainsAndClearMap(SUnit *BarrierSU,
		Value2SUsMap &Val2SUsMap) {
		for (auto &I : Val2SUsMap)
		addBarrierChains(BarrierSU, I.second);

		Val2SUsMap.clear();
		}

		hfinkelUnsubmitted Not Done Reply Inline Actions unknownValue -> UnknownValue hfinkel: unknownValue -> UnknownValue
		/// For an unanalyzable memory access, this Value is used in maps.
		UndefValue *UnknownValue;

/// DbgValues - Remember instruction that precedes DBG_VALUE.		/// DbgValues - Remember instruction that precedes DBG_VALUE.
/// These are generated by buildSchedGraph but persist so they can be		/// These are generated by buildSchedGraph but persist so they can be
/// referenced when emitting the final schedule.		/// referenced when emitting the final schedule.
typedef std::vector<std::pair<MachineInstr , MachineInstr > >		typedef std::vector<std::pair<MachineInstr , MachineInstr > >
DbgValueVector;		DbgValueVector;
DbgValueVector DbgValues;		DbgValueVector DbgValues;
MachineInstr *FirstDbgValue;		MachineInstr *FirstDbgValue;
▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

lib/CodeGen/ScheduleDAGInstrs.cpp

//===---- ScheduleDAGInstrs.cpp - MachineInstr Rescheduling ---------------===//		//===---- ScheduleDAGInstrs.cpp - MachineInstr Rescheduling ---------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This implements the ScheduleDAGInstrs class, which implements re-scheduling		// This implements the ScheduleDAGInstrs class, which implements re-scheduling
// of MachineInstrs.		// of MachineInstrs.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/CodeGen/ScheduleDAGInstrs.h"		#include "llvm/CodeGen/ScheduleDAGInstrs.h"
#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/CodeGen/LiveIntervalAnalysis.h"		#include "llvm/CodeGen/LiveIntervalAnalysis.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineMemOperand.h"		#include "llvm/CodeGen/MachineMemOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/PseudoSourceValue.h"		#include "llvm/CodeGen/PseudoSourceValue.h"
#include "llvm/CodeGen/RegisterPressure.h"		#include "llvm/CodeGen/RegisterPressure.h"
#include "llvm/CodeGen/ScheduleDFS.h"		#include "llvm/CodeGen/ScheduleDFS.h"
		#include "llvm/IR/Function.h"
		#include "llvm/IR/Type.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/MC/MCInstrItineraries.h"		#include "llvm/MC/MCInstrItineraries.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/Format.h"		#include "llvm/Support/Format.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetRegisterInfo.h"		#include "llvm/Target/TargetRegisterInfo.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"
#include <queue>		#include <queue>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "misched"		#define DEBUG_TYPE "misched"

static cl::opt<bool> EnableAASchedMI("enable-aa-sched-mi", cl::Hidden,		static cl::opt<bool> EnableAASchedMI("enable-aa-sched-mi", cl::Hidden,
cl::ZeroOrMore, cl::init(false),		cl::ZeroOrMore, cl::init(false),
cl::desc("Enable use of AA during MI DAG construction"));		cl::desc("Enable use of AA during MI DAG construction"));

static cl::opt<bool> UseTBAA("use-tbaa-in-sched-mi", cl::Hidden,		static cl::opt<bool> UseTBAA("use-tbaa-in-sched-mi", cl::Hidden,
cl::init(true), cl::desc("Enable use of TBAA during MI DAG construction"));		cl::init(true), cl::desc("Enable use of TBAA during MI DAG construction"));

		static cl::opt<unsigned> MaxNumNodes("max-build-sched-nodes", cl::Hidden,
		cl::init(500), cl::desc("Limit the number of SUs in each map in "
		"buildSchedGraph()"));

		void ScheduleDAGInstrs::SUList::dump() const {
		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
		if (BarrierSU != nullptr)
		dbgs() << "BarrierSU: SU(" << BarrierSU->NodeNum << ") + ";
		dbgs() << "{ ";
		for (auto su : this) {
		dbgs() << "SU(" << su->NodeNum << ")";
		if (su != back())
		dbgs() << ", ";
		}
		dbgs() << "}\n";
		#endif
		}

		void ScheduleDAGInstrs::SUList::reduce(unsigned newSize) {
		assert(size() > newSize);
		unsigned toRemove = size() - newSize;

		DEBUG(dbgs() << "SUList reduced from: "; dump(););

		// Remove the elements from beginning of list (fifo), and make
		// the last element removed the new BarrierSU.
		iterator I = begin(), NewBarrierItr = begin();
		advance(NewBarrierItr, toRemove - 1);
		while (I != NewBarrierItr) {
		((I++))->addPredBarrier(NewBarrierItr);
		pop_front();
		}

		// Handle any old BarrierSU and set new BarrierSU.
		if (hasBarrierSU())
		BarrierSU->addPredBarrier(*NewBarrierItr);
		BarrierSU = *NewBarrierItr;
		pop_front(); // pop NewBarrierItr

		DEBUG(dbgs() << "to: "; dump(););
		}

		void ScheduleDAGInstrs::Value2SUsMap::addToMap(SUnit *SU, ValueType V) {
		MapVector::operator[](V).push_back(SU);
		if (++NumNodes > MaxNumNodes)
		reduce(MaxNumNodes * 3 / 4);
		}

		void ScheduleDAGInstrs::Value2SUsMap::reduce(unsigned newSize) {
		assert(NumNodes > newSize);

		DEBUG(dbgs() << "Reducing map, old size: " << NumNodes << "\n");

		// Sort the lists by size.
		std::vector<SUList *> lists;
		for (auto &I : *this)
		lists.push_back(&I.second);
		std::sort(lists.begin(), lists.end(),
		[](const SUList A, const SUList B) {
		return A->size() > B->size(); });

		// Reduce the lists in order of their sizes.
		for (unsigned i = 0; i < lists.size();) {
		if (lists[i]->size() == 0)
		hfinkelUnsubmitted Not Done Reply Inline Actions Smaller than 0? hfinkel: Smaller than 0?
		break;

		// Reduce the current list, which is the biggest one in map.
		unsigned newListSize = lists[i]->size() * 3 / 4;
		NumNodes -= lists[i]->size() - newListSize;
		lists[i]->reduce(newListSize);

		// Continue with next list unless the current one is still bigger.
		if (i + 1 == lists.size() \|\| lists[i]->size() <= lists[i + 1]->size())
		++i;

		// Stop if map has shrunk enough.
		if (NumNodes <= newSize)
		break;
		}

		DEBUG(dbgs() << "New size: " << NumNodes << "\n";);
		}

ScheduleDAGInstrs::ScheduleDAGInstrs(MachineFunction &mf,		ScheduleDAGInstrs::ScheduleDAGInstrs(MachineFunction &mf,
const MachineLoopInfo *mli,		const MachineLoopInfo *mli,
bool IsPostRAFlag, bool RemoveKillFlags,		bool IsPostRAFlag, bool RemoveKillFlags,
LiveIntervals *lis)		LiveIntervals *lis)
: ScheduleDAG(mf), MLI(mli), MFI(mf.getFrameInfo()), LIS(lis),		: ScheduleDAG(mf), MLI(mli), MFI(mf.getFrameInfo()), LIS(lis),
IsPostRA(IsPostRAFlag), RemoveKillFlags(RemoveKillFlags),		IsPostRA(IsPostRAFlag), RemoveKillFlags(RemoveKillFlags),
CanHandleTerminators(false), FirstDbgValue(nullptr) {		CanHandleTerminators(false), AAForDep(nullptr),
		UnknownValue(UndefValue::get(
		hfinkelUnsubmitted Not Done Reply Inline Actions Line too long. hfinkel: Line too long.
		Type::getVoidTy(mf.getFunction()->getContext()))),
		FirstDbgValue(nullptr) {
assert((IsPostRA \|\| LIS) && "PreRA scheduling requires LiveIntervals");		assert((IsPostRA \|\| LIS) && "PreRA scheduling requires LiveIntervals");
DbgValues.clear();		DbgValues.clear();
assert(!(IsPostRA && MRI.getNumVirtRegs()) &&		assert(!(IsPostRA && MRI.getNumVirtRegs()) &&
"Virtual registers must be removed prior to PostRA scheduling");		"Virtual registers must be removed prior to PostRA scheduling");

const TargetSubtargetInfo &ST = mf.getSubtarget();		const TargetSubtargetInfo &ST = mf.getSubtarget();
SchedModel.init(ST.getSchedModel(), &ST, TII);		SchedModel.init(ST.getSchedModel(), &ST, TII);
}		}
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	for (SmallVectorImpl<Value *>::iterator I = Objs.begin(), IE = Objs.end();
continue;		continue;
}		}
}		}
Objects.push_back(const_cast<Value *>(V));		Objects.push_back(const_cast<Value *>(V));
}		}
} while (!Working.empty());		} while (!Working.empty());
}		}

typedef PointerUnion<const Value , const PseudoSourceValue > ValueType;
typedef SmallVector<PointerIntPair<ValueType, 1, bool>, 4>
UnderlyingObjectsVector;

/// getUnderlyingObjectsForInstr - If this machine instr has memory reference		/// getUnderlyingObjectsForInstr - If this machine instr has memory reference
/// information and it can be tracked to a normal reference to a known		/// information and it can be tracked to a normal reference to a known
/// object, return the Value for that object.		/// object, return the Value for that object.
static void getUnderlyingObjectsForInstr(const MachineInstr *MI,		static void getUnderlyingObjectsForInstr(const MachineInstr *MI,
const MachineFrameInfo *MFI,		const MachineFrameInfo *MFI,
UnderlyingObjectsVector &Objects) {		UnderlyingObjectsVector &Objects) {
if (!MI->hasOneMemOperand() \|\|		if (!MI->hasOneMemOperand() \|\|
(!(*MI->memoperands_begin())->getValue() &&		(!(*MI->memoperands_begin())->getValue() &&
▲ Show 20 Lines • Show All 373 Lines • ▼ Show 20 Lines	static bool MIsNeedChainEdge(AliasAnalysis AA, const MachineFrameInfo MFI,
MachineInstr *MIa,		MachineInstr *MIa,
MachineInstr *MIb) {		MachineInstr *MIb) {
const MachineFunction *MF = MIa->getParent()->getParent();		const MachineFunction *MF = MIa->getParent()->getParent();
const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();		const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();

// Cover a trivial case - no edge is need to itself.		// Cover a trivial case - no edge is need to itself.
if (MIa == MIb)		if (MIa == MIb)
return false;		return false;

		// buildSchedGraph() will clear list of stores if not using AA,
		// which means all stores have to be chained then.
		if (!AA && MIa->mayStore() && MIb->mayStore())
		return true;

// Let the target decide if memory accesses cannot possibly overlap.		// Let the target decide if memory accesses cannot possibly overlap.
if ((MIa->mayLoad() \|\| MIa->mayStore()) &&		if ((MIa->mayLoad() \|\| MIa->mayStore()) &&
(MIb->mayLoad() \|\| MIb->mayStore()))		(MIb->mayLoad() \|\| MIb->mayStore()))
if (TII->areMemAccessesTriviallyDisjoint(MIa, MIb, AA))		if (TII->areMemAccessesTriviallyDisjoint(MIa, MIb, AA))
return false;		return false;

// FIXME: Need to handle multiple memory operands to support all targets.		// FIXME: Need to handle multiple memory operands to support all targets.
if (!MIa->hasOneMemOperand() \|\| !MIb->hasOneMemOperand())		if (!MIa->hasOneMemOperand() \|\| !MIb->hasOneMemOperand())
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	AliasAnalysis::AliasResult AAResult = AA->alias(
AliasAnalysis::Location(MMOa->getValue(), Overlapa,		AliasAnalysis::Location(MMOa->getValue(), Overlapa,
UseTBAA ? MMOa->getAAInfo() : AAMDNodes()),		UseTBAA ? MMOa->getAAInfo() : AAMDNodes()),
AliasAnalysis::Location(MMOb->getValue(), Overlapb,		AliasAnalysis::Location(MMOb->getValue(), Overlapb,
UseTBAA ? MMOb->getAAInfo() : AAMDNodes()));		UseTBAA ? MMOb->getAAInfo() : AAMDNodes()));

return (AAResult != AliasAnalysis::NoAlias);		return (AAResult != AliasAnalysis::NoAlias);
}		}

/// This recursive function iterates over chain deps of SUb looking for
/// "latest" node that needs a chain edge to SUa.
static unsigned
iterateChainSucc(AliasAnalysis AA, const MachineFrameInfo MFI,
SUnit SUa, SUnit SUb, SUnit ExitSU, unsigned Depth,
SmallPtrSetImpl<const SUnit*> &Visited) {
if (!SUa \|\| !SUb \|\| SUb == ExitSU)
return *Depth;

// Remember visited nodes.
if (!Visited.insert(SUb).second)
return *Depth;
// If there is _some_ dependency already in place, do not
// descend any further.
// TODO: Need to make sure that if that dependency got eliminated or ignored
// for any reason in the future, we would not violate DAG topology.
// Currently it does not happen, but makes an implicit assumption about
// future implementation.
//
// Independently, if we encounter node that is some sort of global
// object (like a call) we already have full set of dependencies to it
// and we can stop descending.
if (SUa->isSucc(SUb) \|\|
isGlobalMemoryObject(AA, SUb->getInstr()))
return *Depth;

// If we do need an edge, or we have exceeded depth budget,
// add that edge to the predecessors chain of SUb,
// and stop descending.
if (*Depth > 200 \|\|
MIsNeedChainEdge(AA, MFI, SUa->getInstr(), SUb->getInstr())) {
SUb->addPred(SDep(SUa, SDep::MayAliasMem));
return *Depth;
}
// Track current depth.
(*Depth)++;
// Iterate over memory dependencies only.
for (SUnit::const_succ_iterator I = SUb->Succs.begin(), E = SUb->Succs.end();
I != E; ++I)
if (I->isNormalMemoryOrBarrier())
iterateChainSucc (AA, MFI, SUa, I->getSUnit(), ExitSU, Depth, Visited);
return *Depth;
}

/// This function assumes that "downward" from SU there exist
/// tail/leaf of already constructed DAG. It iterates downward and
/// checks whether SU can be aliasing any node dominated
/// by it.
static void adjustChainDeps(AliasAnalysis AA, const MachineFrameInfo MFI,
SUnit SU, SUnit ExitSU, std::set<SUnit *> &CheckList,
unsigned LatencyToLoad) {
if (!SU)
return;

SmallPtrSet<const SUnit*, 16> Visited;
unsigned Depth = 0;

for (std::set<SUnit *>::iterator I = CheckList.begin(), IE = CheckList.end();
I != IE; ++I) {
if (SU == *I)
continue;
if (MIsNeedChainEdge(AA, MFI, SU->getInstr(), (*I)->getInstr())) {
SDep Dep(SU, SDep::MayAliasMem);
Dep.setLatency(((*I)->getInstr()->mayLoad()) ? LatencyToLoad : 0);
(*I)->addPred(Dep);
}

// Iterate recursively over all previously added memory chain
// successors. Keep track of visited nodes.
for (SUnit::const_succ_iterator J = (*I)->Succs.begin(),
JE = (*I)->Succs.end(); J != JE; ++J)
if (J->isNormalMemoryOrBarrier())
iterateChainSucc (AA, MFI, SU, J->getSUnit(),
ExitSU, &Depth, Visited);
}
}

/// Check whether two objects need a chain edge, if so, add it		/// Check whether two objects need a chain edge, if so, add it
/// otherwise remember the rejected SU.		/// otherwise remember the rejected SU.
static inline		void ScheduleDAGInstrs::addChainDependency (SUnit SUa, SUnit SUb,
void addChainDependency (AliasAnalysis AA, const MachineFrameInfo MFI,		unsigned Latency) {
SUnit SUa, SUnit SUb,
std::set<SUnit *> &RejectList,
unsigned TrueMemOrderLatency = 0,
bool isNormalMemory = false) {
// If this is a false dependency,		// If this is a false dependency,
// do not add the edge, but rememeber the rejected node.		// do not add the edge, but rememeber the rejected node.
if (MIsNeedChainEdge(AA, MFI, SUa->getInstr(), SUb->getInstr())) {		if (MIsNeedChainEdge(AAForDep, MFI, SUa->getInstr(), SUb->getInstr())) {
SDep Dep(SUa, isNormalMemory ? SDep::MayAliasMem : SDep::Barrier);		SDep Dep(SUa, SDep::MayAliasMem);
Dep.setLatency(TrueMemOrderLatency);		Dep.setLatency(Latency);
SUb->addPred(Dep);		SUb->addPred(Dep);
}		}
else {
// Duplicate entries should be ignored.
RejectList.insert(SUb);
DEBUG(dbgs() << "\tReject chain dep between SU("
<< SUa->NodeNum << ") and SU("
<< SUb->NodeNum << ")\n");
}
}		}

/// Create an SUnit for each real instruction, numbered in top-down toplological		/// Create an SUnit for each real instruction, numbered in top-down toplological
/// order. The instruction order A < B, implies that no edge exists from B to A.		/// order. The instruction order A < B, implies that no edge exists from B to A.
///		///
/// Map each real instruction to its SUnit.		/// Map each real instruction to its SUnit.
///		///
/// After initSUnits, the SUnits vector cannot be resized and the scheduler may		/// After initSUnits, the SUnits vector cannot be resized and the scheduler may
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
/// DAG builder is an efficient place to do it because it already visits		/// DAG builder is an efficient place to do it because it already visits
/// operands.		/// operands.
void ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA,		void ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA,
RegPressureTracker *RPTracker,		RegPressureTracker *RPTracker,
PressureDiffs *PDiffs) {		PressureDiffs *PDiffs) {
const TargetSubtargetInfo &ST = MF.getSubtarget();		const TargetSubtargetInfo &ST = MF.getSubtarget();
bool UseAA = EnableAASchedMI.getNumOccurrences() > 0 ? EnableAASchedMI		bool UseAA = EnableAASchedMI.getNumOccurrences() > 0 ? EnableAASchedMI
: ST.useAA();		: ST.useAA();
AliasAnalysis *AAForDep = UseAA ? AA : nullptr;		AAForDep = UseAA ? AA : nullptr;

MISUnitMap.clear();		MISUnitMap.clear();
ScheduleDAG::clearDAG();		ScheduleDAG::clearDAG();

// Create an SUnit for each real instruction.		// Create an SUnit for each real instruction.
initSUnits();		initSUnits();

if (PDiffs)		if (PDiffs)
PDiffs->init(SUnits.size());		PDiffs->init(SUnits.size());

// We build scheduling units by walking a block's instruction list from bottom		// We build scheduling units by walking a block's instruction list from bottom
// to top.		// to top.

// Remember where a generic side-effecting instruction is as we procede.		// Remember where a generic side-effecting instruction is as we
SUnit BarrierChain = nullptr, AliasChain = nullptr;		// procede. No SU gets ever scheduled around this SU.
		SUnit *BarrierChain = nullptr;
// Memory references to specific known memory locations are tracked
// so that they can be given more precise dependencies. We track		// Each MIs' memory operand(s) is analyzed to a list of underlying
// separately the known memory locations that may alias and those		// objects. The SU is then inserted in the SUList(s) mapped from
// that are known not to alias		// that Value(s). Each Value thus gets mapped to a list of SUs
MapVector<ValueType, std::vector<SUnit *> > AliasMemDefs, NonAliasMemDefs;		// depending on it, defs and uses kept separately. Two SUs are
MapVector<ValueType, std::vector<SUnit *> > AliasMemUses, NonAliasMemUses;		// non-aliasing to each other if they depend on different Values
std::set<SUnit*> RejectMemNodes;		// exclusively.
		Value2SUsMap Stores, Loads(1 /TrueMemOrderLatency/);

		// Certain memory accesses are known to not alias any SU in Stores
		// or Loads, and have therefore their own 'NonAlias'
		// domain. E.g. spill / reload instructions never alias LLVM I/R
		// Values. It is assumed that this type of memory accesses always
		// have a proper memory operand modelling, and are therefore never
		// unanalyzable. This means they are non aliasing against all nodes
		// in Stores and Loads, including the unanalyzable ones.
		Value2SUsMap NonAliasStores, NonAliasLoads(1 /TrueMemOrderLatency/);

// Remove any stale debug info; sometimes BuildSchedGraph is called again		// Remove any stale debug info; sometimes BuildSchedGraph is called again
// without emitting the info from the previous call.		// without emitting the info from the previous call.
DbgValues.clear();		DbgValues.clear();
FirstDbgValue = nullptr;		FirstDbgValue = nullptr;

assert(Defs.empty() && Uses.empty() &&		assert(Defs.empty() && Uses.empty() &&
"Only BuildGraph should update Defs/Uses");		"Only BuildGraph should update Defs/Uses");
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::iterator MII = RegionEnd, MIE = RegionBegin;
// check currently relies on being called before adding chain deps.		// check currently relies on being called before adding chain deps.
if (SU->NumSuccs == 0 && SU->Latency > 1		if (SU->NumSuccs == 0 && SU->Latency > 1
&& (HasVRegDef \|\| MI->mayLoad())) {		&& (HasVRegDef \|\| MI->mayLoad())) {
SDep Dep(SU, SDep::Artificial);		SDep Dep(SU, SDep::Artificial);
Dep.setLatency(SU->Latency - 1);		Dep.setLatency(SU->Latency - 1);
ExitSU.addPred(Dep);		ExitSU.addPred(Dep);
}		}

// Add chain dependencies.		// Add memory dependencies.
// Chain dependencies used to enforce memory order should have		// Note: isStoreToStackSlot and isLoadFromStackSLot are not usable
// latency of 0 (except for true dependency of Store followed by
// aliased Load... we estimate that with a single cycle of latency
// assuming the hardware will bypass)
// Note that isStoreToStackSlot and isLoadFromStackSLot are not usable
// after stack slots are lowered to actual addresses.		// after stack slots are lowered to actual addresses.
// TODO: Use an AliasAnalysis and do real alias-analysis queries, and
// produce more precise dependence information.
unsigned TrueMemOrderLatency = MI->mayStore() ? 1 : 0;
if (isGlobalMemoryObject(AA, MI)) {		if (isGlobalMemoryObject(AA, MI)) {
// Be conservative with these and add dependencies on all memory		// This is a barrier event that acts as a pivotal node in the
// references, even those that are known to not alias.		// DAG. Add dependencies against everything below it, and clear
for (MapVector<ValueType, std::vector<SUnit *> >::iterator I =		// lists.
NonAliasMemDefs.begin(), E = NonAliasMemDefs.end(); I != E; ++I) {		addBarrierChainsAndClearMap(SU, Stores);
for (unsigned i = 0, e = I->second.size(); i != e; ++i) {		addBarrierChainsAndClearMap(SU, Loads);
I->second[i]->addPred(SDep(SU, SDep::Barrier));		addBarrierChainsAndClearMap(SU, NonAliasStores);
}		addBarrierChainsAndClearMap(SU, NonAliasLoads);
}
for (MapVector<ValueType, std::vector<SUnit *> >::iterator I =		// Become the new BarrierChain:
NonAliasMemUses.begin(), E = NonAliasMemUses.end(); I != E; ++I) {
for (unsigned i = 0, e = I->second.size(); i != e; ++i) {
SDep Dep(SU, SDep::Barrier);
Dep.setLatency(TrueMemOrderLatency);
I->second[i]->addPred(Dep);
}
}
// Add SU to the barrier chain.
if (BarrierChain)		if (BarrierChain)
BarrierChain->addPred(SDep(SU, SDep::Barrier));		BarrierChain->addPredBarrier(SU);
BarrierChain = SU;		BarrierChain = SU;
// This is a barrier event that acts as a pivotal node in the DAG,		continue;
// so it is safe to clear list of exposed nodes.		}
adjustChainDeps(AA, MFI, SU, &ExitSU, RejectMemNodes,
TrueMemOrderLatency);		// If it's not a store or a variant load, we're done.
RejectMemNodes.clear();		if (!MI->mayStore() && !(MI->mayLoad() && !MI->isInvariantLoad(AA)))
NonAliasMemDefs.clear();		continue;
NonAliasMemUses.clear();
		// Always add dependecy edge to BarrierChain if present.
// fall-through
new_alias_chain:
// Chain all possibly aliasing memory references through SU.
if (AliasChain) {
unsigned ChainLatency = 0;
if (AliasChain->getInstr()->mayLoad())
ChainLatency = TrueMemOrderLatency;
addChainDependency(AAForDep, MFI, SU, AliasChain, RejectMemNodes,
ChainLatency);
}
AliasChain = SU;
for (unsigned k = 0, m = PendingLoads.size(); k != m; ++k)
addChainDependency(AAForDep, MFI, SU, PendingLoads[k], RejectMemNodes,
TrueMemOrderLatency);
for (MapVector<ValueType, std::vector<SUnit *> >::iterator I =
AliasMemDefs.begin(), E = AliasMemDefs.end(); I != E; ++I) {
for (unsigned i = 0, e = I->second.size(); i != e; ++i)
addChainDependency(AAForDep, MFI, SU, I->second[i], RejectMemNodes);
}
for (MapVector<ValueType, std::vector<SUnit *> >::iterator I =
AliasMemUses.begin(), E = AliasMemUses.end(); I != E; ++I) {
for (unsigned i = 0, e = I->second.size(); i != e; ++i)
addChainDependency(AAForDep, MFI, SU, I->second[i], RejectMemNodes,
TrueMemOrderLatency);
}
adjustChainDeps(AA, MFI, SU, &ExitSU, RejectMemNodes,
TrueMemOrderLatency);
PendingLoads.clear();
AliasMemDefs.clear();
AliasMemUses.clear();
} else if (MI->mayStore()) {
// Add dependence on barrier chain, if needed.
// There is no point to check aliasing on barrier event. Even if
// SU and barrier _could_ be reordered, they should not. In addition,
// we have lost all RejectMemNodes below barrier.
if (BarrierChain)		if (BarrierChain)
BarrierChain->addPred(SDep(SU, SDep::Barrier));		BarrierChain->addPredBarrier(SU);

		// Find the underlying objects for MI. The Objs vector is either
		// empty, or filled with the Values of memory locations wich this
		// SU depends on. An empty vector means the memory location is
		// unknown, and may alias anything except NonAlias nodes. Such
		// nodes are mapped to 'UnknownValue'.
UnderlyingObjectsVector Objs;		UnderlyingObjectsVector Objs;
getUnderlyingObjectsForInstr(MI, MFI, Objs);		getUnderlyingObjectsForInstr(MI, MFI, Objs);

		if (MI->mayStore()) {
if (Objs.empty()) {		if (Objs.empty()) {
// Treat all other stores conservatively.		// An unknown store depends on all stores and loads, except
goto new_alias_chain;		// NonAliasStores and NonAliasLoads.
		addChainDependencies(SU, Stores);
		addChainDependencies(SU, Loads);

		// If we're not using AA, clear Stores map since all stores
		// will be chained.
		if (!AAForDep)
		Stores.clear();

		Stores.addToMap(SU, UnknownValue);
		continue;
}		}

		// Add precise dependencies against all previously seen memory
		// accesses mapped to the same Value(s).
bool MayAlias = false;		bool MayAlias = false;
for (UnderlyingObjectsVector::iterator K = Objs.begin(), KE = Objs.end();		for (auto &underlObj : Objs) {
K != KE; ++K) {		ValueType V = underlObj.getPointer();
ValueType V = K->getPointer();		bool ThisMayAlias = underlObj.getInt();
bool ThisMayAlias = K->getInt();
if (ThisMayAlias)		if (ThisMayAlias)
MayAlias = true;		MayAlias = true;

// A store to a specific PseudoSourceValue. Add precise dependencies.		Value2SUsMap &stores_ = (ThisMayAlias ? Stores : NonAliasStores);
// Record the def in MemDefs, first adding a dep if there is
// an existing def.		// Add dependencies to previous stores and loads mapped to V.
MapVector<ValueType, std::vector<SUnit *> >::iterator I =		addChainDependencies(SU, stores_, V);
((ThisMayAlias) ? AliasMemDefs.find(V) : NonAliasMemDefs.find(V));		addChainDependencies(SU, (ThisMayAlias ? Loads : NonAliasLoads), V);
MapVector<ValueType, std::vector<SUnit *> >::iterator IE =
((ThisMayAlias) ? AliasMemDefs.end() : NonAliasMemDefs.end());
if (I != IE) {
for (unsigned i = 0, e = I->second.size(); i != e; ++i)
addChainDependency(AAForDep, MFI, SU, I->second[i], RejectMemNodes,
0, true);

// If we're not using AA, then we only need one store per object.		// If we're not using AA, then we only need one store per object.
if (!AAForDep)		if (!AAForDep)
I->second.clear();		stores_.clearList(V);
I->second.push_back(SU);
} else {		// Map this store to V.
if (ThisMayAlias) {		stores_.addToMap(SU, V);
if (!AAForDep)
AliasMemDefs[V].clear();
AliasMemDefs[V].push_back(SU);
} else {
if (!AAForDep)
NonAliasMemDefs[V].clear();
NonAliasMemDefs[V].push_back(SU);
}
}		}
// Handle the uses in MemUses, if there are any.		if (MayAlias) {
MapVector<ValueType, std::vector<SUnit *> >::iterator J =		// The store is not 'NonAlias', and may therefore have
((ThisMayAlias) ? AliasMemUses.find(V) : NonAliasMemUses.find(V));		// dependencies to unanalyzable loads and stores.
MapVector<ValueType, std::vector<SUnit *> >::iterator JE =		addChainDependencies(SU, Loads, UnknownValue);
((ThisMayAlias) ? AliasMemUses.end() : NonAliasMemUses.end());		addChainDependencies(SU, Stores, UnknownValue);
if (J != JE) {
for (unsigned i = 0, e = J->second.size(); i != e; ++i)
addChainDependency(AAForDep, MFI, SU, J->second[i], RejectMemNodes,
TrueMemOrderLatency, true);
J->second.clear();
}		}
}		}
if (MayAlias) {		else { // SU is a load.
// Add dependencies from all the PendingLoads, i.e. loads
// with no underlying object.
for (unsigned k = 0, m = PendingLoads.size(); k != m; ++k)
addChainDependency(AAForDep, MFI, SU, PendingLoads[k], RejectMemNodes,
TrueMemOrderLatency);
// Add dependence on alias chain, if needed.
if (AliasChain)
addChainDependency(AAForDep, MFI, SU, AliasChain, RejectMemNodes);
}
adjustChainDeps(AA, MFI, SU, &ExitSU, RejectMemNodes,
TrueMemOrderLatency);
} else if (MI->mayLoad()) {
bool MayAlias = true;
if (MI->isInvariantLoad(AA)) {
// Invariant load, no chain dependencies needed!
} else {
UnderlyingObjectsVector Objs;
getUnderlyingObjectsForInstr(MI, MFI, Objs);

if (Objs.empty()) {		if (Objs.empty()) {
// A load with no underlying object. Depend on all		// An unknown load depends on all stores, except NonAliasStores.
// potentially aliasing stores.		addChainDependencies(SU, Stores);
for (MapVector<ValueType, std::vector<SUnit *> >::iterator I =
AliasMemDefs.begin(), E = AliasMemDefs.end(); I != E; ++I)
for (unsigned i = 0, e = I->second.size(); i != e; ++i)
addChainDependency(AAForDep, MFI, SU, I->second[i],
RejectMemNodes);

PendingLoads.push_back(SU);		Loads.addToMap(SU, UnknownValue);
MayAlias = true;		continue;
} else {
MayAlias = false;
}		}

for (UnderlyingObjectsVector::iterator		bool MayAlias = false;
J = Objs.begin(), JE = Objs.end(); J != JE; ++J) {		for (auto &underlObj : Objs) {
ValueType V = J->getPointer();		ValueType V = underlObj.getPointer();
bool ThisMayAlias = J->getInt();		bool ThisMayAlias = underlObj.getInt();

if (ThisMayAlias)		if (ThisMayAlias)
MayAlias = true;		MayAlias = true;

// A load from a specific PseudoSourceValue. Add precise dependencies.		// Add precise dependencies against all previously seen stores
MapVector<ValueType, std::vector<SUnit *> >::iterator I =		// mapping to the same Value(s).
((ThisMayAlias) ? AliasMemDefs.find(V) : NonAliasMemDefs.find(V));		addChainDependencies(SU, (ThisMayAlias ? Stores : NonAliasStores), V);
MapVector<ValueType, std::vector<SUnit *> >::iterator IE =
((ThisMayAlias) ? AliasMemDefs.end() : NonAliasMemDefs.end());		// Map this load to V.
if (I != IE)		(ThisMayAlias ? Loads : NonAliasLoads).addToMap(SU, V);
for (unsigned i = 0, e = I->second.size(); i != e; ++i)
addChainDependency(AAForDep, MFI, SU, I->second[i],
RejectMemNodes, 0, true);
if (ThisMayAlias)
AliasMemUses[V].push_back(SU);
else
NonAliasMemUses[V].push_back(SU);
}		}
if (MayAlias)		if (MayAlias)
adjustChainDeps(AA, MFI, SU, &ExitSU, RejectMemNodes, /Latency=/0);		// The load is not 'NonAlias', and may therefore have
// Add dependencies on alias and barrier chains, if needed.		// dependencies to unanalyzable stores.
if (MayAlias && AliasChain)		addChainDependencies(SU, Stores, UnknownValue);
addChainDependency(AAForDep, MFI, SU, AliasChain, RejectMemNodes);
if (BarrierChain)
BarrierChain->addPred(SDep(SU, SDep::Barrier));
}
}		}
}		}
if (DbgMI)		if (DbgMI)
FirstDbgValue = DbgMI;		FirstDbgValue = DbgMI;

Defs.clear();		Defs.clear();
Uses.clear();		Uses.clear();
VRegDefs.clear();		VRegDefs.clear();
PendingLoads.clear();
}		}

/// \brief Initialize register live-range state for updating kills.		/// \brief Initialize register live-range state for updating kills.
void ScheduleDAGInstrs::startBlockForKills(MachineBasicBlock *BB) {		void ScheduleDAGInstrs::startBlockForKills(MachineBasicBlock *BB) {
// Start with no live registers.		// Start with no live registers.
LiveRegs.reset();		LiveRegs.reset();

// Examine the live-in regs of all successors.		// Examine the live-in regs of all successors.
▲ Show 20 Lines • Show All 474 Lines • Show Last 20 Lines