This is an archive of the discontinued LLVM Phabricator instance.

ScheduleDAGInstrs::buildSchedGraph() handling of memory dependecies rewritten.
ClosedPublic

Authored by jonpa on Mar 30 2015, 10:21 AM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
atrick
hfinkel

Summary

The buildSchedGraph() was in need of reworking as the AA features had been
added on top of earlier code. It was very difficult to understand, and buggy.
There had been found cases where scheduling dependencies had actually been
missed (see r228686).

AliasChain, RejectMemNodes, adjustChainDeps() and iterateChainSucc() have
been removed. There are instead now just the four maps from Value to SUs, which
have been renamed to Stores, Loads, NonAliasStores and NonAliasLoads.

An unknown store used to become the AliasChain, but now becomes a store mapped
to 'unknownValue' (in Stores). What used to be PendingLoads is instead the
list of SUs mapped to 'unknownValue' in Loads.

RejectMemNodes and adjustChainDeps() used to be a safety-net for everything.
The SU maps were sometimes cleared and SUs were put in RejectMemNodes, where
adjustChainDeps() would look. Instead of this, a more straight forward approach
is used in maintaining the SU maps without clearing them and simply letting
them grow over time. Instead of the cutt-off in adjustChainDeps() search, a
reduction of maps will be done if needed (see below).

Each SUnit either becomes the BarrierChain, or is put into one of the maps. For
each SUnit encountered, all the information about previous ones are still
available until a new BarrierChain is set, at which point the maps are cleared.

For huge regions, the algorithm becomes slow, therefore the maps will get
reduced at a threshold (default 1000 nodes), by a fraction (default 1/2).
These values can be tuned by use of CL options in case some test case shows that
they need to be changed for some target (-dag-maps-huge-region and
-dag-maps-reduction-size).

There has not been any considerable change observed in output quality or compile
time. There may now be more DAG edges inserted than before (i.e. if A->B->C,
then A->C is not needed). However, in a comparison run there were fewer total
calls to AA, and a somewhat improved compile time, which means this seems to
be not a problem.

Diff Detail

Event Timeline

jonpa updated this revision to Diff 22892.Mar 30 2015, 10:21 AM

jonpa retitled this revision from to ScheduleDAGInstrs::buildSchedGraph() rewritten..

jonpa updated this object.

jonpa edited the test plan for this revision. (Show Details)

jonpa added reviewers: hfinkel, atrick.

jonpa added a subscriber: Unknown Object (MLST).

This looks like a great improvement except for all the data structure boilerplate in the header. I'll take it on your word and Hal's review that the functionality is preserved and that compile time is positively impacted.

include/llvm/CodeGen/ScheduleDAGInstrs.h
167	I don't really understand the value of wrapping the std::list API. Can't you just use a typedef or expose the underlying list?
171	Uppercase variable name.

You're right, the addChainDependencies() method that called sus.getTrueMemOrderLatency() was no longer used, so I could remove the TrueMemOrderLatency member, and then also change SUList to a typedef, as you suggested.
This will hopefully change in the future, if there is a better implementation of SUList - I added a TODO in the comment on this.

This implementation of buildSchedGraph() was faster last time I checked on *my* machine and *my* test-suite. Of course, the main improvement was meant to be on readability and simplicity.

jonpa updated this object.Apr 1 2015, 1:33 AM

Thanks again for working on this; I few minor comments below.

I ran through the test suite and self-hosting with this patch applied; all tests passed and I saw no significant performance changes, so I'm happy on that front!

include/llvm/CodeGen/ScheduleDAGInstrs.h
163	This comment is confusing. "this SU" is that barrier chain?
190	This needs to be: llvm_unreachable("Don't use"); (or, preferably, some more informative message)
lib/CodeGen/ScheduleDAGInstrs.cpp
914	Please make this a cl::opt command-line parameter.
918	This too (a command-line parameter) -- having it default to HugeRegion/2 likely makes sense. (You can use the getNumOccurrences() function on a cl::opt to see if it has actually been set or not).
1110	Should this be >=? We can add multiple dependencies at a time, so we might miss an exact threshold crossing?

Fixes according to Hal Finkels suggestions.

Bugfixing in reduceHugeMemNodeMaps() (Some details had gotten lost in the revision process):

After a reduction of maps:
  Don't forget to recompute NumNodes.
  Don't forget to add dep to old BarrierChain.
  Check for potential loop with newBarrierChain.

Andy, have your concerns been addressed at this point?

include/llvm/CodeGen/PseudoSourceValue.h
24	Line is too long?

Unless it can be shown by measuring compile time that the implementation needs to be defined in the header, I would:

(1) define insertBarrierChain in the .cpp file

(2) Forward declare and define Value2SUsMap and methods that call it in the .cpp file.

This revision now requires changes to proceed.Apr 14 2015, 1:09 PM

Definitions of class / methods moved out of .h file to .cpp file.

Patch rebased.

Some PowerPC regressions:
ppc64-fastcc.ll (A copy of an argument now became coalesced)
vsx-fma-sp.ll and vsx-fma-m.ll (not sure about)

I didn't mean to hold this up! I really have to say this is so much easier for me to understand than the previous code. And it's not just a good cleanup but an important bug fix (in the !AADep areMemAccessTriviallyDisjoint case where the Stores queue is cleared) and a performance improvement (FIFO draining of dependencies).

There's one thing that's a little scary to me. We assume that every NonAliasing load or store must be analyzable by getUnderlyingObjects:

+ if (Objs.empty()) {
+ An unknown store depends on all stores and loads, except
+ NonAliasStores and NonAliasLoads.
+ addChainDependencies(SU, Stores);
+ addChainDependencies(SU, Loads);

and

+ if (Objs.empty()) {
+ // An unknown load depends on all stores, except NonAliasStores.
+ addChainDependencies(SU, Stores);

If we fail to return an underlying object--for example:

+ if (MFI->hasTailCall())
+ return;

Then how can we be sure that load or store doesn't alias with the "NonAliasing" set?

Can you or Hal comment on why that is safe. I can't prove to myself that it's correct, so I think at least a comment would be nice.

Other than that, if this still passes all target tests, please go ahead and check it in.

lib/CodeGen/ScheduleDAGInstrs.cpp
1022–1029	80-column

jonpa mentioned this in D7850: ScheduleDAGInstrs::buildSchedGraph() rewritten..Dec 3 2015, 12:47 AM

uabelho added a subscriber: uabelho.Dec 3 2015, 1:10 AM

Minor update of patch to make it apply cleanly.

Andy,

I think that when looking for underlying objects, objects will only be considered to not alias only if PSV->mayAlias(MFI) returns false:

if (!PSV->isAliased(MFI)) {

bool MayAlias = PSV->mayAlias(MFI);
Objects.push_back(UnderlyingObjectsVector::value_type(PSV, MayAlias));

}

Any other cases will either return either empty Objects, or aliasing objects. So, if MFI->hasTailCall(), an empty objects list is returned instead, which will make the instruction be treated as unknown store / load.

So I believe this is correct as long as the memory operands are correct, of course.

What do you think?

OK. My concern is this:

Two stores both access the same nonaliased location.
One store has it's memory operand stripped for some reason--we don't have any guarantee that memoperands are preserved.
The scheduler now assumes the two stores are independent.

Aha - I guess you are right that if we can't assume that the register allocator is correctly adding memory operands for spill instructions, then we can't safely separate the Alias / NonAlias accesses.

To me, it makes sense to have this assumption, as it is something that AFAIK should work for all targets since the register allocator is already correctly adding those memory operands everywhere. But of course, there is the case of e.g. expanding a spill instruction and forgetting to add a memory operand...

Having memory operands is important for scheduling, so it is natural for targets to want to add them. Perhaps there is a way to enforce it? On a target I worked on, there was seperate opcodes for spill instrucions, which made this easy to verify.

I guess it should work to have just one set of Stores / Loads, and insert all SUs there. The underlying object of stack accesses should still make them "no-alias" to LLVM Values. I kept this design of splitting Alias / NonAlias because I assumed it was important for performance. Merging the sets would give a bigger map, but also easier to read code. It might be worth a try...

/Jonas

I think we need to be conservative in the event that mem_operands are missing, since we haven't made that guarantee--and that's a separate discussion. For example, non-aliasing operations might exist prior to register allocation to handle calling convention (I'm not sure if this is purely theoretical without running tests on all targets).

You can probably be just as aggressive when the instruction has mem_operands. You may just need to distinguish between a load/store without a single mem_operand vs. one without a reachable underlying object. i.e. if it the Store has a mem_operand and is known not to be a PseudoSourceValue, but has a null Value, then your implementation should still be fine.

In D8705#302648, @atrick wrote:

I think we need to be conservative in the event that mem_operands are missing, since we haven't made that guarantee--and that's a separate discussion. For example, non-aliasing operations might exist prior to register allocation to handle calling convention (I'm not sure if this is purely theoretical without running tests on all targets).

I agree. We need to make conservative assumptions when mem_operands is missing.

OK - I have now changed the patch so it does not assume that unanalyzable stores / loads are never NonAlias.

I guess what is needed is for the scheduler to print out a line each time there is a missing mem-operand, as it would then be simple to check that they are not forgotten somewhere. When that is done, this change should not matter.

diff --git a/lib/CodeGen/ScheduleDAGInstrs.cpp b/lib/CodeGen/ScheduleDAGInstrs.cpp
index d6a72b5..c719c78 100644

a/lib/CodeGen/ScheduleDAGInstrs.cpp

+++ b/lib/CodeGen/ScheduleDAGInstrs.cpp
@@ -1030,10 +1030,11 @@ void ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA,

if (MI->mayStore()) {
  if (Objs.empty()) {

// An unknown store depends on all stores and loads, except
// NonAliasStores and NonAliasLoads.

+ // An unknown store depends on all stores and loads.

addChainDependencies(SU, Stores);

+ addChainDependencies(SU, NonAliasStores);

addChainDependencies(SU, Loads);

+ addChainDependencies(SU, NonAliasLoads);

@@ -1067,17 +1068,16 @@ void ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA,

if (MayAlias) {
// The store is not 'NonAlias', and may therefore have
// dependencies to unanalyzable loads and stores.
addChainDependencies(SU, Loads, UnknownValue);
addChainDependencies(SU, Stores, UnknownValue);
}

+ The store may have dependencies to unanalyzable loads and
+ stores.
+ addChainDependencies(SU, Loads, UnknownValue);
+ addChainDependencies(SU, Stores, UnknownValue);

}
else { // SU is a load.
  if (Objs.empty()) {

// An unknown load depends on all stores, except NonAliasStores.

+ // An unknown load depends on all stores.

addChainDependencies(SU, Stores);

+ addChainDependencies(SU, NonAliasStores);

Loads.insert(SU, UnknownValue);
continue;

@@ -1097,10 +1097,8 @@ void ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA,

  // Map this load to V.
  (ThisMayAlias ? Loads : NonAliasLoads).insert(SU, V);
}

if (MayAlias)
// The load is not 'NonAlias', and may therefore have
// dependencies to unanalyzable stores.
addChainDependencies(SU, Stores, UnknownValue);

+ // The load may have dependencies to unanalyzable stores.
+ addChainDependencies(SU, Stores, UnknownValue);

Removed unused MayAlias variables.

Note: https://llvm.org/bugs/show_bug.cgi?id=25794 is an example of a huge region not handled well by the current scheduler. This reminds me that we might want to find good values for -dag-maps-huge-region. Default is now 1000, which is very high. I think 50-100 might actually work well generally, but this I have only tested on an out-of-tree target. Regardless, this patch was much faster already due to the reduction of maps, at least in this case.
.

Jonas,

I just tried this patch on a known-good LLVM+Clang, and it breaks generated code in a number of benchmark suite for the Cortex-A53 and Cortex-A57 targets:

llvm-test-suite/MultiSource/Benchmarks
spec2006: mcf
cpu2000: mcf

I won't have time to investigate further in the next 2 weeks.

Thanks for giving it a try!

This patch has been tested on quite an extensive test-suite out-of-tree, and I would therefore say that it isn't impossible that a scheduler difference might expose bugs elsewhere than in the buildSchedGraph() method. Therefore I do think we have to testing on all targets before this is committed, as things are going to be scrambled around quite a bit. While doing so, we could tune the parameters for huge region reduction as well as check the overall compile time, preferably.

Ka-Ka added a subscriber: Ka-Ka.Dec 17 2015, 1:48 AM

Thanks for following through with this and getting it tested. I think this fix should go in even if there are some regressions or fragile scheduling tests that need to be temporarily disabled. There aren't any miscompiles right?

This revision is now accepted and ready to land.Jan 27 2016, 10:22 AM

mcrosier added a subscriber: mcrosier.Jan 27 2016, 11:06 AM

mcrosier edited edge metadata.Jan 27 2016, 11:09 AM

mcrosier added a subscriber: gberry.

mcrosier mentioned this in D16636: [ScheduleDAGInstrs] Make a conservative assumption about MIs with multiple MMOs..Jan 27 2016, 11:53 AM

mcrosier mentioned this in D16369: [AArch64] Don't drop MMOs in the load/store optimizer when forming ldp/stp instructions or pre-/post-index loads/stores..Jan 28 2016, 7:17 AM

OK, submitted as r259201.

Did not see any regressions that called for disabling any tests at least on my machine.

As mentioned before:

The default value of 1000 for the -dag-maps-huge-region option is very high ("unlimited"). It may very well be that 50 is more reasonable if compile time is precious.

The other thing that could be done to speed things up is to use a a memory pool for the SUList class - see comment in ScheduleDAGInstrs.h.

Thanks!

Sorry for the buildbot regressions. I could find one bug in the patch, but for the other four I am not sure if it is the test cases that needs to be fixed, or if there is an error somewhere. Could please people with knowledge of these backends have a look.

test/CodeGen/AArch64/arm64-misched-memdep-bug.ll:

bug in patch for the case of !AADeps: Stores get cleared, and must always be chained, also against a load.
Test case updated to allow edges SU(2)<-SU(3)<-SU(4) instead of SU(2)<-SU(4)

The fix in MISNeedChainEdge() is:
if (!AA && MIa->mayStore() && MIb->mayStore())

>

if (!AA && MIb->mayStore())

/home/jonas/llvm/llvm-dev/test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll

Don't know if this is a bug or test case needs fixing. Two read / write instructions have been reordered.

/home/jonas/llvm/llvm-dev/test/CodeGen/PowerPC/ppc64-fastcc.ll

Looks like a change in register allocation. Test case updated with different physical registers

Looks like a change in register allocation:
/home/jonas/llvm/llvm-dev/test/CodeGen/PowerPC/vsx-fma-m.ll:
/home/jonas/llvm/llvm-dev/test/CodeGen/PowerPC/vsx-fma-sp.ll:95:14:

Herald added a subscriber: MatzeB. · View Herald TranscriptJan 29 2016, 11:54 AM

gberry added inline comments.Jan 29 2016, 12:13 PM

test/CodeGen/AArch64/arm64-misched-memdep-bug.ll
11–14	I'm not sure I follow the reasoning behind this change. It seems like this is a regression. The 3->4 edge (load %ptr1_plus1 -> store %ptr1) should be unnecessary since areMemAccessTriviallyDisjoint can tell you these two accesses don't overlap.

jonpa added inline comments.Jan 29 2016, 12:36 PM

test/CodeGen/AArch64/arm64-misched-memdep-bug.ll
11–14	I believe this would be the case with AliasAnalysis. However, without AA, all stores to the same Value get chained, and only the last seen store is kept for future reference. Therefore, it could never be legal to skip a dependency to a store to same value, since there may be other stores chained below it that will not be checked. I believe it is default for this target/cpu to not use AA, since when I debugged this test case, AAForDep was nullptr.

gberry added inline comments.Jan 29 2016, 2:23 PM

lib/CodeGen/ScheduleDAGInstrs.cpp
1062	This is indented weird (tabs?)
test/CodeGen/AArch64/arm64-misched-memdep-bug.ll
11–14	I'm not sure what you mean by stores to the same Value (since there aren't any in this example). I kind of get what you're saying about the the non-AA case, but isn't the point of areMemAccessTriviallyDisjoint() to have a cheap approximation of AA? It seems like with this change it only has an effect if AA is being used, in which case is it really adding any extra disambiguation?

Considering the nasty bugs we've had in tree for months (as a result of some attempts to fake AA without enabling it), and based on the discussion today, I can see that it's time to converge the AA and non-AA scheduling. The --enable-aa-sched-mi flag and ST.useAA hook can simply control whether we actually call the AA analysis. That way targets can still isolate themselves from potential bugs in MI-level AliasAnalysis, but the DAG builder logic will be common across targets and free of special cases.

As a result, non-AA targets may have to deal with bloated DAGs. But AA targets have to deal with the same problem. We can deal with that through bug reports, all work toward a common solution, and encourage more targets to adopt AA scheduling.

This may contradict my earlier advice, but now I can see that it's the only way to get this code in a maintainable state. I also realize that targets that care most about the scheduler's compile time (GPU) really need the same level of DAG support for alias analysis as targets that have been using AA already. We might as well have a common solution.

So my suggestion for this patch is:

a/lib/CodeGen/ScheduleDAGInstrs.cpp

+++ b/lib/CodeGen/ScheduleDAGInstrs.cpp
@@ -588,11 +588,6 @@ static bool MIsNeedChainEdge(AliasAnalysis *AA, const MachineFrameInfo *MFI,

assert ((MIa->mayStore() || MIb->mayStore()) &&
        "Dependency checked between two loads");

// buildSchedGraph() will clear list of stores if not using AA,
// which means all stores have to be chained without AA.
if (!AA && MIb->mayStore())
return true; - // Let the target decide if memory accesses cannot possibly overlap. if (TII->areMemAccessesTriviallyDisjoint(MIa, MIb, AA)) return false;

@@ -1059,11 +1054,6 @@ void ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA,

addChainDependencies(SU, Loads);
addChainDependencies(SU, NonAliasLoads);

// If we're not using AA, clear Stores map since all stores
// will be chained.
if (!AAForDep)
Stores.clear(); - // Map this store to 'UnknownValue'. Stores.insert(SU, UnknownValue); continue;

@@ -1081,10 +1071,6 @@ void ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA,

addChainDependencies(SU, stores_, V);
addChainDependencies(SU, (ThisMayAlias ? Loads : NonAliasLoads), V);

// If we're not using AA, then we only need one store per object.
if (!AAForDep)
stores_.clearList(V); - // Map this store to V. stores_.insert(SU, V); }

diff --git a/test/CodeGen/AArch64/arm64-misched-memdep-bug.ll b/test/CodeGen/AArch64/arm64-misched-memdep-bug.ll
index a9ddb73..292fbb7 100644

a/test/CodeGen/AArch64/arm64-misched-memdep-bug.ll

+++ b/test/CodeGen/AArch64/arm64-misched-memdep-bug.ll
@@ -8,7 +8,7 @@
; CHECK: SU(2): %vreg2<def> = LDRWui %vreg0, 1; mem:LD4[%ptr1_plus1] GPR32:%vreg2 GPR64common:%vreg0
; CHECK: Successors:
; CHECK-NEXT: val SU(5): Latency=4 Reg=%vreg2
-; CHECK-NEXT: ch SU(3): Latency=0
+; CHECK-NEXT: ch SU(4): Latency=0
; CHECK: SU(3): STRWui %WZR, %vreg0, 0; mem:ST4[%ptr1] GPR64common:%vreg0
; CHECK: Successors:
; CHECK: ch SU(4): Latency=0
diff --git a/test/CodeGen/AArch64/tailcall_misched_graph.ll b/test/CodeGen/AArch64/tailcall_misched_graph.ll
index 343ffab..59a3be9 100644

a/test/CodeGen/AArch64/tailcall_misched_graph.ll

+++ b/test/CodeGen/AArch64/tailcall_misched_graph.ll
@@ -37,6 +37,8 @@ declare void @callee2(i8*, i8*, i8*, i8*, i8*,
; CHECK: SU({{.*}}): [[VRB]]<def> = LDRXui <fi#-2>
; CHECK-NOT: SU
; CHECK: Successors:
-; CHECK: ch SU([[DEPSTORE:.*]]): Latency=0
+; CHECK: ch SU([[DEPSTOREB:.*]]): Latency=0
+; CHECK: ch SU([[DEPSTOREA:.*]]): Latency=0

-; CHECK: SU([[DEPSTORE]]): STRXui %vreg0, <fi#-4>
+; CHECK: SU([[DEPSTOREA]]): STRXui %vreg{{.*}}, <fi#-4>
+; CHECK: SU([[DEPSTOREB]]): STRXui %vreg{{.*}}, <fi#-3>

Patch updated by removing the handling for !AA, as suggested. To
me this is a welcome change - it was awkward to chain stores and
then handle them especially for the !AA case.

The following tests have been disabled temporarily. Someone
working with these backends, please take a look before I commit
the patch.

Failing Tests (4):

LLVM :: CodeGen/AMDGPU/split-vector-memoperand-offsets.ll
LLVM :: CodeGen/PowerPC/ppc64-fastcc.ll
LLVM :: CodeGen/PowerPC/vsx-fma-m.ll
LLVM :: CodeGen/PowerPC/vsx-fma-sp.ll

I tried the test-suite on my laptop and it seems to pass. I have
no idea what would be a preferred value for -dag-maps-huge-region
for various targets, but regarding compile time value is making a
difference, as seen below (X86).

Results (-dag-maps-huge-region = 1000)

IMPROVED : 218
REGRESSED : 294
PASS : 1480

Results (-dag-maps-huge-region = 50)

IMPROVED : 290
REGRESSED : 208
PASS : 1494

Results (-dag-maps-huge-region = 20)

IMPROVED : 282
REGRESSED : 209
PASS : 1501

I guess one would have to both measure compile time and run
benchmarks to find a good compromise value. Since (sub)targets
are divided in interest regarding performance / compile time, it
perhaps would make sense to let the subtarget define this value,
with the CodeModel in mind (consider JIT). But I guess this might
lie a bit in the future, after getting the patch in...

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptJan 30 2016, 5:16 AM

srhines added a subscriber: srhines.Jan 31 2016, 11:49 PM

hfinkel added inline comments.Feb 1 2016, 4:21 AM

test/CodeGen/PowerPC/vsx-fma-m.ll
3	You can disable tests by just adding: ; XFAIL: * you don't need to change the run lines. It looks like we'll just need to fixup the register numbers in some of these.

; XFAIL: *
used instead in test cases.

• tstellarAMD closed this revision.Feb 9 2016, 12:51 PM

christof added a subscriber: christof.Feb 15 2016, 7:10 AM

chapuni added a subscriber: chapuni.May 2 2016, 10:45 AM

chapuni added inline comments.

lib/CodeGen/ScheduleDAGInstrs.cpp
851	SUItr might point SUEE here. Fixed in r268257.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

PseudoSourceValue.h

4 lines

ScheduleDAG.h

11 lines

ScheduleDAGInstrs.h

70 lines

lib/

CodeGen/

ScheduleDAGInstrs.cpp

709 lines

test/

CodeGen/

AArch64/

arm64-misched-memdep-bug.ll

3 lines

tailcall_misched_graph.ll

6 lines

AMDGPU/

split-vector-memoperand-offsets.ll

1 line

PowerPC/

ppc64-fastcc.ll

6 lines

vsx-fma-m.ll

1 line

vsx-fma-sp.ll

2 lines

Diff 46801

include/llvm/CodeGen/PseudoSourceValue.h

	Show All 15 Lines

	#include "llvm/ADT/StringMap.h"			#include "llvm/ADT/StringMap.h"
	#include "llvm/IR/GlobalValue.h"			#include "llvm/IR/GlobalValue.h"
	#include "llvm/IR/Value.h"			#include "llvm/IR/Value.h"
	#include "llvm/IR/ValueMap.h"			#include "llvm/IR/ValueMap.h"
	#include <map>			#include <map>

	namespace llvm {			namespace llvm {

				hfinkelUnsubmitted Not Done Reply Inline Actions Line is too long? hfinkel: Line is too long?
	class MachineFrameInfo;			class MachineFrameInfo;
	class MachineMemOperand;			class MachineMemOperand;
	class raw_ostream;			class raw_ostream;

	raw_ostream &operator<<(raw_ostream &OS, const MachineMemOperand &MMO);			raw_ostream &operator<<(raw_ostream &OS, const MachineMemOperand &MMO);
				class PseudoSourceValue;
				raw_ostream &operator<<(raw_ostream &OS, const PseudoSourceValue* PSV);

	/// Special value supplied for machine level alias analysis. It indicates that			/// Special value supplied for machine level alias analysis. It indicates that
	/// a memory access references the functions stack frame (e.g., a spill slot),			/// a memory access references the functions stack frame (e.g., a spill slot),
	/// below the stack frame (e.g., argument space), or constant pool.			/// below the stack frame (e.g., argument space), or constant pool.
	class PseudoSourceValue {			class PseudoSourceValue {
	public:			public:
	enum PSVKind {			enum PSVKind {
	Stack,			Stack,
	GOT,			GOT,
	JumpTable,			JumpTable,
	ConstantPool,			ConstantPool,
	FixedStack,			FixedStack,
	GlobalValueCallEntry,			GlobalValueCallEntry,
	ExternalSymbolCallEntry			ExternalSymbolCallEntry
	};			};

	private:			private:
	PSVKind Kind;			PSVKind Kind;
				friend raw_ostream &llvm::operator<<(raw_ostream &OS,
				const PseudoSourceValue* PSV);

	friend class MachineMemOperand; // For printCustom().			friend class MachineMemOperand; // For printCustom().

	/// Implement printing for PseudoSourceValue. This is called from			/// Implement printing for PseudoSourceValue. This is called from
	/// Value::print or Value's operator<<.			/// Value::print or Value's operator<<.
	virtual void printCustom(raw_ostream &O) const;			virtual void printCustom(raw_ostream &O) const;

	public:			public:
	▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

include/llvm/CodeGen/ScheduleDAG.h

Show First 20 Lines • Show All 390 Lines • ▼ Show 20 Lines	MachineInstr *getInstr() const {
return Instr;		return Instr;
}		}

/// addPred - This adds the specified edge as a pred of the current node if		/// addPred - This adds the specified edge as a pred of the current node if
/// not already. It also adds the current node as a successor of the		/// not already. It also adds the current node as a successor of the
/// specified node.		/// specified node.
bool addPred(const SDep &D, bool Required = true);		bool addPred(const SDep &D, bool Required = true);

		/// addPredBarrier - This adds a barrier edge to SU by calling
		/// addPred(), with latency 0 generally or latency 1 for a store
		/// followed by a load.
		bool addPredBarrier(SUnit *SU) {
		SDep Dep(SU, SDep::Barrier);
		unsigned TrueMemOrderLatency =
		((SU->getInstr()->mayStore() && this->getInstr()->mayLoad()) ? 1 : 0);
		Dep.setLatency(TrueMemOrderLatency);
		return addPred(Dep);
		}

/// removePred - This removes the specified edge as a pred of the current		/// removePred - This removes the specified edge as a pred of the current
/// node if it exists. It also removes the current node as a successor of		/// node if it exists. It also removes the current node as a successor of
/// the specified node.		/// the specified node.
void removePred(const SDep &D);		void removePred(const SDep &D);

/// getDepth - Return the depth of this node, which is the length of the		/// getDepth - Return the depth of this node, which is the length of the
/// maximum path up to any node which has no predecessors.		/// maximum path up to any node which has no predecessors.
unsigned getDepth() const {		unsigned getDepth() const {
▲ Show 20 Lines • Show All 354 Lines • Show Last 20 Lines

include/llvm/CodeGen/ScheduleDAGInstrs.h

Show All 9 Lines
// This file implements the ScheduleDAGInstrs class, which implements		// This file implements the ScheduleDAGInstrs class, which implements
// scheduling for a MachineInstr-based dependency graph.		// scheduling for a MachineInstr-based dependency graph.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CODEGEN_SCHEDULEDAGINSTRS_H		#ifndef LLVM_CODEGEN_SCHEDULEDAGINSTRS_H
#define LLVM_CODEGEN_SCHEDULEDAGINSTRS_H		#define LLVM_CODEGEN_SCHEDULEDAGINSTRS_H

		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SparseMultiSet.h"		#include "llvm/ADT/SparseMultiSet.h"
#include "llvm/ADT/SparseSet.h"		#include "llvm/ADT/SparseSet.h"
#include "llvm/CodeGen/ScheduleDAG.h"		#include "llvm/CodeGen/ScheduleDAG.h"
#include "llvm/CodeGen/TargetSchedule.h"		#include "llvm/CodeGen/TargetSchedule.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Target/TargetRegisterInfo.h"		#include "llvm/Target/TargetRegisterInfo.h"
		#include <list>

namespace llvm {		namespace llvm {
class MachineFrameInfo;		class MachineFrameInfo;
class MachineLoopInfo;		class MachineLoopInfo;
class MachineDominatorTree;		class MachineDominatorTree;
class RegPressureTracker;		class RegPressureTracker;
class PressureDiffs;		class PressureDiffs;

▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	namespace llvm {
/// Track local uses of virtual registers. These uses are gathered by the DAG		/// Track local uses of virtual registers. These uses are gathered by the DAG
/// builder and may be consulted by the scheduler to avoid iterating an entire		/// builder and may be consulted by the scheduler to avoid iterating an entire
/// vreg use list.		/// vreg use list.
typedef SparseMultiSet<VReg2SUnit, VirtReg2IndexFunctor> VReg2SUnitMultiMap;		typedef SparseMultiSet<VReg2SUnit, VirtReg2IndexFunctor> VReg2SUnitMultiMap;

typedef SparseMultiSet<VReg2SUnitOperIdx, VirtReg2IndexFunctor>		typedef SparseMultiSet<VReg2SUnitOperIdx, VirtReg2IndexFunctor>
VReg2SUnitOperIdxMultiMap;		VReg2SUnitOperIdxMultiMap;

		typedef PointerUnion<const Value , const PseudoSourceValue > ValueType;
		typedef SmallVector<PointerIntPair<ValueType, 1, bool>, 4>
		UnderlyingObjectsVector;

/// ScheduleDAGInstrs - A ScheduleDAG subclass for scheduling lists of		/// ScheduleDAGInstrs - A ScheduleDAG subclass for scheduling lists of
/// MachineInstrs.		/// MachineInstrs.
class ScheduleDAGInstrs : public ScheduleDAG {		class ScheduleDAGInstrs : public ScheduleDAG {
protected:		protected:
const MachineLoopInfo *MLI;		const MachineLoopInfo *MLI;
const MachineFrameInfo *MFI;		const MachineFrameInfo *MFI;

/// TargetSchedModel provides an interface to the machine model.		/// TargetSchedModel provides an interface to the machine model.
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	protected:

/// Tracks the last instruction(s) in this region defining each virtual		/// Tracks the last instruction(s) in this region defining each virtual
/// register. There may be multiple current definitions for a register with		/// register. There may be multiple current definitions for a register with
/// disjunct lanemasks.		/// disjunct lanemasks.
VReg2SUnitMultiMap CurrentVRegDefs;		VReg2SUnitMultiMap CurrentVRegDefs;
/// Tracks the last instructions in this region using each virtual register.		/// Tracks the last instructions in this region using each virtual register.
VReg2SUnitOperIdxMultiMap CurrentVRegUses;		VReg2SUnitOperIdxMultiMap CurrentVRegUses;

/// PendingLoads - Remember where unknown loads are after the most recent		AliasAnalysis *AAForDep;
/// unknown store, as we iterate. As with Defs and Uses, this is here
/// to minimize construction/destruction.		/// Remember a generic side-effecting instruction as we proceed.
std::vector<SUnit *> PendingLoads;		/// No other SU ever gets scheduled around it (except in the special
		/// case of a huge region that gets reduced).
		SUnit *BarrierChain;
		hfinkelUnsubmitted Not Done Reply Inline Actions This comment is confusing. "this SU" is that barrier chain? hfinkel: This comment is confusing. "this SU" is that barrier chain?

		public:

		/// A list of SUnits, used in Value2SUsMap, during DAG construction.
		atrickUnsubmitted Not Done Reply Inline Actions I don't really understand the value of wrapping the std::list API. Can't you just use a typedef or expose the underlying list? atrick: I don't really understand the value of wrapping the std::list API. Can't you just use a typedef…
		/// Note: to gain speed it might be worth investigating an optimized
		/// implementation of this data structure, such as a singly linked list
		/// with a memory pool (SmallVector was tried but slow and SparseSet is not
		/// applicable).
		atrickUnsubmitted Not Done Reply Inline Actions Uppercase variable name. atrick: Uppercase variable name.
		typedef std::list<SUnit *> SUList;
		protected:
		/// A map from ValueType to SUList, used during DAG construction,
		/// as a means of remembering which SUs depend on which memory
		/// locations.
		class Value2SUsMap;

		/// Remove in FIFO order some SUs from huge maps.
		void reduceHugeMemNodeMaps(Value2SUsMap &stores,
		Value2SUsMap &loads, unsigned N);

		/// Add a chain edge between SUa and SUb, but only if both AliasAnalysis
		/// and Target fail to deny the dependency.
		void addChainDependency(SUnit SUa, SUnit SUb,
		unsigned Latency = 0);

		/// Add dependencies as needed from all SUs in list to SU.
		void addChainDependencies(SUnit *SU, SUList &sus, unsigned Latency) {
		for (auto *su : sus)
		hfinkelUnsubmitted Not Done Reply Inline Actions This needs to be: llvm_unreachable("Don't use"); (or, preferably, some more informative message) hfinkel: This needs to be: llvm_unreachable("Don't use"); (or, preferably, some more informative…
		addChainDependency(SU, su, Latency);
		}

		/// Add dependencies as needed from all SUs in map, to SU.
		void addChainDependencies(SUnit *SU, Value2SUsMap &Val2SUsMap);

		/// Add dependencies as needed to SU, from all SUs mapped to V.
		void addChainDependencies(SUnit *SU, Value2SUsMap &Val2SUsMap,
		ValueType V);

		/// Add barrier chain edges from all SUs in map, and then clear
		/// the map. This is equivalent to insertBarrierChain(), but
		/// optimized for the common case where the new BarrierChain (a
		/// global memory object) has a higher NodeNum than all SUs in
		/// map. It is assumed BarrierChain has been set before calling
		/// this.
		void addBarrierChain(Value2SUsMap &map);

		/// Insert a barrier chain in a huge region, far below current
		/// SU. Add barrier chain edges from all SUs in map with higher
		/// NodeNums than this new BarrierChain, and remove them from
		/// map. It is assumed BarrierChain has been set before calling
		/// this.
		void insertBarrierChain(Value2SUsMap &map);

		/// For an unanalyzable memory access, this Value is used in maps.
		UndefValue *UnknownValue;

/// DbgValues - Remember instruction that precedes DBG_VALUE.		/// DbgValues - Remember instruction that precedes DBG_VALUE.
/// These are generated by buildSchedGraph but persist so they can be		/// These are generated by buildSchedGraph but persist so they can be
/// referenced when emitting the final schedule.		/// referenced when emitting the final schedule.
typedef std::vector<std::pair<MachineInstr , MachineInstr > >		typedef std::vector<std::pair<MachineInstr , MachineInstr > >
DbgValueVector;		DbgValueVector;
DbgValueVector DbgValues;		DbgValueVector DbgValues;
MachineInstr *FirstDbgValue;		MachineInstr *FirstDbgValue;
▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

lib/CodeGen/ScheduleDAGInstrs.cpp

//===---- ScheduleDAGInstrs.cpp - MachineInstr Rescheduling ---------------===//		//===---- ScheduleDAGInstrs.cpp - MachineInstr Rescheduling ---------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This implements the ScheduleDAGInstrs class, which implements re-scheduling		// This implements the ScheduleDAGInstrs class, which implements re-scheduling
// of MachineInstrs.		// of MachineInstrs.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/CodeGen/ScheduleDAGInstrs.h"		#include "llvm/CodeGen/ScheduleDAGInstrs.h"
#include "llvm/ADT/IntEqClasses.h"		#include "llvm/ADT/IntEqClasses.h"
#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/CodeGen/LiveIntervalAnalysis.h"		#include "llvm/CodeGen/LiveIntervalAnalysis.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineMemOperand.h"		#include "llvm/CodeGen/MachineMemOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/PseudoSourceValue.h"		#include "llvm/CodeGen/PseudoSourceValue.h"
#include "llvm/CodeGen/RegisterPressure.h"		#include "llvm/CodeGen/RegisterPressure.h"
#include "llvm/CodeGen/ScheduleDFS.h"		#include "llvm/CodeGen/ScheduleDFS.h"
		#include "llvm/IR/Function.h"
		#include "llvm/IR/Type.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/Format.h"		#include "llvm/Support/Format.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetRegisterInfo.h"		#include "llvm/Target/TargetRegisterInfo.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"
#include <queue>		#include <queue>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "misched"		#define DEBUG_TYPE "misched"

static cl::opt<bool> EnableAASchedMI("enable-aa-sched-mi", cl::Hidden,		static cl::opt<bool> EnableAASchedMI("enable-aa-sched-mi", cl::Hidden,
cl::ZeroOrMore, cl::init(false),		cl::ZeroOrMore, cl::init(false),
cl::desc("Enable use of AA during MI DAG construction"));		cl::desc("Enable use of AA during MI DAG construction"));

static cl::opt<bool> UseTBAA("use-tbaa-in-sched-mi", cl::Hidden,		static cl::opt<bool> UseTBAA("use-tbaa-in-sched-mi", cl::Hidden,
cl::init(true), cl::desc("Enable use of TBAA during MI DAG construction"));		cl::init(true), cl::desc("Enable use of TBAA during MI DAG construction"));

		// Note: the two options below might be used in tuning compile time vs
		// output quality. Setting HugeRegion so large that it will never be
		// reached means best-effort, but may be slow.

		// When Stores and Loads maps (or NonAliasStores and NonAliasLoads)
		// together hold this many SUs, a reduction of maps will be done.
		static cl::opt<unsigned> HugeRegion("dag-maps-huge-region", cl::Hidden,
		cl::init(1000), cl::desc("The limit to use while constructing the DAG "
		"prior to scheduling, at which point a trade-off "
		"is made to avoid excessive compile time."));

		static cl::opt<unsigned> ReductionSize("dag-maps-reduction-size", cl::Hidden,
		cl::desc("A huge scheduling region will have maps reduced by this many "
		"nodes at a time. Defaults to HugeRegion / 2."));

		static void dumpSUList(ScheduleDAGInstrs::SUList &L) {
		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
		dbgs() << "{ ";
		for (auto *su : L) {
		dbgs() << "SU(" << su->NodeNum << ")";
		if (su != L.back())
		dbgs() << ", ";
		}
		dbgs() << "}\n";
		#endif
		}

ScheduleDAGInstrs::ScheduleDAGInstrs(MachineFunction &mf,		ScheduleDAGInstrs::ScheduleDAGInstrs(MachineFunction &mf,
const MachineLoopInfo *mli,		const MachineLoopInfo *mli,
bool RemoveKillFlags)		bool RemoveKillFlags)
: ScheduleDAG(mf), MLI(mli), MFI(mf.getFrameInfo()),		: ScheduleDAG(mf), MLI(mli), MFI(mf.getFrameInfo()),
RemoveKillFlags(RemoveKillFlags), CanHandleTerminators(false),		RemoveKillFlags(RemoveKillFlags), CanHandleTerminators(false),
TrackLaneMasks(false), FirstDbgValue(nullptr) {		TrackLaneMasks(false), AAForDep(nullptr), BarrierChain(nullptr),
		UnknownValue(UndefValue::get(
		Type::getVoidTy(mf.getFunction()->getContext()))),
		FirstDbgValue(nullptr) {
DbgValues.clear();		DbgValues.clear();

const TargetSubtargetInfo &ST = mf.getSubtarget();		const TargetSubtargetInfo &ST = mf.getSubtarget();
SchedModel.init(ST.getSchedModel(), &ST, TII);		SchedModel.init(ST.getSchedModel(), &ST, TII);
}		}

/// getUnderlyingObjectFromInt - This is the function that does the work of		/// getUnderlyingObjectFromInt - This is the function that does the work of
/// looking through basic ptrtoint+arithmetic+inttoptr sequences.		/// looking through basic ptrtoint+arithmetic+inttoptr sequences.
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	for (SmallVectorImpl<Value *>::iterator I = Objs.begin(), IE = Objs.end();
continue;		continue;
}		}
}		}
Objects.push_back(const_cast<Value *>(V));		Objects.push_back(const_cast<Value *>(V));
}		}
} while (!Working.empty());		} while (!Working.empty());
}		}

typedef PointerUnion<const Value , const PseudoSourceValue > ValueType;
typedef SmallVector<PointerIntPair<ValueType, 1, bool>, 4>
UnderlyingObjectsVector;

/// getUnderlyingObjectsForInstr - If this machine instr has memory reference		/// getUnderlyingObjectsForInstr - If this machine instr has memory reference
/// information and it can be tracked to a normal reference to a known		/// information and it can be tracked to a normal reference to a known
/// object, return the Value for that object.		/// object, return the Value for that object.
static void getUnderlyingObjectsForInstr(const MachineInstr *MI,		static void getUnderlyingObjectsForInstr(const MachineInstr *MI,
const MachineFrameInfo *MFI,		const MachineFrameInfo *MFI,
UnderlyingObjectsVector &Objects,		UnderlyingObjectsVector &Objects,
const DataLayout &DL) {		const DataLayout &DL) {
if (!MI->hasOneMemOperand() \|\|		if (!MI->hasOneMemOperand() \|\|
▲ Show 20 Lines • Show All 403 Lines • ▼ Show 20 Lines	static inline bool isUnsafeMemoryObject(MachineInstr *MI,
if ((*MI->memoperands_begin())->getPseudoValue()) {		if ((*MI->memoperands_begin())->getPseudoValue()) {
// Similarly to getUnderlyingObjectForInstr:		// Similarly to getUnderlyingObjectForInstr:
// For now, ignore PseudoSourceValues which may alias LLVM IR values		// For now, ignore PseudoSourceValues which may alias LLVM IR values
// because the code that uses this function has no way to cope with		// because the code that uses this function has no way to cope with
// such aliases.		// such aliases.
return true;		return true;
}		}

const Value V = (MI->memoperands_begin())->getValue();		if ((*MI->memoperands_begin())->getValue() == nullptr)
if (!V)
return true;		return true;

SmallVector<Value *, 4> Objs;
getUnderlyingObjects(V, Objs, DL);
for (Value *V : Objs) {
// Does this pointer refer to a distinct and identifiable object?
if (!isIdentifiedObject(V))
return true;
}

return false;		return false;
}		}

/// This returns true if the two MIs need a chain edge between them.		/// This returns true if the two MIs need a chain edge between them.
/// If these are not even memory operations, we still may need		/// This is called on normal stores and loads.
/// chain deps between them. The question really is - could
/// these two MIs be reordered during scheduling from memory dependency
/// point of view.
static bool MIsNeedChainEdge(AliasAnalysis AA, const MachineFrameInfo MFI,		static bool MIsNeedChainEdge(AliasAnalysis AA, const MachineFrameInfo MFI,
const DataLayout &DL, MachineInstr *MIa,		const DataLayout &DL, MachineInstr *MIa,
MachineInstr *MIb) {		MachineInstr *MIb) {
const MachineFunction *MF = MIa->getParent()->getParent();		const MachineFunction *MF = MIa->getParent()->getParent();
const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();		const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();

// Cover a trivial case - no edge is need to itself.		assert ((MIa->mayStore() \|\| MIb->mayStore()) &&
if (MIa == MIb)		"Dependency checked between two loads");
return false;

// Let the target decide if memory accesses cannot possibly overlap.		// Let the target decide if memory accesses cannot possibly overlap.
if ((MIa->mayLoad() \|\| MIa->mayStore()) &&
(MIb->mayLoad() \|\| MIb->mayStore()))
if (TII->areMemAccessesTriviallyDisjoint(MIa, MIb, AA))		if (TII->areMemAccessesTriviallyDisjoint(MIa, MIb, AA))
return false;		return false;

// FIXME: Need to handle multiple memory operands to support all targets.		// FIXME: Need to handle multiple memory operands to support all targets.
if (!MIa->hasOneMemOperand() \|\| !MIb->hasOneMemOperand())		if (!MIa->hasOneMemOperand() \|\| !MIb->hasOneMemOperand())
return true;		return true;

if (isUnsafeMemoryObject(MIa, MFI, DL) \|\| isUnsafeMemoryObject(MIb, MFI, DL))		if (isUnsafeMemoryObject(MIa, MFI, DL) \|\| isUnsafeMemoryObject(MIb, MFI, DL))
return true;		return true;

// If we are dealing with two "normal" loads, we do not need an edge
// between them - they could be reordered.
if (!MIa->mayStore() && !MIb->mayStore())
return false;

// To this point analysis is generic. From here on we do need AA.		// To this point analysis is generic. From here on we do need AA.
if (!AA)		if (!AA)
return true;		return true;

MachineMemOperand MMOa = MIa->memoperands_begin();		MachineMemOperand MMOa = MIa->memoperands_begin();
MachineMemOperand MMOb = MIb->memoperands_begin();		MachineMemOperand MMOb = MIb->memoperands_begin();

if (!MMOa->getValue() \|\| !MMOb->getValue())		if (!MMOa->getValue() \|\| !MMOb->getValue())
Show All 26 Lines	AliasResult AAResult =
AA->alias(MemoryLocation(MMOa->getValue(), Overlapa,		AA->alias(MemoryLocation(MMOa->getValue(), Overlapa,
UseTBAA ? MMOa->getAAInfo() : AAMDNodes()),		UseTBAA ? MMOa->getAAInfo() : AAMDNodes()),
MemoryLocation(MMOb->getValue(), Overlapb,		MemoryLocation(MMOb->getValue(), Overlapb,
UseTBAA ? MMOb->getAAInfo() : AAMDNodes()));		UseTBAA ? MMOb->getAAInfo() : AAMDNodes()));

return (AAResult != NoAlias);		return (AAResult != NoAlias);
}		}

/// This recursive function iterates over chain deps of SUb looking for		/// Check whether two objects need a chain edge and add it if needed.
/// "latest" node that needs a chain edge to SUa.		void ScheduleDAGInstrs::addChainDependency (SUnit SUa, SUnit SUb,
static unsigned iterateChainSucc(AliasAnalysis AA, const MachineFrameInfo MFI,		unsigned Latency) {
const DataLayout &DL, SUnit SUa, SUnit SUb,		if (MIsNeedChainEdge(AAForDep, MFI, MF.getDataLayout(), SUa->getInstr(),
SUnit ExitSU, unsigned Depth,		SUb->getInstr())) {
SmallPtrSetImpl<const SUnit *> &Visited) {		SDep Dep(SUa, SDep::MayAliasMem);
if (!SUa \|\| !SUb \|\| SUb == ExitSU)		Dep.setLatency(Latency);
return *Depth;

// Remember visited nodes.
if (!Visited.insert(SUb).second)
return *Depth;
// If there is _some_ dependency already in place, do not
// descend any further.
// TODO: Need to make sure that if that dependency got eliminated or ignored
// for any reason in the future, we would not violate DAG topology.
// Currently it does not happen, but makes an implicit assumption about
// future implementation.
//
// Independently, if we encounter node that is some sort of global
// object (like a call) we already have full set of dependencies to it
// and we can stop descending.
if (SUa->isSucc(SUb) \|\|
isGlobalMemoryObject(AA, SUb->getInstr()))
return *Depth;

// If we do need an edge, or we have exceeded depth budget,
// add that edge to the predecessors chain of SUb,
// and stop descending.
if (*Depth > 200 \|\|
MIsNeedChainEdge(AA, MFI, DL, SUa->getInstr(), SUb->getInstr())) {
SUb->addPred(SDep(SUa, SDep::MayAliasMem));
return *Depth;
}
// Track current depth.
(*Depth)++;
// Iterate over memory dependencies only.
for (SUnit::const_succ_iterator I = SUb->Succs.begin(), E = SUb->Succs.end();
I != E; ++I)
if (I->isNormalMemoryOrBarrier())
iterateChainSucc(AA, MFI, DL, SUa, I->getSUnit(), ExitSU, Depth, Visited);
return *Depth;
}

/// This function assumes that "downward" from SU there exist
/// tail/leaf of already constructed DAG. It iterates downward and
/// checks whether SU can be aliasing any node dominated
/// by it.
static void adjustChainDeps(AliasAnalysis AA, const MachineFrameInfo MFI,
const DataLayout &DL, SUnit SU, SUnit ExitSU,
std::set<SUnit *> &CheckList,
unsigned LatencyToLoad) {
if (!SU)
return;

SmallPtrSet<const SUnit*, 16> Visited;
unsigned Depth = 0;

for (std::set<SUnit *>::iterator I = CheckList.begin(), IE = CheckList.end();
I != IE; ++I) {
if (SU == *I)
continue;
if (MIsNeedChainEdge(AA, MFI, DL, SU->getInstr(), (*I)->getInstr())) {
SDep Dep(SU, SDep::MayAliasMem);
Dep.setLatency(((*I)->getInstr()->mayLoad()) ? LatencyToLoad : 0);
(*I)->addPred(Dep);
}

// Iterate recursively over all previously added memory chain
// successors. Keep track of visited nodes.
for (SUnit::const_succ_iterator J = (*I)->Succs.begin(),
JE = (*I)->Succs.end(); J != JE; ++J)
if (J->isNormalMemoryOrBarrier())
iterateChainSucc(AA, MFI, DL, SU, J->getSUnit(), ExitSU, &Depth,
Visited);
}
}

/// Check whether two objects need a chain edge, if so, add it
/// otherwise remember the rejected SU.
static inline void addChainDependency(AliasAnalysis *AA,
const MachineFrameInfo *MFI,
const DataLayout &DL, SUnit *SUa,
SUnit SUb, std::set<SUnit > &RejectList,
unsigned TrueMemOrderLatency = 0,
bool isNormalMemory = false) {
// If this is a false dependency,
// do not add the edge, but remember the rejected node.
if (MIsNeedChainEdge(AA, MFI, DL, SUa->getInstr(), SUb->getInstr())) {
SDep Dep(SUa, isNormalMemory ? SDep::MayAliasMem : SDep::Barrier);
Dep.setLatency(TrueMemOrderLatency);
SUb->addPred(Dep);		SUb->addPred(Dep);
}		}
else {
// Duplicate entries should be ignored.
RejectList.insert(SUb);
DEBUG(dbgs() << "\tReject chain dep between SU("
<< SUa->NodeNum << ") and SU("
<< SUb->NodeNum << ")\n");
}
}		}

/// Create an SUnit for each real instruction, numbered in top-down topological		/// Create an SUnit for each real instruction, numbered in top-down topological
/// order. The instruction order A < B, implies that no edge exists from B to A.		/// order. The instruction order A < B, implies that no edge exists from B to A.
///		///
/// Map each real instruction to its SUnit.		/// Map each real instruction to its SUnit.
///		///
/// After initSUnits, the SUnits vector cannot be resized and the scheduler may		/// After initSUnits, the SUnits vector cannot be resized and the scheduler may
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	for (; UI != VRegUses.end(); ++UI) {
if (UI->SU == SU)		if (UI->SU == SU)
break;		break;
}		}
if (UI == VRegUses.end())		if (UI == VRegUses.end())
VRegUses.insert(VReg2SUnit(Reg, 0, SU));		VRegUses.insert(VReg2SUnit(Reg, 0, SU));
}		}
}		}

		class ScheduleDAGInstrs::Value2SUsMap : public MapVector<ValueType, SUList> {

		/// Current total number of SUs in map.
		unsigned NumNodes;

		/// 1 for loads, 0 for stores. (see comment in SUList)
		unsigned TrueMemOrderLatency;
		public:

		Value2SUsMap(unsigned lat = 0) : NumNodes(0), TrueMemOrderLatency(lat) {}

		/// To keep NumNodes up to date, insert() is used instead of
		/// this operator w/ push_back().
		ValueType &operator[](const SUList &Key) {
		llvm_unreachable("Don't use. Use insert() instead."); };

		/// Add SU to the SUList of V. If Map grows huge, reduce its size
		/// by calling reduce().
		void inline insert(SUnit *SU, ValueType V) {
		MapVector::operator[](V).push_back(SU);
		NumNodes++;
		}

		/// Clears the list of SUs mapped to V.
		void inline clearList(ValueType V) {
		iterator Itr = find(V);
		if (Itr != end()) {
		assert (NumNodes >= Itr->second.size());
		NumNodes -= Itr->second.size();

		Itr->second.clear();
		}
		}

		/// Clears map from all contents.
		void clear() {
		MapVector<ValueType, SUList>::clear();
		NumNodes = 0;
		}

		unsigned inline size() const { return NumNodes; }

		/// Count the number of SUs in this map after a reduction.
		void reComputeSize(void) {
		NumNodes = 0;
		for (auto &I : *this)
		NumNodes += I.second.size();
		}

		unsigned inline getTrueMemOrderLatency() const {
		return TrueMemOrderLatency;
		}

		void dump();
		};

		void ScheduleDAGInstrs::addChainDependencies(SUnit *SU,
		Value2SUsMap &Val2SUsMap) {
		for (auto &I : Val2SUsMap)
		addChainDependencies(SU, I.second,
		Val2SUsMap.getTrueMemOrderLatency());
		}

		void ScheduleDAGInstrs::addChainDependencies(SUnit *SU,
		Value2SUsMap &Val2SUsMap,
		ValueType V) {
		Value2SUsMap::iterator Itr = Val2SUsMap.find(V);
		if (Itr != Val2SUsMap.end())
		addChainDependencies(SU, Itr->second,
		Val2SUsMap.getTrueMemOrderLatency());
		}

		void ScheduleDAGInstrs::addBarrierChain(Value2SUsMap &map) {
		assert (BarrierChain != nullptr);

		for (auto &I : map) {
		SUList &sus = I.second;
		for (auto *SU : sus)
		SU->addPredBarrier(BarrierChain);
		}
		map.clear();
		}

		void ScheduleDAGInstrs::insertBarrierChain(Value2SUsMap &map) {
		assert (BarrierChain != nullptr);

		// Go through all lists of SUs.
		for (Value2SUsMap::iterator I = map.begin(), EE = map.end(); I != EE;) {
		Value2SUsMap::iterator CurrItr = I++;
		SUList &sus = CurrItr->second;
		SUList::iterator SUItr = sus.begin(), SUEE = sus.end();
		for (; SUItr != SUEE; ++SUItr) {
		// Stop on BarrierChain or any instruction above it.
		if ((*SUItr)->NodeNum <= BarrierChain->NodeNum)
		break;

		(*SUItr)->addPredBarrier(BarrierChain);
		}

		// Remove also the BarrierChain from list if present.
		if (*SUItr == BarrierChain)
		chapuniUnsubmitted Not Done Reply Inline Actions SUItr might point SUEE here. Fixed in r268257. chapuni: SUItr might point SUEE here. Fixed in r268257.
		SUItr++;

		// Remove all SUs that are now successors of BarrierChain.
		if (SUItr != sus.begin())
		sus.erase(sus.begin(), SUItr);
		}

		// Remove all entries with empty su lists.
		map.remove_if([&](std::pair<ValueType, SUList> &mapEntry) {
		return (mapEntry.second.empty()); });

		// Recompute the size of the map (NumNodes).
		map.reComputeSize();
		}

/// If RegPressure is non-null, compute register pressure as a side effect. The		/// If RegPressure is non-null, compute register pressure as a side effect. The
/// DAG builder is an efficient place to do it because it already visits		/// DAG builder is an efficient place to do it because it already visits
/// operands.		/// operands.
void ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA,		void ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA,
RegPressureTracker *RPTracker,		RegPressureTracker *RPTracker,
PressureDiffs *PDiffs,		PressureDiffs *PDiffs,
LiveIntervals *LIS,		LiveIntervals *LIS,
bool TrackLaneMasks) {		bool TrackLaneMasks) {
const TargetSubtargetInfo &ST = MF.getSubtarget();		const TargetSubtargetInfo &ST = MF.getSubtarget();
bool UseAA = EnableAASchedMI.getNumOccurrences() > 0 ? EnableAASchedMI		bool UseAA = EnableAASchedMI.getNumOccurrences() > 0 ? EnableAASchedMI
: ST.useAA();		: ST.useAA();
AliasAnalysis *AAForDep = UseAA ? AA : nullptr;		AAForDep = UseAA ? AA : nullptr;

		BarrierChain = nullptr;

this->TrackLaneMasks = TrackLaneMasks;		this->TrackLaneMasks = TrackLaneMasks;
MISUnitMap.clear();		MISUnitMap.clear();
ScheduleDAG::clearDAG();		ScheduleDAG::clearDAG();

// Create an SUnit for each real instruction.		// Create an SUnit for each real instruction.
initSUnits();		initSUnits();

if (PDiffs)		if (PDiffs)
PDiffs->init(SUnits.size());		PDiffs->init(SUnits.size());

// We build scheduling units by walking a block's instruction list from bottom		// We build scheduling units by walking a block's instruction list
// to top.		// from bottom to top.

// Remember where a generic side-effecting instruction is as we proceed.		// Each MIs' memory operand(s) is analyzed to a list of underlying
SUnit BarrierChain = nullptr, AliasChain = nullptr;		// objects. The SU is then inserted in the SUList(s) mapped from
		// that Value(s). Each Value thus gets mapped to a list of SUs
// Memory references to specific known memory locations are tracked		// depending on it, defs and uses kept separately. Two SUs are
// so that they can be given more precise dependencies. We track		// non-aliasing to each other if they depend on different Values
// separately the known memory locations that may alias and those		// exclusively.
// that are known not to alias		Value2SUsMap Stores, Loads(1 /TrueMemOrderLatency/);
MapVector<ValueType, std::vector<SUnit *> > AliasMemDefs, NonAliasMemDefs;
MapVector<ValueType, std::vector<SUnit *> > AliasMemUses, NonAliasMemUses;		// Certain memory accesses are known to not alias any SU in Stores
std::set<SUnit*> RejectMemNodes;		// or Loads, and have therefore their own 'NonAlias'
		// domain. E.g. spill / reload instructions never alias LLVM I/R
		// Values. It is assumed that this type of memory accesses always
		// have a proper memory operand modelling, and are therefore never
		// unanalyzable. This means they are non aliasing against all nodes
		// in Stores and Loads, including the unanalyzable ones.
		Value2SUsMap NonAliasStores, NonAliasLoads(1 /TrueMemOrderLatency/);

		// Always reduce a huge region with half of the elements, except
		// when user sets this number explicitly.
		if (ReductionSize.getNumOccurrences() == 0)
		hfinkelUnsubmitted Not Done Reply Inline Actions Please make this a cl::opt command-line parameter. hfinkel: Please make this a cl::opt command-line parameter.
		ReductionSize = (HugeRegion / 2);

// Remove any stale debug info; sometimes BuildSchedGraph is called again		// Remove any stale debug info; sometimes BuildSchedGraph is called again
// without emitting the info from the previous call.		// without emitting the info from the previous call.
		hfinkelUnsubmitted Not Done Reply Inline Actions This too (a command-line parameter) -- having it default to HugeRegion/2 likely makes sense. (You can use the getNumOccurrences() function on a cl::opt to see if it has actually been set or not). hfinkel: This too (a command-line parameter) -- having it default to HugeRegion/2 likely makes sense.
DbgValues.clear();		DbgValues.clear();
FirstDbgValue = nullptr;		FirstDbgValue = nullptr;

assert(Defs.empty() && Uses.empty() &&		assert(Defs.empty() && Uses.empty() &&
"Only BuildGraph should update Defs/Uses");		"Only BuildGraph should update Defs/Uses");
Defs.setUniverse(TRI->getNumRegs());		Defs.setUniverse(TRI->getNumRegs());
Uses.setUniverse(TRI->getNumRegs());		Uses.setUniverse(TRI->getNumRegs());

▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::iterator MII = RegionEnd, MIE = RegionBegin;
// check currently relies on being called before adding chain deps.		// check currently relies on being called before adding chain deps.
if (SU->NumSuccs == 0 && SU->Latency > 1		if (SU->NumSuccs == 0 && SU->Latency > 1
&& (HasVRegDef \|\| MI->mayLoad())) {		&& (HasVRegDef \|\| MI->mayLoad())) {
SDep Dep(SU, SDep::Artificial);		SDep Dep(SU, SDep::Artificial);
Dep.setLatency(SU->Latency - 1);		Dep.setLatency(SU->Latency - 1);
ExitSU.addPred(Dep);		ExitSU.addPred(Dep);
}		}

// Add chain dependencies.		// Add memory dependencies (Note: isStoreToStackSlot and
// Chain dependencies used to enforce memory order should have		// isLoadFromStackSLot are not usable after stack slots are lowered to
// latency of 0 (except for true dependency of Store followed by		// actual addresses).
// aliased Load... we estimate that with a single cycle of latency
// assuming the hardware will bypass)		// This is a barrier event that acts as a pivotal node in the DAG.
// Note that isStoreToStackSlot and isLoadFromStackSLot are not usable
// after stack slots are lowered to actual addresses.
// TODO: Use an AliasAnalysis and do real alias-analysis queries, and
// produce more precise dependence information.
unsigned TrueMemOrderLatency = MI->mayStore() ? 1 : 0;
if (isGlobalMemoryObject(AA, MI)) {		if (isGlobalMemoryObject(AA, MI)) {
// Be conservative with these and add dependencies on all memory
// references, even those that are known to not alias.		// Become the barrier chain.
for (MapVector<ValueType, std::vector<SUnit *> >::iterator I =
NonAliasMemDefs.begin(), E = NonAliasMemDefs.end(); I != E; ++I) {
for (unsigned i = 0, e = I->second.size(); i != e; ++i) {
I->second[i]->addPred(SDep(SU, SDep::Barrier));
}
}
for (MapVector<ValueType, std::vector<SUnit *> >::iterator I =
NonAliasMemUses.begin(), E = NonAliasMemUses.end(); I != E; ++I) {
for (unsigned i = 0, e = I->second.size(); i != e; ++i) {
SDep Dep(SU, SDep::Barrier);
Dep.setLatency(TrueMemOrderLatency);
I->second[i]->addPred(Dep);
}
}
// Add SU to the barrier chain.
if (BarrierChain)		if (BarrierChain)
BarrierChain->addPred(SDep(SU, SDep::Barrier));		BarrierChain->addPredBarrier(SU);
BarrierChain = SU;		BarrierChain = SU;
// This is a barrier event that acts as a pivotal node in the DAG,
// so it is safe to clear list of exposed nodes.		DEBUG(dbgs() << "Global memory object and new barrier chain: SU("
adjustChainDeps(AA, MFI, MF.getDataLayout(), SU, &ExitSU, RejectMemNodes,		<< BarrierChain->NodeNum << ").\n";);
TrueMemOrderLatency);
RejectMemNodes.clear();		// Add dependencies against everything below it and clear maps.
NonAliasMemDefs.clear();		addBarrierChain(Stores);
NonAliasMemUses.clear();		addBarrierChain(Loads);
		addBarrierChain(NonAliasStores);
// fall-through		addBarrierChain(NonAliasLoads);
		atrickUnsubmitted Not Done Reply Inline Actions 80-column atrick: 80-column
new_alias_chain:
// Chain all possibly aliasing memory references through SU.		continue;
if (AliasChain) {		}
unsigned ChainLatency = 0;
if (AliasChain->getInstr()->mayLoad())		// If it's not a store or a variant load, we're done.
ChainLatency = TrueMemOrderLatency;		if (!MI->mayStore() && !(MI->mayLoad() && !MI->isInvariantLoad(AA)))
addChainDependency(AAForDep, MFI, MF.getDataLayout(), SU, AliasChain,		continue;
RejectMemNodes, ChainLatency);
}		// Always add dependecy edge to BarrierChain if present.
AliasChain = SU;
for (unsigned k = 0, m = PendingLoads.size(); k != m; ++k)
addChainDependency(AAForDep, MFI, MF.getDataLayout(), SU,
PendingLoads[k], RejectMemNodes,
TrueMemOrderLatency);
for (MapVector<ValueType, std::vector<SUnit *> >::iterator I =
AliasMemDefs.begin(), E = AliasMemDefs.end(); I != E; ++I) {
for (unsigned i = 0, e = I->second.size(); i != e; ++i)
addChainDependency(AAForDep, MFI, MF.getDataLayout(), SU,
I->second[i], RejectMemNodes);
}
for (MapVector<ValueType, std::vector<SUnit *> >::iterator I =
AliasMemUses.begin(), E = AliasMemUses.end(); I != E; ++I) {
for (unsigned i = 0, e = I->second.size(); i != e; ++i)
addChainDependency(AAForDep, MFI, MF.getDataLayout(), SU,
I->second[i], RejectMemNodes, TrueMemOrderLatency);
}
// This call must come after calls to addChainDependency() since it
// consumes the 'RejectMemNodes' list that addChainDependency() possibly
// adds to.
adjustChainDeps(AA, MFI, MF.getDataLayout(), SU, &ExitSU, RejectMemNodes,
TrueMemOrderLatency);
PendingLoads.clear();
AliasMemDefs.clear();
AliasMemUses.clear();
} else if (MI->mayStore()) {
// Add dependence on barrier chain, if needed.
// There is no point to check aliasing on barrier event. Even if
// SU and barrier _could_ be reordered, they should not. In addition,
// we have lost all RejectMemNodes below barrier.
if (BarrierChain)		if (BarrierChain)
BarrierChain->addPred(SDep(SU, SDep::Barrier));		BarrierChain->addPredBarrier(SU);

		// Find the underlying objects for MI. The Objs vector is either
		// empty, or filled with the Values of memory locations which this
		// SU depends on. An empty vector means the memory location is
		// unknown, and may alias anything except NonAlias nodes.
UnderlyingObjectsVector Objs;		UnderlyingObjectsVector Objs;
getUnderlyingObjectsForInstr(MI, MFI, Objs, MF.getDataLayout());		getUnderlyingObjectsForInstr(MI, MFI, Objs, MF.getDataLayout());

		if (MI->mayStore()) {
if (Objs.empty()) {		if (Objs.empty()) {
// Treat all other stores conservatively.		// An unknown store depends on all stores and loads.
goto new_alias_chain;		addChainDependencies(SU, Stores);
		addChainDependencies(SU, NonAliasStores);
		addChainDependencies(SU, Loads);
		addChainDependencies(SU, NonAliasLoads);

		// Map this store to 'UnknownValue'.
		Stores.insert(SU, UnknownValue);
		continue;
}		}

bool MayAlias = false;		// Add precise dependencies against all previously seen memory
		gberryUnsubmitted Not Done Reply Inline Actions This is indented weird (tabs?) gberry: This is indented weird (tabs?)
for (UnderlyingObjectsVector::iterator K = Objs.begin(), KE = Objs.end();		// accesses mapped to the same Value(s).
K != KE; ++K) {		for (auto &underlObj : Objs) {
ValueType V = K->getPointer();		ValueType V = underlObj.getPointer();
bool ThisMayAlias = K->getInt();		bool ThisMayAlias = underlObj.getInt();
if (ThisMayAlias)
MayAlias = true;

// A store to a specific PseudoSourceValue. Add precise dependencies.
// Record the def in MemDefs, first adding a dep if there is
// an existing def.
MapVector<ValueType, std::vector<SUnit *> >::iterator I =
((ThisMayAlias) ? AliasMemDefs.find(V) : NonAliasMemDefs.find(V));
MapVector<ValueType, std::vector<SUnit *> >::iterator IE =
((ThisMayAlias) ? AliasMemDefs.end() : NonAliasMemDefs.end());
if (I != IE) {
for (unsigned i = 0, e = I->second.size(); i != e; ++i)
addChainDependency(AAForDep, MFI, MF.getDataLayout(), SU,
I->second[i], RejectMemNodes, 0, true);

// If we're not using AA, then we only need one store per object.
if (!AAForDep)
I->second.clear();
I->second.push_back(SU);
} else {
if (ThisMayAlias) {
if (!AAForDep)
AliasMemDefs[V].clear();
AliasMemDefs[V].push_back(SU);
} else {
if (!AAForDep)
NonAliasMemDefs[V].clear();
NonAliasMemDefs[V].push_back(SU);
}
}
// Handle the uses in MemUses, if there are any.
MapVector<ValueType, std::vector<SUnit *> >::iterator J =
((ThisMayAlias) ? AliasMemUses.find(V) : NonAliasMemUses.find(V));
MapVector<ValueType, std::vector<SUnit *> >::iterator JE =
((ThisMayAlias) ? AliasMemUses.end() : NonAliasMemUses.end());
if (J != JE) {
for (unsigned i = 0, e = J->second.size(); i != e; ++i)
addChainDependency(AAForDep, MFI, MF.getDataLayout(), SU,
J->second[i], RejectMemNodes,
TrueMemOrderLatency, true);
J->second.clear();
}
}
if (MayAlias) {
// Add dependencies from all the PendingLoads, i.e. loads
// with no underlying object.
for (unsigned k = 0, m = PendingLoads.size(); k != m; ++k)
addChainDependency(AAForDep, MFI, MF.getDataLayout(), SU,
PendingLoads[k], RejectMemNodes,
TrueMemOrderLatency);
// Add dependence on alias chain, if needed.
if (AliasChain)
addChainDependency(AAForDep, MFI, MF.getDataLayout(), SU, AliasChain,
RejectMemNodes);
}
// This call must come after calls to addChainDependency() since it
// consumes the 'RejectMemNodes' list that addChainDependency() possibly
// adds to.
adjustChainDeps(AA, MFI, MF.getDataLayout(), SU, &ExitSU, RejectMemNodes,
TrueMemOrderLatency);
} else if (MI->mayLoad()) {
bool MayAlias = true;
if (MI->isInvariantLoad(AA)) {
// Invariant load, no chain dependencies needed!
} else {
UnderlyingObjectsVector Objs;
getUnderlyingObjectsForInstr(MI, MFI, Objs, MF.getDataLayout());

		Value2SUsMap &stores_ = (ThisMayAlias ? Stores : NonAliasStores);

		// Add dependencies to previous stores and loads mapped to V.
		addChainDependencies(SU, stores_, V);
		addChainDependencies(SU, (ThisMayAlias ? Loads : NonAliasLoads), V);

		// Map this store to V.
		stores_.insert(SU, V);
		}
		// The store may have dependencies to unanalyzable loads and
		// stores.
		addChainDependencies(SU, Loads, UnknownValue);
		addChainDependencies(SU, Stores, UnknownValue);
		}
		else { // SU is a load.
if (Objs.empty()) {		if (Objs.empty()) {
// A load with no underlying object. Depend on all		// An unknown load depends on all stores.
// potentially aliasing stores.		addChainDependencies(SU, Stores);
for (MapVector<ValueType, std::vector<SUnit *> >::iterator I =		addChainDependencies(SU, NonAliasStores);
AliasMemDefs.begin(), E = AliasMemDefs.end(); I != E; ++I)
for (unsigned i = 0, e = I->second.size(); i != e; ++i)
addChainDependency(AAForDep, MFI, MF.getDataLayout(), SU,
I->second[i], RejectMemNodes);

PendingLoads.push_back(SU);		Loads.insert(SU, UnknownValue);
MayAlias = true;		continue;
} else {
MayAlias = false;
}		}

for (UnderlyingObjectsVector::iterator		for (auto &underlObj : Objs) {
J = Objs.begin(), JE = Objs.end(); J != JE; ++J) {		ValueType V = underlObj.getPointer();
ValueType V = J->getPointer();		bool ThisMayAlias = underlObj.getInt();
bool ThisMayAlias = J->getInt();
		// Add precise dependencies against all previously seen stores
if (ThisMayAlias)		// mapping to the same Value(s).
MayAlias = true;		addChainDependencies(SU, (ThisMayAlias ? Stores : NonAliasStores), V);

// A load from a specific PseudoSourceValue. Add precise dependencies.		// Map this load to V.
MapVector<ValueType, std::vector<SUnit *> >::iterator I =		(ThisMayAlias ? Loads : NonAliasLoads).insert(SU, V);
((ThisMayAlias) ? AliasMemDefs.find(V) : NonAliasMemDefs.find(V));
MapVector<ValueType, std::vector<SUnit *> >::iterator IE =
((ThisMayAlias) ? AliasMemDefs.end() : NonAliasMemDefs.end());
if (I != IE)
for (unsigned i = 0, e = I->second.size(); i != e; ++i)
addChainDependency(AAForDep, MFI, MF.getDataLayout(), SU,
I->second[i], RejectMemNodes, 0, true);
if (ThisMayAlias)
AliasMemUses[V].push_back(SU);
else
NonAliasMemUses[V].push_back(SU);
}		}
// Add dependencies on alias and barrier chains, if needed.		// The load may have dependencies to unanalyzable stores.
if (MayAlias && AliasChain)		addChainDependencies(SU, Stores, UnknownValue);
addChainDependency(AAForDep, MFI, MF.getDataLayout(), SU, AliasChain,
RejectMemNodes);
if (MayAlias)
// This call must come after calls to addChainDependency() since it
// consumes the 'RejectMemNodes' list that addChainDependency()
// possibly adds to.
adjustChainDeps(AA, MFI, MF.getDataLayout(), SU, &ExitSU,
RejectMemNodes, /Latency=/0);
if (BarrierChain)
BarrierChain->addPred(SDep(SU, SDep::Barrier));
}		}

		// Reduce maps if they grow huge.
		if (Stores.size() + Loads.size() >= HugeRegion) {
		DEBUG(dbgs() << "Reducing Stores and Loads maps.\n";);
		reduceHugeMemNodeMaps(Stores, Loads, ReductionSize);
		hfinkelUnsubmitted Not Done Reply Inline Actions Should this be >=? We can add multiple dependencies at a time, so we might miss an exact threshold crossing? hfinkel: Should this be >=? We can add multiple dependencies at a time, so we might miss an exact…
		}
		if (NonAliasStores.size() + NonAliasLoads.size() >= HugeRegion) {
		DEBUG(dbgs() << "Reducing NonAliasStores and NonAliasLoads maps.\n";);
		reduceHugeMemNodeMaps(NonAliasStores, NonAliasLoads, ReductionSize);
}		}
}		}

if (DbgMI)		if (DbgMI)
FirstDbgValue = DbgMI;		FirstDbgValue = DbgMI;

Defs.clear();		Defs.clear();
Uses.clear();		Uses.clear();
CurrentVRegDefs.clear();		CurrentVRegDefs.clear();
CurrentVRegUses.clear();		CurrentVRegUses.clear();
PendingLoads.clear();		}

		raw_ostream &llvm::operator<<(raw_ostream &OS, const PseudoSourceValue* PSV) {
		PSV->printCustom(OS);
		return OS;
		}

		void ScheduleDAGInstrs::Value2SUsMap::dump() {
		for (auto &Itr : *this) {
		if (Itr.first.is<const Value*>()) {
		const Value V = Itr.first.get<const Value>();
		if (isa<UndefValue>(V))
		dbgs() << "Unknown";
		else
		V->printAsOperand(dbgs());
		}
		else if (Itr.first.is<const PseudoSourceValue*>())
		dbgs() << Itr.first.get<const PseudoSourceValue*>();
		else
		llvm_unreachable("Unknown Value type.");

		dbgs() << " : ";
		dumpSUList(Itr.second);
		}
		}

		/// Reduce maps in FIFO order, by N SUs. This is better than turning
		/// every Nth memory SU into BarrierChain in buildSchedGraph(), since
		/// it avoids unnecessary edges between seen SUs above the new
		/// BarrierChain, and those below it.
		void ScheduleDAGInstrs::reduceHugeMemNodeMaps(Value2SUsMap &stores,
		Value2SUsMap &loads, unsigned N) {
		DEBUG(dbgs() << "Before reduction:\nStoring SUnits:\n";
		stores.dump();
		dbgs() << "Loading SUnits:\n";
		loads.dump());

		// Insert all SU's NodeNums into a vector and sort it.
		std::vector<unsigned> NodeNums;
		NodeNums.reserve(stores.size() + loads.size());
		for (auto &I : stores)
		for (auto *SU : I.second)
		NodeNums.push_back(SU->NodeNum);
		for (auto &I : loads)
		for (auto *SU : I.second)
		NodeNums.push_back(SU->NodeNum);
		std::sort(NodeNums.begin(), NodeNums.end());

		// The N last elements in NodeNums will be removed, and the SU with
		// the lowest NodeNum of them will become the new BarrierChain to
		// let the not yet seen SUs have a dependency to the removed SUs.
		assert (N <= NodeNums.size());
		SUnit newBarrierChain = &SUnits[(NodeNums.end() - N)];
		if (BarrierChain) {
		// The aliasing and non-aliasing maps reduce independently of each
		// other, but share a common BarrierChain. Check if the
		// newBarrierChain is above the former one. If it is not, it may
		// introduce a loop to use newBarrierChain, so keep the old one.
		if (newBarrierChain->NodeNum < BarrierChain->NodeNum) {
		BarrierChain->addPredBarrier(newBarrierChain);
		BarrierChain = newBarrierChain;
		DEBUG(dbgs() << "Inserting new barrier chain: SU("
		<< BarrierChain->NodeNum << ").\n";);
		}
		else
		DEBUG(dbgs() << "Keeping old barrier chain: SU("
		<< BarrierChain->NodeNum << ").\n";);
		}
		else
		BarrierChain = newBarrierChain;

		insertBarrierChain(stores);
		insertBarrierChain(loads);

		DEBUG(dbgs() << "After reduction:\nStoring SUnits:\n";
		stores.dump();
		dbgs() << "Loading SUnits:\n";
		loads.dump());
}		}

/// \brief Initialize register live-range state for updating kills.		/// \brief Initialize register live-range state for updating kills.
void ScheduleDAGInstrs::startBlockForKills(MachineBasicBlock *BB) {		void ScheduleDAGInstrs::startBlockForKills(MachineBasicBlock *BB) {
// Start with no live registers.		// Start with no live registers.
LiveRegs.reset();		LiveRegs.reset();

// Examine the live-in regs of all successors.		// Examine the live-in regs of all successors.
▲ Show 20 Lines • Show All 523 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-misched-memdep-bug.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: llc < %s -mtriple=arm64-linux-gnu -mcpu=cortex-a57 -enable-misched -verify-misched -debug-only=misched -o - 2>&1 > /dev/null \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-linux-gnu -mcpu=cortex-a57 -enable-misched -verify-misched -debug-only=misched -o - 2>&1 > /dev/null \| FileCheck %s
	;			;
	; Test for bug in misched memory dependency calculation.			; Test for bug in misched memory dependency calculation.
	;			;
	; CHECK: ******** MI Scheduling ********			; CHECK: ******** MI Scheduling ********
	; CHECK: misched_bug:BB#0 entry			; CHECK: misched_bug:BB#0 entry
	; CHECK: SU(2): %vreg2<def> = LDRWui %vreg0, 1; mem:LD4[%ptr1_plus1] GPR32:%vreg2 GPR64common:%vreg0			; CHECK: SU(2): %vreg2<def> = LDRWui %vreg0, 1; mem:LD4[%ptr1_plus1] GPR32:%vreg2 GPR64common:%vreg0
	; CHECK: Successors:			; CHECK: Successors:
	; CHECK-NEXT: val SU(5): Latency=4 Reg=%vreg2			; CHECK-NEXT: val SU(5): Latency=4 Reg=%vreg2
	; CHECK-NEXT: ch SU(4): Latency=0			; CHECK-NEXT: ch SU(4): Latency=0
				; CHECK: SU(3): STRWui %WZR, %vreg0, 0; mem:ST4[%ptr1] GPR64common:%vreg0
				; CHECK: Successors:
				; CHECK: ch SU(4): Latency=0
				gberryUnsubmitted Not Done Reply Inline Actions I'm not sure I follow the reasoning behind this change. It seems like this is a regression. The 3->4 edge (load %ptr1_plus1 -> store %ptr1) should be unnecessary since areMemAccessTriviallyDisjoint can tell you these two accesses don't overlap. gberry: I'm not sure I follow the reasoning behind this change. It seems like this is a regression.
				jonpaAuthorUnsubmitted Not Done Reply Inline Actions I believe this would be the case with AliasAnalysis. However, without AA, all stores to the same Value get chained, and only the last seen store is kept for future reference. Therefore, it could never be legal to skip a dependency to a store to same value, since there may be other stores chained below it that will not be checked. I believe it is default for this target/cpu to not use AA, since when I debugged this test case, AAForDep was nullptr. jonpa: I believe this would be the case with AliasAnalysis. However, without AA, all stores to the…
				gberryUnsubmitted Not Done Reply Inline Actions I'm not sure what you mean by stores to the same Value (since there aren't any in this example). I kind of get what you're saying about the the non-AA case, but isn't the point of areMemAccessTriviallyDisjoint() to have a cheap approximation of AA? It seems like with this change it only has an effect if AA is being used, in which case is it really adding any extra disambiguation? gberry: I'm not sure what you mean by stores to the same Value (since there aren't any in this example).
	; CHECK: SU(4): STRWui %WZR, %vreg1, 0; mem:ST4[%ptr2] GPR64common:%vreg1			; CHECK: SU(4): STRWui %WZR, %vreg1, 0; mem:ST4[%ptr2] GPR64common:%vreg1
	; CHECK: SU(5): %W0<def> = COPY %vreg2; GPR32:%vreg2			; CHECK: SU(5): %W0<def> = COPY %vreg2; GPR32:%vreg2
	; CHECK: ** ScheduleDAGMI::schedule picking next node			; CHECK: ** ScheduleDAGMI::schedule picking next node
	define i32 @misched_bug(i32* %ptr1, i32* %ptr2) {			define i32 @misched_bug(i32* %ptr1, i32* %ptr2) {
	entry:			entry:
	%ptr1_plus1 = getelementptr inbounds i32, i32* %ptr1, i64 1			%ptr1_plus1 = getelementptr inbounds i32, i32* %ptr1, i64 1
	%val1 = load i32, i32* %ptr1_plus1, align 4			%val1 = load i32, i32* %ptr1_plus1, align 4
	store i32 0, i32* %ptr1, align 4			store i32 0, i32* %ptr1, align 4
	store i32 0, i32* %ptr2, align 4			store i32 0, i32* %ptr2, align 4
	ret i32 %val1			ret i32 %val1
	}			}

test/CodeGen/AArch64/tailcall_misched_graph.ll

	Show All 31 Lines
	; CHECK: STRXui [[VRB]], <fi#-3>			; CHECK: STRXui [[VRB]], <fi#-3>

	; Make sure that there is an dependence edge between fi#-2 and fi#-4.			; Make sure that there is an dependence edge between fi#-2 and fi#-4.
	; Without this edge the scheduler would be free to move the store accross the load.			; Without this edge the scheduler would be free to move the store accross the load.

	; CHECK: SU({{.*}}): [[VRB]]<def> = LDRXui <fi#-2>			; CHECK: SU({{.*}}): [[VRB]]<def> = LDRXui <fi#-2>
	; CHECK-NOT: SU			; CHECK-NOT: SU
	; CHECK: Successors:			; CHECK: Successors:
	; CHECK: ch SU([[DEPSTORE:.*]]): Latency=0			; CHECK: ch SU([[DEPSTOREB:.*]]): Latency=0
				; CHECK: ch SU([[DEPSTOREA:.*]]): Latency=0

	; CHECK: SU([[DEPSTORE]]): STRXui %vreg0, <fi#-4>			; CHECK: SU([[DEPSTOREA]]): STRXui %vreg{{.*}}, <fi#-4>
				; CHECK: SU([[DEPSTOREB]]): STRXui %vreg{{.*}}, <fi#-3>

test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll

	; RUN: llc -march=amdgcn -mcpu=hawaii -verify-machineinstrs -mattr=-promote-alloca < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=hawaii -verify-machineinstrs -mattr=-promote-alloca < %s \| FileCheck -check-prefix=GCN %s
				; XFAIL: *

	@sPrivateStorage = external addrspace(3) global [256 x [8 x <4 x i64>]]			@sPrivateStorage = external addrspace(3) global [256 x [8 x <4 x i64>]]

	; GCN-LABEL: {{^}}ds_reorder_vector_split:			; GCN-LABEL: {{^}}ds_reorder_vector_split:

	; Write zeroinitializer			; Write zeroinitializer
	; GCN-DAG: ds_write_b64 [[PTR:v[0-9]+]], [[VAL:v\[[0-9]+:[0-9]+\]]] offset:24			; GCN-DAG: ds_write_b64 [[PTR:v[0-9]+]], [[VAL:v\[[0-9]+:[0-9]+\]]] offset:24
	; GCN-DAG: ds_write_b64 [[PTR]], [[VAL]] offset:16			; GCN-DAG: ds_write_b64 [[PTR]], [[VAL]] offset:16
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/ppc64-fastcc.ll

	; RUN: llc -mcpu=pwr7 -mattr=-vsx < %s \| FileCheck %s			; RUN: llc -mcpu=pwr7 -mattr=-vsx < %s \| FileCheck %s
				; XFAIL: *

	target datalayout = "E-m:e-i64:64-n32:64"			target datalayout = "E-m:e-i64:64-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	define fastcc i64 @g1(i64 %g1, double %f1, <4 x i32> %v1, i64 %g2, double %f2, <4 x i32> %v2, i64 %g3, double %f3, <4 x i32> %v3, i64 %g4, double %f4, <4 x i32> %v4, i64 %g5, double %f5, <4 x i32> %v5, i64 %g6, double %f6, <4 x i32> %v6, i64 %g7, double %f7, <4 x i32> %v7, i64 %g8, double %f8, <4 x i32> %v8, i64 %g9, double %f9, <4 x i32> %v9, i64 %g10, double %f10, <4 x i32> %v10, i64 %g11, double %f11, <4 x i32> %v11, i64 %g12, double %f12, <4 x i32> %v12, i64 %g13, double %f13, <4 x i32> %v13, i64 %g14, double %f14, <4 x i32> %v14, i64 %g15, double %f15, <4 x i32> %v15, i64 %g16, double %f16, <4 x i32> %v16) #0 {			define fastcc i64 @g1(i64 %g1, double %f1, <4 x i32> %v1, i64 %g2, double %f2, <4 x i32> %v2, i64 %g3, double %f3, <4 x i32> %v3, i64 %g4, double %f4, <4 x i32> %v4, i64 %g5, double %f5, <4 x i32> %v5, i64 %g6, double %f6, <4 x i32> %v6, i64 %g7, double %f7, <4 x i32> %v7, i64 %g8, double %f8, <4 x i32> %v8, i64 %g9, double %f9, <4 x i32> %v9, i64 %g10, double %f10, <4 x i32> %v10, i64 %g11, double %f11, <4 x i32> %v11, i64 %g12, double %f12, <4 x i32> %v12, i64 %g13, double %f13, <4 x i32> %v13, i64 %g14, double %f14, <4 x i32> %v14, i64 %g15, double %f15, <4 x i32> %v15, i64 %g16, double %f16, <4 x i32> %v16) #0 {
	ret i64 %g1			ret i64 %g1

	; CHECK-LABEL: @g1			; CHECK-LABEL: @g1
	; CHECK-NOT: mr 3,			; CHECK-NOT: mr 3,
	▲ Show 20 Lines • Show All 507 Lines • ▼ Show 20 Lines
	}			}

	define void @cv13(<4 x i32> %v) #0 {			define void @cv13(<4 x i32> %v) #0 {
	tail call fastcc i64 @g1(i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> %v, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>)			tail call fastcc i64 @g1(i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> %v, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>)
	ret void			ret void

	; CHECK-LABEL: @cv13			; CHECK-LABEL: @cv13
	; CHECK-DAG: li [[REG1:[0-9]+]], 96			; CHECK-DAG: li [[REG1:[0-9]+]], 96
	; CHECK-DAG: vor [[REG2:[0-9]+]], 2, 2			; CHECK-DAG: vor [[REG2:[0-9]+]], 3, 3
	; CHECK: stvx [[REG2]], 1, [[REG1]]			; CHECK: stvx [[REG2]], 1, [[REG1]]
	; CHECK: blr			; CHECK: blr
	}			}

	define void @cv14(<4 x i32> %v) #0 {			define void @cv14(<4 x i32> %v) #0 {
	tail call fastcc i64 @g1(i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> %v, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>)			tail call fastcc i64 @g1(i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> %v, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i64 0, double 0.0, <4 x i32> <i32 0, i32 0, i32 0, i32 0>)
	ret void			ret void

	; CHECK-LABEL: @cv14			; CHECK-LABEL: @cv14
	; CHECK-DAG: li [[REG1:[0-9]+]], 128			; CHECK-DAG: li [[REG1:[0-9]+]], 128
	; CHECK-DAG: vor [[REG2:[0-9]+]], 2, 2			; CHECK-DAG: vor [[REG2:[0-9]+]], 3, 3
	; CHECK: stvx [[REG2]], 1, [[REG1]]			; CHECK: stvx [[REG2]], 1, [[REG1]]
	; CHECK: blr			; CHECK: blr
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

test/CodeGen/PowerPC/vsx-fma-m.ll

	; RUN: llc < %s -mcpu=pwr7 -mattr=+vsx \| FileCheck %s			; RUN: llc < %s -mcpu=pwr7 -mattr=+vsx \| FileCheck %s
	; RUN: llc < %s -mcpu=pwr7 -mattr=+vsx -fast-isel -O0 \| FileCheck -check-prefix=CHECK-FISL %s			; RUN: llc < %s -mcpu=pwr7 -mattr=+vsx -fast-isel -O0 \| FileCheck -check-prefix=CHECK-FISL %s
				; XFAIL: *
				hfinkelUnsubmitted Not Done Reply Inline Actions You can disable tests by just adding: ; XFAIL: * you don't need to change the run lines. It looks like we'll just need to fixup the register numbers in some of these. hfinkel: You can disable tests by just adding: ; XFAIL: * you don't need to change the run lines. It…

	; Also run with -schedule-ppc-vsx-fma-mutation-early as a stress test for the			; Also run with -schedule-ppc-vsx-fma-mutation-early as a stress test for the
	; live-interval-updating logic.			; live-interval-updating logic.
	; RUN: llc < %s -mcpu=pwr7 -mattr=+vsx -schedule-ppc-vsx-fma-mutation-early			; RUN: llc < %s -mcpu=pwr7 -mattr=+vsx -schedule-ppc-vsx-fma-mutation-early
	target datalayout = "E-m:e-i64:64-n32:64"			target datalayout = "E-m:e-i64:64-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	define void @test1(double %a, double %b, double %c, double %e, double* nocapture %d) #0 {			define void @test1(double %a, double %b, double %c, double %e, double* nocapture %d) #0 {
	▲ Show 20 Lines • Show All 344 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/vsx-fma-sp.ll

	; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 -mattr=+vsx \| FileCheck %s			; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 -mattr=+vsx \| FileCheck %s
	; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 -mattr=+vsx -fast-isel -O0 \| FileCheck -check-prefix=CHECK-FISL %s			; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 -mattr=+vsx -fast-isel -O0 \| FileCheck -check-prefix=CHECK-FISL %s
				; XFAIL: *

	define void @test1sp(float %a, float %b, float %c, float %e, float* nocapture %d) #0 {			define void @test1sp(float %a, float %b, float %c, float %e, float* nocapture %d) #0 {
	entry:			entry:
	%0 = tail call float @llvm.fma.f32(float %b, float %c, float %a)			%0 = tail call float @llvm.fma.f32(float %b, float %c, float %a)
	store float %0, float* %d, align 4			store float %0, float* %d, align 4
	%1 = tail call float @llvm.fma.f32(float %b, float %e, float %a)			%1 = tail call float @llvm.fma.f32(float %b, float %e, float %a)
	%arrayidx1 = getelementptr inbounds float, float* %d, i64 1			%arrayidx1 = getelementptr inbounds float, float* %d, i64 1
	store float %1, float* %arrayidx1, align 4			store float %1, float* %arrayidx1, align 4
	ret void			ret void
	▲ Show 20 Lines • Show All 158 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

ScheduleDAGInstrs::buildSchedGraph() handling of memory dependecies rewritten.ClosedPublic

Details

Diff Detail

Event Timeline

>

Results (-dag-maps-huge-region = 1000)

Results (-dag-maps-huge-region = 50)

Results (-dag-maps-huge-region = 20)

Revision Contents

Diff 46801

include/llvm/CodeGen/PseudoSourceValue.h

include/llvm/CodeGen/ScheduleDAG.h

include/llvm/CodeGen/ScheduleDAGInstrs.h

lib/CodeGen/ScheduleDAGInstrs.cpp

test/CodeGen/AArch64/arm64-misched-memdep-bug.ll

test/CodeGen/AArch64/tailcall_misched_graph.ll

test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll

test/CodeGen/PowerPC/ppc64-fastcc.ll

test/CodeGen/PowerPC/vsx-fma-m.ll

test/CodeGen/PowerPC/vsx-fma-sp.ll

ScheduleDAGInstrs::buildSchedGraph() handling of memory dependecies rewritten.
ClosedPublic