This is an archive of the discontinued LLVM Phabricator instance.

Differential D38691

Add anti- and output loop carried dependences in SwingScheduler
AcceptedPublic

Authored by ning4827 on Oct 9 2017, 9:02 AM.

Download Raw Diff

Details

Reviewers

bcahoon

Summary

Consider anti- and output dependences in the addLoopCarriedDependences function

Diff Detail

Event Timeline

ning4827 created this revision.Oct 9 2017, 9:02 AM

Hi Ning,

Just a couple of minor comments, but I think the change looks good. I tried your example, provided in an earlier version, but it doesn't have any output or anti dependences by the time it gets to the pipeliner. The load of a[i-1] gets optimized away. I used the following to compile it, so perhaps some other command-line options are needed?

clang -fno-unroll-loops -target hexagon -O2 example.c -o example.s -S

But, I did noticee that a lit test, test/CodeGen/Hexagon/vect/vect-v4i16.ll, fails with the patch due to extra output dependence edges. It contains a loop that is no longer pipelined. The code generation is different, so the checks in the test fail. I think it's fine, for this lit test to remove the checks:
; CHECK: memuh(r{{[0-9]+}}+#6)
; CHECK: combine(r{{[0-9]+}},r{{[0-9]+}})

and leave the check for the vaddh, which shows that the test is vectorized.

Thanks,
Brendon

lib/CodeGen/MachinePipeliner.cpp
1026	Update the command to reflect the new functionality.
1052	Change the variable name, Load, since this can be either a load or store.

Change the variable name and update the comments

Fix the lit test

Thanks Brendon for the comments. I have updated the diff. Let me know if it's okay.

Best,
Ning

In D38691#896376, @bcahoon wrote:

Hi Ning,

Just a couple of minor comments, but I think the change looks good. I tried your example, provided in an earlier version, but it doesn't have any output or anti dependences by the time it gets to the pipeliner. The load of a[i-1] gets optimized away. I used the following to compile it, so perhaps some other command-line options are needed?

I worked on another platform and didn't realize that a[i-1] will be optimized away on Hexagon..

clang -fno-unroll-loops -target hexagon -O2 example.c -o example.s -S
But, I did noticee that a lit test, test/CodeGen/Hexagon/vect/vect-v4i16.ll, fails with the patch due to extra output dependence edges. It contains a loop that is no longer pipelined. The code generation is different, so the checks in the test fail. I think it's fine, for this lit test to remove the checks:
; CHECK: memuh(r{{[0-9]+}}+#6)
; CHECK: combine(r{{[0-9]+}},r{{[0-9]+}})

and leave the check for the vaddh, which shows that the test is vectorized.

Thanks,
Brendon

Add back the changes on pipliner

Sorry for the delay in responding. I've been trying to create a simple test case for this patch, but no luck yet. Otherwise, the patch looks good to me. Thanks!

This revision is now accepted and ready to land.Oct 27 2017, 4:03 PM

Hi Ning,

Let me know if you need me to commit this patch. Or, if you're able to commit it, then that would be great too.

Thanks,
Brendon

Hi Brendon,

As a first-time llvm patch uploader, I don't think I can commit.. :-)
Thank you for reviewing this patch and to commit it.

Best,
Ning

In D38691#915665, @bcahoon wrote:

Hi Ning,

Let me know if you need me to commit this patch. Or, if you're able to commit it, then that would be great too.

Thanks,
Brendon

Revision Contents

Path

Size

lib/

CodeGen/

MachinePipeliner.cpp

130 lines

test/

CodeGen/

Hexagon/

vect/

vect-v4i16.ll

2 lines

Diff 119025

lib/CodeGen/MachinePipeliner.cpp

Show First 20 Lines • Show All 1,017 Lines • ▼ Show 20 Lines	static void getUnderlyingObjects(MachineInstr *MI,
if (!MI->hasOneMemOperand())		if (!MI->hasOneMemOperand())
return;		return;
MachineMemOperand MM = MI->memoperands_begin();		MachineMemOperand MM = MI->memoperands_begin();
if (!MM->getValue())		if (!MM->getValue())
return;		return;
GetUnderlyingObjects(const_cast<Value *>(MM->getValue()), Objs, DL);		GetUnderlyingObjects(const_cast<Value *>(MM->getValue()), Objs, DL);
}		}

/// Add a chain edge between a load and store if the store can be an		/// Add a chain edge between a load and store if the store can be an
		bcahoonUnsubmitted Not Done Reply Inline Actions Update the command to reflect the new functionality. bcahoon: Update the command to reflect the new functionality.
/// alias of the load on a subsequent iteration, i.e., a loop carried		/// alias of the load on a subsequent iteration, i.e., a loop carried
/// dependence. This code is very similar to the code in ScheduleDAGInstrs		/// dependence.
		/// Similarly, add a chain edge between a store and load or between a
		/// store and store.
		/// This code is very similar to the code in ScheduleDAGInstrs
/// but that code doesn't create loop carried dependences.		/// but that code doesn't create loop carried dependences.
void SwingSchedulerDAG::addLoopCarriedDependences(AliasAnalysis *AA) {		void SwingSchedulerDAG::addLoopCarriedDependences(AliasAnalysis *AA) {
MapVector<Value , SmallVector<SUnit , 4>> PendingLoads;		MapVector<Value , SmallVector<SUnit , 8>> PendingLoadsStores;
for (auto &SU : SUnits) {		for (auto &SU : SUnits) {
MachineInstr &MI = *SU.getInstr();		MachineInstr &MI = *SU.getInstr();
if (isDependenceBarrier(MI, AA))		if (isDependenceBarrier(MI, AA))
PendingLoads.clear();		PendingLoadsStores.clear();
else if (MI.mayLoad()) {		else if (!MI.mayLoad() && !MI.mayStore())
SmallVector<Value *, 4> Objs;		continue;

		SmallVector<Value *, 8> Objs;
getUnderlyingObjects(&MI, Objs, MF.getDataLayout());		getUnderlyingObjects(&MI, Objs, MF.getDataLayout());
for (auto V : Objs) {		for (auto V : Objs) {
SmallVector<SUnit *, 4> &SUs = PendingLoads[V];		SmallVector<SUnit *, 8> &SUs = PendingLoadsStores[V];
SUs.push_back(&SU);		SUs.push_back(&SU);
}
} else if (MI.mayStore()) {		MapVector<Value , SmallVector<SUnit , 8>>::iterator I =
SmallVector<Value *, 4> Objs;		PendingLoadsStores.find(V);
getUnderlyingObjects(&MI, Objs, MF.getDataLayout());		if (I == PendingLoadsStores.end())
for (auto V : Objs) {		continue;
MapVector<Value , SmallVector<SUnit , 4>>::iterator I =		for (auto LdSt : I->second) {
		bcahoonUnsubmitted Not Done Reply Inline Actions Change the variable name, Load, since this can be either a load or store. bcahoon: Change the variable name, Load, since this can be either a load or store.
PendingLoads.find(V);		if (isSuccOrder(LdSt, &SU))
if (I == PendingLoads.end())
continue;		continue;
for (auto Load : I->second) {
if (isSuccOrder(Load, &SU))		MachineInstr &LdStMI = *LdSt->getInstr();
		if ((LdStMI.mayLoad() && MI.mayLoad()) \|\| &LdStMI == &MI)
continue;		continue;
MachineInstr &LdMI = *Load->getInstr();
// First, perform the cheaper check that compares the base register.		// First, perform the cheaper check that compares the base register.
// If they are the same and the load offset is less than the store		// If they are the same and the load offset is less than the store
// offset, then mark the dependence as loop carried potentially.		// offset, then mark the dependence as loop carried potentially.
unsigned BaseReg1, BaseReg2;		unsigned BaseReg1, BaseReg2;
int64_t Offset1, Offset2;		int64_t Offset1, Offset2;
if (!TII->getMemOpBaseRegImmOfs(LdMI, BaseReg1, Offset1, TRI) \|\|		if (!TII->getMemOpBaseRegImmOfs(LdStMI, BaseReg1, Offset1, TRI) \|\|
!TII->getMemOpBaseRegImmOfs(MI, BaseReg2, Offset2, TRI)) {		!TII->getMemOpBaseRegImmOfs(MI, BaseReg2, Offset2, TRI)) {
SU.addPred(SDep(Load, SDep::Barrier));		SU.addPred(SDep(LdSt, SDep::Barrier));
continue;		continue;
}		}
if (BaseReg1 == BaseReg2 && (int)Offset1 < (int)Offset2) {		if (BaseReg1 == BaseReg2 && (int)Offset1 < (int)Offset2) {
assert(TII->areMemAccessesTriviallyDisjoint(LdMI, MI, AA) &&		assert(TII->areMemAccessesTriviallyDisjoint(LdStMI, MI, AA) &&
"What happened to the chain edge?");		"What happened to the chain edge?");
SU.addPred(SDep(Load, SDep::Barrier));		SU.addPred(SDep(LdSt, SDep::Barrier));
continue;		continue;
}		}
// Second, the more expensive check that uses alias analysis on the		// Second, the more expensive check that uses alias analysis on the
// base registers. If they alias, and the load offset is less than		// base registers. If they alias, and the load offset is less than
// the store offset, the mark the dependence as loop carried.		// the store offset, the mark the dependence as loop carried.
if (!AA) {		if (!AA) {
SU.addPred(SDep(Load, SDep::Barrier));		SU.addPred(SDep(LdSt, SDep::Barrier));
continue;		continue;
}		}
MachineMemOperand MMO1 = LdMI.memoperands_begin();		MachineMemOperand MMO1 = LdStMI.memoperands_begin();
MachineMemOperand MMO2 = MI.memoperands_begin();		MachineMemOperand MMO2 = MI.memoperands_begin();
if (!MMO1->getValue() \|\| !MMO2->getValue()) {		if (!MMO1->getValue() \|\| !MMO2->getValue()) {
SU.addPred(SDep(Load, SDep::Barrier));		SU.addPred(SDep(LdSt, SDep::Barrier));
continue;		continue;
}		}
if (MMO1->getValue() == MMO2->getValue() &&		if (MMO1->getValue() == MMO2->getValue() &&
MMO1->getOffset() <= MMO2->getOffset()) {		MMO1->getOffset() <= MMO2->getOffset()) {
SU.addPred(SDep(Load, SDep::Barrier));		SU.addPred(SDep(LdSt, SDep::Barrier));
continue;		continue;
}		}
AliasResult AAResult = AA->alias(		AliasResult AAResult = AA->alias(
MemoryLocation(MMO1->getValue(), MemoryLocation::UnknownSize,		MemoryLocation(MMO1->getValue(), MemoryLocation::UnknownSize,
MMO1->getAAInfo()),		MMO1->getAAInfo()),
MemoryLocation(MMO2->getValue(), MemoryLocation::UnknownSize,		MemoryLocation(MMO2->getValue(), MemoryLocation::UnknownSize,
MMO2->getAAInfo()));		MMO2->getAAInfo()));

if (AAResult != NoAlias)		if (AAResult != NoAlias)
SU.addPred(SDep(Load, SDep::Barrier));		SU.addPred(SDep(LdSt, SDep::Barrier));
}
}		}
}		}
}		}
}		}

/// Update the phi dependences to the DAG because ScheduleDAGInstrs no longer		/// Update the phi dependences to the DAG because ScheduleDAGInstrs no longer
/// processes dependences for PHIs. This function adds true dependences		/// processes dependences for PHIs. This function adds true dependences
/// from a PHI to a use, and a loop carried dependence from the use to the		/// from a PHI to a use, and a loop carried dependence from the use to the
▲ Show 20 Lines • Show All 2,877 Lines • Show Last 20 Lines

test/CodeGen/Hexagon/vect/vect-v4i16.ll

	; RUN: llc -march=hexagon -mcpu=hexagonv5 -disable-hsdr < %s \| FileCheck %s			; RUN: llc -march=hexagon -mcpu=hexagonv5 -disable-hsdr < %s \| FileCheck %s

	; Check that store is post-incremented.			; Check that store is post-incremented.
	; CHECK: memuh(r{{[0-9]+}}+#6)
	; CHECK: combine(r{{[0-9]+}},r{{[0-9]+}})
	; CHECK: vaddh			; CHECK: vaddh

	target datalayout = "e-p:32:32:32-i64:64:64-i32:32:32-i16:16:16-i1:32:32-f64:64:64-f32:32:32-v64:64:64-v32:32:32-a0:0-n16:32"			target datalayout = "e-p:32:32:32-i64:64:64-i32:32:32-i16:16:16-i1:32:32-f64:64:64-f32:32:32-v64:64:64-v32:32:32-a0:0-n16:32"
	target triple = "hexagon"			target triple = "hexagon"

	define void @matrix_add_const(i32 %N, i16* nocapture %A, i16 signext %val) #0 {			define void @matrix_add_const(i32 %N, i16* nocapture %A, i16 signext %val) #0 {
	entry:			entry:
	%cmp5 = icmp eq i32 %N, 0			%cmp5 = icmp eq i32 %N, 0
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines