This is an archive of the discontinued LLVM Phabricator instance.

Differential D38097

[IVUsers] Changes to make IVUsers's results robust to instruction and uselist ordering
Needs ReviewPublic

Authored by dneilson on Sep 20 2017, 1:45 PM.

Download Raw Diff

Details

Reviewers

hfinkel

Summary

Full discussion of the motivation, and some background, for this patch can be
found on the LLVM dev mailing list:

http://lists.llvm.org/pipermail/llvm-dev/2017-September/117424.html

This patch makes two changes to the implementation of IVUsers' analysis
to make the result of the analysis more robust to instruction and
use list ordering:
A) Memoize the results of AddUsersImpl() each given instruction. The presence of
Processed.count(User) in the conditions that test for adding an instruction to the
current SCEV (approx lines 235 & 241) basically have the effect of changing the
return value of AddUsersImpl() for “interesting” instructions from true to false,
which results in different behaviour depending on the order in which instructions
are visited.

B) Don’t let the DFT continue into Users that are phi nodes in the loop-header.
We’re going to visit every phi node in the loop-header as the root of a DFT, anyways,
so this prevents the possibility of revisiting the same instruction multiple times
in the same def-use chain.

Diff Detail

Build Status

Buildable 10464
Build 10464: arc lint + arc unit

Event Timeline

dneilson created this revision.Sep 20 2017, 1:45 PM

Harbormaster completed remote builds in B10464: Diff 116064.Sep 20 2017, 1:46 PM

Any performance changes in the test suite?

In D38097#876806, @hfinkel wrote:

Any performance changes in the test suite?

I don't have access to SPEC source, so I cannot test those. I'm unfamiliar with the performance test suite, so I don't know what all else is there. I'm going to set up a run against our internal (Java-based) performance suite to see where that stands.

Functionally, this patch doesn't regress anything in 'ninja check-all'

In D38097#876817, @dneilson wrote:

In D38097#876806, @hfinkel wrote:

Any performance changes in the test suite?

I don't have access to SPEC source, so I cannot test those. I'm unfamiliar with the performance test suite, so I don't know what all else is there. I'm going to set up a run against our internal (Java-based) performance suite to see where that stands.

Great.

You can run LLVM's test suite using LNT by following the directions here: http://llvm.org/docs/lnt/quickstart.html

Functionally, this patch doesn't regress anything in 'ninja check-all'

Sounds good (although I have no idea how good our coverage is in this regard).

In D38097#876869, @hfinkel wrote:

In D38097#876817, @dneilson wrote:

In D38097#876806, @hfinkel wrote:

Any performance changes in the test suite?

You can run LLVM's test suite using LNT by following the directions here: http://llvm.org/docs/lnt/quickstart.html

Thanks for the LNT pointer -- seems like a pretty awesome tool!

I ran the benchmarks in the test-suite (excluding all external source ones, since I don't have them) on a Linux-x86 box, with 5x multisampling. Looks like 2 runtime regressions, and a runtime improvement:
MultiSource/Benchmarks/llubenchmark/llu - 3.56% (3.8200 -> 3.9560, sigma=0.0433). REGRESSION
MultiSource/Benchmarks/TSVC/CrossingThresholds-fit/CrossingThresholds-fit - 1.11% (2.8800 -> 2.9120, sigma 0.0054). REGRESSION
SingleSource/Benchmarks/Misc/flops-4 - -5.26% (1.9760 -> 1.8720, sigma 0.0178). IMPROVEMENT

Clearly we'd be missing one or more IVUsers as input to LSR in these benchmarks. It's interesting that it resulted in a 5% improvement in one benchmark. I don't understand LSR at all, and the code looks like a bit of a beast to wrap my head around, so this is going to require a lot more time & digging around to understand why these benchmarks are acting as they are with this patch.

Functionally, this patch doesn't regress anything in 'ninja check-all'

Sounds good (although I have no idea how good our coverage is in this regard).

Yeah, I don't know either. The IVUsers-specific tests are basically non-existent, so we'd pretty much be relying on the LSR tests to catch things.

So, TL;DR, i'm not sure how much you really care, this isn't going to make your ordering completely consistent in the face of use list reordering or instruction ordering. It should work if there is a single cycle, but not if there are nested cycles (IE nested phi cycles)

If you do care, the only complete solution i know of would be "form sccs of ssa graph, sort them if necessary, perform whatever filtering you want".

Forming scc's guarantees you have all instructions that you could ever want to process for a given node.
You can then sort the SCC's by dominance order (DT dfs numbers, then local dfs numbers) if you don't like the ordering it produces, and process.

That will guarantee completely consistent ordering, as tarjan scc's are maximal.

(This will have the same time bound as the current DFS based solution)

In D38097#878856, @dberlin wrote:

So, TL;DR, i'm not sure how much you really care, this isn't going to make your ordering completely consistent in the face of use list reordering or instruction ordering. It should work if there is a single cycle, but not if there are nested cycles.

Is there a reason not to use a complete solution, which would be "form sccs of ssa graph, sort them if necessary, perform whatever filtering you want".

Forming scc's guarantees you have all instructions that you could ever want to process.
You can then sort the SCC's by dominance order (DT dfs numbers, then local dfs numbers) if you don't like the ordering it produces, and process.

That will guarantee completely consistent ordering, as tarjan scc's are maximal.

(This will have the same time bound as the current DFS based solution)

No reason other than "didn't think of it" -- my first stab at this is trying to retain as much of the existing code/behaviour as possible.

Can you point me to another pass that uses the technique that you suggest? I'd like to see a sample of how it's implemented & how it works.

Revision Contents

Path

Size

include/

llvm/

Analysis/

IVUsers.h

3 lines

lib/

Analysis/

IVUsers.cpp

48 lines

test/

Analysis/

IVUsers/

invariant-out.ll

86 lines

Diff 116064

include/llvm/Analysis/IVUsers.h

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	class IVUsers {
friend class IVStrideUse;		friend class IVStrideUse;
Loop *L;		Loop *L;
AssumptionCache *AC;		AssumptionCache *AC;
LoopInfo *LI;		LoopInfo *LI;
DominatorTree *DT;		DominatorTree *DT;
ScalarEvolution *SE;		ScalarEvolution *SE;
SmallPtrSet<Instruction*, 16> Processed;		SmallPtrSet<Instruction*, 16> Processed;

		// Set of instrucions for which AddUsersImpl returned true.
		SmallPtrSet<Instruction *, 16> AddUsersMemoTrue;

/// IVUses - A list of all tracked IV uses of induction variable expressions		/// IVUses - A list of all tracked IV uses of induction variable expressions
/// we are interested in.		/// we are interested in.
ilist<IVStrideUse> IVUses;		ilist<IVStrideUse> IVUses;

// Ephemeral values used by @llvm.assume in this function.		// Ephemeral values used by @llvm.assume in this function.
SmallPtrSet<const Value *, 32> EphValues;		SmallPtrSet<const Value *, 32> EphValues;

public:		public:
▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

lib/Analysis/IVUsers.cpp

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines
/// reducible SCEV, recursively add its users to the IVUsesByStride set and		/// reducible SCEV, recursively add its users to the IVUsesByStride set and
/// return true. Otherwise, return false.		/// return true. Otherwise, return false.
bool IVUsers::AddUsersImpl(Instruction *I,		bool IVUsers::AddUsersImpl(Instruction *I,
SmallPtrSetImpl<Loop*> &SimpleLoopNests) {		SmallPtrSetImpl<Loop*> &SimpleLoopNests) {
const DataLayout &DL = I->getModule()->getDataLayout();		const DataLayout &DL = I->getModule()->getDataLayout();

// Add this IV user to the Processed set before returning false to ensure that		// Add this IV user to the Processed set before returning false to ensure that
// all IV users are members of the set. See IVUsers::isIVUserOrOperand.		// all IV users are members of the set. See IVUsers::isIVUserOrOperand.
if (!Processed.insert(I).second)		if (!Processed.insert(I).second) {
return true; // Instruction already handled.		// Instruction already handled
		// If the instruction is in AddUsersMemoTrue, then we returned
		// true for the instruction the last time we saw it, and should
		// do so again.
		return AddUsersMemoTrue.count(I) != 0;
		}

if (!SE->isSCEVable(I->getType()))		if (!SE->isSCEVable(I->getType()))
return false; // Void and FP expressions cannot be reduced.		return false; // Void and FP expressions cannot be reduced.

// IVUsers is used by LSR which assumes that all SCEV expressions are safe to		// IVUsers is used by LSR which assumes that all SCEV expressions are safe to
// pass to SCEVExpander. Expressions are not safe to expand if they represent		// pass to SCEVExpander. Expressions are not safe to expand if they represent
// operations that are not safe to speculate, namely integer division.		// operations that are not safe to speculate, namely integer division.
if (!isa<PHINode>(I) && !isSafeToSpeculativelyExecute(I))		if (!isa<PHINode>(I) && !isSafeToSpeculativelyExecute(I))
Show All 21 Lines	bool IVUsers::AddUsersImpl(Instruction *I,

SmallPtrSet<Instruction *, 4> UniqueUsers;		SmallPtrSet<Instruction *, 4> UniqueUsers;
for (Use &U : I->uses()) {		for (Use &U : I->uses()) {
Instruction *User = cast<Instruction>(U.getUser());		Instruction *User = cast<Instruction>(U.getUser());
if (!UniqueUsers.insert(User).second)		if (!UniqueUsers.insert(User).second)
continue;		continue;

// Do not infinitely recurse on PHI nodes.		// Do not infinitely recurse on PHI nodes.
if (isa<PHINode>(User) && Processed.count(User))		// Also, do not visit phi's in the loop header; these will be searched
		// from later as the root of a DFT, and doing so now would make it
		// possible to re-encounter the same instruction twice in the same
		// DFT stack (i.e. we're still in the middle of trying to resolve
		// the instruction)
		if (isa<PHINode>(User) &&
		(Processed.count(User) \|\| (User->getParent() == L->getHeader())))
continue;		continue;

// Only consider IVUsers that are dominated by simplified loop		// Only consider IVUsers that are dominated by simplified loop
// headers. Otherwise, SCEVExpander will crash.		// headers. Otherwise, SCEVExpander will crash.
BasicBlock *UseBB = User->getParent();		BasicBlock *UseBB = User->getParent();
// A phi's use is live out of its predecessor block.		// A phi's use is live out of its predecessor block.
if (PHINode *PHI = dyn_cast<PHINode>(User)) {		if (PHINode *PHI = dyn_cast<PHINode>(User)) {
unsigned OperandNo = U.getOperandNo();		unsigned OperandNo = U.getOperandNo();
unsigned ValNo = PHINode::getIncomingValueNumForOperand(OperandNo);		unsigned ValNo = PHINode::getIncomingValueNumForOperand(OperandNo);
UseBB = PHI->getIncomingBlock(ValNo);		UseBB = PHI->getIncomingBlock(ValNo);
}		}
if (!isSimplifiedLoopNest(UseBB, DT, LI, SimpleLoopNests))		if (!isSimplifiedLoopNest(UseBB, DT, LI, SimpleLoopNests))
return false;		return false;

// Descend recursively, but not into PHI nodes outside the current loop.		// Descend recursively, but not into PHI nodes outside the current loop.
// It's important to see the entire expression outside the loop to get		// It's important to see the entire expression outside the loop to get
// choices that depend on addressing mode use right, although we won't		// choices that depend on addressing mode use right, although we won't
// consider references outside the loop in all cases.		// consider references outside the loop in all cases.
// If User is already in Processed, we don't want to recurse into it again,
// but do want to record a second reference in the same instruction.
bool AddUserToIVUsers = false;		bool AddUserToIVUsers = false;
if (LI->getLoopFor(User->getParent()) != L) {		if (LI->getLoopFor(User->getParent()) != L) {
if (isa<PHINode>(User) \|\| Processed.count(User) \|\|		if (isa<PHINode>(User) \|\| !AddUsersImpl(User, SimpleLoopNests)) {
!AddUsersImpl(User, SimpleLoopNests)) {
DEBUG(dbgs() << "FOUND USER in other loop: " << *User << '\n'		DEBUG(dbgs() << "FOUND USER in other loop: " << *User << '\n'
<< " OF SCEV: " << *ISE << '\n');		<< " OF SCEV: " << *ISE << '\n');
AddUserToIVUsers = true;		AddUserToIVUsers = true;
}		}
} else if (Processed.count(User) \|\| !AddUsersImpl(User, SimpleLoopNests)) {		} else if (!AddUsersImpl(User, SimpleLoopNests)) {
DEBUG(dbgs() << "FOUND USER: " << *User << '\n'		DEBUG(dbgs() << "FOUND USER: " << *User << '\n'
<< " OF SCEV: " << *ISE << '\n');		<< " OF SCEV: " << *ISE << '\n');
AddUserToIVUsers = true;		AddUserToIVUsers = true;
}		}

if (AddUserToIVUsers) {		if (AddUserToIVUsers) {
// Okay, we found a user that we cannot reduce.		// Okay, we found a user that we cannot reduce.
IVStrideUse &NewUse = AddUser(User, I);		IVStrideUse &NewUse = AddUser(User, I);
Show All 28 Lines	if (AddUserToIVUsers) {
IVUses.pop_back();		IVUses.pop_back();
return false;		return false;
}		}
}		}
DEBUG(if (SE->getSCEV(I) != ISE)		DEBUG(if (SE->getSCEV(I) != ISE)
dbgs() << " NORMALIZED TO: " << *ISE << '\n');		dbgs() << " NORMALIZED TO: " << *ISE << '\n');
}		}
}		}
		AddUsersMemoTrue.insert(I);
return true;		return true;
}		}

bool IVUsers::AddUsersIfInteresting(Instruction *I) {		bool IVUsers::AddUsersIfInteresting(Instruction *I) {
		// (( Note: This comment block is reconstructed from reverse engineering
		// the implementation. Do not take it to be the gospel, but instead take
		// it as a framework from which to understand the implementation. Please
		// add-to or correct as you see fit. ))
		//
		// AddUsersIfInteresting performs a depth-first traversal (DFT) of def-use
		// chains starting with the given I. Each DFT path is a def-use chain:
		// I -> ... -> S -> T
		// such that:
		// * The chain starts with some I that is a phi in the loop header.
		// * The chain ends with some instruction T such that AddUsersImpl(T)
		// returns false. i.e. T does not have an interesting SCEV, or is
		// an ephemeral value, or... etc.
		// * All instructions in the chain except for T have interesting SCEVs,
		// and AddUsersImpl() for these instructions will return true.
		//
		// Given such a def-use chain found via this DFT, we add to the IVUsers set:
		// T as a user of the SCEV of S

		assert(isa<PHINode>(I) && "Expected a phi node to start the DFT");
		assert(I->getParent() == L->getHeader() && "Expected I in the loop header");

// SCEVExpander can only handle users that are dominated by simplified loop		// SCEVExpander can only handle users that are dominated by simplified loop
// entries. Keep track of all loops that are only dominated by other simple		// entries. Keep track of all loops that are only dominated by other simple
// loops so we don't traverse the domtree for each user.		// loops so we don't traverse the domtree for each user.
SmallPtrSet<Loop*,16> SimpleLoopNests;		SmallPtrSet<Loop*,16> SimpleLoopNests;

return AddUsersImpl(I, SimpleLoopNests);		return AddUsersImpl(I, SimpleLoopNests);
}		}

▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
}		}

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD void IVUsers::dump() const { print(dbgs()); }		LLVM_DUMP_METHOD void IVUsers::dump() const { print(dbgs()); }
#endif		#endif

void IVUsers::releaseMemory() {		void IVUsers::releaseMemory() {
Processed.clear();		Processed.clear();
		AddUsersMemoTrue.clear();
IVUses.clear();		IVUses.clear();
}		}

IVUsersWrapperPass::IVUsersWrapperPass() : LoopPass(ID) {		IVUsersWrapperPass::IVUsersWrapperPass() : LoopPass(ID) {
initializeIVUsersWrapperPassPass(*PassRegistry::getPassRegistry());		initializeIVUsersWrapperPassPass(*PassRegistry::getPassRegistry());
}		}

void IVUsersWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {		void IVUsersWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

test/Analysis/IVUsers/invariant-out.ll

This file was added.

				; This tests to make sure that the IV-Users analysis's results are
				; invariant of input order. We have the same IR with some small differences
				; in the ordering of instructions, and ensure that all variants generate
				; the same set of IVUsers.

				; RUN: opt -analyze -iv-users -S < %s \| grep -e '%.*=' \| sort \| FileCheck %s
				; CHECK: %2 = {{.*}} in %3 = sitofp i32 %2 to double
				; CHECK: %2 = {{.*}} in %3 = sitofp i32 %2 to double
				; CHECK: %2 = {{.*}} in %3 = sitofp i32 %2 to double
				; CHECK: %iv.inc = {{.}} in store i64 %iv.inc, i64 %addr, align 8
				; CHECK: %iv.inc = {{.}} in store i64 %iv.inc, i64 %addr, align 8
				; CHECK: %iv.inc = {{.}} in store i64 %iv.inc, i64 %addr, align 8

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"
				target triple = "x86_64-unknown-linux-gnu"

				define void @order1(i64 %v1, i32 %v2, i64* %addr) {
				entry:
				br label %loop

				loop:
				%iv = phi i64 [%v1, %entry], [%iv.inc, %loop]
				%iv2 = phi i32 [%v2, %entry], [%6, %loop]
				%0 = add nuw i64 %iv, 1
				%1 = trunc i64 %0 to i32
				%2 = sub i32 %iv2, %1
				%3 = sitofp i32 %2 to double
				%4 = sub i64 1, %iv
				%5 = trunc i64 %4 to i32
				%6 = sub i32 %2, %5
				%iv.inc = add i64 %iv, 1
				store i64 %iv.inc, i64* %addr, align 8
				br i1 undef, label %loop, label %exit

				exit:
				ret void
				}

				; Reorder the phi instructions from @order1
				define void @order2(i64 %v1, i32 %v2, i64* %addr) {
				entry:
				br label %loop

				loop:
				%iv2 = phi i32 [%v2, %entry], [%6, %loop]
				%iv = phi i64 [%v1, %entry], [%iv.inc, %loop]
				%0 = add nuw i64 %iv, 1
				%1 = trunc i64 %0 to i32
				%2 = sub i32 %iv2, %1
				%3 = sitofp i32 %2 to double
				%4 = sub i64 1, %iv
				%5 = trunc i64 %4 to i32
				%6 = sub i32 %2, %5
				%iv.inc = add i64 %iv, 1
				store i64 %iv.inc, i64* %addr, align 8
				br i1 undef, label %loop, label %exit

				exit:
				ret void
				}

				; Reorder the uselist in %iv's phi compared to @order1
				define void @order3(i64 %v1, i32 %v2, i64* %addr) {
				entry:
				br label %loop

				loop:
				%iv = phi i64 [%v1, %entry], [%iv.inc, %loop]
				%iv2 = phi i32 [%v2, %entry], [%6, %loop]
				%0 = add nuw i64 %iv, 1
				%1 = trunc i64 %0 to i32
				%2 = sub i32 %iv2, %1
				%3 = sitofp i32 %2 to double
				%4 = sub i64 1, %iv
				%5 = trunc i64 %4 to i32
				%6 = sub i32 %2, %5
				%iv.inc = add i64 %iv, 1
				store i64 %iv.inc, i64* %addr, align 8
				br i1 undef, label %loop, label %exit

				exit:
				ret void

				; uselistorder directives ----
				uselistorder i64 %iv, {2, 1, 0}
				}

This is an archive of the discontinued LLVM Phabricator instance.

[IVUsers] Changes to make IVUsers's results robust to instruction and uselist orderingNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 116064

include/llvm/Analysis/IVUsers.h

lib/Analysis/IVUsers.cpp

test/Analysis/IVUsers/invariant-out.ll

[IVUsers] Changes to make IVUsers's results robust to instruction and uselist ordering
Needs ReviewPublic