This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Analysis/Utils/
-
llvm/
-
Analysis/
-
Utils/
-
Local.h
-
lib/Transforms/
-
Transforms/
-
Scalar/
3
LoopInstSimplify.cpp
-
Utils/
-
Local.cpp
-
test/Transforms/LoopInstSimplify/
-
Transforms/
-
LoopInstSimplify/
-
basic.ll

Differential D47407

[LoopInstSimplify] Re-implement the core logic of loop-instsimplify to be both simpler and substantially more efficient.
ClosedPublic

Authored by chandlerc on May 26 2018, 2:55 AM.

Download Raw Diff

Details

Reviewers

sanjoy
asbirlea

Commits

rG4cbcbb076132: [LoopInstSimplify] Re-implement the core logic of loop-instsimplify to be both…
rL333461: [LoopInstSimplify] Re-implement the core logic of loop-instsimplify to

Summary

Rather than use a hand-rolled iteration technique that isn't quite the
same as RPO, use the pre-built RPO loop body traversal utility.

Once visiting the loop body in RPO, we can assert that we visit defs
before uses reliably. When this is the case, the only need to iterate is
when simplifying a def that is used by a PHI node along a back-edge.
With this patch, the first pass over the loop body is just a complete
simplification of every instruction across the loop body. When we
encounter a use of a simplified instruction that stems from a PHI node
in the loop body that has already been visited (due to some cyclic CFG,
potentially the loop itself, or a nested loop, or unstructured control
flow), we recall that specific PHI node for the second iteration.
Nothing else needs to be preserved from iteration to iteration.

On the second and later iterations, only instructions known to have
simplified inputs are considered, each time starting from a set of PHIs
that had simplified inputs along the backedges.

Dead instructions are collected along the way, but deleted in a batch at
the end of each iteration making the iterations themselves substantially
simpler. This uses a new batch API for recursively deleting dead
instructions.

This alsa changes the routine to visit subloops. Because simplification
is fundamentally transitive, we may need to visit the entire loop body,
including subloops, to handle knock-on simplification.

I've added a basic test file that helps demonstrate that all of these
changes work. It includes both straight-forward loops with
simplifications as well as interesting PHI-structures, CFG-structures,
and a nested loop case.

Diff Detail

Repository: rL LLVM

Event Timeline

chandlerc created this revision.May 26 2018, 2:55 AM

Herald added subscribers: hiraditya, mcrosier. · View Herald TranscriptMay 26 2018, 2:55 AM

Harbormaster completed remote builds in B18619: Diff 148710.May 26 2018, 2:55 AM

Fix some formatting goofs that snuck in w/o a run of clang-format to clean them up.

chandlerc added a child revision: D47408: [PM/LoopUnswitch] When using the new SimpleLoopUnswitch pass, schedule loop-cleanup passes at the beginning of the loop pass pipeline, and re-enqueue loops after even trivial unswitching..May 26 2018, 3:38 PM

LGTM. Looks like a clean simplification to me.

llvm/lib/Transforms/Scalar/LoopInstSimplify.cpp
131 ↗	(On Diff #148711)	Nit: "an iteration over all instructions in all blocks." perhaps. For code readability it would be easier to understand IMO. Alternatively " end loop over instructions", " end loop over blocks" for above closing braces, but that seems too much.

This revision is now accepted and ready to land.May 29 2018, 10:46 AM

sanjoy accepted this revision.May 29 2018, 12:07 PM

sanjoy added inline comments.

llvm/lib/Transforms/Scalar/LoopInstSimplify.cpp
92 ↗	(On Diff #148711)	How about pulling out `ToSimplify->empty()` into a `bool IsFirstIteration`? I think that will make the intent clearer.
118 ↗	(On Diff #148711)	Might be worth asserting that if `!L.contains(UserI)` then `UserI` is a PHI node. Up to you.

asbirlea added inline comments.May 29 2018, 1:10 PM

llvm/lib/Transforms/Utils/Local.cpp
448 ↗	(On Diff #148711)	assert for isInstructionTriviallyDead too?

All suggestions implemented, submitting now, thanks!

Closed by commit rL333461: [LoopInstSimplify] Re-implement the core logic of loop-instsimplify to (authored by chandlerc). · Explain WhyMay 29 2018, 1:19 PM

This revision was automatically updated to reflect the committed changes.

mzolotukhin added a subscriber: mzolotukhin.May 30 2018, 8:17 AM

mzolotukhin added inline comments.

llvm/trunk/lib/Transforms/Scalar/LoopInstSimplify.cpp
82	I wonder if it would be more efficient to iterate through `ToSimplify` instead of all instructions in all blocks. What do you think?

chandlerc added inline comments.May 30 2018, 9:57 AM

llvm/trunk/lib/Transforms/Scalar/LoopInstSimplify.cpp
82	I think it's trickier than that, but let me know if you see a way that seems more promising. We need to visit defs before uses to reliably converge on the simplified result without revisiting instructions even more times. And we don't know all of the instructions we will need to visit, we only know the first ones. Each time we simplify an instruction we (potentially) grow the `ToSimplify` set. So to only look at the instructions in the `ToSimplify` set, I think we'd need a system like the following: First build a mapping from every instruction in the loop body to an integer so that they can be cheaply sorted in RPO. We could do this in the first iteration. Build a sorted worklist in addition to a `ToSimplify` set, and every time we add an instruction to the `ToSimplify` set, insert it into the worklist in the correct (based on sort) position. Walk that worklist rather than all the instructions in the loop body. To build a worklist like this which has good algorithmic properties, what you'd probably want is to make `ToSimplify` actually a std::set or some other ordered search tree with cheap in-order insertion because we'll constantly be inserting into all kinds of different positions. A std::set (or similar) tree data structure combined with the std::less needing to do two hash table lookups to get the sort key seemed like it would end up having a really frustratingly high overhead, and that's why I didn't immediately jump to this solution. Another approach which doesn't have any better algorithmic properties in the worst case than the current one but might have hilariously better practical properties would be as follows: Build a vector of instruction pointers in RPO order during the initial traversal, and a map from instruction to vector index. Instead of using a ToSimplify set, use a sparse bitvector where the bit represents that the instruction at this index is in the set. Walk the sparse bitvector in order in all subsequent iterations, and then use the index of the set bits to find the instruction at that position in the RPO. Technically, in the worst case, this is the same as the current approach -- for a loop with N instructions that we iterate on M times we do O(N*M) work. But with this approach, that work involves looking a bit in a fairly cache-optimized data structure rather than walking to a linked list node and testing it in a hash table. And as the iteration becomes sparse, we actually get algorithmic improvements that would likely help in the average case. The down side is that it is much more complex and requires a decent amount of memory, which is why I didn't implement it at first. I'm happy to pursue either of these if you and others thing it is worthwhile. I'm not sure we'll ever have a test case where this shows up as a practical problem, but that doesn't mean one won't show up. =D I mostly had trouble making the trade-offs here, and this patch seemed a strict improvement. I'll admit I'm more reluctant about the first approach as I suspect its algorithmic scaling will be completely undermined by the practical cost of maintaining the sorted data structure. But the second approach is super appealing. I didn't think super carefully about it when I wrote this the first time to convince myself it would be effective, but I'm happy to go do this work if you think its worthwhile based on my description.

mzolotukhin added inline comments.May 30 2018, 11:14 AM

llvm/trunk/lib/Transforms/Scalar/LoopInstSimplify.cpp
82	A std::set (or similar) tree data structure combined with the std::less needing to do two hash table lookups to get the sort key seemed like it would end up having a really frustratingly high overhead, and that's why I didn't immediately jump to this solution. I was thinking about `priority_queue` here, which should have all the desired properties and supposedly less overhead, than `set`. We still will have to spend a lot of memory for enumerating all instructions, so maybe it's not worth it after all. ...this patch seemed a strict improvement. I don't argue with that :)

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

Utils/

Local.h

12 lines

lib/

Transforms/

Scalar/

LoopInstSimplify.cpp

208 lines

Utils/

Local.cpp

33 lines

test/

Transforms/

LoopInstSimplify/

basic.ll

164 lines

Diff 148973

llvm/trunk/include/llvm/Analysis/Utils/Local.h

Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	bool wouldInstructionBeTriviallyDead(Instruction *I,
const TargetLibraryInfo *TLI = nullptr);		const TargetLibraryInfo *TLI = nullptr);

/// If the specified value is a trivially dead instruction, delete it.		/// If the specified value is a trivially dead instruction, delete it.
/// If that makes any of its operands trivially dead, delete them too,		/// If that makes any of its operands trivially dead, delete them too,
/// recursively. Return true if any instructions were deleted.		/// recursively. Return true if any instructions were deleted.
bool RecursivelyDeleteTriviallyDeadInstructions(Value *V,		bool RecursivelyDeleteTriviallyDeadInstructions(Value *V,
const TargetLibraryInfo *TLI = nullptr);		const TargetLibraryInfo *TLI = nullptr);

		/// Delete all of the instructions in `DeadInsts`, and all other instructions
		/// that deleting these in turn causes to be trivially dead.
		///
		/// The initial instructions in the provided vector must all have empty use
		/// lists and satisfy `isInstructionTriviallyDead`.
		///
		/// `DeadInsts` will be used as scratch storage for this routine and will be
		/// empty afterward.
		void RecursivelyDeleteTriviallyDeadInstructions(
		SmallVectorImpl<Instruction *> &DeadInsts,
		const TargetLibraryInfo *TLI = nullptr);

/// If the specified value is an effectively dead PHI node, due to being a		/// If the specified value is an effectively dead PHI node, due to being a
/// def-use chain of single-use nodes that either forms a cycle or is terminated		/// def-use chain of single-use nodes that either forms a cycle or is terminated
/// by a trivially dead instruction, delete it. If that makes any of its		/// by a trivially dead instruction, delete it. If that makes any of its
/// operands trivially dead, delete them too, recursively. Return true if a		/// operands trivially dead, delete them too, recursively. Return true if a
/// change was made.		/// change was made.
bool RecursivelyDeleteDeadPHINode(PHINode *PN,		bool RecursivelyDeleteDeadPHINode(PHINode *PN,
const TargetLibraryInfo *TLI = nullptr);		const TargetLibraryInfo *TLI = nullptr);

▲ Show 20 Lines • Show All 356 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Scalar/LoopInstSimplify.cpp

	Show All 14 Lines
	#include "llvm/ADT/PointerIntPair.h"			#include "llvm/ADT/PointerIntPair.h"
	#include "llvm/ADT/STLExtras.h"			#include "llvm/ADT/STLExtras.h"
	#include "llvm/ADT/SmallPtrSet.h"			#include "llvm/ADT/SmallPtrSet.h"
	#include "llvm/ADT/SmallVector.h"			#include "llvm/ADT/SmallVector.h"
	#include "llvm/ADT/Statistic.h"			#include "llvm/ADT/Statistic.h"
	#include "llvm/Analysis/AssumptionCache.h"			#include "llvm/Analysis/AssumptionCache.h"
	#include "llvm/Analysis/InstructionSimplify.h"			#include "llvm/Analysis/InstructionSimplify.h"
	#include "llvm/Analysis/LoopInfo.h"			#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/Analysis/LoopIterator.h"
	#include "llvm/Analysis/LoopPass.h"			#include "llvm/Analysis/LoopPass.h"
	#include "llvm/Analysis/TargetLibraryInfo.h"			#include "llvm/Analysis/TargetLibraryInfo.h"
	#include "llvm/Analysis/Utils/Local.h"			#include "llvm/Analysis/Utils/Local.h"
	#include "llvm/IR/BasicBlock.h"			#include "llvm/IR/BasicBlock.h"
	#include "llvm/IR/CFG.h"			#include "llvm/IR/CFG.h"
	#include "llvm/IR/DataLayout.h"			#include "llvm/IR/DataLayout.h"
	#include "llvm/IR/Dominators.h"			#include "llvm/IR/Dominators.h"
	#include "llvm/IR/Instruction.h"			#include "llvm/IR/Instruction.h"
	Show All 9 Lines
	#include <utility>			#include <utility>

	using namespace llvm;			using namespace llvm;

	#define DEBUG_TYPE "loop-instsimplify"			#define DEBUG_TYPE "loop-instsimplify"

	STATISTIC(NumSimplified, "Number of redundant instructions simplified");			STATISTIC(NumSimplified, "Number of redundant instructions simplified");

	static bool SimplifyLoopInst(Loop L, DominatorTree DT, LoopInfo *LI,			static bool simplifyLoopInst(Loop &L, DominatorTree &DT, LoopInfo &LI,
	AssumptionCache *AC,			AssumptionCache &AC,
	const TargetLibraryInfo *TLI) {			const TargetLibraryInfo &TLI) {
	SmallVector<BasicBlock *, 8> ExitBlocks;			const DataLayout &DL = L.getHeader()->getModule()->getDataLayout();
	L->getUniqueExitBlocks(ExitBlocks);			SimplifyQuery SQ(DL, &TLI, &DT, &AC);
	array_pod_sort(ExitBlocks.begin(), ExitBlocks.end());
				// On the first pass over the loop body we try to simplify every instruction.
				// On subsequent passes, we can restrict this to only simplifying instructions
				// where the inputs have been updated. We end up needing two sets: one
				// containing the instructions we are simplifying in this pass, and one for
				// the instructions we will want to simplify in the next pass. We use
				// pointers so we can swap between two stably allocated sets.
	SmallPtrSet<const Instruction , 8> S1, S2, ToSimplify = &S1, *Next = &S2;			SmallPtrSet<const Instruction , 8> S1, S2, ToSimplify = &S1, *Next = &S2;

	// The bit we are stealing from the pointer represents whether this basic			// Track the PHI nodes that have already been visited during each iteration so
	// block is the header of a subloop, in which case we only process its phis.			// that we can identify when it is necessary to iterate.
	using WorklistItem = PointerIntPair<BasicBlock *, 1>;			SmallPtrSet<PHINode *, 4> VisitedPHIs;
	SmallVector<WorklistItem, 16> VisitStack;
	SmallPtrSet<BasicBlock *, 32> Visited;			// While simplifying we may discover dead code or cause code to become dead.
				// Keep track of all such instructions and we will delete them at the end.
				SmallVector<Instruction *, 8> DeadInsts;

				// First we want to create an RPO traversal of the loop body. By processing in
				// RPO we can ensure that definitions are processed prior to uses (for non PHI
				// uses) in all cases. This ensures we maximize the simplifications in each
				// iteration over the loop and minimizes the possible causes for continuing to
				// iterate.
				LoopBlocksRPO RPOT(&L);
				RPOT.perform(&LI);

	bool Changed = false;			bool Changed = false;
	bool LocalChanged;			for (;;) {
	do {			for (BasicBlock *BB : RPOT) {
	LocalChanged = false;			for (Instruction &I : *BB) {
				mzolotukhinUnsubmitted Not Done Reply Inline Actions I wonder if it would be more efficient to iterate through `ToSimplify` instead of all instructions in all blocks. What do you think? mzolotukhin: I wonder if it would be more efficient to iterate through `ToSimplify` instead of all…
				chandlercAuthorUnsubmitted Not Done Reply Inline Actions I think it's trickier than that, but let me know if you see a way that seems more promising. We need to visit defs before uses to reliably converge on the simplified result without revisiting instructions even more times. And we don't know all of the instructions we will need to visit, we only know the first ones. Each time we simplify an instruction we (potentially) grow the `ToSimplify` set. So to only look at the instructions in the `ToSimplify` set, I think we'd need a system like the following: First build a mapping from every instruction in the loop body to an integer so that they can be cheaply sorted in RPO. We could do this in the first iteration. Build a sorted worklist in addition to a `ToSimplify` set, and every time we add an instruction to the `ToSimplify` set, insert it into the worklist in the correct (based on sort) position. Walk that worklist rather than all the instructions in the loop body. To build a worklist like this which has good algorithmic properties, what you'd probably want is to make `ToSimplify` actually a std::set or some other ordered search tree with cheap in-order insertion because we'll constantly be inserting into all kinds of different positions. A std::set (or similar) tree data structure combined with the std::less needing to do two hash table lookups to get the sort key seemed like it would end up having a really frustratingly high overhead, and that's why I didn't immediately jump to this solution. Another approach which doesn't have any better algorithmic properties in the worst case than the current one but might have hilariously better practical properties would be as follows: Build a vector of instruction pointers in RPO order during the initial traversal, and a map from instruction to vector index. Instead of using a ToSimplify set, use a sparse bitvector where the bit represents that the instruction at this index is in the set. Walk the sparse bitvector in order in all subsequent iterations, and then use the index of the set bits to find the instruction at that position in the RPO. Technically, in the worst case, this is the same as the current approach -- for a loop with N instructions that we iterate on M times we do O(NM) work. But with this approach, that work involves looking a bit in a fairly cache-optimized data structure rather than walking to a linked list node and testing it in a hash table. And as the iteration becomes sparse, we actually get algorithmic improvements that would likely help in the average case. The down side is that it is much more complex and requires a decent amount of memory, which is why I didn't implement it at first. I'm happy to pursue either of these if you and others thing it is worthwhile. I'm not sure we'll ever have a test case where this shows up as a practical problem, but that doesn't mean one won't show up. =D I mostly had trouble making the trade-offs here, and this patch seemed a strict improvement. I'll admit I'm more reluctant about the first approach as I suspect its algorithmic scaling will be completely undermined by the practical cost of maintaining the sorted data structure. But the second approach is super appealing. I didn't think super carefully about it when I wrote this the first time to convince myself it would be effective, but I'm happy to go do this work if you think its worthwhile based on my description. chandlerc:* I think it's trickier than that, but let me know if you see a way that seems more promising.
				mzolotukhinUnsubmitted Not Done Reply Inline Actions A std::set (or similar) tree data structure combined with the std::less needing to do two hash table lookups to get the sort key seemed like it would end up having a really frustratingly high overhead, and that's why I didn't immediately jump to this solution. I was thinking about `priority_queue` here, which should have all the desired properties and supposedly less overhead, than `set`. We still will have to spend a lot of memory for enumerating all instructions, so maybe it's not worth it after all. ...this patch seemed a strict improvement. I don't argue with that :) mzolotukhin: > A std::set (or similar) tree data structure combined with the std::less needing to do two…
				if (auto *PI = dyn_cast<PHINode>(&I))
	VisitStack.clear();			VisitedPHIs.insert(PI);
	Visited.clear();
				if (I.use_empty()) {
	VisitStack.push_back(WorklistItem(L->getHeader(), false));			if (isInstructionTriviallyDead(&I, &TLI))
				DeadInsts.push_back(&I);
	while (!VisitStack.empty()) {
	WorklistItem Item = VisitStack.pop_back_val();
	BasicBlock *BB = Item.getPointer();
	bool IsSubloopHeader = Item.getInt();
	const DataLayout &DL = L->getHeader()->getModule()->getDataLayout();

	// Simplify instructions in the current basic block.
	for (BasicBlock::iterator BI = BB->begin(), BE = BB->end(); BI != BE;) {
	Instruction I = &BI++;

	// The first time through the loop ToSimplify is empty and we try to
	// simplify all instructions. On later iterations ToSimplify is not
	// empty and we only bother simplifying instructions that are in it.
	if (!ToSimplify->empty() && !ToSimplify->count(I))
	continue;			continue;

	// Don't bother simplifying unused instructions.
	if (!I->use_empty()) {
	Value *V = SimplifyInstruction(I, {DL, TLI, DT, AC});
	if (V && LI->replacementPreservesLCSSAForm(I, V)) {
	// Mark all uses for resimplification next time round the loop.
	for (User *U : I->users())
	Next->insert(cast<Instruction>(U));

	I->replaceAllUsesWith(V);
	LocalChanged = true;
	++NumSimplified;
	}
	}
	if (RecursivelyDeleteTriviallyDeadInstructions(I, TLI)) {
	// RecursivelyDeleteTriviallyDeadInstruction can remove more than one
	// instruction, so simply incrementing the iterator does not work.
	// When instructions get deleted re-iterate instead.
	BI = BB->begin();
	BE = BB->end();
	LocalChanged = true;
	}			}

	if (IsSubloopHeader && !isa<PHINode>(I))			// We special case the first iteration which we can detect due to the
	break;			// empty `ToSimplify` set.
	}			bool IsFirstIteration = ToSimplify->empty();

	// Add all successors to the worklist, except for loop exit blocks and the			if (!IsFirstIteration && !ToSimplify->count(&I))
	// bodies of subloops. We visit the headers of loops so that we can
	// process
	// their phis, but we contract the rest of the subloop body and only
	// follow
	// edges leading back to the original loop.
	for (succ_iterator SI = succ_begin(BB), SE = succ_end(BB); SI != SE;
	++SI) {
	BasicBlock SuccBB = SI;
	if (!Visited.insert(SuccBB).second)
	continue;			continue;

	const Loop *SuccLoop = LI->getLoopFor(SuccBB);			Value *V = SimplifyInstruction(&I, SQ.getWithInstruction(&I));
	if (SuccLoop && SuccLoop->getHeader() == SuccBB &&			if (!V \|\| !LI.replacementPreservesLCSSAForm(&I, V))
	L->contains(SuccLoop)) {			continue;
	VisitStack.push_back(WorklistItem(SuccBB, true));

	SmallVector<BasicBlock *, 8> SubLoopExitBlocks;
	SuccLoop->getExitBlocks(SubLoopExitBlocks);

	for (unsigned i = 0; i < SubLoopExitBlocks.size(); ++i) {
	BasicBlock *ExitBB = SubLoopExitBlocks[i];
	if (LI->getLoopFor(ExitBB) == L && Visited.insert(ExitBB).second)
	VisitStack.push_back(WorklistItem(ExitBB, false));
	}

				for (Value::use_iterator UI = I.use_begin(), UE = I.use_end();
				UI != UE;) {
				Use &U = *UI++;
				auto *UserI = cast<Instruction>(U.getUser());
				U.set(V);

				// If the instruction is used by a PHI node we have already processed
				// we'll need to iterate on the loop body to converge, so add it to
				// the next set.
				if (auto *UserPI = dyn_cast<PHINode>(UserI))
				if (VisitedPHIs.count(UserPI)) {
				Next->insert(UserPI);
	continue;			continue;
	}			}

	bool IsExitBlock =			// If we are only simplifying targeted instructions and the user is an
	std::binary_search(ExitBlocks.begin(), ExitBlocks.end(), SuccBB);			// instruction in the loop body, add it to our set of targeted
	if (IsExitBlock)			// instructions. Because we process defs before uses (outside of PHIs)
	continue;			// we won't have visited it yet.
				//
				// We also skip any uses outside of the loop being simplified. Those
				// should always be PHI nodes due to LCSSA form, and we don't want to
				// try to simplify those away.
				assert((L.contains(UserI) \|\| isa<PHINode>(UserI)) &&
				"Uses outside the loop should be PHI nodes due to LCSSA!");
				if (!IsFirstIteration && L.contains(UserI))
				ToSimplify->insert(UserI);
				}

	VisitStack.push_back(WorklistItem(SuccBB, false));			assert(I.use_empty() && "Should always have replaced all uses!");
				if (isInstructionTriviallyDead(&I, &TLI))
				DeadInsts.push_back(&I);
				++NumSimplified;
				Changed = true;
	}			}
	}			}

	// Place the list of instructions to simplify on the next loop iteration			// Delete any dead instructions found thus far now that we've finished an
	// into ToSimplify.			// iteration over all instructions in all the loop blocks.
	std::swap(ToSimplify, Next);			if (!DeadInsts.empty()) {
	Next->clear();			Changed = true;
				RecursivelyDeleteTriviallyDeadInstructions(DeadInsts, &TLI);
				}

				// If we never found a PHI that needs to be simplified in the next
				// iteration, we're done.
				if (Next->empty())
				break;

	Changed \|= LocalChanged;			// Otherwise, put the next set in place for the next iteration and reset it
	} while (LocalChanged);			// and the visited PHIs for that iteration.
				std::swap(Next, ToSimplify);
				Next->clear();
				VisitedPHIs.clear();
				DeadInsts.clear();
				}

	return Changed;			return Changed;
	}			}

	namespace {			namespace {

	class LoopInstSimplifyLegacyPass : public LoopPass {			class LoopInstSimplifyLegacyPass : public LoopPass {
	public:			public:
	static char ID; // Pass ID, replacement for typeid			static char ID; // Pass ID, replacement for typeid

	LoopInstSimplifyLegacyPass() : LoopPass(ID) {			LoopInstSimplifyLegacyPass() : LoopPass(ID) {
	initializeLoopInstSimplifyLegacyPassPass(*PassRegistry::getPassRegistry());			initializeLoopInstSimplifyLegacyPassPass(*PassRegistry::getPassRegistry());
	}			}

	bool runOnLoop(Loop *L, LPPassManager &LPM) override {			bool runOnLoop(Loop *L, LPPassManager &LPM) override {
	if (skipLoop(L))			if (skipLoop(L))
	return false;			return false;
	DominatorTreeWrapperPass *DTWP =			DominatorTree &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
	getAnalysisIfAvailable<DominatorTreeWrapperPass>();			LoopInfo &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
	DominatorTree *DT = DTWP ? &DTWP->getDomTree() : nullptr;			AssumptionCache &AC =
	LoopInfo *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();			getAnalysis<AssumptionCacheTracker>().getAssumptionCache(
	AssumptionCache *AC =
	&getAnalysis<AssumptionCacheTracker>().getAssumptionCache(
	*L->getHeader()->getParent());			*L->getHeader()->getParent());
	const TargetLibraryInfo *TLI =			const TargetLibraryInfo &TLI =
	&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();			getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();

	return SimplifyLoopInst(L, DT, LI, AC, TLI);			return simplifyLoopInst(*L, DT, LI, AC, TLI);
	}			}

	void getAnalysisUsage(AnalysisUsage &AU) const override {			void getAnalysisUsage(AnalysisUsage &AU) const override {
	AU.addRequired<AssumptionCacheTracker>();			AU.addRequired<AssumptionCacheTracker>();
				AU.addRequired<DominatorTreeWrapperPass>();
	AU.addRequired<TargetLibraryInfoWrapperPass>();			AU.addRequired<TargetLibraryInfoWrapperPass>();
	AU.setPreservesCFG();			AU.setPreservesCFG();
	getLoopAnalysisUsage(AU);			getLoopAnalysisUsage(AU);
	}			}
	};			};

	} // end anonymous namespace			} // end anonymous namespace

	PreservedAnalyses LoopInstSimplifyPass::run(Loop &L, LoopAnalysisManager &AM,			PreservedAnalyses LoopInstSimplifyPass::run(Loop &L, LoopAnalysisManager &AM,
	LoopStandardAnalysisResults &AR,			LoopStandardAnalysisResults &AR,
	LPMUpdater &) {			LPMUpdater &) {
	if (!SimplifyLoopInst(&L, &AR.DT, &AR.LI, &AR.AC, &AR.TLI))			if (!simplifyLoopInst(L, AR.DT, AR.LI, AR.AC, AR.TLI))
	return PreservedAnalyses::all();			return PreservedAnalyses::all();

	auto PA = getLoopPassPreservedAnalyses();			auto PA = getLoopPassPreservedAnalyses();
	PA.preserveSet<CFGAnalyses>();			PA.preserveSet<CFGAnalyses>();
	return PA;			return PA;
	}			}

	char LoopInstSimplifyLegacyPass::ID = 0;			char LoopInstSimplifyLegacyPass::ID = 0;
	Show All 12 Lines

llvm/trunk/lib/Transforms/Utils/Local.cpp

	Show First 20 Lines • Show All 428 Lines • ▼ Show 20 Lines
	llvm::RecursivelyDeleteTriviallyDeadInstructions(Value *V,			llvm::RecursivelyDeleteTriviallyDeadInstructions(Value *V,
	const TargetLibraryInfo *TLI) {			const TargetLibraryInfo *TLI) {
	Instruction *I = dyn_cast<Instruction>(V);			Instruction *I = dyn_cast<Instruction>(V);
	if (!I \|\| !I->use_empty() \|\| !isInstructionTriviallyDead(I, TLI))			if (!I \|\| !I->use_empty() \|\| !isInstructionTriviallyDead(I, TLI))
	return false;			return false;

	SmallVector<Instruction*, 16> DeadInsts;			SmallVector<Instruction*, 16> DeadInsts;
	DeadInsts.push_back(I);			DeadInsts.push_back(I);
				RecursivelyDeleteTriviallyDeadInstructions(DeadInsts, TLI);

	do {			return true;
	I = DeadInsts.pop_back_val();			}
	salvageDebugInfo(*I);
				void llvm::RecursivelyDeleteTriviallyDeadInstructions(
				SmallVectorImpl<Instruction > &DeadInsts, const TargetLibraryInfo TLI) {
				// Process the dead instruction list until empty.
				while (!DeadInsts.empty()) {
				Instruction &I = *DeadInsts.pop_back_val();
				assert(I.use_empty() && "Instructions with uses are not dead.");
				assert(isInstructionTriviallyDead(&I, TLI) &&
				"Live instruction found in dead worklist!");

				// Don't lose the debug info while deleting the instructions.
				salvageDebugInfo(I);

	// Null out all of the instruction's operands to see if any operand becomes			// Null out all of the instruction's operands to see if any operand becomes
	// dead as we go.			// dead as we go.
	for (unsigned i = 0, e = I->getNumOperands(); i != e; ++i) {			for (Use &OpU : I.operands()) {
	Value *OpV = I->getOperand(i);			Value *OpV = OpU.get();
	I->setOperand(i, nullptr);			OpU.set(nullptr);

	if (!OpV->use_empty()) continue;			if (!OpV->use_empty())
				continue;

	// If the operand is an instruction that became dead as we nulled out the			// If the operand is an instruction that became dead as we nulled out the
	// operand, and if it is 'trivially' dead, delete it in a future loop			// operand, and if it is 'trivially' dead, delete it in a future loop
	// iteration.			// iteration.
	if (Instruction *OpI = dyn_cast<Instruction>(OpV))			if (Instruction *OpI = dyn_cast<Instruction>(OpV))
	if (isInstructionTriviallyDead(OpI, TLI))			if (isInstructionTriviallyDead(OpI, TLI))
	DeadInsts.push_back(OpI);			DeadInsts.push_back(OpI);
	}			}

	I->eraseFromParent();			I.eraseFromParent();
	} while (!DeadInsts.empty());			}

	return true;
	}			}

	/// areAllUsesEqual - Check whether the uses of a value are all the same.			/// areAllUsesEqual - Check whether the uses of a value are all the same.
	/// This is similar to Instruction::hasOneUse() except this will also return			/// This is similar to Instruction::hasOneUse() except this will also return
	/// true when there are no uses or multiple uses that all refer to the same			/// true when there are no uses or multiple uses that all refer to the same
	/// value.			/// value.
	static bool areAllUsesEqual(Instruction *I) {			static bool areAllUsesEqual(Instruction *I) {
	Value::user_iterator UI = I->user_begin();			Value::user_iterator UI = I->user_begin();
	▲ Show 20 Lines • Show All 2,047 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopInstSimplify/basic.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S %s -passes=loop-instsimplify \| FileCheck %s

				; Test very basic folding and propagation occurs within a loop body. This should
				; collapse to the loop iteration structure and the LCSSA PHI node.
				define i32 @test1(i32 %n, i32 %x) {
				; CHECK-LABEL: @test1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[I_NEXT:%.*]], [[LOOP]] ]
				; CHECK-NEXT: [[I_NEXT]] = add nsw i32 [[I]], 1
				; CHECK-NEXT: [[I_CMP:%.]] = icmp slt i32 [[I_NEXT]], [[N:%.]]
				; CHECK-NEXT: br i1 [[I_CMP]], label [[LOOP]], label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: [[X_LCSSA:%.]] = phi i32 [ [[X:%.]], [[LOOP]] ]
				; CHECK-NEXT: ret i32 [[X_LCSSA]]
				;
				entry:
				br label %loop

				loop:
				%i = phi i32 [ 0, %entry ], [ %i.next, %loop ]
				%x.add = add nsw i32 %x, 0
				%x.sub = sub i32 %x.add, 0
				%x.and = and i32 %x.sub, -1
				%i.next = add nsw i32 %i, 1
				%i.cmp = icmp slt i32 %i.next, %n
				br i1 %i.cmp, label %loop, label %exit

				exit:
				%x.lcssa = phi i32 [ %x.and, %loop ]
				ret i32 %x.lcssa
				}

				; Test basic loop structure that still has a simplification feed a prior PHI.
				define i32 @test2(i32 %n, i32 %x) {
				; CHECK-LABEL: @test2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[I_NEXT:%.*]], [[LOOP]] ]
				; CHECK-NEXT: [[I_NEXT]] = add nsw i32 [[I]], 1
				; CHECK-NEXT: [[I_CMP:%.]] = icmp slt i32 [[I_NEXT]], [[N:%.]]
				; CHECK-NEXT: br i1 [[I_CMP]], label [[LOOP]], label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: [[X_LCSSA:%.]] = phi i32 [ [[X:%.]], [[LOOP]] ]
				; CHECK-NEXT: ret i32 [[X_LCSSA]]
				;
				entry:
				br label %loop

				loop:
				%i = phi i32 [ 0, %entry ], [ %i.next, %loop ]
				%x.loop = phi i32 [ %x, %entry ], [ %x.next, %loop ]
				%x.next = add nsw i32 %x.loop, 0
				%i.next = add nsw i32 %i, 1
				%i.cmp = icmp slt i32 %i.next, %n
				br i1 %i.cmp, label %loop, label %exit

				exit:
				%x.lcssa = phi i32 [ %x.loop, %loop ]
				ret i32 %x.lcssa
				}

				; Test a diamond CFG with inner PHI nodes.
				define i32 @test3(i32 %n, i32 %x) {
				; CHECK-LABEL: @test3(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[I_NEXT:%.]], [[LOOP_LATCH:%.]] ]
				; CHECK-NEXT: [[X_CMP:%.*]] = icmp slt i32 [[I]], 42
				; CHECK-NEXT: br i1 [[X_CMP]], label [[LOOP_LHS:%.]], label [[LOOP_RHS:%.]]
				; CHECK: loop.lhs:
				; CHECK-NEXT: br label [[LOOP_LATCH]]
				; CHECK: loop.rhs:
				; CHECK-NEXT: br label [[LOOP_LATCH]]
				; CHECK: loop.latch:
				; CHECK-NEXT: [[I_NEXT]] = add nsw i32 [[I]], 1
				; CHECK-NEXT: [[I_CMP:%.]] = icmp slt i32 [[I_NEXT]], [[N:%.]]
				; CHECK-NEXT: br i1 [[I_CMP]], label [[LOOP]], label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: [[X_LCSSA:%.]] = phi i32 [ [[X:%.]], [[LOOP_LATCH]] ]
				; CHECK-NEXT: ret i32 [[X_LCSSA]]
				;
				entry:
				br label %loop

				loop:
				%i = phi i32 [ 0, %entry ], [ %i.next, %loop.latch ]
				%x.loop = phi i32 [ %x, %entry ], [ %x.phi, %loop.latch ]
				%x.add = add nsw i32 %x.loop, 0
				%x.cmp = icmp slt i32 %i, 42
				br i1 %x.cmp, label %loop.lhs, label %loop.rhs

				loop.lhs:
				%x.l.add = add nsw i32 %x.add, 0
				br label %loop.latch

				loop.rhs:
				%x.r.sub = sub nsw i32 %x.add, 0
				br label %loop.latch

				loop.latch:
				%x.phi = phi i32 [ %x.l.add, %loop.lhs ], [ %x.r.sub, %loop.rhs ]
				%i.next = add nsw i32 %i, 1
				%i.cmp = icmp slt i32 %i.next, %n
				br i1 %i.cmp, label %loop, label %exit

				exit:
				%x.lcssa = phi i32 [ %x.loop, %loop.latch ]
				ret i32 %x.lcssa
				}

				; Test an inner loop that is only simplified when processing the outer loop, and
				; an outer loop only simplified when processing the inner loop.
				define i32 @test4(i32 %n, i32 %m, i32 %x) {
				; CHECK-LABEL: @test4(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[I_NEXT:%.]], [[LOOP_LATCH:%.]] ]
				; CHECK-NEXT: br label [[LOOP_INNER:%.*]]
				; CHECK: loop.inner:
				; CHECK-NEXT: [[J:%.]] = phi i32 [ 0, [[LOOP]] ], [ [[J_NEXT:%.]], [[LOOP_INNER]] ]
				; CHECK-NEXT: [[J_NEXT]] = add nsw i32 [[J]], 1
				; CHECK-NEXT: [[J_CMP:%.]] = icmp slt i32 [[J_NEXT]], [[M:%.]]
				; CHECK-NEXT: br i1 [[J_CMP]], label [[LOOP_INNER]], label [[LOOP_LATCH]]
				; CHECK: loop.latch:
				; CHECK-NEXT: [[I_NEXT]] = add nsw i32 [[I]], 1
				; CHECK-NEXT: [[I_CMP:%.]] = icmp slt i32 [[I_NEXT]], [[N:%.]]
				; CHECK-NEXT: br i1 [[I_CMP]], label [[LOOP]], label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: [[X_LCSSA:%.]] = phi i32 [ [[X:%.]], [[LOOP_LATCH]] ]
				; CHECK-NEXT: ret i32 [[X_LCSSA]]
				;
				entry:
				br label %loop

				loop:
				%i = phi i32 [ 0, %entry ], [ %i.next, %loop.latch ]
				%x.loop = phi i32 [ %x, %entry ], [ %x.inner.lcssa, %loop.latch ]
				%x.add = add nsw i32 %x.loop, 0
				br label %loop.inner

				loop.inner:
				%j = phi i32 [ 0, %loop ], [ %j.next, %loop.inner ]
				%x.inner.loop = phi i32 [ %x.add, %loop ], [ %x.inner.add, %loop.inner ]
				%x.inner.add = add nsw i32 %x.inner.loop, 0
				%j.next = add nsw i32 %j, 1
				%j.cmp = icmp slt i32 %j.next, %m
				br i1 %j.cmp, label %loop.inner, label %loop.latch

				loop.latch:
				%x.inner.lcssa = phi i32 [ %x.inner.loop, %loop.inner ]
				%i.next = add nsw i32 %i, 1
				%i.cmp = icmp slt i32 %i.next, %n
				br i1 %i.cmp, label %loop, label %exit

				exit:
				%x.lcssa = phi i32 [ %x.loop, %loop.latch ]
				ret i32 %x.lcssa
				}