This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
SSAUpdaterBulk.h
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
-
CMakeLists.txt
-
SSAUpdaterBulk.cpp
-
unittests/Transforms/Utils/
-
Transforms/
-
Utils/
-
CMakeLists.txt
-
SSAUpdaterBulk.cpp

Differential D44282

[PR16756] JumpThreading: explicitly update SSA rather than use SSAUpdater.
ClosedPublic

Authored by mzolotukhin on Mar 8 2018, 5:16 PM.

Download Raw Diff

Details

Reviewers

• dberlin
davide
MatzeB

Commits

rG52b064f3d375: [PR16756] Add SSAUpdaterBulk.
rGc6d2d65f37b2: [PR16756] Use SSAUpdaterBulk in JumpThreading.
rL329644: [PR16756] Use SSAUpdaterBulk in JumpThreading.
rL329643: [PR16756] Add SSAUpdaterBulk.

Summary

SSAUpdater is often a bottleneck in JumpThreading, and one of the reasons is
that it performs a lot of unnecessary computations (DT/IDF) over and over
again. This patch implements a classic algorithm for PHI-nodes placement and
uses JT-specific properties to optimize it (namely: we only have two blocks with
definitions and these blocks are the same for all instructions we're going to
rewrite).

With this patch the test from PR16756 speeds-up by ~2x, while the time spent in
JumpThreading goes down by ~4x.

Before the patch:

Total Execution Time: 26.6205 seconds (26.6232 wall clock)
 ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
13.3016 ( 50.6%)   0.0167 (  4.7%)  13.3183 ( 50.0%)  13.3190 ( 50.0%)  Jump Threading
 5.3226 ( 20.3%)   0.0170 (  4.8%)   5.3397 ( 20.1%)   5.3408 ( 20.1%)  Jump Threading
 1.7753 (  6.8%)   0.0020 (  0.6%)   1.7772 (  6.7%)   1.7772 (  6.7%)  SLP Vectorizer
 1.1617 (  4.4%)   0.1579 ( 44.1%)   1.3195 (  5.0%)   1.3199 (  5.0%)  X86 DAG->DAG Instruction Selection

With the patch:

Total Execution Time: 12.8328 seconds (12.8335 wall clock)
 ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
 4.5331 ( 36.3%)   0.0135 (  4.1%)   4.5466 ( 35.4%)   4.5471 ( 35.4%)  Jump Threading
 1.7649 ( 14.1%)   0.0015 (  0.5%)   1.7664 ( 13.8%)   1.7665 ( 13.8%)  SLP Vectorizer
 1.1914 (  9.5%)   0.1594 ( 47.9%)   1.3507 ( 10.5%)   1.3510 ( 10.5%)  X86 DAG->DAG Instruction Selection
 0.8300 (  6.6%)   0.0034 (  1.0%)   0.8333 (  6.5%)   0.8333 (  6.5%)  Jump Threading

Also, some archeology. Some time ago there was a patch for an alternative
SSAUpdater implementation (https://reviews.llvm.org/D28934). It was not
committed back then, but I decided to try it first. Initially, it also showed
huge speed-ups, but then I discovered a couple of bugs, fixes for which ate a
big chunk of the speedups (although the new implementation still was
significantly faster than the existing one). I can upload an updated version of
that patch if there is an interest, but in JumpThreading I decided to pursue
another path: the D28934 implementation, being much faster than the existing
one, still does not scale very well. The algorithm used there is very efficient
for small incremental updates, but doesn't scale well for bulk updates (e.g.
when we have to update big number of instructions), as there is not much to
share across the algorithm invocations. In contrast, the approach I propose here
aims at reusing as much computations as possible.

Compile time impact: no noticable changes on CTMark, a big improvement on the
test from PR16756.

Diff Detail

Repository: rL LLVM

Event Timeline

mzolotukhin created this revision.Mar 8 2018, 5:16 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptMar 8 2018, 5:16 PM

Harbormaster completed remote builds in B15854: Diff 137679.Mar 8 2018, 5:17 PM

I'm going to review this in detail, but I have a general comment about the design.
I'm a little torn about the approach you're using. I think it's a good experiment because you're leveraging the structure of the problem to find a fundamentally more efficient solution. This is, FWIW, not really surprising.
OTOH, this is bad because you're de-facto rewriting a specialized SSA updater. We might need another one for, e.g. LCSSA, and another one for GVN/PRE (until NewGVN is in or we rewrite PRE to not use the updater).
This results in basically a bunch of code duplicated. History shows (and you know this very well) that it turned out to be a long tail of bugs when we did something similar with the dominator (have specialized updates per-pass). If we really want to go this route, we really need to analyze carefully all the implications.

Yeah, I have such feelings regarding this patch too.

I've been thinking about how to make this more general and reusable: it should be possible to implement something like BulkSSAUpdater (it probably might use a better name), and it will at least won't recompute DT on every iteration. I don't think it would be worth it to expose in its interface a flag showing that all definitions are in the same blocks or something like this, so we'll have to recompute IDF for every instruction we process. It still will be a win compared to the existing implementation, but loss in term of compile time compared to this implementation.

When I submitted this patch, I expected that it would trigger discussions, and that was one of the reasons I wanted to put it out :) I'm eager to suggestions on how to do that better.

Thanks,
Michael

So, FWIW:
It's definitely possible to make the SSAUpdater rewrite faster in bulk insertion. I just never bothered because it wasn't used that way :)
You will never make it as fast as exploiting the problem structure, but it could be made a *lot* faster.

As for this one:
There are approaches to reuse a lot of the phi placement computation in bulk. It's also possible to close IDF recomputation under DT update (IE incrementally recompute IDF).

You can see an example of something like that in the TDMSC algorithms here:https://dl.acm.org/citation.cfm?id=1065890

It only relies on the DJ graph, and it's possible, at maximum, to invalidate only the merge sets as DT changes.
(We'd need to get info from DT about what happened during an update, but that seems easy enough)

All depends how far down this rabbit hole you want to go.

As for the pass-specific vs bulk updating, yeah, this is the eternal debate.
JT itself doesn't make too many types of transforms, and i wonder if you could just verify them all working with the ssa updater with all the types of cfgs or something.
(This would be fragile in other ways, but at least would give you confidence the thing works)

I think it also depensd on how much faster you can make the generic bulk updater. For example, if you get it to 0.9x the speed of the impl you have here, i suspect that's plenty good enough.
But i'm also not the one doing the work :)

I think so far the consensus was that I need to try implementing it in a more general way. I'll do that and come back!

Thanks,
Michael

brzycki added a subscriber: brzycki.Mar 9 2018, 12:32 PM

If you want to update using DJ-Graphs and merge sets, I have implemented them in the global-scheduler which I thought could be reused: https://reviews.llvm.org/D32140
Please let me know if you find this useful.

Thanks! My current plan is to simply move the part that I wrote to a separate class and make it usable from other places. If I manage to keep most of the gains, then it would probably be the easiest way to resolve the issue.

Michael

Factor out SSA updating logic to a separate class SSAUpdaterBulk.

Harbormaster completed remote builds in B15989: Diff 138097.Mar 12 2018, 3:08 PM

Herald added a subscriber: mgorny. · View Herald TranscriptMar 12 2018, 3:08 PM

Hi,

I created a separate class for bulk SSA updates. With that implementation, we recompute IDF for every individual variable, but usually the subgraph we're working with is smaller. In the previous implementation we computed IDF once for the union of these subgraphs - that was faster, but also we might accidentally insert unneeded phi-nodes, which we later had to clean-up.

I tried to make the interface close to the existing one, and included an example of how it can be used in JumpThreading (if this change is approved, I'll commit these parts separately). With this change, the speedup on the original test is smaller, but still quite good:

===-------------------------------------------------------------------------===
  Total Execution Time: 15.0158 seconds (15.0163 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   4.5962 ( 31.5%)   0.0170 (  3.8%)   4.6132 ( 30.7%)   4.6142 ( 30.7%)  Jump Threading
   3.1789 ( 21.8%)   0.0049 (  1.1%)   3.1838 ( 21.2%)   3.1838 ( 21.2%)  Jump Threading
   1.2104 (  8.3%)   0.1836 ( 41.5%)   1.3940 (  9.3%)   1.3942 (  9.3%)  X86 DAG->DAG Instruction Selection
   1.3558 (  9.3%)   0.0020 (  0.5%)   1.3578 (  9.0%)   1.3577 (  9.0%)  SLP Vectorizer

What do you think?

Thanks,
Michael

PS: Ideas for a better class name are welcome!

Ping!

I don't mind this approach, but as discussed offline we should consider also moving LCSSA to make sure the API makes sense.
In general, what's your plan for this? You want it to replace the SSA updater for all the instances in llvm? If so, we should carefully plan the transition costs.

@zhendongsu could do a round of testing before we decide whether this can go in to shake bugs.

Also, do you know why the second invocation of Jump threading is taking so long?

llvm/lib/Transforms/Utils/SSAUpdaterBulk.cpp
25–26 ↗	(On Diff #138097)	should this be `ssaupdaterbulk`?
52–54 ↗	(On Diff #138097)	ternary
111 ↗	(On Diff #138097)	auto
113 ↗	(On Diff #138097)	auto

davide added inline comments.Mar 25 2018, 6:45 PM

llvm/lib/Transforms/Scalar/JumpThreading.cpp
2013 ↗	(On Diff #138097)	typo: s/ee/we/
2036 ↗	(On Diff #138097)	Why do you need to flush the dominator here? please add a comment.

Rebase.
Address Davide's remarks.

Harbormaster completed remote builds in B16520: Diff 140157.Mar 28 2018, 4:16 PM

I don't mind this approach, but as discussed offline we should consider also moving LCSSA to make sure the API makes sense.
In general, what's your plan for this? You want it to replace the SSA updater for all the instances in llvm? If so, we should carefully plan the transition costs.

I looked into LCSSA, and indeed it seems that it also can be improved. I tried direct replacement of the old SSAUpdater with the new one, but that didn't give any benefits. However, I think we can simplify the code in LCSSA by passing LoopInfo to SSAUpdaterBulk, which will then use it to insert a phi node whenever it crosses a loop boundary when rewriting a use. I don't know yet how much work it would take, but I don't think it would require much rewritings - it would probably be an addition to what we currently have in this patch.

Also, do you know why the second invocation of Jump threading is taking so long?

I assume by 'taking so long' you mean 'taking so long compared to the first version of this patch', because even this "slow" version is ~50% faster than what we have in trunk. It is most probably caused by the fact that we recompute IDFs for every rewriting while in the first version we computed it once for a united subgraph. Actually, both approaches have "good" and "bad" cases: I can imagine a set of subgraphs for which computing individual IDFs N times would be faster than computing a united IDF, but seemingly in this particular case we have an opposite example.

Thanks,
Michael

I'm ... confused.

I have no other comments on this, which looks fine. I'd add some unittest(s), as we have in the SSAUpdater and I think we should be good to do.
@dberlin what do you think?

I'm fine with this, we can always improve it more later.

davide accepted this revision.Apr 9 2018, 12:35 PM

This revision is now accepted and ready to land.Apr 9 2018, 12:35 PM

Closed by commit rL329643: [PR16756] Add SSAUpdaterBulk. (authored by mzolotukhin). · Explain WhyApr 9 2018, 4:40 PM

This revision was automatically updated to reflect the committed changes.

Thanks! I committed the patch in two parts: r329643 (Add SSAUpdaterBulk) and r329644 (Use SSAUpdaterBulk in JumpThreading).

Michael

JumpThreading pass fails with assertion here:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-android/builds/9491,
please check it out

brzycki mentioned this in D48111: [JumpThreading] Don't try to rewrite a use if it's already valid..Jun 21 2018, 8:16 AM

I've bisected a recent regression we discovered in Rust to this commit and cc'd some of y'all on the bug there, but if anyone else would like to help take a look at the performance regression there that'd be much appreciated!

a.elovikov added a subscriber: a.elovikov.Jul 10 2018, 12:29 AM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Transforms/

Utils/

SSAUpdaterBulk.h

91 lines

lib/

Transforms/

Utils/

CMakeLists.txt

1 line

SSAUpdaterBulk.cpp

173 lines

unittests/

Transforms/

Utils/

CMakeLists.txt

1 line

SSAUpdaterBulk.cpp

195 lines

Diff 141766

llvm/trunk/include/llvm/Transforms/Utils/SSAUpdaterBulk.h

				//===- SSAUpdaterBulk.h - Unstructured SSA Update Tool ----------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file declares the SSAUpdaterBulk class.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_UTILS_SSAUPDATERBULK_H
				#define LLVM_TRANSFORMS_UTILS_SSAUPDATERBULK_H

				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/SmallPtrSet.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/IR/PredIteratorCache.h"

				namespace llvm {

				class BasicBlock;
				class PHINode;
				template <typename T> class SmallVectorImpl;
				class Type;
				class Use;
				class Value;
				class DominatorTree;

				/// Helper class for SSA formation on a set of values defined in multiple
				/// blocks.
				///
				/// This is used when code duplication or another unstructured transformation
				/// wants to rewrite a set of uses of one value with uses of a set of values.
				/// The update is done only when RewriteAllUses is called, all other methods are
				/// used for book-keeping. That helps to share some common computations between
				/// updates of different uses (which is not the case when traditional SSAUpdater
				/// is used).
				class SSAUpdaterBulk {
				struct RewriteInfo {
				DenseMap<BasicBlock , Value > Defines;
				SmallPtrSet<Use *, 4> Uses;
				StringRef Name;
				Type *Ty;
				RewriteInfo(){};
				RewriteInfo(StringRef &N, Type *T) : Name(N), Ty(T){};
				};
				DenseMap<unsigned, RewriteInfo> Rewrites;

				PredIteratorCache PredCache;

				Value computeValueAt(BasicBlock BB, RewriteInfo &R, DominatorTree *DT);

				public:
				explicit SSAUpdaterBulk(){};
				SSAUpdaterBulk(const SSAUpdaterBulk &) = delete;
				SSAUpdaterBulk &operator=(const SSAUpdaterBulk &) = delete;
				~SSAUpdaterBulk(){};

				/// Add a new variable to the SSA rewriter. This needs to be called before
				/// AddAvailableValue or AddUse calls.
				void AddVariable(unsigned Var, StringRef Name, Type *Ty);

				/// Indicate that a rewritten value is available in the specified block with
				/// the specified value.
				void AddAvailableValue(unsigned Var, BasicBlock BB, Value V);

				/// Record a use of the symbolic value. This use will be updated with a
				/// rewritten value when RewriteAllUses is called.
				void AddUse(unsigned Var, Use *U);

				/// Return true if the SSAUpdater already has a value for the specified
				/// variable in the specified block.
				bool HasValueForBlock(unsigned Var, BasicBlock *BB);

				/// Perform all the necessary updates, including new PHI-nodes insertion and
				/// the requested uses update.
				///
				/// The function requires dominator tree DT, which is used for computing
				/// locations for new phi-nodes insertions. If a nonnull pointer to a vector
				/// InsertedPHIs is passed, all the new phi-nodes will be added to this
				/// vector.
				void RewriteAllUses(DominatorTree *DT,
				SmallVectorImpl<PHINode > InsertedPHIs = nullptr);
				};

				} // end namespace llvm

				#endif // LLVM_TRANSFORMS_UTILS_SSAUPDATERBULK_H

llvm/trunk/lib/Transforms/Utils/CMakeLists.txt

Show All 38 Lines	add_llvm_library(LLVMTransformUtils
MetaRenamer.cpp		MetaRenamer.cpp
ModuleUtils.cpp		ModuleUtils.cpp
NameAnonGlobals.cpp		NameAnonGlobals.cpp
OrderedInstructions.cpp		OrderedInstructions.cpp
PredicateInfo.cpp		PredicateInfo.cpp
PromoteMemoryToRegister.cpp		PromoteMemoryToRegister.cpp
StripGCRelocates.cpp		StripGCRelocates.cpp
SSAUpdater.cpp		SSAUpdater.cpp
		SSAUpdaterBulk.cpp
SanitizerStats.cpp		SanitizerStats.cpp
SimplifyCFG.cpp		SimplifyCFG.cpp
SimplifyIndVar.cpp		SimplifyIndVar.cpp
SimplifyInstructions.cpp		SimplifyInstructions.cpp
SimplifyLibCalls.cpp		SimplifyLibCalls.cpp
SplitModule.cpp		SplitModule.cpp
StripNonLineTableDebugInfo.cpp		StripNonLineTableDebugInfo.cpp
SymbolRewriter.cpp		SymbolRewriter.cpp
Show All 12 Lines

llvm/trunk/lib/Transforms/Utils/SSAUpdaterBulk.cpp

				//===- SSAUpdaterBulk.cpp - Unstructured SSA Update Tool ------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the SSAUpdaterBulk class.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/Utils/SSAUpdaterBulk.h"
				#include "llvm/Analysis/IteratedDominanceFrontier.h"
				#include "llvm/IR/BasicBlock.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/Use.h"
				#include "llvm/IR/Value.h"

				using namespace llvm;

				#define DEBUG_TYPE "ssaupdaterbulk"

				/// Add a new variable to the SSA rewriter. This needs to be called before
				/// AddAvailableValue or AddUse calls.
				void SSAUpdaterBulk::AddVariable(unsigned Var, StringRef Name, Type *Ty) {
				assert(Rewrites.find(Var) == Rewrites.end() && "Variable added twice!");
				RewriteInfo RI(Name, Ty);
				Rewrites[Var] = RI;
				}

				/// Indicate that a rewritten value is available in the specified block with the
				/// specified value.
				void SSAUpdaterBulk::AddAvailableValue(unsigned Var, BasicBlock BB, Value V) {
				assert(Rewrites.find(Var) != Rewrites.end() && "Should add variable first!");
				Rewrites[Var].Defines[BB] = V;
				}

				/// Record a use of the symbolic value. This use will be updated with a
				/// rewritten value when RewriteAllUses is called.
				void SSAUpdaterBulk::AddUse(unsigned Var, Use *U) {
				assert(Rewrites.find(Var) != Rewrites.end() && "Should add variable first!");
				Rewrites[Var].Uses.insert(U);
				}

				/// Return true if the SSAUpdater already has a value for the specified variable
				/// in the specified block.
				bool SSAUpdaterBulk::HasValueForBlock(unsigned Var, BasicBlock *BB) {
				return Rewrites.count(Var) ? Rewrites[Var].Defines.count(BB) : false;
				}

				// Compute value at the given block BB. We either should already know it, or we
				// should be able to recursively reach it going up dominator tree.
				Value SSAUpdaterBulk::computeValueAt(BasicBlock BB, RewriteInfo &R,
				DominatorTree *DT) {
				if (!R.Defines.count(BB)) {
				if (PredCache.get(BB).size()) {
				BasicBlock *IDom = DT->getNode(BB)->getIDom()->getBlock();
				R.Defines[BB] = computeValueAt(IDom, R, DT);
				} else
				R.Defines[BB] = UndefValue::get(R.Ty);
				}
				return R.Defines[BB];
				}

				/// Given sets of UsingBlocks and DefBlocks, compute the set of LiveInBlocks.
				/// This is basically a subgraph limited by DefBlocks and UsingBlocks.
				static void
				ComputeLiveInBlocks(const SmallPtrSetImpl<BasicBlock *> &UsingBlocks,
				const SmallPtrSetImpl<BasicBlock *> &DefBlocks,
				SmallPtrSetImpl<BasicBlock *> &LiveInBlocks) {
				// To determine liveness, we must iterate through the predecessors of blocks
				// where the def is live. Blocks are added to the worklist if we need to
				// check their predecessors. Start with all the using blocks.
				SmallVector<BasicBlock *, 64> LiveInBlockWorklist(UsingBlocks.begin(),
				UsingBlocks.end());

				// Now that we have a set of blocks where the phi is live-in, recursively add
				// their predecessors until we find the full region the value is live.
				while (!LiveInBlockWorklist.empty()) {
				BasicBlock *BB = LiveInBlockWorklist.pop_back_val();

				// The block really is live in here, insert it into the set. If already in
				// the set, then it has already been processed.
				if (!LiveInBlocks.insert(BB).second)
				continue;

				// Since the value is live into BB, it is either defined in a predecessor or
				// live into it to. Add the preds to the worklist unless they are a
				// defining block.
				for (BasicBlock *P : predecessors(BB)) {
				// The value is not live into a predecessor if it defines the value.
				if (DefBlocks.count(P))
				continue;

				// Otherwise it is, add to the worklist.
				LiveInBlockWorklist.push_back(P);
				}
				}
				}

				/// Helper function for finding a block which should have a value for the given
				/// user. For PHI-nodes this block is the corresponding predecessor, for other
				/// instructions it's their parent block.
				static BasicBlock getUserBB(Use U) {
				auto *User = cast<Instruction>(U->getUser());

				if (auto *UserPN = dyn_cast<PHINode>(User))
				return UserPN->getIncomingBlock(*U);
				else
				return User->getParent();
				}

				/// Perform all the necessary updates, including new PHI-nodes insertion and the
				/// requested uses update.
				void SSAUpdaterBulk::RewriteAllUses(DominatorTree *DT,
				SmallVectorImpl<PHINode > InsertedPHIs) {
				for (auto P : Rewrites) {
				// Compute locations for new phi-nodes.
				// For that we need to initialize DefBlocks from definitions in R.Defines,
				// UsingBlocks from uses in R.Uses, then compute LiveInBlocks, and then use
				// this set for computing iterated dominance frontier (IDF).
				// The IDF blocks are the blocks where we need to insert new phi-nodes.
				ForwardIDFCalculator IDF(*DT);
				RewriteInfo &R = P.second;
				SmallPtrSet<BasicBlock *, 2> DefBlocks;
				for (auto Def : R.Defines)
				DefBlocks.insert(Def.first);
				IDF.setDefiningBlocks(DefBlocks);

				SmallPtrSet<BasicBlock *, 2> UsingBlocks;
				for (auto U : R.Uses)
				UsingBlocks.insert(getUserBB(U));

				SmallVector<BasicBlock *, 32> IDFBlocks;
				SmallPtrSet<BasicBlock *, 32> LiveInBlocks;
				ComputeLiveInBlocks(UsingBlocks, DefBlocks, LiveInBlocks);
				IDF.resetLiveInBlocks();
				IDF.setLiveInBlocks(LiveInBlocks);
				IDF.calculate(IDFBlocks);

				// We've computed IDF, now insert new phi-nodes there.
				SmallVector<PHINode *, 4> InsertedPHIsForVar;
				for (auto FrontierBB : IDFBlocks) {
				IRBuilder<> B(FrontierBB, FrontierBB->begin());
				PHINode *PN = B.CreatePHI(R.Ty, 0, R.Name);
				R.Defines[FrontierBB] = PN;
				InsertedPHIsForVar.push_back(PN);
				if (InsertedPHIs)
				InsertedPHIs->push_back(PN);
				}

				// Fill in arguments of the inserted PHIs.
				for (auto PN : InsertedPHIsForVar) {
				BasicBlock *PBB = PN->getParent();
				for (BasicBlock *Pred : PredCache.get(PBB))
				PN->addIncoming(computeValueAt(Pred, R, DT), Pred);
				}

				// Rewrite actual uses with the inserted definitions.
				for (auto U : R.Uses) {
				Value *V = computeValueAt(getUserBB(U), R, DT);
				Value *OldVal = U->get();
				// Notify that users of the existing value that it is being replaced.
				if (OldVal != V && OldVal->hasValueHandle())
				ValueHandleBase::ValueIsRAUWd(OldVal, V);
				U->set(V);
				}
				}
				}

llvm/trunk/unittests/Transforms/Utils/CMakeLists.txt

Show All 9 Lines	add_llvm_unittest(UtilsTests
ASanStackFrameLayoutTest.cpp		ASanStackFrameLayoutTest.cpp
BasicBlockUtils.cpp		BasicBlockUtils.cpp
Cloning.cpp		Cloning.cpp
CodeExtractor.cpp		CodeExtractor.cpp
FunctionComparator.cpp		FunctionComparator.cpp
IntegerDivision.cpp		IntegerDivision.cpp
Local.cpp		Local.cpp
OrderedInstructions.cpp		OrderedInstructions.cpp
		SSAUpdaterBulk.cpp
ValueMapperTest.cpp		ValueMapperTest.cpp
)		)

llvm/trunk/unittests/Transforms/Utils/SSAUpdaterBulk.cpp

				//===- SSAUpdaterBulk.cpp - Unit tests for SSAUpdaterBulk -----------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/Utils/SSAUpdaterBulk.h"
				#include "llvm/AsmParser/Parser.h"
				#include "llvm/IR/BasicBlock.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/LLVMContext.h"
				#include "llvm/IR/Module.h"
				#include "gtest/gtest.h"

				using namespace llvm;

				TEST(SSAUpdaterBulk, SimpleMerge) {
				SSAUpdaterBulk Updater;
				LLVMContext C;
				Module M("SSAUpdaterTest", C);
				IRBuilder<> B(C);
				Type *I32Ty = B.getInt32Ty();
				auto *F = Function::Create(FunctionType::get(B.getVoidTy(), {I32Ty}, false),
				GlobalValue::ExternalLinkage, "F", &M);

				// Generate a simple program:
				// if:
				// br i1 true, label %true, label %false
				// true:
				// %1 = add i32 %0, 1
				// %2 = sub i32 %0, 2
				// br label %merge
				// false:
				// %3 = add i32 %0, 3
				// %4 = sub i32 %0, 4
				// br label %merge
				// merge:
				// %5 = add i32 %1, 5
				// %6 = add i32 %3, 6
				// %7 = add i32 %2, %4
				// %8 = sub i32 %2, %4
				Argument FirstArg = &(F->arg_begin());
				BasicBlock *IfBB = BasicBlock::Create(C, "if", F);
				BasicBlock *TrueBB = BasicBlock::Create(C, "true", F);
				BasicBlock *FalseBB = BasicBlock::Create(C, "false", F);
				BasicBlock *MergeBB = BasicBlock::Create(C, "merge", F);

				B.SetInsertPoint(IfBB);
				B.CreateCondBr(B.getTrue(), TrueBB, FalseBB);

				B.SetInsertPoint(TrueBB);
				Value *AddOp1 = B.CreateAdd(FirstArg, ConstantInt::get(I32Ty, 1));
				Value *SubOp1 = B.CreateSub(FirstArg, ConstantInt::get(I32Ty, 2));
				B.CreateBr(MergeBB);

				B.SetInsertPoint(FalseBB);
				Value *AddOp2 = B.CreateAdd(FirstArg, ConstantInt::get(I32Ty, 3));
				Value *SubOp2 = B.CreateSub(FirstArg, ConstantInt::get(I32Ty, 4));
				B.CreateBr(MergeBB);

				B.SetInsertPoint(MergeBB, MergeBB->begin());
				auto *I1 = cast<Instruction>(B.CreateAdd(AddOp1, ConstantInt::get(I32Ty, 5)));
				auto *I2 = cast<Instruction>(B.CreateAdd(AddOp2, ConstantInt::get(I32Ty, 6)));
				auto *I3 = cast<Instruction>(B.CreateAdd(SubOp1, SubOp2));
				auto *I4 = cast<Instruction>(B.CreateSub(SubOp1, SubOp2));

				// Now rewrite uses in instructions %5, %6, %7. They need to use a phi, which
				// SSAUpdater should insert into %merge.
				// Intentionally don't touch %8 to see that SSAUpdater only changes
				// instructions that were explicitly specified.
				Updater.AddVariable(0, "a", I32Ty);
				Updater.AddAvailableValue(0, TrueBB, AddOp1);
				Updater.AddAvailableValue(0, FalseBB, AddOp2);
				Updater.AddUse(0, &I1->getOperandUse(0));
				Updater.AddUse(0, &I2->getOperandUse(0));

				Updater.AddVariable(1, "b", I32Ty);
				Updater.AddAvailableValue(1, TrueBB, SubOp1);
				Updater.AddAvailableValue(1, FalseBB, SubOp2);
				Updater.AddUse(1, &I3->getOperandUse(0));
				Updater.AddUse(1, &I3->getOperandUse(1));

				DominatorTree DT(*F);
				Updater.RewriteAllUses(&DT);

				// Check how %5 and %6 were rewritten.
				PHINode *UpdatePhiA = dyn_cast_or_null<PHINode>(I1->getOperand(0));
				EXPECT_NE(UpdatePhiA, nullptr);
				EXPECT_EQ(UpdatePhiA->getIncomingValueForBlock(TrueBB), AddOp1);
				EXPECT_EQ(UpdatePhiA->getIncomingValueForBlock(FalseBB), AddOp2);
				EXPECT_EQ(UpdatePhiA, dyn_cast_or_null<PHINode>(I1->getOperand(0)));

				// Check how %7 was rewritten.
				PHINode *UpdatePhiB = dyn_cast_or_null<PHINode>(I3->getOperand(0));
				EXPECT_EQ(UpdatePhiB->getIncomingValueForBlock(TrueBB), SubOp1);
				EXPECT_EQ(UpdatePhiB->getIncomingValueForBlock(FalseBB), SubOp2);
				EXPECT_EQ(UpdatePhiB, dyn_cast_or_null<PHINode>(I3->getOperand(1)));

				// Check that %8 was kept untouched.
				EXPECT_EQ(I4->getOperand(0), SubOp1);
				EXPECT_EQ(I4->getOperand(1), SubOp2);
				}

				TEST(SSAUpdaterBulk, Irreducible) {
				SSAUpdaterBulk Updater;
				LLVMContext C;
				Module M("SSAUpdaterTest", C);
				IRBuilder<> B(C);
				Type *I32Ty = B.getInt32Ty();
				auto *F = Function::Create(FunctionType::get(B.getVoidTy(), {I32Ty}, false),
				GlobalValue::ExternalLinkage, "F", &M);

				// Generate a small program with a multi-entry loop:
				// if:
				// %1 = add i32 %0, 1
				// br i1 true, label %loopmain, label %loopstart
				//
				// loopstart:
				// %2 = add i32 %0, 2
				// br label %loopmain
				//
				// loopmain:
				// %3 = add i32 %1, 3
				// br i1 true, label %loopstart, label %afterloop
				//
				// afterloop:
				// %4 = add i32 %2, 4
				// ret i32 %0
				Argument FirstArg = &F->arg_begin();
				BasicBlock *IfBB = BasicBlock::Create(C, "if", F);
				BasicBlock *LoopStartBB = BasicBlock::Create(C, "loopstart", F);
				BasicBlock *LoopMainBB = BasicBlock::Create(C, "loopmain", F);
				BasicBlock *AfterLoopBB = BasicBlock::Create(C, "afterloop", F);

				B.SetInsertPoint(IfBB);
				Value *AddOp1 = B.CreateAdd(FirstArg, ConstantInt::get(I32Ty, 1));
				B.CreateCondBr(B.getTrue(), LoopMainBB, LoopStartBB);

				B.SetInsertPoint(LoopStartBB);
				Value *AddOp2 = B.CreateAdd(FirstArg, ConstantInt::get(I32Ty, 2));
				B.CreateBr(LoopMainBB);

				B.SetInsertPoint(LoopMainBB);
				auto *I1 = cast<Instruction>(B.CreateAdd(AddOp1, ConstantInt::get(I32Ty, 3)));
				B.CreateCondBr(B.getTrue(), LoopStartBB, AfterLoopBB);

				B.SetInsertPoint(AfterLoopBB);
				auto *I2 = cast<Instruction>(B.CreateAdd(AddOp2, ConstantInt::get(I32Ty, 4)));
				ReturnInst *Return = B.CreateRet(FirstArg);

				// Now rewrite uses in instructions %3, %4, and 'ret i32 %0'. Only %4 needs a
				// new phi, others should be able to work with existing values.
				// The phi for %4 should be inserted into LoopMainBB and should look like
				// this:
				// %b = phi i32 [ %2, %loopstart ], [ undef, %if ]
				// No other rewrites should be made.

				// Add use in %3.
				Updater.AddVariable(0, "c", I32Ty);
				Updater.AddAvailableValue(0, IfBB, AddOp1);
				Updater.AddUse(0, &I1->getOperandUse(0));

				// Add use in %4.
				Updater.AddVariable(1, "b", I32Ty);
				Updater.AddAvailableValue(1, LoopStartBB, AddOp2);
				Updater.AddUse(1, &I2->getOperandUse(0));

				// Add use in the return instruction.
				Updater.AddVariable(2, "a", I32Ty);
				Updater.AddAvailableValue(2, &F->getEntryBlock(), FirstArg);
				Updater.AddUse(2, &Return->getOperandUse(0));

				// Save all inserted phis into a vector.
				SmallVector<PHINode *, 8> Inserted;
				DominatorTree DT(*F);
				Updater.RewriteAllUses(&DT, &Inserted);

				// Only one phi should have been inserted.
				EXPECT_EQ(Inserted.size(), 1u);

				// I1 and Return should use the same values as they used before.
				EXPECT_EQ(I1->getOperand(0), AddOp1);
				EXPECT_EQ(Return->getOperand(0), FirstArg);

				// I2 should use the new phi.
				PHINode *UpdatePhi = dyn_cast_or_null<PHINode>(I2->getOperand(0));
				EXPECT_NE(UpdatePhi, nullptr);
				EXPECT_EQ(UpdatePhi->getIncomingValueForBlock(LoopStartBB), AddOp2);
				EXPECT_EQ(UpdatePhi->getIncomingValueForBlock(IfBB), UndefValue::get(I32Ty));
				}

This is an archive of the discontinued LLVM Phabricator instance.

[PR16756] JumpThreading: explicitly update SSA rather than use SSAUpdater.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 141766

llvm/trunk/include/llvm/Transforms/Utils/SSAUpdaterBulk.h

llvm/trunk/lib/Transforms/Utils/CMakeLists.txt

llvm/trunk/lib/Transforms/Utils/SSAUpdaterBulk.cpp

llvm/trunk/unittests/Transforms/Utils/CMakeLists.txt

llvm/trunk/unittests/Transforms/Utils/SSAUpdaterBulk.cpp

[PR16756] JumpThreading: explicitly update SSA rather than use SSAUpdater.
ClosedPublic