This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/ADT/
-
llvm/
-
ADT/
1
SetOperations.h
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
25/49
SimplifyCFG.cpp
-
test/Transforms/SimplifyCFG/
-
Transforms/
-
SimplifyCFG/
-
merge-cond-stores.ll

Differential D13697

[SimplifyCFG] Merge conditional stores
AbandonedPublic

Authored by jmolloy on Oct 13 2015, 7:30 AM.

Download Raw Diff

Details

Reviewers

majnemer
sanjoy
hfinkel

Summary

We can often end up with conditional stores that cannot be speculated. They can come from fairly simple, idiomatic code:

if (c & flag1)
  *a = x;
if (c & flag2)
  *a = y;
...

There is no dominating or post-dominating store to a, so it is not legal to move the store unconditionally to the end of the sequence and cache the intermediate result in a register, as we would like to.

It is, however, legal to merge the stores together and do the store once:

tmp = undef;
if (c & flag1)
  tmp = x;
if (c & flag2)
  tmp = y;
if (c & flag1 || c & flag2)
  *a = tmp;

The real power in this optimization is that it allows arbitrary length ladders such as these to be completely and trivially if-converted. The typical code I'd expect this to trigger on often uses binary-AND with constants as the condition (as in the above example), which means the ending condition can simply be truncated into a single binary-AND too: 'if (c & (flag1|flag2))'. As in the general case there are bitwise operators here, the ladder can often be optimized further too.

This optimization involves potentially increasing register pressure. Even in the simplest case, the lifetime of the first predicate is extended. This can be elided in some cases such as using binary-AND on constants, but not in the general case. Threading 'tmp' through all branches can also increase register pressure.

The optimization as in this patch is enabled by default but kept in a very conservative mode. It will only optimize if it thinks the resultant code should be if-convertable, and additionally if it can thread 'tmp' through at least one existing PHI, so it will only ever in the worst case create one more PHI and extend the lifetime of a predicate.

This doesn't trigger much in LNT, unfortunately, but it does trigger in a big way in a third party test suite.

LNT diff: one test regresses by 2%, another improves by 3%.

Diff Detail

Repository: rL LLVM

Event Timeline

jmolloy retitled this revision from to [SimplifyCFG] Merge conditional stores.Oct 13 2015, 7:30 AM

jmolloy updated this object.

jmolloy added reviewers: sanjoy, majnemer.

jmolloy updated this revision to Diff 37250.Oct 13 2015, 7:30 AM

jmolloy set the repository for this revision to rL LLVM.

jmolloy added a subscriber: llvm-commits.

jmolloy added a subscriber: mcrosier.

reames added a subscriber: reames.Oct 14 2015, 2:37 PM

reames added inline comments.

include/llvm/Transforms/Utils/Local.h
141 ↗	(On Diff #37250)	This seems potentially dangerous. Since SimplifyCFG doesn't preserve AA, are there any situations where a transform could have invalidated cached information, and then your transform runs?
lib/Transforms/Utils/SimplifyCFG.cpp
2407	Is there a simple form of this which doesn't require anything other than trivial AA? If so, I'd start with that, get the transform working and in, then generalize.
2436	There's a good chance that pstore isn't from the conditional block at all. You might want to check dominance. Optionally, you can speculate pstore out of the conditional block is it's safe to speculate.
2461	This long wall of code is hard to follow. Please use some well named helper functions.
2505	One think you might consider: are there cases where we can insert a store down the other path? Doing so in general is clearly a violation of both dereferenceability and the memory model, but what about a conditional store to a dereferenceable location which is known to be thread local? I ran across a similar case in LICM's store promotion which I'm thinking about implementing since it would really help one of my benchmarks. Just throwing out the idea in case you find it helpful.
2559	Spacing. Clang-format?
2562	This chain of checks is hard to read. Could it be restructured along with the above code to make it simpler?
2571	Some of these are legality, some are profitability, some are implementation limits. Please separate and comment each class.

Forgot to say: very interesting transform. I'm glad to see you're proposing this. I'm a bit concerned about the profitability aspects though. You might want to further restrict the conditions to be things which we know can be combined/folded cheaply.

Hi Philip,

Thanks very much for the review. A new version is attached.

Cheers,

James

jmolloy marked 4 inline comments as done.Oct 21 2015, 6:36 AM

jmolloy added inline comments.

lib/Transforms/Utils/SimplifyCFG.cpp
2440	Actually, PStore must come from a conditional block (see line 2410).
2509	This is a good possible extension to this optimization! Probably best to wait until it's gone in in its current, more simple form though.

I haven't grokked the whole patch yet, but some comments inline based off of what I've understood so far. Also, please upload the diff with full context.

lib/Transforms/Utils/SimplifyCFG.cpp
2385	These are suggestions, I'm not a 100% sure that these will actually make the code more readable: Extract out a `BasicBlock *OtherIncomingBlock` variable Put an assert in the loop verifying that there are only two incoming blocks (or remove the loop and have an if/else in its place). IOW, make it obvious that we're not dealing with blocks with an arbitrary number of preds.
2416	I think you need `QStore->getPointerOperand()->getType()` here. All `StoreInst` s have the type `void`.
2450	What about things like `AtomicRMWInst`, `AtomicCmpXchgInst` and `FenceInst`? I think you're better off querying `Instruction::mayReadOrWriteMemory`.
2517	However, this is a pessimization if both conditions are always false, and the resulting code does not simplify further, right?
2597	Nit: spacing should be `SmallVector<Value *, 4>`.
2598	I'm not sure this is correct -- `std::set_intersection` expects both the ranges to be sorted, and I don't think `SmallPtrSet` is guaranteed to iterate over values in sorted order. I think it is best to use `llvm::set_intersect` here.
2662	Wrap the line? Actually, I'll just assume you'll run clang-format before checkin. :)
3621	Unrelated whitespace changes?
5101	Whitespace damage?

sanjoy requested changes to this revision.Oct 24 2015, 1:23 AM

sanjoy edited edge metadata.

This revision now requires changes to proceed.Oct 24 2015, 1:23 AM

Hi Sanjoy,

Thanks for the review! New patch attached (with full context, sorry!).

Cheers,

James

jmolloy marked 5 inline comments as done.Oct 26 2015, 8:27 AM

jmolloy added inline comments.

include/llvm/ADT/SetOperations.h
42	This was required because SmallPtrSet doesn't have ::key_type. auto is obviously a better way to do this. I'll obviously apply this as a separate commit.
lib/Transforms/Utils/SimplifyCFG.cpp
2416	Ouch! good catch, thanks!
2517	In that case, yes it is. The important part is "if the code doesn't simplify further" though - even if both conditions are always false, if we can if-convert this is very likely to be a win. The heuristics at the moment are trying to catch cases where we know we can if-convert.
2673	I've been selectively running clang-format on the bits I've touched, and missed this bit :(
5112	Yes, this was me undoing a change and blindingly running clang-format. More below, where clang-format has ripped trailing whitespace away :(

sanjoy requested changes to this revision.Oct 27 2015, 1:38 AM

sanjoy edited edge metadata.

sanjoy added inline comments.

lib/Transforms/Utils/SimplifyCFG.cpp
2352	I'd just iterate through the two blocks instead of iterating over all of `Address`'s uses, unless you have reason to believe that `Address` will have only a small number of uses. Also see comment on `IsWorthwhile`.
2366	Have you considered using `SSAUpdater` here? This looks like its duplicating logic that already exists there. Sorry for not bringing this up earlier!
2417	Why do you need to check the types? Aren't they both stores to `Address`?
2424	When you say "if-converted" what transform are you talking about specifically? Is it `CodeGen/EarlyIfConversion.cpp` or `CodeGen/IfConversion.cpp` or something else?
2426	If you move this check on the block size to before the calls to `findUniqueStoreInBlocks` then you'll know in `findUniqueStoreInBlocks` that the blocks are guaranteed to be small (assuming `MergeCondStoresAggressively` is `false`) and scanning through all of the instructions in the blocks won't hurt much.
2429	Might be better to have this as a whitelist -- "all instructions that do not touch memory and the stores we already know about" or something like that.
2450	I'm not sure what the lambda adds here -- why not directly call `mayReadOrWriteMemory`?
2476	Why do you need to do this? The only uses of TB I see are the `XStore->getParent() == XTB` checks; and they should be fine with a `nullptr` `XTB`, no?
2487	This one isn't used.
2544	Minor nit: I'd use `InvertPCond` as "swapping" a predicate means something slightly different in LLVM.
2566	Why do you need to check `BB` for `nullptr`?
2582	You can shorten this using initializer lists: for (auto BB : { PTB, PFB }) { if (!BB) continue; for (auto &I : BB) if (StoreInst *SI = ... and possibly even more with initializer lists of `std::pair`.

This revision now requires changes to proceed.Oct 27 2015, 1:38 AM

Hi Sanjoy,

Thanks again for this fantastic review! I agree with most of the comments ; patch updated.

Cheers,

James

lib/Transforms/Utils/SimplifyCFG.cpp
2366	I've just tried using SSAUpdater. Actually it really doesn't help much - the search for an appropriate already-existing PHI here is more than SSAUpdater can do itself, so replacing this function with SSAUpdater pessimizes some code (we end up with another PHI, which becomes another select).
2424	I'm talking about the if-conversion done by SimplifyCFG itself. SimplifyCFG refers to this as PHI node folding (see -phi-node-folding-threshold).
2429	This kind of already is a whitelist, is it not? We're saying that if there's any instruction that isn't one of those types, it's not worthwhile. Those are explicitly the instructions that we believe can be if-converted. StoreInsts don't matter because we know (from findUniqueStoreInBlocks) that there can only be zero or one store.
2566	I don't. Removed.
2582	Unfortunately type deduction wasn't clever enough to work out an initializer list of std::pair (and it turns out the std::pair elements end up being const too), so I've just done the simpler transform.

jmolloy updated this revision to Diff 38534.Oct 27 2015, 6:18 AM

jmolloy edited edge metadata.

jmolloy removed rL LLVM as the repository for this revision.

jmolloy marked an inline comment as done.

Hi Sanjoy,

Do you have time to do another review round on this patch?

Cheers,

James

I think this is fairly close to be ready to check in; mostly minor stuff inline.

While I'm comfortable giving a final LGTM once these comments are addressed, it will be more preferable if you can recruit someone more familiar with SimplifyCFG than me to take a final look.

lib/Transforms/Utils/SimplifyCFG.cpp
2353	I'd use an early `continue` here, will reduce the nesting a bit.
2356	This check isn't necessary any more -- `SI` will always be either in `BB1` or `BB2`.
2383	This will insert a redundant PHI if the first PHI in `Succ` only has one correct incoming value, but is followed by a PHI that has both `V` and `AlternateV` correctly as incoming values. How about extracting out a lambda that checks if a phi node does what you want (both for `V` and `AlternateV`) and use that in this loop? This is minor though, and if you don't prefer doing this then I'm fine with it.
2414	Fix the wrapping here.
2558	Nit: "dominating"

This revision now requires changes to proceed.Nov 2 2015, 7:04 PM

Hi Sanjoy,

Thanks again for the review! I'll try and get David or Hal to give it a once-over before I commit.

Cheers,

James

jmolloy updated this revision to Diff 39048.Nov 3 2015, 5:24 AM

jmolloy edited edge metadata.

jmolloy set the repository for this revision to rL LLVM.

hfinkel added a subscriber: hfinkel.Nov 3 2015, 7:54 AM

hfinkel added inline comments.

lib/Transforms/Utils/SimplifyCFG.cpp
2349	This function signature and comment are out of date (Address is not used here).
2368	Please comment on what AlternativeV means (when it is not null).
2388	This seems somewhat dangerous as an assertion, unless you specifically check it as a precondition somewhere. Otherwise, it can be violated in non-obvious ways (the block could have its address taken, for example).
2423	You also need to skip debug intrinsics. Bitcasts. but only on pointer types, are probably also a good idea.
2499	You're not preserving here any of the (aliasing) metadata that might have been present on the stores.

Addressed Hal's comments.

LGTM.

Thanks Hal, committed in r252051.

junbuml added a subscriber: junbuml.Nov 4 2015, 11:37 AM

Revision Contents

Path

Size

include/

llvm/

ADT/

SetOperations.h

2 lines

lib/

Transforms/

Utils/

SimplifyCFG.cpp

315 lines

test/

Transforms/

SimplifyCFG/

merge-cond-stores.ll

241 lines

Diff 39194

include/llvm/ADT/SetOperations.h

	Show All 33 Lines
	/// set_intersect(A, B) - Compute A := A ^ B			/// set_intersect(A, B) - Compute A := A ^ B
	/// Identical to set_intersection, except that it works on set<>'s and			/// Identical to set_intersection, except that it works on set<>'s and
	/// is nicer to use. Functionally, this iterates through S1, removing			/// is nicer to use. Functionally, this iterates through S1, removing
	/// elements that are not contained in S2.			/// elements that are not contained in S2.
	///			///
	template <class S1Ty, class S2Ty>			template <class S1Ty, class S2Ty>
	void set_intersect(S1Ty &S1, const S2Ty &S2) {			void set_intersect(S1Ty &S1, const S2Ty &S2) {
	for (typename S1Ty::iterator I = S1.begin(); I != S1.end();) {			for (typename S1Ty::iterator I = S1.begin(); I != S1.end();) {
	const typename S1Ty::key_type &E = *I;			const auto &E = *I;
				jmolloyAuthorUnsubmitted Not Done Reply Inline Actions This was required because SmallPtrSet doesn't have ::key_type. auto is obviously a better way to do this. I'll obviously apply this as a separate commit. jmolloy: This was required because SmallPtrSet doesn't have ::key_type. auto is obviously a better way…
	++I;			++I;
	if (!S2.count(E)) S1.erase(E); // Erase element if not in S2			if (!S2.count(E)) S1.erase(E); // Erase element if not in S2
	}			}
	}			}

	/// set_difference(A, B) - Return A - B			/// set_difference(A, B) - Return A - B
	///			///
	template <class S1Ty, class S2Ty>			template <class S1Ty, class S2Ty>
	Show All 21 Lines

lib/Transforms/Utils/SimplifyCFG.cpp

//===- SimplifyCFG.cpp - Code to perform CFG simplification ---------------===//		//===- SimplifyCFG.cpp - Code to perform CFG simplification ---------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Peephole optimize the CFG.		// Peephole optimize the CFG.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
		#include "llvm/ADT/SetOperations.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
static cl::opt<bool>		static cl::opt<bool>
SinkCommon("simplifycfg-sink-common", cl::Hidden, cl::init(true),		SinkCommon("simplifycfg-sink-common", cl::Hidden, cl::init(true),
cl::desc("Sink common instructions down to the end block"));		cl::desc("Sink common instructions down to the end block"));

static cl::opt<bool> HoistCondStores(		static cl::opt<bool> HoistCondStores(
"simplifycfg-hoist-cond-stores", cl::Hidden, cl::init(true),		"simplifycfg-hoist-cond-stores", cl::Hidden, cl::init(true),
cl::desc("Hoist conditional stores if an unconditional store precedes"));		cl::desc("Hoist conditional stores if an unconditional store precedes"));

		static cl::opt<bool> MergeCondStores(
		"simplifycfg-merge-cond-stores", cl::Hidden, cl::init(true),
		cl::desc("Hoist conditional stores even if an unconditional store does not "
		"precede - hoist multiple conditional stores into a single "
		"predicated store"));

		static cl::opt<bool> MergeCondStoresAggressively(
		"simplifycfg-merge-cond-stores-aggressively", cl::Hidden, cl::init(false),
		cl::desc("When merging conditional stores, do so even if the resultant "
		"basic blocks are unlikely to be if-converted as a result"));

STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");		STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");
STATISTIC(NumLinearMaps, "Number of switch instructions turned into linear mapping");		STATISTIC(NumLinearMaps, "Number of switch instructions turned into linear mapping");
STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");		STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");
STATISTIC(NumLookupTablesHoles, "Number of switch instructions turned into lookup tables (holes checked)");		STATISTIC(NumLookupTablesHoles, "Number of switch instructions turned into lookup tables (holes checked)");
STATISTIC(NumTableCmpReuses, "Number of reused switch table lookup compares");		STATISTIC(NumTableCmpReuses, "Number of reused switch table lookup compares");
STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");		STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");
STATISTIC(NumSpeculations, "Number of speculative executed instructions");		STATISTIC(NumSpeculations, "Number of speculative executed instructions");

▲ Show 20 Lines • Show All 2,243 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)
if (isa<DbgInfoIntrinsic>(*I))		if (isa<DbgInfoIntrinsic>(*I))
I->clone()->insertBefore(PBI);		I->clone()->insertBefore(PBI);

return true;		return true;
}		}
return false;		return false;
}		}

		// If there is only one store in BB1 and BB2, return it, otherwise return
		// nullptr.
		static StoreInst findUniqueStoreInBlocks(BasicBlock BB1, BasicBlock *BB2) {
		hfinkelUnsubmitted Done Reply Inline Actions This function signature and comment are out of date (Address is not used here). hfinkel: This function signature and comment are out of date (Address is not used here).
		StoreInst *S = nullptr;
		for (auto *BB : {BB1, BB2}) {
		if (!BB)
		sanjoyUnsubmitted Done Reply Inline Actions I'd just iterate through the two blocks instead of iterating over all of `Address`'s uses, unless you have reason to believe that `Address` will have only a small number of uses. Also see comment on `IsWorthwhile`. sanjoy: I'd just iterate through the two blocks instead of iterating over all of `Address`'s uses…
		continue;
		sanjoyUnsubmitted Done Reply Inline Actions I'd use an early `continue` here, will reduce the nesting a bit. sanjoy: I'd use an early `continue` here, will reduce the nesting a bit.
		for (auto &I : *BB)
		if (auto *SI = dyn_cast<StoreInst>(&I)) {
		if (S)
		sanjoyUnsubmitted Done Reply Inline Actions This check isn't necessary any more -- `SI` will always be either in `BB1` or `BB2`. sanjoy: This check isn't necessary any more -- `SI` will always be either in `BB1` or `BB2`.
		// Multiple stores seen.
		return nullptr;
		else
		S = SI;
		}
		}
		return S;
		}

		static Value ensureValueAvailableInSuccessor(Value V, BasicBlock *BB,
		sanjoyUnsubmitted Not Done Reply Inline Actions Have you considered using `SSAUpdater` here? This looks like its duplicating logic that already exists there. Sorry for not bringing this up earlier! sanjoy: Have you considered using `SSAUpdater` here? This looks like its duplicating logic that…
		jmolloyAuthorUnsubmitted Not Done Reply Inline Actions I've just tried using SSAUpdater. Actually it really doesn't help much - the search for an appropriate already-existing PHI here is more than SSAUpdater can do itself, so replacing this function with SSAUpdater pessimizes some code (we end up with another PHI, which becomes another select). jmolloy: I've just tried using SSAUpdater. Actually it really doesn't help much - the search for an…
		Value *AlternativeV = nullptr) {
		// PHI is going to be a PHI node that allows the value V that is defined in
		hfinkelUnsubmitted Done Reply Inline Actions Please comment on what AlternativeV means (when it is not null). hfinkel: Please comment on what AlternativeV means (when it is not null).
		// BB to be referenced in BB's only successor.
		//
		// If AlternativeV is nullptr, the only value we care about in PHI is V. It
		// doesn't matter to us what the other operand is (it'll never get used). We
		// could just create a new PHI with an undef incoming value, but that could
		// increase register pressure if EarlyCSE/InstCombine can't fold it with some
		// other PHI. So here we directly look for some PHI in BB's successor with V
		// as an incoming operand. If we find one, we use it, else we create a new
		// one.
		//
		// If AlternativeV is not nullptr, we care about both incoming values in PHI.
		// PHI must be exactly: phi <ty> [ %BB, %V ], [ %OtherBB, %AlternativeV]
		// where OtherBB is the single other predecessor of BB's only successor.
		PHINode *PHI = nullptr;
		BasicBlock *Succ = BB->getSingleSuccessor();
		sanjoyUnsubmitted Done Reply Inline Actions This will insert a redundant PHI if the first PHI in `Succ` only has one correct incoming value, but is followed by a PHI that has both `V` and `AlternateV` correctly as incoming values. How about extracting out a lambda that checks if a phi node does what you want (both for `V` and `AlternateV`) and use that in this loop? This is minor though, and if you don't prefer doing this then I'm fine with it. sanjoy: This will insert a redundant PHI if the first PHI in `Succ` only has one correct incoming value…

		for (auto I = Succ->begin(); isa<PHINode>(I); ++I)
		sanjoyUnsubmitted Done Reply Inline Actions These are suggestions, I'm not a 100% sure that these will actually make the code more readable: Extract out a `BasicBlock OtherIncomingBlock` variable Put an assert in the loop verifying that there are only two incoming blocks (or remove the loop and have an if/else in its place). IOW, make it obvious that we're not dealing with blocks with an arbitrary number of preds. sanjoy:* These are suggestions, I'm not a 100% sure that these will actually make the code more readable…
		if (cast<PHINode>(I)->getIncomingValueForBlock(BB) == V) {
		PHI = cast<PHINode>(I);
		if (!AlternativeV)
		hfinkelUnsubmitted Done Reply Inline Actions This seems somewhat dangerous as an assertion, unless you specifically check it as a precondition somewhere. Otherwise, it can be violated in non-obvious ways (the block could have its address taken, for example). hfinkel: This seems somewhat dangerous as an assertion, unless you specifically check it as a…
		break;

		assert(std::distance(pred_begin(Succ), pred_end(Succ)) == 2);
		auto PredI = pred_begin(Succ);
		BasicBlock OtherPredBB = PredI == BB ? ++PredI : PredI;
		if (PHI->getIncomingValueForBlock(OtherPredBB) == AlternativeV)
		break;
		PHI = nullptr;
		}
		if (PHI)
		return PHI;

		PHI = PHINode::Create(V->getType(), 2, "simplifycfg.merge", Succ->begin());
		PHI->addIncoming(V, BB);
		for (BasicBlock *PredBB : predecessors(Succ))
		if (PredBB != BB)
		PHI->addIncoming(AlternativeV ? AlternativeV : UndefValue::get(V->getType()),
		PredBB);
		return PHI;
		reamesUnsubmitted Done Reply Inline Actions Is there a simple form of this which doesn't require anything other than trivial AA? If so, I'd start with that, get the transform working and in, then generalize. reames: Is there a simple form of this which doesn't require anything other than trivial AA? If so…
		}

		static bool mergeConditionalStoreToAddress(BasicBlock PTB, BasicBlock PFB,
		BasicBlock QTB, BasicBlock QFB,
		BasicBlock PostBB, Value Address,
		bool InvertPCond, bool InvertQCond) {
		auto IsaBitcastOfPointerType = [](const Instruction &I) {
		sanjoyUnsubmitted Done Reply Inline Actions Fix the wrapping here. sanjoy: Fix the wrapping here.
		return Operator::getOpcode(&I) == Instruction::BitCast &&
		I.getType()->isPointerTy();
		sanjoyUnsubmitted Done Reply Inline Actions I think you need `QStore->getPointerOperand()->getType()` here. All `StoreInst` s have the type `void`. sanjoy: I think you need `QStore->getPointerOperand()->getType()` here. All `StoreInst` s have the…
		jmolloyAuthorUnsubmitted Not Done Reply Inline Actions Ouch! good catch, thanks! jmolloy: Ouch! good catch, thanks!
		};
		sanjoyUnsubmitted Not Done Reply Inline Actions Why do you need to check the types? Aren't they both stores to `Address`? sanjoy: Why do you need to check the types? Aren't they both stores to `Address`?

		// If we're not in aggressive mode, we only optimize if we have some
		// confidence that by optimizing we'll allow P and/or Q to be if-converted.
		auto IsWorthwhile = [&](BasicBlock *BB) {
		if (!BB)
		return true;
		hfinkelUnsubmitted Done Reply Inline Actions You also need to skip debug intrinsics. Bitcasts. but only on pointer types, are probably also a good idea. hfinkel: You also need to skip debug intrinsics. Bitcasts. but only on pointer types, are probably also…
		// Heuristic: if the block can be if-converted/phi-folded and the
		sanjoyUnsubmitted Not Done Reply Inline Actions When you say "if-converted" what transform are you talking about specifically? Is it `CodeGen/EarlyIfConversion.cpp` or `CodeGen/IfConversion.cpp` or something else? sanjoy: When you say "if-converted" what transform are you talking about specifically? Is it…
		jmolloyAuthorUnsubmitted Not Done Reply Inline Actions I'm talking about the if-conversion done by SimplifyCFG itself. SimplifyCFG refers to this as PHI node folding (see -phi-node-folding-threshold). jmolloy: I'm talking about the if-conversion done by SimplifyCFG itself. SimplifyCFG refers to this as…
		// instructions inside are all cheap (arithmetic/GEPs), it's worthwhile to
		// thread this store.
		sanjoyUnsubmitted Done Reply Inline Actions If you move this check on the block size to before the calls to `findUniqueStoreInBlocks` then you'll know in `findUniqueStoreInBlocks` that the blocks are guaranteed to be small (assuming `MergeCondStoresAggressively` is `false`) and scanning through all of the instructions in the blocks won't hurt much. sanjoy: If you move this check on the block size to before the calls to `findUniqueStoreInBlocks` then…
		if (BB->size() > PHINodeFoldingThreshold)
		return false;
		for (auto &I : *BB)
		sanjoyUnsubmitted Not Done Reply Inline Actions Might be better to have this as a whitelist -- "all instructions that do not touch memory and the stores we already know about" or something like that. sanjoy: Might be better to have this as a whitelist -- "all instructions that do not touch memory and…
		jmolloyAuthorUnsubmitted Not Done Reply Inline Actions This kind of already is a whitelist, is it not? We're saying that if there's any instruction that isn't one of those types, it's not worthwhile. Those are explicitly the instructions that we believe can be if-converted. StoreInsts don't matter because we know (from findUniqueStoreInBlocks) that there can only be zero or one store. jmolloy: This kind of already is a whitelist, is it not? We're saying that if there's any instruction…
		if (!isa<BinaryOperator>(I) && !isa<GetElementPtrInst>(I) &&
		!isa<StoreInst>(I) && !isa<TerminatorInst>(I) &&
		!isa<DbgInfoIntrinsic>(I) && !IsaBitcastOfPointerType(I))
		return false;
		return true;
		};

		reamesUnsubmitted Not Done Reply Inline Actions There's a good chance that pstore isn't from the conditional block at all. You might want to check dominance. Optionally, you can speculate pstore out of the conditional block is it's safe to speculate. reames: There's a good chance that pstore isn't from the conditional block at all. You might want to…
		if (!MergeCondStoresAggressively && (!IsWorthwhile(PTB) \|\|
		!IsWorthwhile(PFB) \|\|
		!IsWorthwhile(QTB) \|\|
		!IsWorthwhile(QFB)))
		jmolloyAuthorUnsubmitted Not Done Reply Inline Actions Actually, PStore must come from a conditional block (see line 2410). jmolloy: Actually, PStore must come from a conditional block (see line 2410).
		return false;

		// For every pointer, there must be exactly two stores, one coming from
		// PTB or PFB, and the other from QTB or QFB. We don't support more than one
		// store (to any address) in PTB,PFB or QTB,QFB.
		// FIXME: We could relax this restriction with a bit more work and performance
		// testing.
		StoreInst *PStore = findUniqueStoreInBlocks(PTB, PFB);
		StoreInst *QStore = findUniqueStoreInBlocks(QTB, QFB);
		if (!PStore \|\| !QStore)
		sanjoyUnsubmitted Done Reply Inline Actions What about things like `AtomicRMWInst`, `AtomicCmpXchgInst` and `FenceInst`? I think you're better off querying `Instruction::mayReadOrWriteMemory`. sanjoy: What about things like `AtomicRMWInst`, `AtomicCmpXchgInst` and `FenceInst`? I think you're…
		sanjoyUnsubmitted Done Reply Inline Actions I'm not sure what the lambda adds here -- why not directly call `mayReadOrWriteMemory`? sanjoy: I'm not sure what the lambda adds here -- why not directly call `mayReadOrWriteMemory`?
		return false;

		// Now check the stores are compatible.
		if (!QStore->isUnordered() \|\| !PStore->isUnordered())
		return false;

		// Check that sinking the store won't cause program behavior changes. Sinking
		// the store out of the Q blocks won't change any behavior as we're sinking
		// from a block to its unconditional successor. But we're moving a store from
		// the P blocks down through the middle block (QBI) and past both QFB and QTB.
		// So we need to check that there are no aliasing loads or stores in
		reamesUnsubmitted Done Reply Inline Actions This long wall of code is hard to follow. Please use some well named helper functions. reames: This long wall of code is hard to follow. Please use some well named helper functions.
		// QBI, QTB and QFB. We also need to check there are no conflicting memory
		// operations between PStore and the end of its parent block.
		//
		// The ideal way to do this is to query AliasAnalysis, but we don't
		// preserve AA currently so that is dangerous. Be super safe and just
		// check there are no other memory operations at all.
		for (auto &I : *QFB->getSinglePredecessor())
		if (I.mayReadOrWriteMemory())
		return false;
		for (auto &I : *QFB)
		if (&I != QStore && I.mayReadOrWriteMemory())
		return false;
		if (QTB)
		for (auto &I : *QTB)
		if (&I != QStore && I.mayReadOrWriteMemory())
		sanjoyUnsubmitted Done Reply Inline Actions Why do you need to do this? The only uses of TB I see are the `XStore->getParent() == XTB` checks; and they should be fine with a `nullptr` `XTB`, no? sanjoy: Why do you need to do this? The only uses of TB I see are the `XStore->getParent() == XTB`…
		return false;
		for (auto I = BasicBlock::iterator(PStore), E = PStore->getParent()->end();
		I != E; ++I)
		if (&*I != PStore && I->mayReadOrWriteMemory())
		return false;

		// OK, we're going to sink the stores to PostBB. The store has to be
		// conditional though, so first create the predicate.
		Value *PCond = cast<BranchInst>(PFB->getSinglePredecessor()->getTerminator())
		->getCondition();
		Value *QCond = cast<BranchInst>(QFB->getSinglePredecessor()->getTerminator())
		sanjoyUnsubmitted Done Reply Inline Actions This one isn't used. sanjoy: This one isn't used.
		->getCondition();

		Value *PPHI = ensureValueAvailableInSuccessor(PStore->getValueOperand(),
		PStore->getParent());
		Value *QPHI = ensureValueAvailableInSuccessor(QStore->getValueOperand(),
		QStore->getParent(), PPHI);

		IRBuilder<> QB(PostBB->getFirstInsertionPt());

		Value *PPred = PStore->getParent() == PTB ? PCond : QB.CreateNot(PCond);
		Value *QPred = QStore->getParent() == QTB ? QCond : QB.CreateNot(QCond);

		hfinkelUnsubmitted Not Done Reply Inline Actions You're not preserving here any of the (aliasing) metadata that might have been present on the stores. hfinkel: You're not preserving here any of the (aliasing) metadata that might have been present on the…
		if (InvertPCond)
		PPred = QB.CreateNot(PPred);
		if (InvertQCond)
		QPred = QB.CreateNot(QPred);
		Value *CombinedPred = QB.CreateOr(PPred, QPred);

		reamesUnsubmitted Not Done Reply Inline Actions One think you might consider: are there cases where we can insert a store down the other path? Doing so in general is clearly a violation of both dereferenceability and the memory model, but what about a conditional store to a dereferenceable location which is known to be thread local? I ran across a similar case in LICM's store promotion which I'm thinking about implementing since it would really help one of my benchmarks. Just throwing out the idea in case you find it helpful. reames: One think you might consider: are there cases where we can insert a store down the other path?
		auto *T = SplitBlockAndInsertIfThen(CombinedPred, QB.GetInsertPoint(), false);
		QB.SetInsertPoint(T);
		StoreInst *SI = cast<StoreInst>(QB.CreateStore(QPHI, Address));
		AAMDNodes AAMD;
		jmolloyAuthorUnsubmitted Not Done Reply Inline Actions This is a good possible extension to this optimization! Probably best to wait until it's gone in in its current, more simple form though. jmolloy: This is a good possible extension to this optimization! Probably best to wait until it's gone…
		PStore->getAAMetadata(AAMD, /Merge=/false);
		PStore->getAAMetadata(AAMD, /Merge=/true);
		SI->setAAMetadata(AAMD);

		QStore->eraseFromParent();
		PStore->eraseFromParent();

		return true;
		sanjoyUnsubmitted Not Done Reply Inline Actions However, this is a pessimization if both conditions are always false, and the resulting code does not simplify further, right? sanjoy: However, this is a pessimization if both conditions are always false, and the resulting code…
		jmolloyAuthorUnsubmitted Not Done Reply Inline Actions In that case, yes it is. The important part is "if the code doesn't simplify further" though - even if both conditions are always false, if we can if-convert this is very likely to be a win. The heuristics at the moment are trying to catch cases where we know we can if-convert. jmolloy: In that case, yes it is. The important part is "if the code doesn't simplify further" though…
		}

		static bool mergeConditionalStores(BranchInst PBI, BranchInst QBI) {
		// The intention here is to find diamonds or triangles (see below) where each
		// conditional block contains a store to the same address. Both of these
		// stores are conditional, so they can't be unconditionally sunk. But it may
		// be profitable to speculatively sink the stores into one merged store at the
		// end, and predicate the merged store on the union of the two conditions of
		// PBI and QBI.
		//
		// This can reduce the number of stores executed if both of the conditions are
		// true, and can allow the blocks to become small enough to be if-converted.
		// This optimization will also chain, so that ladders of test-and-set
		// sequences can be if-converted away.
		//
		// We only deal with simple diamonds or triangles:
		//
		// PBI or PBI or a combination of the two
		// / \ \| \
		// PTB PFB \| PFB
		// \ / \| /
		// QBI QBI
		// / \ \| \
		// QTB QFB \| QFB
		// \ / \| /
		// PostBB PostBB
		//
		sanjoyUnsubmitted Done Reply Inline Actions Minor nit: I'd use `InvertPCond` as "swapping" a predicate means something slightly different in LLVM. sanjoy: Minor nit: I'd use `InvertPCond` as "swapping" a predicate means something slightly different…
		// We model triangles as a type of diamond with a nullptr "true" block.
		// Triangles are canonicalized so that the fallthrough edge is represented by
		// a true condition, as in the diagram above.
		//
		BasicBlock *PTB = PBI->getSuccessor(0);
		BasicBlock *PFB = PBI->getSuccessor(1);
		BasicBlock *QTB = QBI->getSuccessor(0);
		BasicBlock *QFB = QBI->getSuccessor(1);
		BasicBlock *PostBB = QFB->getSingleSuccessor();

		bool InvertPCond = false, InvertQCond = false;
		// Canonicalize fallthroughs to the true branches.
		if (PFB == QBI->getParent()) {
		std::swap(PFB, PTB);
		sanjoyUnsubmitted Done Reply Inline Actions Nit: "dominating" sanjoy: Nit: "dominating"
		InvertPCond = true;
		reamesUnsubmitted Not Done Reply Inline Actions Spacing. Clang-format? reames: Spacing. Clang-format?
		}
		if (QFB == PostBB) {
		std::swap(QFB, QTB);
		reamesUnsubmitted Done Reply Inline Actions This chain of checks is hard to read. Could it be restructured along with the above code to make it simpler? reames: This chain of checks is hard to read. Could it be restructured along with the above code to…
		InvertQCond = true;
		}

		// From this point on we can assume PTB or QTB may be fallthroughs but PFB
		sanjoyUnsubmitted Done Reply Inline Actions Why do you need to check `BB` for `nullptr`? sanjoy: Why do you need to check `BB` for `nullptr`?
		jmolloyAuthorUnsubmitted Not Done Reply Inline Actions I don't. Removed. jmolloy: I don't. Removed.
		// and QFB may not. Model fallthroughs as a nullptr block.
		if (PTB == QBI->getParent())
		PTB = nullptr;
		if (QTB == PostBB)
		QTB = nullptr;
		reamesUnsubmitted Done Reply Inline Actions Some of these are legality, some are profitability, some are implementation limits. Please separate and comment each class. reames: Some of these are legality, some are profitability, some are implementation limits. Please…

		// Legality bailouts. We must have at least the non-fallthrough blocks and
		// the post-dominating block, and the non-fallthroughs must only have one
		// predecessor.
		auto HasOnePredAndOneSucc = [](BasicBlock BB, BasicBlock P, BasicBlock *S) {
		return BB->getSinglePredecessor() == P &&
		BB->getSingleSuccessor() == S;
		};
		if (!PostBB \|\|
		!HasOnePredAndOneSucc(PFB, PBI->getParent(), QBI->getParent()) \|\|
		!HasOnePredAndOneSucc(QFB, QBI->getParent(), PostBB))
		sanjoyUnsubmitted Not Done Reply Inline Actions You can shorten this using initializer lists: for (auto BB : { PTB, PFB }) { if (!BB) continue; for (auto &I : BB) if (StoreInst SI = ... and possibly even more with initializer lists of `std::pair`. sanjoy:* You can shorten this using initializer lists: ``` for (auto *BB : { PTB, PFB }) { if (!BB)…
		jmolloyAuthorUnsubmitted Not Done Reply Inline Actions Unfortunately type deduction wasn't clever enough to work out an initializer list of std::pair (and it turns out the std::pair elements end up being const too), so I've just done the simpler transform. jmolloy: Unfortunately type deduction wasn't clever enough to work out an initializer list of std::pair…
		return false;
		if ((PTB && !HasOnePredAndOneSucc(PTB, PBI->getParent(), QBI->getParent())) \|\|
		(QTB && !HasOnePredAndOneSucc(QTB, QBI->getParent(), PostBB)))
		return false;
		if (PostBB->getNumUses() != 2 \|\| QBI->getParent()->getNumUses() != 2)
		return false;

		// OK, this is a sequence of two diamonds or triangles.
		// Check if there are stores in PTB or PFB that are repeated in QTB or QFB.
		SmallPtrSet<Value *,4> PStoreAddresses, QStoreAddresses;
		for (auto *BB : {PTB, PFB}) {
		if (!BB)
		continue;
		for (auto &I : *BB)
		if (StoreInst *SI = dyn_cast<StoreInst>(&I))
		sanjoyUnsubmitted Done Reply Inline Actions Nit: spacing should be `SmallVector<Value , 4>`. sanjoy:* Nit: spacing should be `SmallVector<Value *, 4>`.
		PStoreAddresses.insert(SI->getPointerOperand());
		sanjoyUnsubmitted Done Reply Inline Actions I'm not sure this is correct -- `std::set_intersection` expects both the ranges to be sorted, and I don't think `SmallPtrSet` is guaranteed to iterate over values in sorted order. I think it is best to use `llvm::set_intersect` here. sanjoy: I'm not sure this is correct -- `std::set_intersection` expects both the ranges to be sorted…
		}
		for (auto *BB : {QTB, QFB}) {
		if (!BB)
		continue;
		for (auto &I : *BB)
		if (StoreInst *SI = dyn_cast<StoreInst>(&I))
		QStoreAddresses.insert(SI->getPointerOperand());
		}

		set_intersect(PStoreAddresses, QStoreAddresses);
		// set_intersect mutates PStoreAddresses in place. Rename it here to make it
		// clear what it contains.
		auto &CommonAddresses = PStoreAddresses;

		bool Changed = false;
		for (auto *Address : CommonAddresses)
		Changed \|= mergeConditionalStoreToAddress(
		PTB, PFB, QTB, QFB, PostBB, Address, InvertPCond, InvertQCond);
		return Changed;
		}

/// If we have a conditional branch as a predecessor of another block,		/// If we have a conditional branch as a predecessor of another block,
/// this function tries to simplify it. We know		/// this function tries to simplify it. We know
/// that PBI and BI are both conditional branches, and BI is in one of the		/// that PBI and BI are both conditional branches, and BI is in one of the
/// successor blocks of PBI - PBI branches to BI.		/// successor blocks of PBI - PBI branches to BI.
static bool SimplifyCondBranchToCondBranch(BranchInst PBI, BranchInst BI,		static bool SimplifyCondBranchToCondBranch(BranchInst PBI, BranchInst BI,
const DataLayout &DL) {		const DataLayout &DL) {
assert(PBI->isConditional() && BI->isConditional());		assert(PBI->isConditional() && BI->isConditional());
BasicBlock *BB = BI->getParent();		BasicBlock *BB = BI->getParent();
Show All 26 Lines	if (BlockIsSimpleEnoughToThreadThrough(BB)) {
// Any predecessor where the condition is not computable we keep symbolic.		// Any predecessor where the condition is not computable we keep symbolic.
for (pred_iterator PI = PB; PI != PE; ++PI) {		for (pred_iterator PI = PB; PI != PE; ++PI) {
BasicBlock P = PI;		BasicBlock P = PI;
if ((PBI = dyn_cast<BranchInst>(P->getTerminator())) &&		if ((PBI = dyn_cast<BranchInst>(P->getTerminator())) &&
PBI != BI && PBI->isConditional() &&		PBI != BI && PBI->isConditional() &&
PBI->getCondition() == BI->getCondition() &&		PBI->getCondition() == BI->getCondition() &&
PBI->getSuccessor(0) != PBI->getSuccessor(1)) {		PBI->getSuccessor(0) != PBI->getSuccessor(1)) {
bool CondIsTrue = PBI->getSuccessor(0) == BB;		bool CondIsTrue = PBI->getSuccessor(0) == BB;
NewPN->addIncoming(ConstantInt::get(Type::getInt1Ty(BB->getContext()),		NewPN->addIncoming(ConstantInt::get(Type::getInt1Ty(BB->getContext()),
		sanjoyUnsubmitted Not Done Reply Inline Actions Wrap the line? Actually, I'll just assume you'll run clang-format before checkin. :) sanjoy: Wrap the line? Actually, I'll just assume you'll run clang-format before checkin. :)
CondIsTrue), P);		CondIsTrue), P);
} else {		} else {
NewPN->addIncoming(BI->getCondition(), P);		NewPN->addIncoming(BI->getCondition(), P);
}		}
}		}

BI->setCondition(NewPN);		BI->setCondition(NewPN);
return true;		return true;
}		}
}		}

		jmolloyAuthorUnsubmitted Not Done Reply Inline Actions I've been selectively running clang-format on the bits I've touched, and missed this bit :( jmolloy: I've been selectively running clang-format on the bits I've touched, and missed this bit :(
if (auto *CE = dyn_cast<ConstantExpr>(BI->getCondition()))		if (auto *CE = dyn_cast<ConstantExpr>(BI->getCondition()))
if (CE->canTrap())		if (CE->canTrap())
return false;		return false;

		// If both branches are conditional and both contain stores to the same
		// address, remove the stores from the conditionals and create a conditional
		// merged store at the end.
		if (MergeCondStores && mergeConditionalStores(PBI, BI))
		return true;

// If this is a conditional branch in an empty block, and if any		// If this is a conditional branch in an empty block, and if any
// predecessors are a conditional branch to one of our destinations,		// predecessors are a conditional branch to one of our destinations,
// fold the conditions into logical ops and one cond br.		// fold the conditions into logical ops and one cond br.
BasicBlock::iterator BBI = BB->begin();		BasicBlock::iterator BBI = BB->begin();
// Ignore dbg intrinsics.		// Ignore dbg intrinsics.
while (isa<DbgInfoIntrinsic>(BBI))		while (isa<DbgInfoIntrinsic>(BBI))
++BBI;		++BBI;
if (&*BBI != BI)		if (&*BBI != BI)
▲ Show 20 Lines • Show All 558 Lines • ▼ Show 20 Lines	bool SimplifyCFGOpt::SimplifyCleanupReturn(CleanupReturnInst *RI) {
// are both EH pads).		// are both EH pads).
if (UnwindDest) {		if (UnwindDest) {
// First, go through the PHI nodes in UnwindDest and update any nodes that		// First, go through the PHI nodes in UnwindDest and update any nodes that
// reference the block we are removing		// reference the block we are removing
for (BasicBlock::iterator I = UnwindDest->begin(),		for (BasicBlock::iterator I = UnwindDest->begin(),
IE = UnwindDest->getFirstNonPHI()->getIterator();		IE = UnwindDest->getFirstNonPHI()->getIterator();
I != IE; ++I) {		I != IE; ++I) {
PHINode *DestPN = cast<PHINode>(I);		PHINode *DestPN = cast<PHINode>(I);

int Idx = DestPN->getBasicBlockIndex(BB);		int Idx = DestPN->getBasicBlockIndex(BB);
// Since BB unwinds to UnwindDest, it has to be in the PHI node.		// Since BB unwinds to UnwindDest, it has to be in the PHI node.
assert(Idx != -1);		assert(Idx != -1);
// This PHI node has an incoming value that corresponds to a control		// This PHI node has an incoming value that corresponds to a control
// path through the cleanup pad we are removing. If the incoming		// path through the cleanup pad we are removing. If the incoming
// value is in the cleanup pad, it must be a PHINode (because we		// value is in the cleanup pad, it must be a PHINode (because we
// verified above that the block is otherwise empty). Otherwise, the		// verified above that the block is otherwise empty). Otherwise, the
// value is either a constant or a value that dominates the cleanup		// value is either a constant or a value that dominates the cleanup
// pad being removed.		// pad being removed.
//		//
// Because BB and UnwindDest are both EH pads, all of their		// Because BB and UnwindDest are both EH pads, all of their
// predecessors must unwind to these blocks, and since no instruction		// predecessors must unwind to these blocks, and since no instruction
// can have multiple unwind destinations, there will be no overlap in		// can have multiple unwind destinations, there will be no overlap in
// incoming blocks between SrcPN and DestPN.		// incoming blocks between SrcPN and DestPN.
Value *SrcVal = DestPN->getIncomingValue(Idx);		Value *SrcVal = DestPN->getIncomingValue(Idx);
PHINode *SrcPN = dyn_cast<PHINode>(SrcVal);		PHINode *SrcPN = dyn_cast<PHINode>(SrcVal);

// Remove the entry for the block we are deleting.		// Remove the entry for the block we are deleting.
DestPN->removeIncomingValue(Idx, false);		DestPN->removeIncomingValue(Idx, false);

if (SrcPN && SrcPN->getParent() == BB) {		if (SrcPN && SrcPN->getParent() == BB) {
// If the incoming value was a PHI node in the cleanup pad we are		// If the incoming value was a PHI node in the cleanup pad we are
// removing, we need to merge that PHI node's incoming values into		// removing, we need to merge that PHI node's incoming values into
// DestPN.		// DestPN.
for (unsigned SrcIdx = 0, SrcE = SrcPN->getNumIncomingValues();		for (unsigned SrcIdx = 0, SrcE = SrcPN->getNumIncomingValues();
SrcIdx != SrcE; ++SrcIdx) {		SrcIdx != SrcE; ++SrcIdx) {
DestPN->addIncoming(SrcPN->getIncomingValue(SrcIdx),		DestPN->addIncoming(SrcPN->getIncomingValue(SrcIdx),
SrcPN->getIncomingBlock(SrcIdx));		SrcPN->getIncomingBlock(SrcIdx));
}		}
} else {		} else {
// Otherwise, the incoming value came from above BB and		// Otherwise, the incoming value came from above BB and
// so we can just reuse it. We must associate all of BB's		// so we can just reuse it. We must associate all of BB's
// predecessors with this value.		// predecessors with this value.
▲ Show 20 Lines • Show All 321 Lines • ▼ Show 20 Lines	static bool TurnSwitchRangeIntoICmp(SwitchInst *SI, IRBuilder<> &Builder) {
return true;		return true;
}		}

/// Compute masked bits for the condition of a switch		/// Compute masked bits for the condition of a switch
/// and use it to remove dead cases.		/// and use it to remove dead cases.
static bool EliminateDeadSwitchCases(SwitchInst SI, AssumptionCache AC,		static bool EliminateDeadSwitchCases(SwitchInst SI, AssumptionCache AC,
const DataLayout &DL) {		const DataLayout &DL) {
Value *Cond = SI->getCondition();		Value *Cond = SI->getCondition();
unsigned Bits = Cond->getType()->getIntegerBitWidth();		unsigned Bits = Cond->getType()->getIntegerBitWidth();
		sanjoyUnsubmitted Not Done Reply Inline Actions Unrelated whitespace changes? sanjoy: Unrelated whitespace changes?
APInt KnownZero(Bits, 0), KnownOne(Bits, 0);		APInt KnownZero(Bits, 0), KnownOne(Bits, 0);
computeKnownBits(Cond, KnownZero, KnownOne, DL, 0, AC, SI);		computeKnownBits(Cond, KnownZero, KnownOne, DL, 0, AC, SI);

// Gather dead cases.		// Gather dead cases.
SmallVector<ConstantInt*, 8> DeadCases;		SmallVector<ConstantInt*, 8> DeadCases;
for (SwitchInst::CaseIt I = SI->case_begin(), E = SI->case_end(); I != E; ++I) {		for (SwitchInst::CaseIt I = SI->case_begin(), E = SI->case_end(); I != E; ++I) {
if ((I.getCaseValue()->getValue() & KnownZero) != 0 \|\|		if ((I.getCaseValue()->getValue() & KnownZero) != 0 \|\|
(I.getCaseValue()->getValue() & KnownOne) != KnownOne) {		(I.getCaseValue()->getValue() & KnownOne) != KnownOne) {
▲ Show 20 Lines • Show All 764 Lines • ▼ Show 20 Lines	static void reuseTableCompare(User PhiUser, BasicBlock PhiBlock,
for (auto ValuePair : Values) {		for (auto ValuePair : Values) {
Constant *CaseConst = ConstantExpr::getICmp(CmpInst->getPredicate(),		Constant *CaseConst = ConstantExpr::getICmp(CmpInst->getPredicate(),
ValuePair.second, CmpOp1, true);		ValuePair.second, CmpOp1, true);
if (!CaseConst \|\| CaseConst == DefaultConst)		if (!CaseConst \|\| CaseConst == DefaultConst)
return;		return;
assert((CaseConst == TrueConst \|\| CaseConst == FalseConst) &&		assert((CaseConst == TrueConst \|\| CaseConst == FalseConst) &&
"Expect true or false as compare result.");		"Expect true or false as compare result.");
}		}

// Check if the branch instruction dominates the phi node. It's a simple		// Check if the branch instruction dominates the phi node. It's a simple
// dominance check, but sufficient for our needs.		// dominance check, but sufficient for our needs.
// Although this check is invariant in the calling loops, it's better to do it		// Although this check is invariant in the calling loops, it's better to do it
// at this late stage. Practically we do it at most once for a switch.		// at this late stage. Practically we do it at most once for a switch.
BasicBlock *BranchBlock = RangeCheckBranch->getParent();		BasicBlock *BranchBlock = RangeCheckBranch->getParent();
for (auto PI = pred_begin(PhiBlock), E = pred_end(PhiBlock); PI != E; ++PI) {		for (auto PI = pred_begin(PhiBlock), E = pred_end(PhiBlock); PI != E; ++PI) {
BasicBlock Pred = PI;		BasicBlock Pred = PI;
if (Pred != BranchBlock && Pred->getUniquePredecessor() != BranchBlock)		if (Pred != BranchBlock && Pred->getUniquePredecessor() != BranchBlock)
▲ Show 20 Lines • Show All 448 Lines • ▼ Show 20 Lines	bool SimplifyCFGOpt::SimplifyUncondBranch(BranchInst *BI, IRBuilder<> &Builder){
// branches to us and our successor, fold the comparison into the		// branches to us and our successor, fold the comparison into the
// predecessor and use logical operations to update the incoming value		// predecessor and use logical operations to update the incoming value
// for PHI nodes in common successor.		// for PHI nodes in common successor.
if (FoldBranchToCommonDest(BI, BonusInstThreshold))		if (FoldBranchToCommonDest(BI, BonusInstThreshold))
return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;		return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;
return false;		return false;
}		}

		static BasicBlock allPredecessorsComeFromSameSource(BasicBlock BB) {
		BasicBlock *PredPred = nullptr;
		for (auto *P : predecessors(BB)) {
		BasicBlock *PPred = P->getSinglePredecessor();
		if (!PPred \|\| (PredPred && PredPred != PPred))
		return nullptr;
		PredPred = PPred;
		}
		return PredPred;
		}

bool SimplifyCFGOpt::SimplifyCondBranch(BranchInst *BI, IRBuilder<> &Builder) {		bool SimplifyCFGOpt::SimplifyCondBranch(BranchInst *BI, IRBuilder<> &Builder) {
BasicBlock *BB = BI->getParent();		BasicBlock *BB = BI->getParent();

// Conditional branch		// Conditional branch
if (isValueEqualityComparison(BI)) {		if (isValueEqualityComparison(BI)) {
// If we only have one predecessor, and if it is a branch on this value,		// If we only have one predecessor, and if it is a branch on this value,
// see if that predecessor totally determines the outcome of this		// see if that predecessor totally determines the outcome of this
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	bool SimplifyCFGOpt::SimplifyCondBranch(BranchInst *BI, IRBuilder<> &Builder) {

// Scan predecessor blocks for conditional branches.		// Scan predecessor blocks for conditional branches.
for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI)		for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI)
if (BranchInst PBI = dyn_cast<BranchInst>((PI)->getTerminator()))		if (BranchInst PBI = dyn_cast<BranchInst>((PI)->getTerminator()))
if (PBI != BI && PBI->isConditional())		if (PBI != BI && PBI->isConditional())
if (SimplifyCondBranchToCondBranch(PBI, BI, DL))		if (SimplifyCondBranchToCondBranch(PBI, BI, DL))
return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;		return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;

		// Look for diamond patterns.
		if (MergeCondStores)
		if (BasicBlock *PrevBB = allPredecessorsComeFromSameSource(BB))
		if (BranchInst *PBI = dyn_cast<BranchInst>(PrevBB->getTerminator()))
		if (PBI != BI && PBI->isConditional())
		if (mergeConditionalStores(PBI, BI))
		return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;

return false;		return false;
}		}

/// Check if passing a value to an instruction will cause undefined behavior.		/// Check if passing a value to an instruction will cause undefined behavior.
static bool passingValueIsAlwaysUndefined(Value V, Instruction I) {		static bool passingValueIsAlwaysUndefined(Value V, Instruction I) {
Constant *C = dyn_cast<Constant>(V);		Constant *C = dyn_cast<Constant>(V);
if (!C)		if (!C)
return false;		return false;
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	if (BranchInst *BI = dyn_cast<BranchInst>(BB->getTerminator())) {
if (SimplifyCleanupReturn(RI)) return true;		if (SimplifyCleanupReturn(RI)) return true;
} else if (SwitchInst *SI = dyn_cast<SwitchInst>(BB->getTerminator())) {		} else if (SwitchInst *SI = dyn_cast<SwitchInst>(BB->getTerminator())) {
if (SimplifySwitch(SI, Builder)) return true;		if (SimplifySwitch(SI, Builder)) return true;
} else if (UnreachableInst *UI =		} else if (UnreachableInst *UI =
dyn_cast<UnreachableInst>(BB->getTerminator())) {		dyn_cast<UnreachableInst>(BB->getTerminator())) {
if (SimplifyUnreachable(UI)) return true;		if (SimplifyUnreachable(UI)) return true;
} else if (IndirectBrInst *IBI =		} else if (IndirectBrInst *IBI =
dyn_cast<IndirectBrInst>(BB->getTerminator())) {		dyn_cast<IndirectBrInst>(BB->getTerminator())) {
if (SimplifyIndirectBr(IBI)) return true;		if (SimplifyIndirectBr(IBI)) return true;
		sanjoyUnsubmitted Not Done Reply Inline Actions Whitespace damage? sanjoy: Whitespace damage?
}		}

return Changed;		return Changed;
}		}

/// This function is used to do simplification of a CFG.		/// This function is used to do simplification of a CFG.
/// For example, it adjusts branches to branches to eliminate the extra hop,		/// For example, it adjusts branches to branches to eliminate the extra hop,
/// eliminates unreachable basic blocks, and does other "peephole" optimization		/// eliminates unreachable basic blocks, and does other "peephole" optimization
/// of the CFG. It returns true if a modification was made.		/// of the CFG. It returns true if a modification was made.
///		///
bool llvm::SimplifyCFG(BasicBlock *BB, const TargetTransformInfo &TTI,		bool llvm::SimplifyCFG(BasicBlock *BB, const TargetTransformInfo &TTI,
		jmolloyAuthorUnsubmitted Not Done Reply Inline Actions Yes, this was me undoing a change and blindingly running clang-format. More below, where clang-format has ripped trailing whitespace away :( jmolloy: Yes, this was me undoing a change and blindingly running clang-format. More below, where clang…
unsigned BonusInstThreshold, AssumptionCache *AC) {		unsigned BonusInstThreshold, AssumptionCache *AC) {
return SimplifyCFGOpt(TTI, BB->getModule()->getDataLayout(),		return SimplifyCFGOpt(TTI, BB->getModule()->getDataLayout(),
BonusInstThreshold, AC).run(BB);		BonusInstThreshold, AC).run(BB);
}		}

test/Transforms/SimplifyCFG/merge-cond-stores.ll

This file was added.

				; RUN: opt -simplifycfg -instcombine < %s -simplifycfg-merge-cond-stores=true -simplifycfg-merge-cond-stores-aggressively=false -phi-node-folding-threshold=2 -S \| FileCheck %s

				; CHECK-LABEL: @test_simple
				; This test should succeed and end up if-converted.
				; CHECK: icmp eq i32 %b, 0
				; CHECK-NEXT: icmp ne i32 %a, 0
				; CHECK-NEXT: xor i1 %x2, true
				; CHECK-NEXT: %[[x:.]] = or i1 %{{.}}, %{{.*}}
				; CHECK-NEXT: br i1 %[[x]]
				; CHECK: store
				; CHECK-NOT: store
				; CHECK: ret
				define void @test_simple(i32* %p, i32 %a, i32 %b) {
				entry:
				%x1 = icmp eq i32 %a, 0
				br i1 %x1, label %fallthrough, label %yes1

				yes1:
				store i32 0, i32* %p
				br label %fallthrough

				fallthrough:
				%x2 = icmp eq i32 %b, 0
				br i1 %x2, label %end, label %yes2

				yes2:
				store i32 1, i32* %p
				br label %end

				end:
				ret void
				}

				; CHECK-LABEL: @test_recursive
				; This test should entirely fold away, leaving one large basic block.
				; CHECK: store
				; CHECK-NOT: store
				; CHECK: ret
				define void @test_recursive(i32* %p, i32 %a, i32 %b, i32 %c, i32 %d) {
				entry:
				%x1 = icmp eq i32 %a, 0
				br i1 %x1, label %fallthrough, label %yes1

				yes1:
				store i32 0, i32* %p
				br label %fallthrough

				fallthrough:
				%x2 = icmp eq i32 %b, 0
				br i1 %x2, label %next, label %yes2

				yes2:
				store i32 1, i32* %p
				br label %next

				next:
				%x3 = icmp eq i32 %c, 0
				br i1 %x3, label %fallthrough2, label %yes3

				yes3:
				store i32 2, i32* %p
				br label %fallthrough2

				fallthrough2:
				%x4 = icmp eq i32 %d, 0
				br i1 %x4, label %end, label %yes4

				yes4:
				store i32 3, i32* %p
				br label %end


				end:
				ret void
				}

				; CHECK-LABEL: @test_not_ifconverted
				; The code in each diamond is too large - it won't be if-converted so our
				; heuristics should say no.
				; CHECK: store
				; CHECK: store
				; CHECK: ret
				define void @test_not_ifconverted(i32* %p, i32 %a, i32 %b) {
				entry:
				%x1 = icmp eq i32 %a, 0
				br i1 %x1, label %fallthrough, label %yes1

				yes1:
				%y1 = or i32 %b, 55
				%y2 = add i32 %y1, 24
				%y3 = and i32 %y2, 67
				store i32 %y3, i32* %p
				br label %fallthrough

				fallthrough:
				%x2 = icmp eq i32 %b, 0
				br i1 %x2, label %end, label %yes2

				yes2:
				%z1 = or i32 %a, 55
				%z2 = add i32 %z1, 24
				%z3 = and i32 %z2, 67
				store i32 %z3, i32* %p
				br label %end

				end:
				ret void
				}

				; CHECK-LABEL: @test_aliasing1
				; The store to %p clobbers the previous store, so if-converting this would
				; be illegal.
				; CHECK: store
				; CHECK: store
				; CHECK: ret
				define void @test_aliasing1(i32* %p, i32 %a, i32 %b) {
				entry:
				%x1 = icmp eq i32 %a, 0
				br i1 %x1, label %fallthrough, label %yes1

				yes1:
				store i32 0, i32* %p
				br label %fallthrough

				fallthrough:
				%y1 = load i32, i32* %p
				%x2 = icmp eq i32 %y1, 0
				br i1 %x2, label %end, label %yes2

				yes2:
				store i32 1, i32* %p
				br label %end

				end:
				ret void
				}

				; CHECK-LABEL: @test_aliasing2
				; The load from %q aliases with %p, so if-converting this would be illegal.
				; CHECK: store
				; CHECK: store
				; CHECK: ret
				define void @test_aliasing2(i32* %p, i32* %q, i32 %a, i32 %b) {
				entry:
				%x1 = icmp eq i32 %a, 0
				br i1 %x1, label %fallthrough, label %yes1

				yes1:
				store i32 0, i32* %p
				br label %fallthrough

				fallthrough:
				%y1 = load i32, i32* %q
				%x2 = icmp eq i32 %y1, 0
				br i1 %x2, label %end, label %yes2

				yes2:
				store i32 1, i32* %p
				br label %end

				end:
				ret void
				}

				declare void @f()

				; CHECK-LABEL: @test_diamond_simple
				; This should get if-converted.
				; CHECK: store
				; CHECK-NOT: store
				; CHECK: ret
				define i32 @test_diamond_simple(i32* %p, i32* %q, i32 %a, i32 %b) {
				entry:
				%x1 = icmp eq i32 %a, 0
				br i1 %x1, label %no1, label %yes1

				yes1:
				store i32 0, i32* %p
				br label %fallthrough

				no1:
				%z1 = add i32 %a, %b
				br label %fallthrough

				fallthrough:
				%z2 = phi i32 [ %z1, %no1 ], [ 0, %yes1 ]
				%x2 = icmp eq i32 %b, 0
				br i1 %x2, label %no2, label %yes2

				yes2:
				store i32 1, i32* %p
				br label %end

				no2:
				%z3 = sub i32 %z2, %b
				br label %end

				end:
				%z4 = phi i32 [ %z3, %no2 ], [ 3, %yes2 ]
				ret i32 %z4
				}

				; CHECK-LABEL: @test_diamond_alias3
				; Now there is a call to f() in the bottom branch. The store in the first
				; branch would now be reordered with respect to the call if we if-converted,
				; so we must not.
				; CHECK: store
				; CHECK: store
				; CHECK: ret
				define i32 @test_diamond_alias3(i32* %p, i32* %q, i32 %a, i32 %b) {
				entry:
				%x1 = icmp eq i32 %a, 0
				br i1 %x1, label %no1, label %yes1

				yes1:
				store i32 0, i32* %p
				br label %fallthrough

				no1:
				call void @f()
				%z1 = add i32 %a, %b
				br label %fallthrough

				fallthrough:
				%z2 = phi i32 [ %z1, %no1 ], [ 0, %yes1 ]
				%x2 = icmp eq i32 %b, 0
				br i1 %x2, label %no2, label %yes2

				yes2:
				store i32 1, i32* %p
				br label %end

				no2:
				call void @f()
				%z3 = sub i32 %z2, %b
				br label %end

				end:
				%z4 = phi i32 [ %z3, %no2 ], [ 3, %yes2 ]
				ret i32 %z4
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Merge conditional storesAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 39194

include/llvm/ADT/SetOperations.h

lib/Transforms/Utils/SimplifyCFG.cpp

test/Transforms/SimplifyCFG/merge-cond-stores.ll

[SimplifyCFG] Merge conditional stores
AbandonedPublic