This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Analysis/
-
Analysis/
-
ValueTracking.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
dom-conditions.ll

Differential D7708

Infer known bits from dominating conditions
ClosedPublic

Authored by reames on Feb 17 2015, 3:59 PM.

Download Raw Diff

Details

Reviewers

nlewycky
hfinkel

Commits

rG1c2922714486: Infer known bits from dominating conditions
rL231879: Infer known bits from dominating conditions

Summary

This patch adds limited support in ValueTracking for inferring known bits of a value from conditional expressions which must be true to reach the instruction we're trying to optimize. At this time, the feature is off by default. Once landed, I'm hoping for feedback from others on both profitability and compile time impact.

Forms of conditional value propagation have been tried in LLVM before and have failed due to compile time problems. In an attempt to side step that, this patch only considers conditions where the edge leaving the branch dominates the context instruction. Rather than walking back along the dominator tree for every value/context instruction pair, we only consider uses of the value itself when searching for useful conditions. It's also limited to only the first level call in the known bits recursion search.

Even with these restrictions, it handles many interesting cases:

Early exits from functions
Early exits from loops (for context instructions in the loop and after the check)
Conditions which control entry into loops, including multi-version loops (such as those produced during vectorization, IRCE, loop unswitch, etc..)

Possible applications include optimizing using information provided by constructs such as: preconditions, assumptions, null checks, & range checks.

When compiling sqlite3.c (using a Release+Asserts build), I get the following timings:
-instcombine -value-tracking-dom-conditions=1: 0.788s-0.793
-instcombine -value-tracking-dom-conditions=0: 0.771-0.773
-O3 -value-tracking-dom-conditions=1: 14.993-15.391
-O3 -value-tracking-dom-conditions=0: 14.961-14.984

There is some slightly slow down in instcombine, but the difference in the O3 times are essentially noise. I measured an outlier (which I discarded in the above) of 16.408 for "-O3 -value-tracking-dom-conditions=0".

Diff Detail

Repository: rL LLVM

Event Timeline

reames updated this revision to Diff 20111.Feb 17 2015, 3:59 PM

reames retitled this revision from to Infer known bits from dominating conditions.

reames updated this object.

reames edited the test plan for this revision. (Show Details)

reames added reviewers: nlewycky, hfinkel.

reames added a subscriber: Unknown Object (MLST).

Here are the O3 numbers for a Release build (without asserts):
-O3 -value-tracking-dom-conditions=1: 11.045-11.319
-O3 -value-tracking-dom-conditions=0: 10.989-11.300

Taking the two extremes, that's about a 300 ms difference at worst. That's about 2.7%. Given the (informally measured) variance between runs, that's within the noise.

In terms of effectiveness, this allows an extra 500 inst combines (out of ~50k). The resulting IR is slightly larger, though I can't easily determine why. The biggest difference is a lot more nsw and nuw markers are being added. It appears that instcombine is now handling some cases that LVI and GVN would have previously gotten since the stats on those drop slightly. (This has the potential of making those passes more powerful due by avoiding pass ordering issues.) I do see both loop vectorizer and memcpyopt kicking in one extra time each. I have not measured execution time of sqlite.

Thanks for working on this!

I'd really like to see a compile-time comparison with an implementation that just walks up the dominator tree. It seems like that technique would be more powerful (because it could catch conditions where the variable is not a direct argument to the icmp). It might be manageable (with a cutoff for really deep trees, and noting that we can stop searching after we reach the definition of the variable itself).

Also, full-context patches please: http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface

lib/Analysis/ValueTracking.cpp
42 ↗	(On Diff #20111)	Line too long?
498 ↗	(On Diff #20111)	Please make this cutoff a command-line parameter.
510 ↗	(On Diff #20111)	Why do you care that the cmp has only one use? Why not look at all of the branches that use the cmp result?
541 ↗	(On Diff #20111)	Indeed, can't you share all of that logic that is used by the assume implementation? Although to really do that you'd have to start with the dominating branches, walking up?

Comments inline.

lib/Analysis/ValueTracking.cpp
498 ↗	(On Diff #20111)	I remember hearing somewhere that the walking the use-list in LLVM is cache-miss'y. If (a big if) that is true, then for variables with a large number of uses you're going to spend a lot of time just skimming through the use list here. To that end, does it make sense to have this cutoff in the loop where you're looping through the use list?
508 ↗	(On Diff #20111)	Nit: should be `ICmpInst *Cmp`. In general, I'd suggest just running this patch through `clang-format` before checking in.
529 ↗	(On Diff #20111)	I don't think this will work if `BI->getSuccessor(0) == BI->getSuccessor(1)`. You need to check if the `false` branch of `BI` actually branches to something other than `BB0`.
531 ↗	(On Diff #20111)	The only other place that uses `Q.DT` checks to see if it isn't `nullptr`. Is there reason to believe that we actually have the dominator tree?
533 ↗	(On Diff #20111)	Actually, it might be cleaner and more powerful to just use the `BasicBlockEdge` version of `dominates` here. `dominates` will also handle the case where `BB0` has multiple preds but `BB0` dominates all of them.
574 ↗	(On Diff #20111)	Nit: should this be 'high bits in RHS'?

Responding to Sanjoy's review comments.

lib/Analysis/ValueTracking.cpp
498 ↗	(On Diff #20111)	This hasn't been a measurable impact. I'd be a bit leery of effecting the code generation of the loop as well. Besides, once somethings in cache (from the first check), accessing it again (in the loop) is cheap.
510 ↗	(On Diff #20111)	Compile time. I want to avoid walking potentially long use lists. I'm matching specifically the common pattern of a branch controlled by a comparison right before it. See my comment to Hal about possible future extensions.
529 ↗	(On Diff #20111)	You're right. I need to account for that case. Thanks for finding this!
531 ↗	(On Diff #20111)	It's checked at the top of the function. We do nothing if we don't have a dom tree.
533 ↗	(On Diff #20111)	This is a good idea. Hadn't thought of that interface. Will definitely have to look into that.
574 ↗	(On Diff #20111)	Will fix.

To be clear, is this a request for me to evaluate this before submission? I really don't see the need given that a) the current implementation is fast, and b) this is an experimental flag.

Yes, it is. My feeling is that this is not particularly complicated, and if it is, then there's something I'm not considering properly.

To be clear, this is a very important missing capability, and I want to make sure that we don't prematurely optimize here: inconsistencies here will be *very* user visible, and I'm concerned that with your approach, small local code changes will cause drastic changes in the output by turning on and off this logic. If a more general approach is not feasible, then so be it, I'll take what we can get, but I want to make his decision quantitatively -- and I don't want to start with a design that will need to be rewritten to make forward progress.

Why do you care that the cmp has only one use? Why not look at all of the branches that use the cmp result?

We could, and may someday. Doing so involves a potential increase in
compile time. I'm deliberately going for minimal compile time impact,
even at the loss of some effectiveness. I want to be able to turn this
on by default. Most of my actual motivating cases are really simple.

Please measure this. Unless this is really needed, please take it out. Users can change codes to introduce extra uses of comparison results in many ways, and doing so should not change the optimizers ability to make use of the information.

I would strongly prefer going with the fastest implementation now and
evaluating extensions at a later time. If you think it matters and is
cheap enough, feel free to contribute a patch. :)

But I don't know that it is the fastest implementation, or how much that matters. Fast matters, but it is not the only thing that matters. Using a DT walk with a cutoff might actually be faster. Stability of optimization results to small source changes is also really important. Consistency with other optimizer capabilities is also really important.

Let's work together on this. Yes, if you don't want to do this, I'll get to it in a couple of months.

I think all of the issues you raised should be evaluated and addressed. I'm even willing to do most of the work to do so. However, I would really strongly prefer doing that *in tree*. You have access to workloads I don't. So does Nick. So do other interested parties. What I want is to check in a shared implementation (or two, or three), and give each of us a chance to measure on our workloads. Running with a set of flags is generally much easier than building a custom build with a potentially tweaked patch and keeping track of which patch generated which set of results.

Okay. However, being asked to try an alternate implementation technique is part of the code review process. I think there are very good reasons to prefer this alternate strategy if it is practical...

Now, on the technical side again, I think we're coming from two different perspectives. My overwhelming concern is compile time. We've tried various attempts at value property propagation in the past and all of them have failed due to that issue. What I'm aiming to do is find *something* which can at least catch the simple cases and can be turned on without impacting performance. I'd like to push that as far as possible, but that's a second order concern. You seem to be focused more on the effectiveness side, with a second order interest in performance. Given past history, I don't think that's the right approach to take here.

This seems a bit unfair. I am concerned with compile time, but...

ValueTracking is slow currently, and the way it is used by InstCombine (etc.) make it very expensive. Really, we need a way of caching the results (instead of recomputing things each time each instruction is visited, which can happen many times as each instructions users are processed). To really fix the compile time issues here, we need to fix this.
Should we settle for a less-general analysis to work around (1)? It obviously depends on the price. But let's not compound our problems unnecessarily. Walking all the way up the DT is likely to be too expensive, but maybe searching N dominators (for some small N) is not too expensive. This is the implementation we should desire, however, because it will allow us to do the analysis in a general way, and share that code with the code that handles @llvm.assume. Then, once (1) is fixed, we might be able to raise N somewhat. If we cannot do this, even for a small N, then we'll have to settle for doing something even cheaper, at the price of effectiveness, consistency, and future development time. We might choose to do this, but let's not go there without evidence that it is necessary. Otherwise, if there is some value of N that hits your use cases and meets your compile-time impact targets, let's begin as we mean to go on.

Also, can you give an example of a code change that concerns you? At least on the surface this seems pretty stable to me. Early exits from functions and loops gives you information the optimizer can exploit. Anything else might, but might not. That seems entirely understandable.

I'm actually concerned about patterns like this:

$ cat /tmp/c.c 
void bar(); // pretend this can be inlined.
void har(); // and/or this too.

void foo1(int a) {
  if (a < 5)
    bar();
  else
    har();
}

bool foo2(int a) {
  if (a < 5)
    bar();
  else
    har();

  return a < 5;
}

where, as you might expect, in foo2 there is a second use of the comparison. Introducing this second use of the comparison should not affect whether or not ValueTracking can deduce some fact about 'a' when processing the inlined code.

clang++ -O3 -emit-llvm -S -o - /tmp/c.c

target datalayout = "E-m:e-i64:64-n32:64"
target triple = "powerpc64-unknown-linux-gnu"

define void @_Z4foo1i(i32 signext %a) #0 {
entry:
  %cmp = icmp slt i32 %a, 5
  br i1 %cmp, label %if.then, label %if.else

if.then:                                          ; preds = %entry
  tail call void @_Z3barv()
  br label %if.end

if.else:                                          ; preds = %entry
  tail call void @_Z3harv()
  br label %if.end

if.end:                                           ; preds = %if.else, %if.then
  ret void
}

declare void @_Z3barv() #0
declare void @_Z3harv() #0

define zeroext i1 @_Z4foo2i(i32 signext %a) #0 {
entry:
  %cmp = icmp slt i32 %a, 5
  br i1 %cmp, label %if.then, label %if.else

if.then:                                          ; preds = %entry
  tail call void @_Z3barv()
  br label %if.end

if.else:                                          ; preds = %entry
  tail call void @_Z3harv()
  br label %if.end

if.end:                                           ; preds = %if.else, %if.then
  ret i1 %cmp
}

a user can "do nothing wrong", but by repeating the comparison as part of some other expression, cause known bit information to change inside otherwise unrelated blocks. This would be, as I'm sure you appreciate, quite unfortunate.

Thanks again!

Updated to incorporate code comments. Additional feature flags and implementation approaches to follow separately.

Look at all uses of the comparison as requested by Hal.

On the sqlite3 example I used before, the difference between the two configurations is indistinguishable. I can quantify on a quieter box later, but on my local machine I don't see much difference.

Hal - preliminary numbers on the dominance based approach make it look like that might be viable. I need to cleanup my hacked implementation and fix at least one bug, but my initial impression is that it appears to be fast enough. I'm a bit surprised by this, but it's what the data says.

Worth discussing is whether we should actually incorporate both. They seem to be good at different cases and maybe we should exploit that. The use based approach is good for grabbing facts far from the use site (i.e. function entry guards), while the dominance approach is likely more powerful since it's not restricted to direct uses, but is probably best limited to a local scope. Thoughts?

In D7708#134110, @reames wrote:

Hal - preliminary numbers on the dominance based approach make it look like that might be viable. I need to cleanup my hacked implementation and fix at least one bug, but my initial impression is that it appears to be fast enough. I'm a bit surprised by this, but it's what the data says.

Worth discussing is whether we should actually incorporate both. They seem to be good at different cases and maybe we should exploit that. The use based approach is good for grabbing facts far from the use site (i.e. function entry guards), while the dominance approach is likely more powerful since it's not restricted to direct uses, but is probably best limited to a local scope. Thoughts?

Yes, perhaps with a FIXME if there is some other "right" way to fix the problem. I've been thinking of something like LVI that keeps track of known bits. What do you think?

Updated to include a cleaned up dom tree based implementation. This version should allow performance comparisons between the two approaches.

With the inline comments addressed, I think this is okay to check in, but I'll defer to someone more familiar with the area for an LGTM.

Optional: ou might want to add some negative tests too -- where optimizing based on a dominating condition would be a miscompile.

lib/Analysis/ValueTracking.cpp
638 ↗	(On Diff #21537)	`DominatorTree::dominates` asserts `BBE.isSingleEdge()`, so you should check that before calling `dominates`.
1051 ↗	(On Diff #21537)	Whitespace is a little off (elsewhere too), please run `clang-format` before checkin.
test/Transforms/InstCombine/dom-conditions.ll
63 ↗	(On Diff #21537)	This is an interesting example -- we should be able to fold the addition completely. (Not commenting on this change) Can `instcombine` be taught to do something if `(KnownZero \| KnowOne).isAllOnesValue()` (i.e. we have complete information about a value)?

Before anyone bothers running test against this, please let me upload one more patch. I took another look and realized that I have a return where I should have a continue or break. (for the dominating check) The good news (I guess) is that I introduced this bug when adding features Hal requested, so my original numbers are still good. Everything after that is meaningless of course. :(

Updated to fix the early return bug. Unfortunately, I'm now getting the performance impact I really expected to see all along. Short version: We will need to be very selective about this.

Baseline (sqlite -03): 16.128-16.805

Note the high variance. Also, NEW BASELINE NUMBERS. I have no idea why the baseline changed by 50%. I've rebased on ToT, but nothing else should have changed.

Both Dom & Value based (unlimited): 18.416, 18.427 (ouch!)
Value only (unlimited): 17.349, 17.492
Dom only (unlimited): 17.462, 17.613
Value only (< 20 uses): 16.516 - 17.107
Dom + Value (< 20 each): 17.512 - 18.054
Value only (< 4 uses): 16.540-16.927
Dom only (< 4 blocks): 16.991 - 17.366

My take away here is that the dominance walk based implementation is more expensive than the value based one.

@Hal, can you give me a LGTM here? I'd really prefer to work on this incrementally in tree with proper review at each change. In particular, that likely would have caught my return typo issue and saved us all some time.

@sanjoy - will fix up whitespace and add a negative test or two

Address Sanjoy's review comments.

LGTM, thanks!

test/Transforms/InstCombine/dom-conditions.ll
50 ↗	(On Diff #21611)	Extra blank lines (if you want to separate these into sections, please add a comment, don't just leave more blank lines).
154 ↗	(On Diff #21611)	Extra blank line.

This revision is now accepted and ready to land.Mar 10 2015, 1:35 PM

Closed by commit rL231879: Infer known bits from dominating conditions (authored by reames). · Explain WhyMar 10 2015, 3:49 PM

This revision was automatically updated to reflect the committed changes.

Sorry for taking so long to get back to you. This has been put on the
back burner for the moment.

To answer your question, I don't believe this line of work will help
you. KnownBits would need to infer that some specific bit is non-zero
to prove that the value in question is non-zero. Unless I'm misreading
your example, you don't know this. You really want something more along
the lines of a constant range analysis pass. We have something like that
in LVI today, but I've given absolutely no thought to how to use that in
alias analysis. Daniel's concerns seem entirely reasonable to me and I
suspect would be prohibitive of any attempt before the pass manager has
been *finally* finished. (Feel free to go bug/harass Chandler!)

To give a quick update on this line of work as a whole:

Nick was able to run some internal tests for me. His overall results

showed that a) compile time wasn't too badly impacted (2-3% on average),
but b) lots of bugs were exposed by enabling this. I haven't had a
chance to go stomp all those bugs and would definitely welcome help here.

I need to get around to implementing a couple of known performance

tweaks and completing the branch checking. Currently the code is
somewhat incomplete (but functional.)

I plan to pursue turning this on by default once the previous two

issues are addressed.

I've also been looking in to other approaches for solving the same

issues. I see no reason to only solve this one way and have been trying
to get some of the easy cases dealt with. Examples include:
http://reviews.llvm.org/D9312, http://reviews.llvm.org/D9763,
https://llvm.org/bugs/show_bug.cgi?id=23333.

Philip

aleksandr.popov mentioned this in D152838: [ValueTracking] Add range computation from dominating conditions.Jun 13 2023, 11:26 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Analysis/

ValueTracking.cpp

212 lines

test/

Transforms/

InstCombine/

dom-conditions.ll

152 lines

Diff 21654

llvm/trunk/lib/Analysis/ValueTracking.cpp

Show All 33 Lines
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include <cstring>		#include <cstring>
using namespace llvm;		using namespace llvm;
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

const unsigned MaxDepth = 6;		const unsigned MaxDepth = 6;

		/// Enable an experimental feature to leverage information about dominating
		/// conditions to compute known bits. The individual options below control how
		/// hard we search. The defaults are choosen to be fairly aggressive. If you
		/// run into compile time problems when testing, scale them back and report
		/// your findings.
		static cl::opt<bool> EnableDomConditions("value-tracking-dom-conditions",
		cl::Hidden, cl::init(false));

		// This is expensive, so we only do it for the top level query value.
		// (TODO: evaluate cost vs profit, consider higher thresholds)
		static cl::opt<unsigned> DomConditionsMaxDepth("dom-conditions-max-depth",
		cl::Hidden, cl::init(1));

		/// How many dominating blocks should be scanned looking for dominating
		/// conditions?
		static cl::opt<unsigned> DomConditionsMaxDomBlocks("dom-conditions-dom-blocks",
		cl::Hidden,
		cl::init(20000));

		// Controls the number of uses of the value searched for possible
		// dominating comparisons.
		static cl::opt<unsigned> DomConditionsMaxUses("dom-conditions-max-uses",
		cl::Hidden, cl::init(2000));

		// If true, don't consider only compares whose only use is a branch.
		static cl::opt<bool> DomConditionsSingleCmpUse("dom-conditions-single-cmp-use",
		cl::Hidden, cl::init(false));

/// Returns the bitwidth of the given scalar or pointer type (if unknown returns		/// Returns the bitwidth of the given scalar or pointer type (if unknown returns
/// 0). For vector types, returns the element type's bitwidth.		/// 0). For vector types, returns the element type's bitwidth.
static unsigned getBitWidth(Type *Ty, const DataLayout &DL) {		static unsigned getBitWidth(Type *Ty, const DataLayout &DL) {
if (unsigned BitWidth = Ty->getScalarSizeInBits())		if (unsigned BitWidth = Ty->getScalarSizeInBits())
return BitWidth;		return BitWidth;

return DL.getPointerTypeSizeInBits(Ty);		return DL.getPointerTypeSizeInBits(Ty);
}		}
▲ Show 20 Lines • Show All 415 Lines • ▼ Show 20 Lines

template<typename LHS, typename RHS>		template<typename LHS, typename RHS>
inline match_combine_or<BinaryOp_match<LHS, RHS, Instruction::Xor>,		inline match_combine_or<BinaryOp_match<LHS, RHS, Instruction::Xor>,
BinaryOp_match<RHS, LHS, Instruction::Xor>>		BinaryOp_match<RHS, LHS, Instruction::Xor>>
m_c_Xor(const LHS &L, const RHS &R) {		m_c_Xor(const LHS &L, const RHS &R) {
return m_CombineOr(m_Xor(L, R), m_Xor(R, L));		return m_CombineOr(m_Xor(L, R), m_Xor(R, L));
}		}

		/// Compute known bits in 'V' under the assumption that the condition 'Cmp' is
		/// true (at the context instruction.) This is mostly a utility function for
		/// the prototype dominating conditions reasoning below.
		static void computeKnownBitsFromTrueCondition(Value V, ICmpInst Cmp,
		APInt &KnownZero,
		APInt &KnownOne,
		const DataLayout &DL,
		unsigned Depth, const Query &Q) {
		Value *LHS = Cmp->getOperand(0);
		Value *RHS = Cmp->getOperand(1);
		// TODO: We could potentially be more aggressive here. This would be worth
		// evaluating. If we can, explore commoning this code with the assume
		// handling logic.
		if (LHS != V && RHS != V)
		return;

		const unsigned BitWidth = KnownZero.getBitWidth();

		switch (Cmp->getPredicate()) {
		default:
		// We know nothing from this condition
		break;
		// TODO: implement unsigned bound from below (known one bits)
		// TODO: common condition check implementations with assumes
		// TODO: implement other patterns from assume (e.g. V & B == A)
		case ICmpInst::ICMP_SGT:
		if (LHS == V) {
		APInt KnownZeroTemp(BitWidth, 0), KnownOneTemp(BitWidth, 0);
		computeKnownBits(RHS, KnownZeroTemp, KnownOneTemp, DL, Depth + 1, Q);
		if (KnownOneTemp.isAllOnesValue() \|\| KnownZeroTemp.isNegative()) {
		// We know that the sign bit is zero.
		KnownZero \|= APInt::getSignBit(BitWidth);
		}
		}
		break;
		case ICmpInst::ICMP_EQ:
		if (LHS == V)
		computeKnownBits(RHS, KnownZero, KnownOne, DL, Depth + 1, Q);
		else if (RHS == V)
		computeKnownBits(LHS, KnownZero, KnownOne, DL, Depth + 1, Q);
		else
		llvm_unreachable("missing use?");
		break;
		case ICmpInst::ICMP_ULE:
		if (LHS == V) {
		APInt KnownZeroTemp(BitWidth, 0), KnownOneTemp(BitWidth, 0);
		computeKnownBits(RHS, KnownZeroTemp, KnownOneTemp, DL, Depth + 1, Q);
		// The known zero bits carry over
		unsigned SignBits = KnownZeroTemp.countLeadingOnes();
		KnownZero \|= APInt::getHighBitsSet(BitWidth, SignBits);
		}
		break;
		case ICmpInst::ICMP_ULT:
		if (LHS == V) {
		APInt KnownZeroTemp(BitWidth, 0), KnownOneTemp(BitWidth, 0);
		computeKnownBits(RHS, KnownZeroTemp, KnownOneTemp, DL, Depth + 1, Q);
		// Whatever high bits in rhs are zero are known to be zero (if rhs is a
		// power of 2, then one more).
		unsigned SignBits = KnownZeroTemp.countLeadingOnes();
		if (isKnownToBeAPowerOfTwo(RHS, false, Depth + 1, Query(Q, Cmp), DL))
		SignBits++;
		KnownZero \|= APInt::getHighBitsSet(BitWidth, SignBits);
		}
		break;
		};
		}

		/// Compute known bits in 'V' from conditions which are known to be true along
		/// all paths leading to the context instruction. In particular, look for
		/// cases where one branch of an interesting condition dominates the context
		/// instruction. This does not do general dataflow.
		/// NOTE: This code is EXPERIMENTAL and currently off by default.
		static void computeKnownBitsFromDominatingCondition(Value *V, APInt &KnownZero,
		APInt &KnownOne,
		const DataLayout &DL,
		unsigned Depth,
		const Query &Q) {
		// Need both the dominator tree and the query location to do anything useful
		if (!Q.DT \|\| !Q.CxtI)
		return;
		Instruction Cxt = const_cast<Instruction >(Q.CxtI);

		// Avoid useless work
		if (auto VI = dyn_cast<Instruction>(V))
		if (VI->getParent() == Cxt->getParent())
		return;

		// Note: We currently implement two options. It's not clear which of these
		// will survive long term, we need data for that.
		// Option 1 - Try walking the dominator tree looking for conditions which
		// might apply. This works well for local conditions (loop guards, etc..),
		// but not as well for things far from the context instruction (presuming a
		// low max blocks explored). If we can set an high enough limit, this would
		// be all we need.
		// Option 2 - We restrict out search to those conditions which are uses of
		// the value we're interested in. This is independent of dom structure,
		// but is slightly less powerful without looking through lots of use chains.
		// It does handle conditions far from the context instruction (e.g. early
		// function exits on entry) really well though.

		// Option 1 - Search the dom tree
		unsigned NumBlocksExplored = 0;
		BasicBlock *Current = Cxt->getParent();
		while (true) {
		// Stop searching if we've gone too far up the chain
		if (NumBlocksExplored >= DomConditionsMaxDomBlocks)
		break;
		NumBlocksExplored++;

		if (!Q.DT->getNode(Current)->getIDom())
		break;
		Current = Q.DT->getNode(Current)->getIDom()->getBlock();
		if (!Current)
		// found function entry
		break;

		BranchInst *BI = dyn_cast<BranchInst>(Current->getTerminator());
		if (!BI \|\| BI->isUnconditional())
		continue;
		ICmpInst *Cmp = dyn_cast<ICmpInst>(BI->getCondition());
		if (!Cmp)
		continue;

		// We're looking for conditions that are guaranteed to hold at the context
		// instruction. Finding a condition where one path dominates the context
		// isn't enough because both the true and false cases could merge before
		// the context instruction we're actually interested in. Instead, we need
		// to ensure that the taken edge dominates the context instruction.
		BasicBlock *BB0 = BI->getSuccessor(0);
		BasicBlockEdge Edge(BI->getParent(), BB0);
		if (!Edge.isSingleEdge() \|\| !Q.DT->dominates(Edge, Q.CxtI->getParent()))
		continue;

		computeKnownBitsFromTrueCondition(V, Cmp, KnownZero, KnownOne, DL, Depth,
		Q);
		}

		// Option 2 - Search the other uses of V
		unsigned NumUsesExplored = 0;
		for (auto U : V->users()) {
		// Avoid massive lists
		if (NumUsesExplored >= DomConditionsMaxUses)
		break;
		NumUsesExplored++;
		// Consider only compare instructions uniquely controlling a branch
		ICmpInst *Cmp = dyn_cast<ICmpInst>(U);
		if (!Cmp)
		continue;

		if (DomConditionsSingleCmpUse && !Cmp->hasOneUse())
		continue;

		for (auto *CmpU : Cmp->users()) {
		BranchInst *BI = dyn_cast<BranchInst>(CmpU);
		if (!BI \|\| BI->isUnconditional())
		continue;
		// We're looking for conditions that are guaranteed to hold at the
		// context instruction. Finding a condition where one path dominates
		// the context isn't enough because both the true and false cases could
		// merge before the context instruction we're actually interested in.
		// Instead, we need to ensure that the taken edge dominates the context
		// instruction.
		BasicBlock *BB0 = BI->getSuccessor(0);
		BasicBlockEdge Edge(BI->getParent(), BB0);
		if (!Edge.isSingleEdge() \|\| !Q.DT->dominates(Edge, Q.CxtI->getParent()))
		continue;

		computeKnownBitsFromTrueCondition(V, Cmp, KnownZero, KnownOne, DL, Depth,
		Q);
		}
		}
		}

static void computeKnownBitsFromAssume(Value *V, APInt &KnownZero,		static void computeKnownBitsFromAssume(Value *V, APInt &KnownZero,
APInt &KnownOne, const DataLayout &DL,		APInt &KnownOne, const DataLayout &DL,
unsigned Depth, const Query &Q) {		unsigned Depth, const Query &Q) {
// Use of assumptions is context-sensitive. If we don't have a context, we		// Use of assumptions is context-sensitive. If we don't have a context, we
// cannot use them!		// cannot use them!
if (!Q.AC \|\| !Q.CxtI)		if (!Q.AC \|\| !Q.CxtI)
return;		return;

▲ Show 20 Lines • Show All 338 Lines • ▼ Show 20 Lines	if (Align)
KnownZero = APInt::getLowBitsSet(BitWidth, countTrailingZeros(Align));		KnownZero = APInt::getLowBitsSet(BitWidth, countTrailingZeros(Align));
else		else
KnownZero.clearAllBits();		KnownZero.clearAllBits();
KnownOne.clearAllBits();		KnownOne.clearAllBits();

// Don't give up yet... there might be an assumption that provides more		// Don't give up yet... there might be an assumption that provides more
// information...		// information...
computeKnownBitsFromAssume(V, KnownZero, KnownOne, DL, Depth, Q);		computeKnownBitsFromAssume(V, KnownZero, KnownOne, DL, Depth, Q);

		// Or a dominating condition for that matter
		if (EnableDomConditions && Depth <= DomConditionsMaxDepth)
		computeKnownBitsFromDominatingCondition(V, KnownZero, KnownOne, DL,
		Depth, Q);
return;		return;
}		}

// Start out not knowing anything.		// Start out not knowing anything.
KnownZero.clearAllBits(); KnownOne.clearAllBits();		KnownZero.clearAllBits(); KnownOne.clearAllBits();

// Limit search depth.		// Limit search depth.
// All recursive calls that increase depth must come after this.		// All recursive calls that increase depth must come after this.
if (Depth == MaxDepth)		if (Depth == MaxDepth)
return;		return;

// A weak GlobalAlias is totally unknown. A non-weak GlobalAlias has		// A weak GlobalAlias is totally unknown. A non-weak GlobalAlias has
// the bits of its aliasee.		// the bits of its aliasee.
if (GlobalAlias *GA = dyn_cast<GlobalAlias>(V)) {		if (GlobalAlias *GA = dyn_cast<GlobalAlias>(V)) {
if (!GA->mayBeOverridden())		if (!GA->mayBeOverridden())
computeKnownBits(GA->getAliasee(), KnownZero, KnownOne, DL, Depth + 1, Q);		computeKnownBits(GA->getAliasee(), KnownZero, KnownOne, DL, Depth + 1, Q);
return;		return;
}		}

// Check whether a nearby assume intrinsic can determine some known bits.		// Check whether a nearby assume intrinsic can determine some known bits.
computeKnownBitsFromAssume(V, KnownZero, KnownOne, DL, Depth, Q);		computeKnownBitsFromAssume(V, KnownZero, KnownOne, DL, Depth, Q);

		// Check whether there's a dominating condition which implies something about
		// this value at the given context.
		if (EnableDomConditions && Depth <= DomConditionsMaxDepth)
		computeKnownBitsFromDominatingCondition(V, KnownZero, KnownOne, DL, Depth,
		Q);

Operator *I = dyn_cast<Operator>(V);		Operator *I = dyn_cast<Operator>(V);
if (!I) return;		if (!I) return;

APInt KnownZero2(KnownZero), KnownOne2(KnownOne);		APInt KnownZero2(KnownZero), KnownOne2(KnownOne);
switch (I->getOpcode()) {		switch (I->getOpcode()) {
default: break;		default: break;
case Instruction::Load:		case Instruction::Load:
if (MDNode *MD = cast<LoadInst>(I)->getMetadata(LLVMContext::MD_range))		if (MDNode *MD = cast<LoadInst>(I)->getMetadata(LLVMContext::MD_range))
▲ Show 20 Lines • Show All 1,950 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/dom-conditions.ll

				; RUN: opt -instcombine -value-tracking-dom-conditions=1 -S < %s \| FileCheck %s

				target datalayout = "e-p:64:64:64-p1:16:16:16-p2:32:32:32-p3:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"

				define i1 @test_cmp_ult(i64 %A) {
				; CHECK-LABEL: @test_cmp_ult
				entry:
				%cmp = icmp ult i64 %A, 64
				br i1 %cmp, label %taken, label %untaken

				taken:
				; CHECK-LABEL: taken:
				; CHECK-NEXT: ret i1 false
				%cmp2 = icmp ugt i64 %A, 64
				ret i1 %cmp2
				untaken:
				ret i1 true
				}

				define i1 @test_cmp_ule(i64 %A) {
				; CHECK-LABEL: @test_cmp_ule
				entry:
				%cmp = icmp ule i64 %A, 64
				br i1 %cmp, label %taken, label %untaken

				taken:
				; CHECK-LABEL: taken:
				; CHECK-NEXT: ret i1 false
				%cmp2 = icmp ugt i64 %A, 128
				ret i1 %cmp2
				untaken:
				ret i1 true
				}

				define i1 @test_cmp_sgt(i32 %A) {
				; CHECK-LABEL: @test_cmp_sgt
				entry:
				%cmp = icmp sgt i32 %A, 10
				br i1 %cmp, label %taken, label %untaken

				taken:
				; CHECK-LABEL: taken:
				; CHECK-NEXT: ret i1 true
				%cmp2 = icmp sgt i32 %A, -1
				ret i1 %cmp2
				untaken:
				ret i1 true
				}

				define i64 @test_add_zero_bits(i64 %A) {
				; CHECK-LABEL: @test_add_zero_bits
				entry:
				%cmp = icmp eq i64 %A, 2
				br i1 %cmp, label %taken, label %untaken

				taken:
				; CHECK-LABEL: taken:
				; CHECK-NEXT: ret i64 3
				%add = add i64 %A, 1
				ret i64 %add
				untaken:
				ret i64 %A
				}

				define i64 @test_add_nsw(i64 %A) {
				; CHECK-LABEL: @test_add_nsw
				entry:
				%cmp = icmp ult i64 %A, 20
				br i1 %cmp, label %taken, label %untaken

				taken:
				; CHECK-LABEL: taken:
				; CHECK-NEXT: %add = add nuw nsw i64 %A, 1
				; CHECK-NEXT: ret i64 %add
				%add = add i64 %A, 1
				ret i64 %add
				untaken:
				ret i64 %A
				}

				; After sinking the instructions into the if block, check that we
				; can simplify some of them using dominating conditions.
				define i32 @test_add_zero_bits_sink(i32 %x) nounwind ssp {
				; CHECK-LABEL: @test_add_zero_bits_sink(
				; CHECK-NOT: sdiv i32
				entry:
				%a = add nsw i32 %x, 16
				%b = sdiv i32 %a, %x
				%cmp = icmp ult i32 %x, 7
				br i1 %cmp, label %bb1, label %bb2

				bb1:
				; CHECK-LABEL: bb1:
				; CHECK-NEXT: or i32 %x, 16
				; CHECK-NEXT: udiv i32
				ret i32 %b

				bb2:
				ret i32 %x
				}

				; A condition in the same block gives no information
				define i32 @test_neg1(i32 %x) nounwind ssp {
				; CHECK-LABEL: @test_neg1
				; CHECK: add
				; CHECK: sdiv
				; CHECK: icmp
				; CHECK: select
				entry:
				%a = add nsw i32 %x, 16
				%b = sdiv i32 %a, %x
				%cmp = icmp ult i32 %x, 7
				%ret = select i1 %cmp, i32 %a, i32 %b
				ret i32 %ret
				}

				; A non-dominating edge gives no information
				define i32 @test_neg2(i32 %x) {
				; CHECK-LABEL: @test_neg2
				entry:
				%cmp = icmp ult i32 %x, 7
				br i1 %cmp, label %bb1, label %merge

				bb1:
				br label %merge

				merge:
				; CHECK-LABEL: merge:
				; CHECK: icmp
				; CHECK: select
				%cmp2 = icmp ult i32 %x, 7
				%ret = select i1 %cmp2, i32 %x, i32 0
				ret i32 %ret
				}

				; A unconditional branch expressed as a condition one gives no
				; information (and shouldn't trip any asserts.)
				define i32 @test_neg3(i32 %x) {
				; CHECK-LABEL: @test_neg3
				entry:
				%cmp = icmp ult i32 %x, 7
				br i1 %cmp, label %merge, label %merge
				merge:
				; CHECK-LABEL: merge:
				; CHECK: icmp
				; CHECK: select
				%cmp2 = icmp ult i32 %x, 7
				%ret = select i1 %cmp2, i32 %x, i32 0
				ret i32 %ret
				}

				declare i32 @bar()