This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/
-
Analysis/
2
InstructionSimplify.cpp
19
ValueTracking.cpp
-
Transforms/InstCombine/
-
InstCombine/
2
InstructionCombining.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
all-bits-shift.ll
-
div.ll
-
load-combine-metadata.ll

Differential D12706

Handle non-constant shifts in computeKnownBits, and use computeKnownBits for constant folding in InstCombine/Simplify
ClosedPublic

Authored by hfinkel on Sep 8 2015, 2:12 PM.

Download Raw Diff

Details

Reviewers

reames
majnemer
sanjoy

Summary

First, the motivation: LLVM currently does not realize that:

((2072 >> (L == 0)) >> 7) & 1 == 0

where L is some arbitrary value. Whether you right-shift 2072 by 7 or by 8, the lowest-order bit is always zero. There are obviously several ways to go about fixing this, but the generic solution pursued in this patch is to teach computeKnownBits something about shifts by a non-constant amount. Currently, we give up completely on these. Instead, in cases where we know something about the low-order bits of the shift-amount operand, we can combine (and together) the associated restrictions for all shift amounts consistent with that knowledge. As a further generalization, I refactored all of the logic for all three kinds of shifts to have this capability. This works well in the above case, for example, because the dynamic shift amount can only be 0 or 1, and thus we can say a lot about the known bits of the result.

This brings us to the second part of this patch: Even when we know all of the bits of a value via computeKnownBits, nothing currently constant-folds the result. The patch introduces the necessary code into InstCombine and InstSimplify. I've added it into both because:

InstCombine won't automatically pick up the associated logic in InstSimplify (InstCombine uses InstSimplify, but not via the API that passes in the original instruction).
Putting the logic in InstCombine allows the resulting simplifications to become part of the iterative worklist
Putting the logic in InstSimplify allows the resulting simplifications to be used by everywhere else that calls SimplifyInstruction (inlining, unrolling, and many others).

And this requires a small change to our definition of an ephemeral value so that we don't break the rest case from r246696 (where the icmp feeding the @llvm.assume, is also feeding a br). Under the old definition, the icmp would not be considered ephemeral (because it is used by the br), but this causes the assume to remove itself (in addition to simplifying the branch structure), and it seems more-useful to prevent that from happening.

Please review, and thanks again!

Diff Detail

Repository: rL LLVM

Event Timeline

hfinkel updated this revision to Diff 34250.Sep 8 2015, 2:12 PM

hfinkel retitled this revision from to Handle non-constant shifts in computeKnownBits, and use computeKnownBits for constant folding in InstCombine/Simplify.

hfinkel updated this object.

hfinkel added reviewers: sanjoy, majnemer, reames.

hfinkel set the repository for this revision to rL LLVM.

hfinkel added a subscriber: llvm-commits.

Include the update for a regression test I missed in the last patch (load-combine-metadata.ll). Once we simplify based on known bits, if we have range metadata that restricts to a single value, then we'll use it to constant fold the result. This is not what the test was supposed to be testing, so I've just changed the ranges on the metadata in the test.

Roman Divacky asked on IRC about statistics from self-hosting. Here are some:

During a self-host (PPC64/Linux - LLVM + Clang + compiler-rt), computeKnownBits returns non-trivial results for shifts with dynamic shift amounts:

shl - About 350k times
lshr - About 9k times
ashr - About 125 times

And the logic that constant-folds values when we know all of the bits, fires:

InstSimplify - About 3.5k times
InstCombine - About 2k times

Some comments inline:

lib/Analysis/ValueTracking.cpp
965	Should we cap `BitWidth` to be some reasonable value (like `128`) so that the loop down below does not run for too long in the worst case?
987	Why not start `KnownZero` and `KnownOne` as `allOnesValue` and then unconditionally KnownZero &= KZF(KnownZero2, ShiftAmt); KnownOne &= KOF(KnownOne2, ShiftAmt); in the loop? If no shift amount is consistent with `ShiftAmtKZ` and `ShiftAmtKO` then `KnownZero` and `KnownOne` will remain `allOnesValue`; but that's fine since we proved that the shift's result is `undef`, and it's allowed to be `0` and `-1` at the same time.
1180	Why not just `return APIntOps::ashr(KnownZero, ShiftAmt)`?
1188	Why not just `return APIntOps::ashr(KnownOne, ShiftAmt)`?

jmolloy added a subscriber: jmolloy.Sep 10 2015, 4:16 AM

jmolloy added inline comments.

lib/Analysis/ValueTracking.cpp
983	It seems we could do slightly better here; If the shifter operand is known to be nonzero, we know that we're shifting by at least 1: if (isKnownNonZero(I->getOperand(0), DL, Depth + 1, Q)) { KnownZero = KZF(KnownZero, 1); KnownOne = KOF(KnownOne, 1); } I have a similar patch that I was just about to send upstream, but obviously it conflicts with yours and yours handles many more cases. Would you be able to put this test in too or should I wait until you've committed this and add it myself?

regehr added a subscriber: regehr.Sep 23 2015, 2:30 AM

Hi Hal, I just discovered this code. I've been working on a related patch based on the imprecisions that Souper found, that I posted about on llvm-dev.

My strategy was simply to find the imprecisions with the highest profile count and fix those, plus some random other cases that were easy.

If possible I'd like to pick the best parts of my patch and get them integrated with yours, if that is somehow possible. I'm totally new to Phabricator so for now I'll just link to my patch. Any comments about how to best proceed would be appreciated.

http://www.cs.utah.edu/~regehr/knownbits1.txt

I should add that my patch has no measurable effect on the time taken to build an LLVM/clang/compiler-rt. Also a Debug build of the patched compiler passes all tests.

Also, the clang that gets compiled by a patched compiler is about 4KB smaller than one built with a clang lacking my patch, so it is certainly enabling a few optimizations to fire.

Hi Duncan,

Of course I agree in principle -- but keep in mind that Souper's
computation of known bits is effectively optimal, and that my patch
eliminated the vast majority of observed imprecisions.

Also I have implemented precise abstract transfer functions for things
like this in the past, and always found them hard to get right! No fun
at all.

But really, I was hoping to simply take Hal's patch and add in the parts
of mine that fail to overlap with his! For example his patch does not
address bswap or ctpop.

John

jmolloy added a child revision: D12799: [ValueTracking] Teach isKnownNonZero a new trick.Sep 24 2015, 8:53 AM

Updated to reflect review suggestions.

lib/Analysis/ValueTracking.cpp
965	I don't think so. Our canonicalization sometimes creates large integer types, and there's nothing really expensive in the loop (and it is linear with the bit width). If this shows up on a profile somewhere because of large integer sizes, I'll certainly change my mind.
983	Either way is fine, but in the name of keeping changes small, it is probably better if you just add it yourself).
987	We actually can't return non-disjoint bit sets, even for undef, without breaking other code (there are at least two asserts in ValueTracking.cpp that check this explicitly, and I'm afraid of how much else). That aside, I agree. I'll update the loop to use that form with a check at the end.
1180	Good idea (the logic was like this before, but using ashr certainly seems better).
1188	Good idea.

Drive by review.

lib/Analysis/InstructionSimplify.cpp
4032	As a future enhancement, extending this to vectors and floating point types would make sense. A future patch is fine.
lib/Analysis/ValueTracking.cpp
959	Reading the code, it's really not clear what KnownX vs KnownX2 mean. Are there better names which could be used here? Same with KZF and KOF. At minimum, there's some documentation needed here.
1114	It's not immediately clear to me that the new code implements all of the cases the old code did. It would have been much easier to review if you'd first refactored out the helper function with the existing code, then added the new functionality. Not asking you to do that now, but doing that way would have made review easier.
lib/Transforms/InstCombine/InstructionCombining.cpp
2728	Doesn't instcombine call SimplifyInstruction internally? If so, why do we need to duplicate this block of code here?

hfinkel added inline comments.Sep 28 2015, 9:05 AM

lib/Analysis/InstructionSimplify.cpp
4032	Agreed.
lib/Analysis/ValueTracking.cpp
959	I'll improve this.
1114	Noted and agreed.
lib/Transforms/InstCombine/InstructionCombining.cpp
2728	It does, in a sense, but it does not call SimplifyInstruction directly. It calls the various helper functions that only get the opcode and operands, and so there's no instruction on which to call computeKnownBits.

In D12706#251597, @regehr wrote:

I should add that my patch has no measurable effect on the time taken to build an LLVM/clang/compiler-rt. Also a Debug build of the patched compiler passes all tests.

Also, the clang that gets compiled by a patched compiler is about 4KB smaller than one built with a clang lacking my patch, so it is certainly enabling a few optimizations to fire.

This sounds great. There is a lot of follow-up work to be done here. I'm somewhat fearful of adding more functionality into this patch, it is already pretty far-reaching in terms of potential complications. Once I get this settled, I think it would be best to post follow-up patches for review.

Hal, sounds great, I'll follow up. Philip Reames has offered to help which is great since I'm not too familiar with the process or tools here!

Rebased (and add a better comment explaining the function parameters of computeKnownBitsFromShiftOperator).

LGTM w/minor comment addressed

Are you planning on implementing the follow on suggestions? If not, we should make sure they get tracked either as TODOs or bugs.

lib/Analysis/ValueTracking.cpp
983	It would be more clearly correct to use the two temporaries for this calculation. The current code is correct, but slightly confusing.

This revision is now accepted and ready to land.Oct 14 2015, 10:49 AM

LGTM

lib/Analysis/ValueTracking.cpp
974	Could you make this `auto *SA` ?
1008–1014	InstSimplify already has logic to handle shift amounts `>= bitwidth`. Should we care whether or not `computeKnownBits` gives the same result?

I apologize for the delay, and thanks for the reviews! r251146.

lib/Analysis/ValueTracking.cpp
983	I added an additional comment about this before I committed.
1008–1014	I don't think that it can give the same result, because I can't return 'undef' here. We might be able to do that by returning conflicting KnownZero/KnownOne masks, but we'd need to fix some downstream code first.

Revision Contents

Path

Size

lib/

Analysis/

InstructionSimplify.cpp

11 lines

ValueTracking.cpp

140 lines

Transforms/

InstCombine/

InstructionCombining.cpp

21 lines

test/

Transforms/

InstCombine/

all-bits-shift.ll

46 lines

div.ll

4 lines

load-combine-metadata.ll

6 lines

Diff 34270

lib/Analysis/InstructionSimplify.cpp

Show First 20 Lines • Show All 4,021 Lines • ▼ Show 20 Lines	case Instruction::Call: {
break;		break;
}		}
case Instruction::Trunc:		case Instruction::Trunc:
Result =		Result =
SimplifyTruncInst(I->getOperand(0), I->getType(), DL, TLI, DT, AC, I);		SimplifyTruncInst(I->getOperand(0), I->getType(), DL, TLI, DT, AC, I);
break;		break;
}		}

		// In general, it is possible for computeKnownBits to determine all bits in a
		// value even when the operands are not all constants.
		if (!Result && I->getType()->isIntegerTy()) {
		reamesUnsubmitted Not Done Reply Inline Actions As a future enhancement, extending this to vectors and floating point types would make sense. A future patch is fine. reames: As a future enhancement, extending this to vectors and floating point types would make sense.
		hfinkelAuthorUnsubmitted Not Done Reply Inline Actions Agreed. hfinkel: Agreed.
		unsigned BitWidth = I->getType()->getScalarSizeInBits();
		APInt KnownZero(BitWidth, 0);
		APInt KnownOne(BitWidth, 0);
		computeKnownBits(I, KnownZero, KnownOne, DL, /Depth/0, AC, I, DT);
		if ((KnownZero \| KnownOne).isAllOnesValue())
		Result = ConstantInt::get(I->getContext(), KnownOne);
		}

/// If called on unreachable code, the above logic may report that the		/// If called on unreachable code, the above logic may report that the
/// instruction simplified to itself. Make life easier for users by		/// instruction simplified to itself. Make life easier for users by
/// detecting that case here, returning a safe value instead.		/// detecting that case here, returning a safe value instead.
return Result == I ? UndefValue::get(I->getType()) : Result;		return Result == I ? UndefValue::get(I->getType()) : Result;
}		}

/// \brief Implementation of recursive simplification through an instructions		/// \brief Implementation of recursive simplification through an instructions
/// uses.		/// uses.
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

lib/Analysis/ValueTracking.cpp

Show First 20 Lines • Show All 376 Lines • ▼ Show 20 Lines	void llvm::computeKnownBitsFromRangeMetadata(const MDNode &Ranges,
KnownZero = APInt::getHighBitsSet(BitWidth, MinLeadingZeros);		KnownZero = APInt::getHighBitsSet(BitWidth, MinLeadingZeros);
}		}

static bool isEphemeralValueOf(Instruction I, const Value E) {		static bool isEphemeralValueOf(Instruction I, const Value E) {
SmallVector<const Value *, 16> WorkSet(1, I);		SmallVector<const Value *, 16> WorkSet(1, I);
SmallPtrSet<const Value *, 32> Visited;		SmallPtrSet<const Value *, 32> Visited;
SmallPtrSet<const Value *, 16> EphValues;		SmallPtrSet<const Value *, 16> EphValues;

		// The instruction defining an assumption's condition itself is always
		// considered ephemeral to that assumption (even if it has other
		// non-ephemeral users). See r246696's test case for an example.
		if (std::find(I->op_begin(), I->op_end(), E) != I->op_end())
		return true;

while (!WorkSet.empty()) {		while (!WorkSet.empty()) {
const Value *V = WorkSet.pop_back_val();		const Value *V = WorkSet.pop_back_val();
if (!Visited.insert(V).second)		if (!Visited.insert(V).second)
continue;		continue;

// If all uses of this value are ephemeral, then so is this value.		// If all uses of this value are ephemeral, then so is this value.
bool FoundNEUse = false;		bool FoundNEUse = false;
for (const User *I : V->users())		for (const User *I : V->users())
▲ Show 20 Lines • Show All 551 Lines • ▼ Show 20 Lines	// assume(v <_u c)
APInt::getHighBitsSet(BitWidth, RHSKnownZero.countLeadingOnes()+1);		APInt::getHighBitsSet(BitWidth, RHSKnownZero.countLeadingOnes()+1);
else		else
KnownZero \|=		KnownZero \|=
APInt::getHighBitsSet(BitWidth, RHSKnownZero.countLeadingOnes());		APInt::getHighBitsSet(BitWidth, RHSKnownZero.countLeadingOnes());
}		}
}		}
}		}

		template <typename KZFunctor, typename KOFunctor>
		static void computeKnownBitsFromShiftOperator(Operator *I,
		reamesUnsubmitted Not Done Reply Inline Actions Reading the code, it's really not clear what KnownX vs KnownX2 mean. Are there better names which could be used here? Same with KZF and KOF. At minimum, there's some documentation needed here. reames: Reading the code, it's really not clear what KnownX vs KnownX2 mean. Are there better names…
		hfinkelAuthorUnsubmitted Not Done Reply Inline Actions I'll improve this. hfinkel: I'll improve this.
		APInt &KnownZero, APInt &KnownOne,
		APInt &KnownZero2, APInt &KnownOne2,
		const DataLayout &DL, unsigned Depth, const Query &Q,
		KZFunctor KZF, KOFunctor KOF) {
		unsigned BitWidth = KnownZero.getBitWidth();

		sanjoyUnsubmitted Not Done Reply Inline Actions Should we cap `BitWidth` to be some reasonable value (like `128`) so that the loop down below does not run for too long in the worst case? sanjoy: Should we cap `BitWidth` to be some reasonable value (like `128`) so that the loop down below…
		hfinkelAuthorUnsubmitted Not Done Reply Inline Actions I don't think so. Our canonicalization sometimes creates large integer types, and there's nothing really expensive in the loop (and it is linear with the bit width). If this shows up on a profile somewhere because of large integer sizes, I'll certainly change my mind. hfinkel: I don't think so. Our canonicalization sometimes creates large integer types, and there's…
		if (ConstantInt *SA = dyn_cast<ConstantInt>(I->getOperand(1))) {
		unsigned ShiftAmt = SA->getLimitedValue(BitWidth-1);

		computeKnownBits(I->getOperand(0), KnownZero, KnownOne, DL, Depth + 1, Q);
		KnownZero = KZF(KnownZero, ShiftAmt);
		KnownOne = KOF(KnownOne, ShiftAmt);
		return;
		}

		majnemerUnsubmitted Not Done Reply Inline Actions Could you make this `auto SA` ? majnemer:* Could you make this `auto *SA` ?
		computeKnownBits(I->getOperand(1), KnownZero, KnownOne, DL, Depth + 1, Q);

		uint64_t ShiftAmtKZ = KnownZero.getLimitedValue();
		uint64_t ShiftAmtKO = KnownOne.getLimitedValue();
		KnownZero.clearAllBits(), KnownOne.clearAllBits();

		// Early exit if we can't constrain any well-defined shift amount.
		if (!(ShiftAmtKZ & (BitWidth-1)) && !(ShiftAmtKO & (BitWidth-1)))
		return;
		jmolloyUnsubmitted Not Done Reply Inline Actions It seems we could do slightly better here; If the shifter operand is known to be nonzero, we know that we're shifting by at least 1: if (isKnownNonZero(I->getOperand(0), DL, Depth + 1, Q)) { KnownZero = KZF(KnownZero, 1); KnownOne = KOF(KnownOne, 1); } I have a similar patch that I was just about to send upstream, but obviously it conflicts with yours and yours handles many more cases. Would you be able to put this test in too or should I wait until you've committed this and add it myself? jmolloy: It seems we could do slightly better here; If the shifter operand is known to be nonzero, we…
		hfinkelAuthorUnsubmitted Not Done Reply Inline Actions Either way is fine, but in the name of keeping changes small, it is probably better if you just add it yourself). hfinkel: Either way is fine, but in the name of keeping changes small, it is probably better if you just…
		reamesUnsubmitted Not Done Reply Inline Actions It would be more clearly correct to use the two temporaries for this calculation. The current code is correct, but slightly confusing. reames: It would be more clearly correct to use the two temporaries for this calculation. The current…
		hfinkelAuthorUnsubmitted Not Done Reply Inline Actions I added an additional comment about this before I committed. hfinkel: I added an additional comment about this before I committed.

		computeKnownBits(I->getOperand(0), KnownZero2, KnownOne2, DL, Depth + 1, Q);

		bool FirstAllowed = true;
		sanjoyUnsubmitted Not Done Reply Inline Actions Why not start `KnownZero` and `KnownOne` as `allOnesValue` and then unconditionally KnownZero &= KZF(KnownZero2, ShiftAmt); KnownOne &= KOF(KnownOne2, ShiftAmt); in the loop? If no shift amount is consistent with `ShiftAmtKZ` and `ShiftAmtKO` then `KnownZero` and `KnownOne` will remain `allOnesValue`; but that's fine since we proved that the shift's result is `undef`, and it's allowed to be `0` and `-1` at the same time. sanjoy: Why not start `KnownZero` and `KnownOne` as `allOnesValue` and then unconditionally ```…
		hfinkelAuthorUnsubmitted Not Done Reply Inline Actions We actually can't return non-disjoint bit sets, even for undef, without breaking other code (there are at least two asserts in ValueTracking.cpp that check this explicitly, and I'm afraid of how much else). That aside, I agree. I'll update the loop to use that form with a check at the end. hfinkel: We actually can't return non-disjoint bit sets, even for undef, without breaking other code…
		for (unsigned ShiftAmt = 0; ShiftAmt < BitWidth; ++ShiftAmt) {
		// Combine the shifted known input bits only for those shift amounts
		// compatible with its known constraints.
		if ((ShiftAmt & ~ShiftAmtKZ) != ShiftAmt)
		continue;
		if ((ShiftAmt \| ShiftAmtKO) != ShiftAmt)
		continue;

		if (FirstAllowed) {
		KnownZero = KZF(KnownZero2, ShiftAmt);
		KnownOne = KOF(KnownOne2, ShiftAmt);
		FirstAllowed = false;
		continue;
		}

		KnownZero &= KZF(KnownZero2, ShiftAmt);
		KnownOne &= KOF(KnownOne2, ShiftAmt);
		}
		}

static void computeKnownBitsFromOperator(Operator *I, APInt &KnownZero,		static void computeKnownBitsFromOperator(Operator *I, APInt &KnownZero,
APInt &KnownOne, const DataLayout &DL,		APInt &KnownOne, const DataLayout &DL,
unsigned Depth, const Query &Q) {		unsigned Depth, const Query &Q) {
unsigned BitWidth = KnownZero.getBitWidth();		unsigned BitWidth = KnownZero.getBitWidth();

APInt KnownZero2(KnownZero), KnownOne2(KnownOne);		APInt KnownZero2(KnownZero), KnownOne2(KnownOne);
switch (I->getOpcode()) {		switch (I->getOpcode()) {
		majnemerUnsubmitted Not Done Reply Inline Actions InstSimplify already has logic to handle shift amounts `>= bitwidth`. Should we care whether or not `computeKnownBits` gives the same result? majnemer: InstSimplify already has logic to handle shift amounts `>= bitwidth`. Should we care whether…
		hfinkelAuthorUnsubmitted Not Done Reply Inline Actions I don't think that it can give the same result, because I can't return 'undef' here. We might be able to do that by returning conflicting KnownZero/KnownOne masks, but we'd need to fix some downstream code first. hfinkel: I don't think that it can give the same result, because I can't return 'undef' here. We might…
default: break;		default: break;
case Instruction::Load:		case Instruction::Load:
if (MDNode *MD = cast<LoadInst>(I)->getMetadata(LLVMContext::MD_range))		if (MDNode *MD = cast<LoadInst>(I)->getMetadata(LLVMContext::MD_range))
computeKnownBitsFromRangeMetadata(*MD, KnownZero);		computeKnownBitsFromRangeMetadata(*MD, KnownZero);
break;		break;
case Instruction::And: {		case Instruction::And: {
// If either the LHS or the RHS are Zero, the result is zero.		// If either the LHS or the RHS are Zero, the result is zero.
computeKnownBits(I->getOperand(1), KnownZero, KnownOne, DL, Depth + 1, Q);		computeKnownBits(I->getOperand(1), KnownZero, KnownOne, DL, Depth + 1, Q);
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	case Instruction::SExt: {
// If the sign bit of the input is known set or clear, then we know the		// If the sign bit of the input is known set or clear, then we know the
// top bits of the result.		// top bits of the result.
if (KnownZero[SrcBitWidth-1]) // Input sign bit known zero		if (KnownZero[SrcBitWidth-1]) // Input sign bit known zero
KnownZero \|= APInt::getHighBitsSet(BitWidth, BitWidth - SrcBitWidth);		KnownZero \|= APInt::getHighBitsSet(BitWidth, BitWidth - SrcBitWidth);
else if (KnownOne[SrcBitWidth-1]) // Input sign bit known set		else if (KnownOne[SrcBitWidth-1]) // Input sign bit known set
KnownOne \|= APInt::getHighBitsSet(BitWidth, BitWidth - SrcBitWidth);		KnownOne \|= APInt::getHighBitsSet(BitWidth, BitWidth - SrcBitWidth);
break;		break;
}		}
case Instruction::Shl:		case Instruction::Shl: {
// (shl X, C1) & C2 == 0 iff (X & C2 >>u C1) == 0		// (shl X, C1) & C2 == 0 iff (X & C2 >>u C1) == 0
if (ConstantInt *SA = dyn_cast<ConstantInt>(I->getOperand(1))) {		auto KZF = [BitWidth](const APInt &KnownZero, unsigned ShiftAmt) {
uint64_t ShiftAmt = SA->getLimitedValue(BitWidth);		return (KnownZero << ShiftAmt) \|
computeKnownBits(I->getOperand(0), KnownZero, KnownOne, DL, Depth + 1, Q);		APInt::getLowBitsSet(BitWidth, ShiftAmt); // Low bits known 0.
KnownZero <<= ShiftAmt;		};
KnownOne <<= ShiftAmt;
KnownZero \|= APInt::getLowBitsSet(BitWidth, ShiftAmt); // low bits known 0		auto KOF = [BitWidth](const APInt &KnownOne, unsigned ShiftAmt) {
}		return KnownOne << ShiftAmt;
		};

		computeKnownBitsFromShiftOperator(I, KnownZero, KnownOne,
		KnownZero2, KnownOne2, DL, Depth, Q,
		KZF, KOF);
break;		break;
case Instruction::LShr:		}
		case Instruction::LShr: {
// (ushr X, C1) & C2 == 0 iff (-1 >> C1) & C2 == 0		// (ushr X, C1) & C2 == 0 iff (-1 >> C1) & C2 == 0
if (ConstantInt *SA = dyn_cast<ConstantInt>(I->getOperand(1))) {		auto KZF = [BitWidth](const APInt &KnownZero, unsigned ShiftAmt) {
// Compute the new bits that are at the top now.		return APIntOps::lshr(KnownZero, ShiftAmt) \|
uint64_t ShiftAmt = SA->getLimitedValue(BitWidth);		// High bits known zero.
		APInt::getHighBitsSet(BitWidth, ShiftAmt);
		};

// Unsigned shift right.		auto KOF = [BitWidth](const APInt &KnownOne, unsigned ShiftAmt) {
computeKnownBits(I->getOperand(0), KnownZero, KnownOne, DL, Depth + 1, Q);		return APIntOps::lshr(KnownOne, ShiftAmt);
KnownZero = APIntOps::lshr(KnownZero, ShiftAmt);		};
KnownOne = APIntOps::lshr(KnownOne, ShiftAmt);
// high bits known zero.		computeKnownBitsFromShiftOperator(I, KnownZero, KnownOne,
KnownZero \|= APInt::getHighBitsSet(BitWidth, ShiftAmt);		KnownZero2, KnownOne2, DL, Depth, Q,
}		KZF, KOF);
break;		break;
case Instruction::AShr:		}
		case Instruction::AShr: {
// (ashr X, C1) & C2 == 0 iff (-1 >> C1) & C2 == 0		// (ashr X, C1) & C2 == 0 iff (-1 >> C1) & C2 == 0
if (ConstantInt *SA = dyn_cast<ConstantInt>(I->getOperand(1))) {		auto KZF = [BitWidth](const APInt &KnownZero, unsigned ShiftAmt) {
reamesUnsubmitted Not Done Reply Inline Actions It's not immediately clear to me that the new code implements all of the cases the old code did. It would have been much easier to review if you'd first refactored out the helper function with the existing code, then added the new functionality. Not asking you to do that now, but doing that way would have made review easier. reames: It's not immediately clear to me that the new code implements all of the cases the old code did.
hfinkelAuthorUnsubmitted Not Done Reply Inline Actions Noted and agreed. hfinkel: Noted and agreed.
// Compute the new bits that are at the top now.		APInt KZ = APIntOps::lshr(KnownZero, ShiftAmt);
		sanjoyUnsubmitted Not Done Reply Inline Actions Why not just `return APIntOps::ashr(KnownZero, ShiftAmt)`? sanjoy: Why not just `return APIntOps::ashr(KnownZero, ShiftAmt)`?
		hfinkelAuthorUnsubmitted Not Done Reply Inline Actions Good idea (the logic was like this before, but using ashr certainly seems better). hfinkel: Good idea (the logic was like this before, but using ashr certainly seems better).
uint64_t ShiftAmt = SA->getLimitedValue(BitWidth-1);		if (KZ[BitWidth-ShiftAmt-1]) // New bits are known zero.
		KZ \|= APInt::getHighBitsSet(BitWidth, ShiftAmt);
		return KZ;
		};

// Signed shift right.		auto KOF = [BitWidth](const APInt &KnownOne, unsigned ShiftAmt) {
computeKnownBits(I->getOperand(0), KnownZero, KnownOne, DL, Depth + 1, Q);		APInt KO = APIntOps::lshr(KnownOne, ShiftAmt);
KnownZero = APIntOps::lshr(KnownZero, ShiftAmt);		if (KO[BitWidth-ShiftAmt-1]) // // New bits are known one.
		sanjoyUnsubmitted Not Done Reply Inline Actions Why not just `return APIntOps::ashr(KnownOne, ShiftAmt)`? sanjoy: Why not just `return APIntOps::ashr(KnownOne, ShiftAmt)`?
		hfinkelAuthorUnsubmitted Not Done Reply Inline Actions Good idea. hfinkel: Good idea.
KnownOne = APIntOps::lshr(KnownOne, ShiftAmt);		KO \|= APInt::getHighBitsSet(BitWidth, ShiftAmt);
		return KO;
		};

APInt HighBits(APInt::getHighBitsSet(BitWidth, ShiftAmt));		computeKnownBitsFromShiftOperator(I, KnownZero, KnownOne,
if (KnownZero[BitWidth-ShiftAmt-1]) // New bits are known zero.		KnownZero2, KnownOne2, DL, Depth, Q,
KnownZero \|= HighBits;		KZF, KOF);
else if (KnownOne[BitWidth-ShiftAmt-1]) // New bits are known one.
KnownOne \|= HighBits;
}
break;		break;
		}
case Instruction::Sub: {		case Instruction::Sub: {
bool NSW = cast<OverflowingBinaryOperator>(I)->hasNoSignedWrap();		bool NSW = cast<OverflowingBinaryOperator>(I)->hasNoSignedWrap();
computeKnownBitsAddSub(false, I->getOperand(0), I->getOperand(1), NSW,		computeKnownBitsAddSub(false, I->getOperand(0), I->getOperand(1), NSW,
KnownZero, KnownOne, KnownZero2, KnownOne2, DL,		KnownZero, KnownOne, KnownZero2, KnownOne2, DL,
Depth, Q);		Depth, Q);
break;		break;
}		}
case Instruction::Add: {		case Instruction::Add: {
▲ Show 20 Lines • Show All 2,707 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 2,719 Lines • ▼ Show 20 Lines	if (!I->use_empty() &&
ReplaceInstUsesWith(*I, C);		ReplaceInstUsesWith(*I, C);
++NumConstProp;		++NumConstProp;
EraseInstFromFunction(*I);		EraseInstFromFunction(*I);
MadeIRChange = true;		MadeIRChange = true;
continue;		continue;
}		}
}		}

		// In general, it is possible for computeKnownBits to determine all bits in a
		reamesUnsubmitted Not Done Reply Inline Actions Doesn't instcombine call SimplifyInstruction internally? If so, why do we need to duplicate this block of code here? reames: Doesn't instcombine call SimplifyInstruction internally? If so, why do we need to duplicate…
		hfinkelAuthorUnsubmitted Not Done Reply Inline Actions It does, in a sense, but it does not call SimplifyInstruction directly. It calls the various helper functions that only get the opcode and operands, and so there's no instruction on which to call computeKnownBits. hfinkel: It does, in a sense, but it does not call SimplifyInstruction directly. It calls the various…
		// value even when the operands are not all constants.
		if (!I->use_empty() && I->getType()->isIntegerTy()) {
		unsigned BitWidth = I->getType()->getScalarSizeInBits();
		APInt KnownZero(BitWidth, 0);
		APInt KnownOne(BitWidth, 0);
		computeKnownBits(I, KnownZero, KnownOne, /Depth/0, I);
		if ((KnownZero \| KnownOne).isAllOnesValue()) {
		Constant *C = ConstantInt::get(I->getContext(), KnownOne);
		DEBUG(dbgs() << "IC: ConstFold (all bits known) to: " << *C <<
		" from: " << *I << '\n');

		// Add operands to the worklist.
		ReplaceInstUsesWith(*I, C);
		++NumConstProp;
		EraseInstFromFunction(*I);
		MadeIRChange = true;
		continue;
		}
		}

// See if we can trivially sink this instruction to a successor basic block.		// See if we can trivially sink this instruction to a successor basic block.
if (I->hasOneUse()) {		if (I->hasOneUse()) {
BasicBlock *BB = I->getParent();		BasicBlock *BB = I->getParent();
Instruction UserInst = cast<Instruction>(I->user_begin());		Instruction UserInst = cast<Instruction>(I->user_begin());
BasicBlock *UserParent;		BasicBlock *UserParent;

// Get the block the use occurs in.		// Get the block the use occurs in.
if (PHINode *PN = dyn_cast<PHINode>(UserInst))		if (PHINode *PN = dyn_cast<PHINode>(UserInst))
▲ Show 20 Lines • Show All 384 Lines • Show Last 20 Lines

test/Transforms/InstCombine/all-bits-shift.ll

This file was added.

				; RUN: opt -S -instcombine < %s \| FileCheck %s
				; RUN: opt -S -instsimplify < %s \| FileCheck %s
				target datalayout = "E-m:e-i64:64-n32:64"
				target triple = "powerpc64-unknown-linux-gnu"

				@d = global i32 15, align 4
				@b = global i32* @d, align 8
				@a = common global i32 0, align 4

				; Function Attrs: nounwind
				define signext i32 @main() #1 {
				entry:
				%0 = load i32, i32* @b, align 8
				%1 = load i32, i32* @a, align 4
				%lnot = icmp eq i32 %1, 0
				%lnot.ext = zext i1 %lnot to i32
				%shr.i = lshr i32 2072, %lnot.ext
				%call.lobit = lshr i32 %shr.i, 7
				%2 = and i32 %call.lobit, 1
				%3 = load i32, i32* %0, align 4
				%or = or i32 %2, %3
				store i32 %or, i32* %0, align 4
				%4 = load i32, i32* @a, align 4
				%lnot.1 = icmp eq i32 %4, 0
				%lnot.ext.1 = zext i1 %lnot.1 to i32
				%shr.i.1 = lshr i32 2072, %lnot.ext.1
				%call.lobit.1 = lshr i32 %shr.i.1, 7
				%5 = and i32 %call.lobit.1, 1
				%or.1 = or i32 %5, %or
				store i32 %or.1, i32* %0, align 4
				ret i32 %or.1

				; Check that both InstCombine and InstSimplify can use computeKnownBits to
				; realize that:
				; ((2072 >> (L == 0)) >> 7) & 1
				; is always zero.

				; CHECK-LABEL: @main
				; CHECK: %[[V1:[0-9]+]] = load i32, i32* @b, align 8
				; CHECK: %[[V2:[0-9]+]] = load i32, i32* %[[V1]], align 4
				; CHECK: ret i32 %[[V2]]
				}

				attributes #0 = { nounwind readnone }
				attributes #1 = { nounwind }

test/Transforms/InstCombine/div.ll

	Show First 20 Lines • Show All 264 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret i32 %a			; CHECK-NEXT: ret i32 %a
	}			}

	define <2 x i32> @test31(<2 x i32> %x) {			define <2 x i32> @test31(<2 x i32> %x) {
	%shr = lshr <2 x i32> %x, <i32 31, i32 31>			%shr = lshr <2 x i32> %x, <i32 31, i32 31>
	%div = udiv <2 x i32> %shr, <i32 2147483647, i32 2147483647>			%div = udiv <2 x i32> %shr, <i32 2147483647, i32 2147483647>
	ret <2 x i32> %div			ret <2 x i32> %div
	; CHECK-LABEL: @test31(			; CHECK-LABEL: @test31(
	; CHECK-NEXT: %[[shr:.*]] = lshr <2 x i32> %x, <i32 31, i32 31>			; CHECK-NEXT: ret <2 x i32> zeroinitializer
	; CHECK-NEXT: udiv <2 x i32> %[[shr]], <i32 2147483647, i32 2147483647>
	; CHECK-NEXT: ret <2 x i32>
	}			}

	define i32 @test32(i32 %a, i32 %b) {			define i32 @test32(i32 %a, i32 %b) {
	%shl = shl i32 2, %b			%shl = shl i32 2, %b
	%div = lshr i32 %shl, 2			%div = lshr i32 %shl, 2
	%div2 = udiv i32 %a, %div			%div2 = udiv i32 %a, %div
	ret i32 %div2			ret i32 %div2
	; CHECK-LABEL: @test32(			; CHECK-LABEL: @test32(
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/Transforms/InstCombine/load-combine-metadata.ll

	Show All 11 Lines
	define void @test_load_load_combine_metadata(i32, i32, i32*) {			define void @test_load_load_combine_metadata(i32, i32, i32*) {
	%a = load i32, i32* %0, !tbaa !8, !range !0, !alias.scope !5, !noalias !6			%a = load i32, i32* %0, !tbaa !8, !range !0, !alias.scope !5, !noalias !6
	%b = load i32, i32* %0, !tbaa !8, !range !1			%b = load i32, i32* %0, !tbaa !8, !range !1
	store i32 %a, i32* %1			store i32 %a, i32* %1
	store i32 %b, i32* %2			store i32 %b, i32* %2
	ret void			ret void
	}			}

	; CHECK: ![[RANGE]] = !{i32 0, i32 1, i32 8, i32 9}			; CHECK: ![[RANGE]] = !{i32 0, i32 5, i32 7, i32 9}
	!0 = !{ i32 0, i32 1 }			!0 = !{ i32 0, i32 5 }
	!1 = !{ i32 8, i32 9 }			!1 = !{ i32 7, i32 9 }
	!2 = !{!2}			!2 = !{!2}
	!3 = !{!3, !2}			!3 = !{!3, !2}
	!4 = !{!4, !2}			!4 = !{!4, !2}
	!5 = !{!3}			!5 = !{!3}
	!6 = !{!4}			!6 = !{!4}
	!7 = !{ !"tbaa root" }			!7 = !{ !"tbaa root" }
	!8 = !{ !7, !7, i64 0 }			!8 = !{ !7, !7, i64 0 }