Download Raw Diff

Details

Reviewers

Commits

rG57776b8159eb: Handle resolvable branches in complete loop unroll heuristic.
rL243084: Handle resolvable branches in complete loop unroll heuristic.

Summary

Resolving a branch allows us to ignore blocks that won't be executed, and thus make our estimate more accurate.
This patch is intended to be applied after D10205 (though it could be applied independently).

Diff Detail

Event Timeline

mzolotukhin updated this revision to Diff 27021.Jun 2 2015, 6:23 PM

mzolotukhin retitled this revision from to Handle resolvable branches in complete loop unroll heuristic..

mzolotukhin updated this object.

mzolotukhin edited the test plan for this revision. (Show Details)

mzolotukhin added a reviewer: chandlerc.

mzolotukhin added a subscriber: Unknown Object (MLST).

mzolotukhin mentioned this in D10208: Add tests for full unroll heuristic: folding CFG, folding IVs..Jun 2 2015, 6:27 PM

mzolotukhin added a parent revision: D10205: Remove SCEVCache and FindConstantPointers from complete loop unrolling heuristic..Jun 2 2015, 6:29 PM

In two-operand instructions use simplified values only if both operands have the same base pointer (or don't have it at all).

Totally the right direction, just some quick comments here. This patch will likely change some based on my comments in D10205 but these are independent issues.

lib/Transforms/Scalar/LoopUnrollPass.cpp
528–529	I don't think the 'Cond' variable is buying you much. I would also write the condition differently as the only null case is where we have no simplified value. The type must always be an int. So: if (Constant *SimpleCond = SimplifiedValues.lookup(BI->getCondition())) { BBWorklist.insert(BI->getSuccessor(cast<ConstantInt>(SimpleCond)->isZero() ? 1 : 0));
535–538	Same comment as above.

This revision now requires changes to proceed.Jun 4 2015, 5:19 PM

Rebase.
Rename SCEVSimplify to simplifyInstWithSCEV in visitCmpInst.
Fallback to Base visitor if failed to constant fold (and only then call simplifyInstWithSCEV).
Rewrite conditional expression for adding successors.

mzolotukhin mentioned this in D10205: Remove SCEVCache and FindConstantPointers from complete loop unrolling heuristic..Jun 5 2015, 10:44 PM

Rebase on master.

Ping!

Looks really close, just need to sort out the offset simplification.

lib/Transforms/Scalar/LoopUnrollPass.cpp
367–369	This doesn't really seem correct to me... For example, multiplication such as "(B + X) * (B + Y)" does not simplify to "X * Y". Even addition doesn't simplify that way. I think it would be more clear (and correct) to explicitly handle the math that simplifies here rather than trying to share a routine. Test for subtraction and that LHS and RHS are in the simplified addresses mapping. If they are, you can write a comment about how the base addresses cancel and the result is the CaonstantExpr difference of the offsets. I don't think you really need to even think about falling through to the fancy InstSimplify logic here because you only can do anything when you have boring constant offsets.
434	If you take my advice above, I would also inline the logic here. I'm not sure there is really going to be that much shared between the two when you're done. You can really only handle subtraction above, but here you can handle any comparison and really want to just fall back on the same logic.
543	I would leave a hint in this comment that this is the fallback if we can't directly fold the successor above.

This revision now requires changes to proceed.Jul 14 2015, 2:59 PM

Address review remarks:

Add tests.
Remove simplifyUsingOffsets routine.
.. and rebase on the trunk.

Hi Chandler,

I updated the patch - does it look good now?

I decided not to handle operands in (Base+Offset) form in visitBinaryOperator for now, because I think we could get into troubles even if we only look at SUB (I'm concerned about possible overflow errors here). I think it's possible to do, but we can do it later if we ever need it.

Tests are also added to the patch now (mostly they are from D10208).

Thanks,
Michael

This looks fantastic. Nit-picky indenting and layout below. Feel free to commit with that addressed.

lib/Transforms/Scalar/LoopUnrollPass.cpp

436–444

I actually think its better to nest these. Because we're using fall-through to "continue trying", the early-exit is hard to spot here. It also will save some map lookups:

auto SimplifiedLHS = SimplifiedAddresses.find(LHS)
if (SimplifiedLHS != SimplifiedAddresses.end()) {
  auto SimplifiedRHS = SimplifiedAddresses.find(RHS);
  if (SimplifiedRHS != SimplifiedAddresses.end()) {
    SimplifiedAddress &LHSAddr = SimplifiedLHS->second;
    SimplifiedAddress &RHSAddr = SimplifiedRHS->second;
    if (LHSAddr.Base == RHSAddr.Base) {
      LHS = LHSAddr.Offset;
      ...

447–449

If you're going to skip the middle {}s (which I'm fine with) skip the outer ones as well.

This revision is now accepted and ready to land.Jul 23 2015, 5:33 PM

Closed by commit rL243084: Handle resolvable branches in complete loop unroll heuristic. (authored by mzolotukhin). · Explain WhyJul 23 2015, 6:53 PM

This revision was automatically updated to reflect the committed changes.

Thanks, Chandler! It's committed.

Michael

Diff 27021

lib/Transforms/Scalar/LoopUnrollPass.cpp

Show First 20 Lines • Show All 358 Lines • ▼ Show 20 Lines	if (SCEVSimplify(&I))
return true;		return true;
Value LHS = I.getOperand(0), RHS = I.getOperand(1);		Value LHS = I.getOperand(0), RHS = I.getOperand(1);
if (!isa<Constant>(LHS))		if (!isa<Constant>(LHS))
if (Constant *SimpleLHS = SimplifiedValues.lookup(LHS))		if (Constant *SimpleLHS = SimplifiedValues.lookup(LHS))
LHS = SimpleLHS;		LHS = SimpleLHS;
if (!isa<Constant>(RHS))		if (!isa<Constant>(RHS))
if (Constant *SimpleRHS = SimplifiedValues.lookup(RHS))		if (Constant *SimpleRHS = SimplifiedValues.lookup(RHS))
RHS = SimpleRHS;		RHS = SimpleRHS;
Value *SimpleV = nullptr;		Value *SimpleV = nullptr;
const DataLayout &DL = I.getModule()->getDataLayout();		const DataLayout &DL = I.getModule()->getDataLayout();
if (auto FI = dyn_cast<FPMathOperator>(&I))		if (auto FI = dyn_cast<FPMathOperator>(&I))
		chandlercUnsubmitted Not Done Reply Inline Actions This doesn't really seem correct to me... For example, multiplication such as "(B + X) * (B + Y)" does not simplify to "X * Y". Even addition doesn't simplify that way. I think it would be more clear (and correct) to explicitly handle the math that simplifies here rather than trying to share a routine. Test for subtraction and that LHS and RHS are in the simplified addresses mapping. If they are, you can write a comment about how the base addresses cancel and the result is the CaonstantExpr difference of the offsets. I don't think you really need to even think about falling through to the fancy InstSimplify logic here because you only can do anything when you have boring constant offsets. chandlerc: This doesn't really seem correct to me... For example, multiplication such as "(B + X) * (B +…
SimpleV =		SimpleV =
SimplifyFPBinOp(I.getOpcode(), LHS, RHS, FI->getFastMathFlags(), DL);		SimplifyFPBinOp(I.getOpcode(), LHS, RHS, FI->getFastMathFlags(), DL);
else		else
SimpleV = SimplifyBinOp(I.getOpcode(), LHS, RHS, DL);		SimpleV = SimplifyBinOp(I.getOpcode(), LHS, RHS, DL);

if (Constant *C = dyn_cast_or_null<Constant>(SimpleV))		if (Constant *C = dyn_cast_or_null<Constant>(SimpleV))
SimplifiedValues[&I] = C;		SimplifiedValues[&I] = C;

Show All 35 Lines	bool visitLoad(LoadInst &I) {
}		}

Constant *CV = CDS->getElementAsConstant(Index);		Constant *CV = CDS->getElementAsConstant(Index);
assert(CV && "Constant expected.");		assert(CV && "Constant expected.");
SimplifiedValues[&I] = CV;		SimplifiedValues[&I] = CV;

return true;		return true;
}		}

		bool visitCmpInst(CmpInst &I) {
		if (SCEVSimplify(&I))
		return true;
		Value LHS = I.getOperand(0), RHS = I.getOperand(1);
		// First try to handle simplified comparisons.
		if (!isa<Constant>(LHS))
		if (Constant *SimpleLHS = SimplifiedValues.lookup(LHS))
		LHS = SimpleLHS;
		if (!isa<Constant>(RHS))
		if (Constant *SimpleRHS = SimplifiedValues.lookup(RHS))
		RHS = SimpleRHS;
		if (Constant *CLHS = dyn_cast<Constant>(LHS)) {
		if (Constant *CRHS = dyn_cast<Constant>(RHS))
		chandlercUnsubmitted Not Done Reply Inline Actions If you take my advice above, I would also inline the logic here. I'm not sure there is really going to be that much shared between the two when you're done. You can really only handle subtraction above, but here you can handle any comparison and really want to just fall back on the same logic. chandlerc: If you take my advice above, I would also inline the logic here. I'm not sure there is really…
		if (Constant *C = ConstantExpr::getCompare(I.getPredicate(), CLHS, CRHS)) {
		SimplifiedValues[&I] = C;
		return true;
		}
		}

		return false;
		}
};		};
} // namespace		} // namespace
		chandlercUnsubmitted Not Done Reply Inline Actions I actually think its better to nest these. Because we're using fall-through to "continue trying", the early-exit is hard to spot here. It also will save some map lookups: auto SimplifiedLHS = SimplifiedAddresses.find(LHS) if (SimplifiedLHS != SimplifiedAddresses.end()) { auto SimplifiedRHS = SimplifiedAddresses.find(RHS); if (SimplifiedRHS != SimplifiedAddresses.end()) { SimplifiedAddress &LHSAddr = SimplifiedLHS->second; SimplifiedAddress &RHSAddr = SimplifiedRHS->second; if (LHSAddr.Base == RHSAddr.Base) { LHS = LHSAddr.Offset; ... chandlerc: I actually think its better to nest these. Because we're using fall-through to "continue…


namespace {		namespace {
struct EstimatedUnrollCost {		struct EstimatedUnrollCost {
/// \brief Count the number of optimized instructions.		/// \brief Count the number of optimized instructions.
		chandlercUnsubmitted Not Done Reply Inline Actions If you're going to skip the middle {}s (which I'm fine with) skip the outer ones as well. chandlerc: If you're going to skip the middle {}s (which I'm fine with) skip the outer ones as well.
unsigned NumberOfOptimizedInstructions;		unsigned NumberOfOptimizedInstructions;

/// \brief Count the total number of instructions.		/// \brief Count the total number of instructions.
unsigned UnrolledLoopSize;		unsigned UnrolledLoopSize;
};		};
}		}

/// \brief Figure out if the loop is worth full unrolling.		/// \brief Figure out if the loop is worth full unrolling.
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	for (unsigned Idx = 0; Idx != BBWorklist.size(); ++Idx) {
NumberOfOptimizedInstructions += TTI.getUserCost(&I);		NumberOfOptimizedInstructions += TTI.getUserCost(&I);

// If unrolled body turns out to be too big, bail out.		// If unrolled body turns out to be too big, bail out.
if (UnrolledLoopSize - NumberOfOptimizedInstructions >		if (UnrolledLoopSize - NumberOfOptimizedInstructions >
MaxUnrolledLoopSize)		MaxUnrolledLoopSize)
return None;		return None;
}		}

		TerminatorInst *TI = BB->getTerminator();

		// Add in the live successors by first checking whether we have terminator
		// that may be simplified based on the values simplified by this call.
		if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {
		if (BI->isConditional()) {
		Value *Cond = BI->getCondition();
		if (ConstantInt *SimpleCond
		= dyn_cast_or_null<ConstantInt>(SimplifiedValues.lookup(Cond))) {
		chandlercUnsubmitted Not Done Reply Inline Actions I don't think the 'Cond' variable is buying you much. I would also write the condition differently as the only null case is where we have no simplified value. The type must always be an int. So: if (Constant SimpleCond = SimplifiedValues.lookup(BI->getCondition())) { BBWorklist.insert(BI->getSuccessor(cast<ConstantInt>(SimpleCond)->isZero() ? 1 : 0)); chandlerc:* I don't think the 'Cond' variable is buying you much. I would also write the condition…
		BBWorklist.insert(BI->getSuccessor(SimpleCond->isZero() ? 1 : 0));
		continue;
		}
		}
		} else if (SwitchInst *SI = dyn_cast<SwitchInst>(TI)) {
		Value *Cond = SI->getCondition();
		if (ConstantInt *SimpleCond
		= dyn_cast_or_null<ConstantInt>(SimplifiedValues.lookup(Cond))) {
		BBWorklist.insert(SI->findCaseValue(SimpleCond).getCaseSuccessor());
		chandlercUnsubmitted Not Done Reply Inline Actions Same comment as above. chandlerc: Same comment as above.
		continue;
		}
		}

// Add BB's successors to the worklist.		// Add BB's successors to the worklist.
		chandlercUnsubmitted Not Done Reply Inline Actions I would leave a hint in this comment that this is the fallback if we can't directly fold the successor above. chandlerc: I would leave a hint in this comment that this is the fallback if we can't directly fold the…
for (BasicBlock *Succ : successors(BB))		for (BasicBlock *Succ : successors(BB))
if (L->contains(Succ))		if (L->contains(Succ))
BBWorklist.insert(Succ);		BBWorklist.insert(Succ);
}		}

// If we found no optimization opportunities on the first iteration, we		// If we found no optimization opportunities on the first iteration, we
// won't find them on later ones too.		// won't find them on later ones too.
if (!NumberOfOptimizedInstructions)		if (!NumberOfOptimizedInstructions)
▲ Show 20 Lines • Show All 372 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Handle resolvable branches in complete loop unroll heuristic.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 27021

lib/Transforms/Scalar/LoopUnrollPass.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Handle resolvable branches in complete loop unroll heuristic.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 27021

lib/Transforms/Scalar/LoopUnrollPass.cpp

Handle resolvable branches in complete loop unroll heuristic.
ClosedPublic