This is an archive of the discontinued LLVM Phabricator instance.

Instcombine: try to avoid wasted work in ConstantFold
Needs ReviewPublic

Authored by escha on Mar 14 2016, 11:58 AM.

Download Raw Diff

Details

Reviewers

bogner
resistor
reames

Summary

This is a bit tricky, and if someone has a better idea, please feel free to suggest it!

InstCombine currently does the following:

Call ConstantFoldInstruction on the instruction, and replace it with a constant if appropriate. This involves (internally) iterating over its operands and optimizing each ConstantExpr operand with ConstantFoldConstantExpression, then calling ConstantFold on the instruction with operands.
If that didn't work, optimize each ConstantExpr operand with ConstantFoldConstantExpression.

However, if you have lots of instructions with ConstantExpr operands but which cannot be folded (ex: loads from constant locations in memory -- in our case, uniform buffers are a common case), this means you call ConstantFoldConstantExpression twice on every operand, every time, because the constant folding fails.

One possible solution (as written here) is to provide an option to ConstantFoldInstruction to tell it not to try to optimize the ConstantExpr arguments, and then to reorder 1) and 2) so that we optimize the ConstantExprs first. I don't know if this is the best option, but it saves ~3-4% of time in instcombine for us.

Diff Detail

Repository: rL LLVM

Event Timeline

escha updated this revision to Diff 50624.Mar 14 2016, 11:58 AM

escha retitled this revision from to Instcombine: try to avoid wasted work in ConstantFold.

escha updated this object.

escha added reviewers: resistor, bogner.

escha set the repository for this revision to rL LLVM.

escha added a subscriber: llvm-commits.

Please upload with full context; I can't review this in the current form.

In general, I'm bothered by this change. Let me back up and ask a question, why do we need to fold the operand of the constant explicitly? Isn't the canonical form fully folded? (i.e. why isn't the argument already folded?) Is there some subtlety specific to constantexprs here?

This revision now requires changes to proceed.Mar 15 2016, 5:23 PM

What git option do I use to get a diff with full context?

If it's already folded, why does InstCombine try to optimize it? I did this under the assumption that InstCombine was doing something reasonable (and just being redundant about it); are you saying InstCombine shouldn't be doing this at all?

Updated diff with some context.

In D18155#376016, @escha wrote:

If it's already folded, why does InstCombine try to optimize it? I did this under the assumption that InstCombine was doing something reasonable (and just being redundant about it); are you saying InstCombine shouldn't be doing this at all?

This is exactly what I'm asking. In general, our canonical form is to have constants fully folded. Given InstCombine is iterative, we only need to do a single folding step at a time to reach an optimal fixed point. Why does this code need to fold constantexprs specially? Is there a reason we can't have a loop per instruction which tries to fold constexpr operands and updates the instruction operands?

I believe this code is in the initial portion of instcombine that creates the initial worklist; it isn't run iteratively at all, at least if I remember right.

To try to be more clear, this code appears to:

Drop instructions that are dead.
If constant, fold the constant. [this step folds ConstExprs as part of its internals, because that's how ConstantFolding works]
If not constant-foldable, check to see if any ConstExpr operands are foldable [this is where we duplicate work]
Add to the initial instcombine worklist.

In D18155#376087, @escha wrote:

To try to be more clear, this code appears to:

Drop instructions that are dead.

If constant, fold the constant. [this step folds ConstExprs as part of its internals, because that's how ConstantFolding works]

If not constant-foldable, check to see if any ConstExpr operands are foldable [this is where we duplicate work]

Add to the initial instcombine worklist.

Why not do the following:

Drop instructions that are dead.
Check to see if any ConstExpr operands are foldable, and update the instruction
If constant, fold the constant. [no longer needs to consider constexprs]
Add to the initial instcombine worklist.

Also, steps 1-3 are repeated inside the InstCombine run loop. See InstCombiner::run. I believe the code AddReachableCodeToWorklist is merely intended to be an optimization to avoid adding trivially constant instructions to the worklist (i.e. reduce the memory consumed by the worklist).

Why not do the following:

Drop instructions that are dead.

Check to see if any ConstExpr operands are foldable, and update the instruction

If constant, fold the constant. [no longer needs to consider constexprs]

Add to the initial instcombine worklist.

that's literally what this patch does

You're correct, I got confused by the changes to the ConstantFolding interface.

What happens if you *only* reorder the two steps? (i.e. do we see cases where we have non-foldable constant exprs?) What performance impact do you see there?

What's really confusing me about this code is that my expectation would be that constant expressions were fully folded *at construction* and that there would be very few places (linking two modules?) which would introduce new foldable constructs in a previously fully folded constant expr. If that's correct, why aren't we conicalizing in those presumable very rare places? (i.e. why is this code in ConstantFoldingInstruction at *all*?)

Yeah, I don't understand that part either!

Reordering them alone isn't enough because the slow part isn't the creation of a new simplified ConstExpr; it's running the simplify function, which takes time even if it ends up doing nothing, AFAIK.

Ah okay, I think now (looking at this a bit more) I understand why this cost keeps showing up in the profile. Like you said, it also happens during the main InstCombine run as well. Specifically, we have lots of loads and intrinsics with all constant operands that can't be folded (since they're runtime, not compile-time constants). So when running ConstantFoldInstruction, it's forced to call SimplifyConstantExpr on every operand, even though that ends up doing absolutely nothing. The results then get thrown away because you can't constant-fold a load to something whose value isn't known at compile time.

This cost shows up about 3 times in InstCombine:

~4% of time: constantfoldinstruction on init (this patch tries to eliminate that cost)
~4% of time: constantfoldconstantexpr on init (in this patch, but not saved)
~4% of time: constantfoldinstruction during the main loop

My guess is most of these are doing absolutely nothing useful, but I don't know enough about constexprs or the design of instcombine to understand why it's done this way.

In D18155#376179, @escha wrote:

My guess is most of these are doing absolutely nothing useful, but I don't know enough about constexprs or the design of instcombine to understand why it's done this way.

My suspicion is that you've stumbled onto a deeper issue with the design of constexpr canonicalization. I think it would be good to raise this on llvm-dev for further discussion. I suspect there's a better design here which could address InstCombine and all other users of ConstantFolding in a single go. In particular, it *really* bothers me that ConstantFolding is doing a recursive search in this case.

p.s. If you want to split out the reordering change, I'd be happy to sign off on that part without waiting for the broader design discussion. Just make sure you make the change in the main loop as well. That's obviously a good idea.

Resigning from a stale review. Feel free to re-add if thread ever revived.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2020, 9:08 AM

nikic mentioned this in D75634: [ConstantFolding] Accept FoldedOps cache as parameter.Mar 4 2020, 12:38 PM

I know I'm a bit late here, but wondering if you have any IR / test case which can be used for perf testing? I've created D75634 as a different approach to this issue, but don't have a test case for this myself.

Revision Contents

Path

Size

include/

llvm/

Analysis/

ConstantFolding.h

5 lines

lib/

Analysis/

ConstantFolding.cpp

9 lines

Transforms/

InstCombine/

InstructionCombining.cpp

24 lines

Diff 50624

include/llvm/Analysis/ConstantFolding.h

Context not available.
	/// Note that this fails if not all of the operands are constant. Otherwise,	/// Note that this fails if not all of the operands are constant. Otherwise,
	/// this function can only fail when attempting to fold instructions like loads	/// this function can only fail when attempting to fold instructions like loads
	/// and stores, which have no constant expression form.	/// and stores, which have no constant expression form.
		/// Only attempt to constant-fold ConstantExpr operands if FoldOperands
		/// is set to true; set this to false if they are already known to be folded.
	Constant ConstantFoldInstruction(Instruction I, const DataLayout &DL,	Constant ConstantFoldInstruction(Instruction I, const DataLayout &DL,
	const TargetLibraryInfo *TLI = nullptr);	const TargetLibraryInfo *TLI = nullptr,
		bool FoldOperands = true);

	/// ConstantFoldConstantExpression - Attempt to fold the constant expression	/// ConstantFoldConstantExpression - Attempt to fold the constant expression
	/// using the specified DataLayout. If successful, the constant result is	/// using the specified DataLayout. If successful, the constant result is
Context not available.

lib/Analysis/ConstantFolding.cpp

Context not available.
	/// this function can only fail when attempting to fold instructions like loads	/// this function can only fail when attempting to fold instructions like loads
	/// and stores, which have no constant expression form.	/// and stores, which have no constant expression form.
	Constant llvm::ConstantFoldInstruction(Instruction I, const DataLayout &DL,	Constant llvm::ConstantFoldInstruction(Instruction I, const DataLayout &DL,
	const TargetLibraryInfo *TLI) {	const TargetLibraryInfo *TLI,
		bool FoldOperands) {
	// Handle PHI nodes quickly here...	// Handle PHI nodes quickly here...
	if (PHINode *PN = dyn_cast<PHINode>(I)) {	if (PHINode *PN = dyn_cast<PHINode>(I)) {
	Constant *CommonValue = nullptr;	Constant *CommonValue = nullptr;
Context not available.
	if (!C)	if (!C)
	return nullptr;	return nullptr;
	// Fold the PHI's operands.	// Fold the PHI's operands.
	if (ConstantExpr *NewC = dyn_cast<ConstantExpr>(C))	ConstantExpr *NewC = dyn_cast<ConstantExpr>(C);
		if (NewC && FoldOperands)
	C = ConstantFoldConstantExpression(NewC, DL, TLI);	C = ConstantFoldConstantExpression(NewC, DL, TLI);
	// If the incoming value is a different constant to	// If the incoming value is a different constant to
	// the one we saw previously, then give up.	// the one we saw previously, then give up.
Context not available.
	for (User::op_iterator i = I->op_begin(), e = I->op_end(); i != e; ++i) {	for (User::op_iterator i = I->op_begin(), e = I->op_end(); i != e; ++i) {
	Constant Op = cast<Constant>(i);	Constant Op = cast<Constant>(i);
	// Fold the Instruction's operands.	// Fold the Instruction's operands.
	if (ConstantExpr *NewCE = dyn_cast<ConstantExpr>(Op))	ConstantExpr *NewCE = dyn_cast<ConstantExpr>(Op);
		if (NewCE && FoldOperands)
	Op = ConstantFoldConstantExpression(NewCE, DL, TLI);	Op = ConstantFoldConstantExpression(NewCE, DL, TLI);

	Ops.push_back(Op);	Ops.push_back(Op);
Context not available.

lib/Transforms/InstCombine/InstructionCombining.cpp

Context not available.
	continue;	continue;
	}	}

	// ConstantProp instruction if trivially constant.
	if (!Inst->use_empty() &&
	(Inst->getNumOperands() == 0 \|\| isa<Constant>(Inst->getOperand(0))))
	if (Constant *C = ConstantFoldInstruction(Inst, DL, TLI)) {
	DEBUG(dbgs() << "IC: ConstFold to: " << *C << " from: "
	<< *Inst << '\n');
	Inst->replaceAllUsesWith(C);
	++NumConstProp;
	Inst->eraseFromParent();
	continue;
	}

	// See if we can constant fold its operands.	// See if we can constant fold its operands.
	for (User::op_iterator i = Inst->op_begin(), e = Inst->op_end(); i != e;	for (User::op_iterator i = Inst->op_begin(), e = Inst->op_end(); i != e;
	++i) {	++i) {
Context not available.
	}	}
	}	}

		// ConstantProp instruction if trivially constant.
		if (!Inst->use_empty() &&
		(Inst->getNumOperands() == 0 \|\| isa<Constant>(Inst->getOperand(0))))
		if (Constant *C = ConstantFoldInstruction(Inst, DL, TLI, false )) {
		DEBUG(dbgs() << "IC: ConstFold to: " << *C << " from: "
		<< *Inst << '\n');
		Inst->replaceAllUsesWith(C);
		++NumConstProp;
		Inst->eraseFromParent();
		continue;
		}

	InstrsForInstCombineWorklist.push_back(Inst);	InstrsForInstCombineWorklist.push_back(Inst);
	}	}

Context not available.