This is an archive of the discontinued LLVM Phabricator instance.

[Reassociate]: Add intermediate subtract instructions created while negating to be redone later for more reassociate opportunities
ClosedPublic

Authored by aditya_nandakumar on Aug 25 2015, 4:21 PM.

Download Raw Diff

Details

Reviewers

chandlerc
majnemer
• dberlin
llvm-commits
mcrosier

Summary

This is tackling the same issue as in http://reviews.llvm.org/D12096. Reassociate is currently unable to simplify expressions such as (2 * b - (5 * a - 3 * b))
As David Majnemer pointed out, running reassociate twice did simplify the same.
Redoing the intermediate instructions created while breaking up a subtract (Negating) can open up more opportunities for reassociation and in this case simplifies the above expression to 5 * (b - a)

Diff Detail

Repository: rL LLVM

Event Timeline

aditya_nandakumar updated this revision to Diff 33149.Aug 25 2015, 4:21 PM

aditya_nandakumar retitled this revision from to [Reassociate]: Add intermediate subtract instructions created while negating to be redone later for more reassociate opportunities.

aditya_nandakumar updated this object.

aditya_nandakumar added reviewers: llvm-commits, majnemer.

aditya_nandakumar set the repository for this revision to rL LLVM.

This approach looks reasonable to me but I'd appreciate it if @dberlin or @chandlerc could take a look.

Thanks for working on this, Aditya. I tend to agree with David; I much prefer this solution over the InstCombine equivalent. I added a few minor nits, but overall this looks good.

lib/Transforms/Scalar/Reassociate.cpp
896	Perhaps, /// Also add intermediate instructions to the redo list that are modified while pushing the negates through adds. These will be revisited to see if additional opportunities have been exposed.
936	open up -> expose
2106	Perhaps something like: // If the negate was simplified, revisit the users to see if we can reassociate further.
2130	Perhaps something like: // If the negate was simplified, revisit the users to see if we can reassociate further.
test/Transforms/Reassociate/reassoc-intermediate-fnegs.ll
2	Please use the CHECK-LABEL directive.
17	CHECK-LABEL:

Thanks Chad. I will fix the comments shortly.
I think I might have found another convergence issue with reassociate. I expect when the pass finishes, running reassociate again should not make any changes (the output should already be in canonicalized form - please correct me if this is not a valid expectation). It currently takes three runs of reassociate for the output to converge while running secondary.ll (above change). Should this also be tackled in this change?

In D12345#233382, @aditya_nandakumar wrote:

I expect when the pass finishes, running reassociate again should not make any changes (the output should already be in canonicalized form - please correct me if this is not a valid expectation).

I think this should be the goal, but, as you're finding out, this isn't reality. IIRC, a similar question was asked and I believe David provided a similar comment. The approach you're taking seems to be moving us closer and that's a good thing. We just need to make sure we're not going overboard; we should only revisit things that have changed and only when that change is likely to expose other opportunities.

I see. I'll try to see what it takes for the above case to converge in one iteration.
The reason I asked is because I see reduction in instruction count (assembly) in several tests when I change the pass pipeline to have 2 reassociates(consecutive) vs just one. I will try and narrow down the patterns/cases which the second reassociate is exposing and/or cases which instcombine is missing.

First, you can get it to converge in O(size of the largest SCC of the
expressions being evaluated). Right now, reassociate does not look
through phi nodes, so that size will be 1 :)
Thus, it is possible to converge in one iteration with the right ordering.

Second, rather than spend lots of time on that, i would suggest other
approaches, Jingyu recently suggested a global reassociation algorithm
(https://docs.google.com/document/d/1momWzKFf4D6h8H3YlfgKQ3qeZy5ayvMRh6yR-Xn2hUE/edit#heading=h.pc7256itmioz)

(These are extensions of the existing n-ary reassociate)

This is likely a much better approach than trying to make the local
reassociation pass that reprocess things repeatedly, or even once,
because the *output* will be better :)

In particular,
A. I expect what is there now will not fixpoint in all cases, you'd
have to fix some things.
B. We already know the heuristics the local reassociation uses are
not only "bad for CSE" in a lot of cases, we know they are optimally
bad in a lot of cases

(IE it will transform things into the least canonical form).

For example:

; foo(a + c);
; foo((a + (b + c));

Reassociate on both:

RAIn: add i32 [ %a, #3] [ %c, #5]
RAOut: add i32 [ %c, #5] [ %a, #3]

and
for the second:
RAIn: add i32 [ %a, #3] [ %b, #4] [ %c, #5]
RAOut: add i32 [ %c, #5] [ %b, #4] [ %a, #3]

(IE c+ a, (c + b) + a)

The longer the expression, the worse it gets.

It is not possible to fix this without a global view of the
expressions, because you need to know "how i i pair the expressions
the last time i saw them" so you can pair them the same way.

Local reassociate, being a local algorithm, only has info about the
current expression chain, and thus, can't do something like this.

TL; DR While this patch seems great, doing a lot of work on local
reassociate is probably a mistake. Even if you get it to converge in
one iteration (which should be possible given processing), it'll still
give not great results in a lot of cases.

This was also all discussed a few times on the mailing list, if you
look back over threads mentioning reassociate.

Thanks Daniel. I definitely missed that conversation on the mailing list and I see the drawbacks of the local reassociate. I'll try seeing what little needs to be done to converge in the couple cases that I have (one above) and a superficial look at why the second reassociate improves codegen.
Is there already an implementation for global reassociate?

Ping? Was there any further feedback on this? We're seeing pretty horrible regressions caused by this.

I'm confused ;-)

The last status i saw was: "I'll try seeing what little needs to be done to
converge in the couple cases that I have (one above) and a superficial look
at why the second reassociate improves codegen."

I saw no update on that ;-)

and
"Is there already an implementation for global reassociate?"

Which i missed.

The answer to this is yes". N-Ary reassociate is already in tree, and
making it "better" shouldn't be that hard (if it turns out to be hard,
great, let's hack up local reassociate if we have to)

In D12345#288471, @dberlin wrote:

The last status i saw was: "I'll try seeing what little needs to be done to
converge in the couple cases that I have (one above) and a superficial look
at why the second reassociate improves codegen."

I saw no update on that ;-)

Would it be reasonable to stage that investigation? AFAICT the changes already proposed (with Chad's comments incorporated) are a strict improvement, and I at least am seeing pretty severe performance regressions from their absence. Would you be alright with going ahead and landing those?

and
"Is there already an implementation for global reassociate?"

Which i missed.

The answer to this is yes". N-Ary reassociate is already in tree, and
making it "better" shouldn't be that hard (if it turns out to be hard,
great, let's hack up local reassociate if we have to)

For my use case, at least, N-ary reassociation is not really appropriate as I also heavily depend on reassociation of floating point arithmetic. So, I'm stuck with local reassociation, and the problem described here is making today's LLVM integer factors worse than last year's for me.

Thanks Owen. Sorry about not updating this. This patch caused some regressions on some tests and I hadn't fully figured out/isolated the regression.

In D12345#301762, @aditya_nandakumar wrote:

Thanks Owen. Sorry about not updating this. This patch caused some regressions on some tests and I hadn't fully figured out/isolated the regression.

What kinds of regressions? Can you replicate these by running Reassociate twice in an otherwise-standard pipeline? We really should get this figured out.

Danny, are you proposing that we extend N-ary reassociation to work on floating-point values?

There were both improvements in instruction count as well as reduction with this patch on our internal test suite.

Modifying our pass pipeline to do reassociate twice resulted in some differences in Instruction count. On investigating why the second reassociate made a difference, I found that sometimes, when we revisit instructions, valid instruction tree roots don't get simplified (for eg factorize) when they get revisited before dead instructions as there are additional uses(false).
This patch tries to erase dead instructions before we try and redo the instructions. This improves codegen slightly (lesser instruction count) and takes Reassociate pass closer to being Idempotent.

mcrosier added inline comments.Dec 23 2015, 7:13 AM

lib/Transforms/Scalar/Reassociate.cpp
623	I assume this condition was removed because it will always be true, correct? If so, please commit this in isolation.
1943	How about: Remove dead instructions and if any operands are trivially dead add them to Insts so they will be removed as well.
1946	Ins -> Insts
1948	Can we use a for loop to loop over all the Instruction operands, rather than pushing/poping each operand onto/off of a SetVector?
2275	Please add a period. Comments should be written with proper capitalization, punctuation, etc.
2277	How about something like: Iterate over all instructions to be reevaluated and remove trivially dead instructions. If any operand of the trivially dead instruction becomes dead mark it for deletion as well. Continue this process until all trivially dead instructions have been removed.
2279	Please don't add extra curly brackets.
2289	Please add a period.

aditya_nandakumar marked 7 inline comments as done.Dec 23 2015, 11:09 AM

aditya_nandakumar added inline comments.

lib/Transforms/Scalar/Reassociate.cpp
623	Yes - I'll remove it and commit it separately

Updated based on feedback.

LGTM once the minor nits have been fixed.

lib/Transforms/Scalar/Reassociate.cpp
198	To conform to the surrounding coding style, would you mind adding argument names?
test/Transforms/Reassociate/factorize-again.ll
2 ↗	(On Diff #43550)	You can probably drop this comment.
4 ↗	(On Diff #43550)	Please use a CHECK-LABEL directive.
27 ↗	(On Diff #43550)	Drop comment
28 ↗	(On Diff #43550)	Remove dead "#0"

This revision is now accepted and ready to land.Dec 29 2015, 7:53 AM

Committed in r256773.

Aditya,
Once committed (r256773) please be sure to close the Phabricator review.

Chad

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

Reassociate.cpp

49 lines

test/

Transforms/

Reassociate/

fast-ReassociateVector.ll

10 lines

2 lines

4 lines

6 lines

6 lines

reassoc-intermediate-fnegs.ll

30 lines

secondary.ll

2 lines

xor_reassoc.ll

4 lines

Diff 33149

lib/Transforms/Scalar/Reassociate.cpp

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	bool CombineXorOpnd(Instruction I, XorOpnd Opnd1, XorOpnd *Opnd2,
APInt &ConstOpnd, Value *&Res);		APInt &ConstOpnd, Value *&Res);
bool collectMultiplyFactors(SmallVectorImpl<ValueEntry> &Ops,		bool collectMultiplyFactors(SmallVectorImpl<ValueEntry> &Ops,
SmallVectorImpl<Factor> &Factors);		SmallVectorImpl<Factor> &Factors);
Value *buildMinimalMultiplyDAG(IRBuilder<> &Builder,		Value *buildMinimalMultiplyDAG(IRBuilder<> &Builder,
SmallVectorImpl<Factor> &Factors);		SmallVectorImpl<Factor> &Factors);
Value OptimizeMul(BinaryOperator I, SmallVectorImpl<ValueEntry> &Ops);		Value OptimizeMul(BinaryOperator I, SmallVectorImpl<ValueEntry> &Ops);
Value RemoveFactorFromExpression(Value V, Value *Factor);		Value RemoveFactorFromExpression(Value V, Value *Factor);
void EraseInst(Instruction *I);		void EraseInst(Instruction *I);
void OptimizeInst(Instruction *I);		void OptimizeInst(Instruction *I);
		mcrosierUnsubmitted Not Done Reply Inline Actions To conform to the surrounding coding style, would you mind adding argument names? mcrosier: To conform to the surrounding coding style, would you mind adding argument names?
Instruction canonicalizeNegConstExpr(Instruction I);		Instruction canonicalizeNegConstExpr(Instruction I);
};		};
}		}

XorOpnd::XorOpnd(Value *V) {		XorOpnd::XorOpnd(Value *V) {
assert(!isa<ConstantInt>(V) && "No ConstantInt");		assert(!isa<ConstantInt>(V) && "No ConstantInt");
OrigVal = V;		OrigVal = V;
Instruction *I = dyn_cast<Instruction>(V);		Instruction *I = dyn_cast<Instruction>(V);
▲ Show 20 Lines • Show All 408 Lines • ▼ Show 20 Lines	for (unsigned OpIdx = 0; OpIdx < 2; ++OpIdx) { // Visit operands.
// This value has uses not accounted for by the expression, so it is		// This value has uses not accounted for by the expression, so it is
// not safe to modify. Mark it as being a leaf.		// not safe to modify. Mark it as being a leaf.
DEBUG(dbgs() << "ADD USES LEAF: " << *Op << " (" << Weight << ")\n");		DEBUG(dbgs() << "ADD USES LEAF: " << *Op << " (" << Weight << ")\n");
LeafOrder.push_back(Op);		LeafOrder.push_back(Op);
Leaves[Op] = Weight;		Leaves[Op] = Weight;
continue;		continue;
}		}
// No uses outside the expression, try morphing it.		// No uses outside the expression, try morphing it.
} else if (It != Leaves.end()) {		} else if (It != Leaves.end()) {
		mcrosierUnsubmitted Not Done Reply Inline Actions I assume this condition was removed because it will always be true, correct? If so, please commit this in isolation. mcrosier: I assume this condition was removed because it will always be true, correct? If so, please…
		aditya_nandakumarAuthorUnsubmitted Not Done Reply Inline Actions Yes - I'll remove it and commit it separately aditya_nandakumar: Yes - I'll remove it and commit it separately
// Already in the leaf map.		// Already in the leaf map.
assert(Visited.count(Op) && "In leaf map but not visited!");		assert(Visited.count(Op) && "In leaf map but not visited!");

// Update the number of paths to the leaf.		// Update the number of paths to the leaf.
IncorporateWeight(It->second, Weight, Opcode);		IncorporateWeight(It->second, Weight, Opcode);

#if 0 // TODO: Re-enable once PR13021 is fixed.		#if 0 // TODO: Re-enable once PR13021 is fixed.
// The leaf already has one use from inside the expression. As we want		// The leaf already has one use from inside the expression. As we want
▲ Show 20 Lines • Show All 256 Lines • ▼ Show 20 Lines	void Reassociate::RewriteExprTree(BinaryOperator *I,
for (unsigned i = 0, e = NodesToRewrite.size(); i != e; ++i)		for (unsigned i = 0, e = NodesToRewrite.size(); i != e; ++i)
RedoInsts.insert(NodesToRewrite[i]);		RedoInsts.insert(NodesToRewrite[i]);
}		}

/// Insert instructions before the instruction pointed to by BI,		/// Insert instructions before the instruction pointed to by BI,
/// that computes the negative version of the value specified. The negative		/// that computes the negative version of the value specified. The negative
/// version of the value is returned, and BI is left pointing at the instruction		/// version of the value is returned, and BI is left pointing at the instruction
/// that should be processed next by the reassociation pass.		/// that should be processed next by the reassociation pass.
static Value NegateValue(Value V, Instruction *BI) {		/// Also adds the intermediate instructions that it creates to be redone at the
		mcrosierUnsubmitted Not Done Reply Inline Actions Perhaps, /// Also add intermediate instructions to the redo list that are modified while pushing the negates through adds. These will be revisited to see if additional opportunities have been exposed. mcrosier: Perhaps, /// Also add intermediate instructions to the redo list that are modified while…
		/// end to see if we have more opportunities to reassociate.
		static Value NegateValue(Value V, Instruction *BI,
		SetVector<AssertingVH<Instruction>> &ToRedo) {
if (Constant *C = dyn_cast<Constant>(V)) {		if (Constant *C = dyn_cast<Constant>(V)) {
if (C->getType()->isFPOrFPVectorTy()) {		if (C->getType()->isFPOrFPVectorTy()) {
return ConstantExpr::getFNeg(C);		return ConstantExpr::getFNeg(C);
}		}
return ConstantExpr::getNeg(C);		return ConstantExpr::getNeg(C);
}		}


// We are trying to expose opportunity for reassociation. One of the things		// We are trying to expose opportunity for reassociation. One of the things
// that we want to do to achieve this is to push a negation as deep into an		// that we want to do to achieve this is to push a negation as deep into an
// expression chain as possible, to expose the add instructions. In practice,		// expression chain as possible, to expose the add instructions. In practice,
// this means that we turn this:		// this means that we turn this:
// X = -(A+12+C+D) into X = -A + -12 + -C + -D = -12 + -A + -C + -D		// X = -(A+12+C+D) into X = -A + -12 + -C + -D = -12 + -A + -C + -D
// so that later, a: Y = 12+X could get reassociated with the -12 to eliminate		// so that later, a: Y = 12+X could get reassociated with the -12 to eliminate
// the constants. We assume that instcombine will clean up the mess later if		// the constants. We assume that instcombine will clean up the mess later if
// we introduce tons of unnecessary negation instructions.		// we introduce tons of unnecessary negation instructions.
//		//
if (BinaryOperator *I =		if (BinaryOperator *I =
isReassociableOp(V, Instruction::Add, Instruction::FAdd)) {		isReassociableOp(V, Instruction::Add, Instruction::FAdd)) {
// Push the negates through the add.		// Push the negates through the add.
I->setOperand(0, NegateValue(I->getOperand(0), BI));		I->setOperand(0, NegateValue(I->getOperand(0), BI, ToRedo));
I->setOperand(1, NegateValue(I->getOperand(1), BI));		I->setOperand(1, NegateValue(I->getOperand(1), BI, ToRedo));
if (I->getOpcode() == Instruction::Add) {		if (I->getOpcode() == Instruction::Add) {
I->setHasNoUnsignedWrap(false);		I->setHasNoUnsignedWrap(false);
I->setHasNoSignedWrap(false);		I->setHasNoSignedWrap(false);
}		}

// We must move the add instruction here, because the neg instructions do		// We must move the add instruction here, because the neg instructions do
// not dominate the old add instruction in general. By moving it, we are		// not dominate the old add instruction in general. By moving it, we are
// assured that the neg instructions we just inserted dominate the		// assured that the neg instructions we just inserted dominate the
// instruction we are about to insert after them.		// instruction we are about to insert after them.
//		//
I->moveBefore(BI);		I->moveBefore(BI);
I->setName(I->getName()+".neg");		I->setName(I->getName()+".neg");

		// Add the intermediate negates to the redo list as processing them later
		// could open up more reassociating opportunities.
		mcrosierUnsubmitted Not Done Reply Inline Actions open up -> expose mcrosier: open up -> expose
		ToRedo.insert(I);
return I;		return I;
}		}

// Okay, we need to materialize a negated version of V with an instruction.		// Okay, we need to materialize a negated version of V with an instruction.
// Scan the use lists of V to see if we have one already.		// Scan the use lists of V to see if we have one already.
for (User *U : V->users()) {		for (User *U : V->users()) {
if (!BinaryOperator::isNeg(U) && !BinaryOperator::isFNeg(U))		if (!BinaryOperator::isNeg(U) && !BinaryOperator::isFNeg(U))
continue;		continue;
Show All 24 Lines	for (User *U : V->users()) {
}		}
TheNeg->moveBefore(InsertPt);		TheNeg->moveBefore(InsertPt);
if (TheNeg->getOpcode() == Instruction::Sub) {		if (TheNeg->getOpcode() == Instruction::Sub) {
TheNeg->setHasNoUnsignedWrap(false);		TheNeg->setHasNoUnsignedWrap(false);
TheNeg->setHasNoSignedWrap(false);		TheNeg->setHasNoSignedWrap(false);
} else {		} else {
TheNeg->andIRFlags(BI);		TheNeg->andIRFlags(BI);
}		}
		ToRedo.insert(TheNeg);
return TheNeg;		return TheNeg;
}		}

// Insert a 'neg' instruction that subtracts the value from zero to get the		// Insert a 'neg' instruction that subtracts the value from zero to get the
// negation.		// negation.
return CreateNeg(V, V->getName() + ".neg", BI, BI);		BinaryOperator *NewNeg = CreateNeg(V, V->getName() + ".neg", BI, BI);
		ToRedo.insert(NewNeg);
		return NewNeg;
}		}

/// Return true if we should break up this subtract of X-Y into (X + -Y).		/// Return true if we should break up this subtract of X-Y into (X + -Y).
static bool ShouldBreakUpSubtract(Instruction *Sub) {		static bool ShouldBreakUpSubtract(Instruction *Sub) {
// If this is a negation, we can't split it up!		// If this is a negation, we can't split it up!
if (BinaryOperator::isNeg(Sub) \|\| BinaryOperator::isFNeg(Sub))		if (BinaryOperator::isNeg(Sub) \|\| BinaryOperator::isFNeg(Sub))
return false;		return false;

Show All 17 Lines	if (Sub->hasOneUse() &&
isReassociableOp(VB, Instruction::Sub, Instruction::FSub)))		isReassociableOp(VB, Instruction::Sub, Instruction::FSub)))
return true;		return true;

return false;		return false;
}		}

/// If we have (X-Y), and if either X is an add, or if this is only used by an		/// If we have (X-Y), and if either X is an add, or if this is only used by an
/// add, transform this into (X+(0-Y)) to promote better reassociation.		/// add, transform this into (X+(0-Y)) to promote better reassociation.
static BinaryOperator BreakUpSubtract(Instruction Sub) {		static BinaryOperator *
		BreakUpSubtract(Instruction *Sub, SetVector<AssertingVH<Instruction>> &ToRedo) {
// Convert a subtract into an add and a neg instruction. This allows sub		// Convert a subtract into an add and a neg instruction. This allows sub
// instructions to be commuted with other add instructions.		// instructions to be commuted with other add instructions.
//		//
// Calculate the negative value of Operand 1 of the sub instruction,		// Calculate the negative value of Operand 1 of the sub instruction,
// and set it as the RHS of the add instruction we just made.		// and set it as the RHS of the add instruction we just made.
//		//
Value *NegVal = NegateValue(Sub->getOperand(1), Sub);		Value *NegVal = NegateValue(Sub->getOperand(1), Sub, ToRedo);
BinaryOperator *New = CreateAdd(Sub->getOperand(0), NegVal, "", Sub, Sub);		BinaryOperator *New = CreateAdd(Sub->getOperand(0), NegVal, "", Sub, Sub);
Sub->setOperand(0, Constant::getNullValue(Sub->getType())); // Drop use of op.		Sub->setOperand(0, Constant::getNullValue(Sub->getType())); // Drop use of op.
Sub->setOperand(1, Constant::getNullValue(Sub->getType())); // Drop use of op.		Sub->setOperand(1, Constant::getNullValue(Sub->getType())); // Drop use of op.
New->takeName(Sub);		New->takeName(Sub);

// Everyone now refers to the add instruction.		// Everyone now refers to the add instruction.
Sub->replaceAllUsesWith(New);		Sub->replaceAllUsesWith(New);
New->setDebugLoc(Sub->getDebugLoc());		New->setDebugLoc(Sub->getDebugLoc());
▲ Show 20 Lines • Show All 898 Lines • ▼ Show 20 Lines	case Instruction::FMul:
break;		break;
}		}

if (Ops.size() != NumOps)		if (Ops.size() != NumOps)
return OptimizeExpression(I, Ops);		return OptimizeExpression(I, Ops);
return nullptr;		return nullptr;
}		}

/// Zap the given instruction, adding interesting operands to the work list.		/// Zap the given instruction, adding interesting operands to the work list.
		mcrosierUnsubmitted Done Reply Inline Actions How about: Remove dead instructions and if any operands are trivially dead add them to Insts so they will be removed as well. mcrosier: How about: // Remove dead instructions and if any operands are trivially dead add them to //…
void Reassociate::EraseInst(Instruction *I) {		void Reassociate::EraseInst(Instruction *I) {
assert(isInstructionTriviallyDead(I) && "Trivially dead instructions only!");		assert(isInstructionTriviallyDead(I) && "Trivially dead instructions only!");
SmallVector<Value*, 8> Ops(I->op_begin(), I->op_end());		SmallVector<Value*, 8> Ops(I->op_begin(), I->op_end());
		mcrosierUnsubmitted Done Reply Inline Actions Ins -> Insts mcrosier: Ins -> Insts
// Erase the dead instruction.		// Erase the dead instruction.
ValueRankMap.erase(I);		ValueRankMap.erase(I);
		mcrosierUnsubmitted Done Reply Inline Actions Can we use a for loop to loop over all the Instruction operands, rather than pushing/poping each operand onto/off of a SetVector? mcrosier: Can we use a for loop to loop over all the Instruction operands, rather than pushing/poping…
RedoInsts.remove(I);		RedoInsts.remove(I);
I->eraseFromParent();		I->eraseFromParent();
// Optimize its operands.		// Optimize its operands.
SmallPtrSet<Instruction *, 8> Visited; // Detect self-referential nodes.		SmallPtrSet<Instruction *, 8> Visited; // Detect self-referential nodes.
for (unsigned i = 0, e = Ops.size(); i != e; ++i)		for (unsigned i = 0, e = Ops.size(); i != e; ++i)
if (Instruction *Op = dyn_cast<Instruction>(Ops[i])) {		if (Instruction *Op = dyn_cast<Instruction>(Ops[i])) {
// If this is a node in an expression tree, climb to the expression root		// If this is a node in an expression tree, climb to the expression root
// and add that since that's where optimization actually happens.		// and add that since that's where optimization actually happens.
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	void Reassociate::OptimizeInst(Instruction *I) {
// optimized for the most likely conditions.		// optimized for the most likely conditions.
if (I->getType()->isIntegerTy(1))		if (I->getType()->isIntegerTy(1))
return;		return;

// If this is a subtract instruction which is not already in negate form,		// If this is a subtract instruction which is not already in negate form,
// see if we can convert it to X+-Y.		// see if we can convert it to X+-Y.
if (I->getOpcode() == Instruction::Sub) {		if (I->getOpcode() == Instruction::Sub) {
if (ShouldBreakUpSubtract(I)) {		if (ShouldBreakUpSubtract(I)) {
Instruction *NI = BreakUpSubtract(I);		Instruction *NI = BreakUpSubtract(I, RedoInsts);
RedoInsts.insert(I);		RedoInsts.insert(I);
MadeChange = true;		MadeChange = true;
I = NI;		I = NI;
} else if (BinaryOperator::isNeg(I)) {		} else if (BinaryOperator::isNeg(I)) {
// Otherwise, this is a negation. See if the operand is a multiply tree		// Otherwise, this is a negation. See if the operand is a multiply tree
// and if this is not an inner node of a multiply tree.		// and if this is not an inner node of a multiply tree.
if (isReassociableOp(I->getOperand(1), Instruction::Mul) &&		if (isReassociableOp(I->getOperand(1), Instruction::Mul) &&
(!I->hasOneUse() \|\|		(!I->hasOneUse() \|\|
!isReassociableOp(I->user_back(), Instruction::Mul))) {		!isReassociableOp(I->user_back(), Instruction::Mul))) {
Instruction *NI = LowerNegateToMultiply(I);		Instruction *NI = LowerNegateToMultiply(I);
		// If the negate got simplified, add the users to be
		mcrosierUnsubmitted Not Done Reply Inline Actions Perhaps something like: // If the negate was simplified, revisit the users to see if we can reassociate further. mcrosier: Perhaps something like: // If the negate was simplified, revisit the users to see if…
		// redone later to see if we can reassociate further (especially
		// when we optimizing this instruction while redoing)
		for (User *U : NI->users()) {
		if (BinaryOperator *Tmp = dyn_cast<BinaryOperator>(U))
		RedoInsts.insert(Tmp);
		}
RedoInsts.insert(I);		RedoInsts.insert(I);
MadeChange = true;		MadeChange = true;
I = NI;		I = NI;
}		}
}		}
} else if (I->getOpcode() == Instruction::FSub) {		} else if (I->getOpcode() == Instruction::FSub) {
if (ShouldBreakUpSubtract(I)) {		if (ShouldBreakUpSubtract(I)) {
Instruction *NI = BreakUpSubtract(I);		Instruction *NI = BreakUpSubtract(I, RedoInsts);
RedoInsts.insert(I);		RedoInsts.insert(I);
MadeChange = true;		MadeChange = true;
I = NI;		I = NI;
} else if (BinaryOperator::isFNeg(I)) {		} else if (BinaryOperator::isFNeg(I)) {
// Otherwise, this is a negation. See if the operand is a multiply tree		// Otherwise, this is a negation. See if the operand is a multiply tree
// and if this is not an inner node of a multiply tree.		// and if this is not an inner node of a multiply tree.
if (isReassociableOp(I->getOperand(1), Instruction::FMul) &&		if (isReassociableOp(I->getOperand(1), Instruction::FMul) &&
(!I->hasOneUse() \|\|		(!I->hasOneUse() \|\|
!isReassociableOp(I->user_back(), Instruction::FMul))) {		!isReassociableOp(I->user_back(), Instruction::FMul))) {
		// If the negate got simplified, add the users to be
		mcrosierUnsubmitted Not Done Reply Inline Actions Perhaps something like: // If the negate was simplified, revisit the users to see if we can reassociate further. mcrosier: Perhaps something like: // If the negate was simplified, revisit the users to see if we can…
		// redone later to see if we can reassociate further (especially
		// when we optimizing this instruction while redoing)
Instruction *NI = LowerNegateToMultiply(I);		Instruction *NI = LowerNegateToMultiply(I);
		for (User *U : NI->users()) {
		if (BinaryOperator *Tmp = dyn_cast<BinaryOperator>(U))
		RedoInsts.insert(Tmp);
		}
RedoInsts.insert(I);		RedoInsts.insert(I);
MadeChange = true;		MadeChange = true;
I = NI;		I = NI;
}		}
}		}
}		}

// If this instruction is an associative binary operator, process it.		// If this instruction is an associative binary operator, process it.
if (!I->isAssociative()) return;		if (!I->isAssociative()) return;
BinaryOperator *BO = cast<BinaryOperator>(I);		BinaryOperator *BO = cast<BinaryOperator>(I);

// If this is an interior node of a reassociable tree, ignore it until we		// If this is an interior node of a reassociable tree, ignore it until we
// get to the root of the tree, to avoid N^2 analysis.		// get to the root of the tree, to avoid N^2 analysis.
unsigned Opcode = BO->getOpcode();		unsigned Opcode = BO->getOpcode();
if (BO->hasOneUse() && BO->user_back()->getOpcode() == Opcode)		if (BO->hasOneUse() && BO->user_back()->getOpcode() == Opcode) {
		// During the initial run we will get to the root of the tree.
		// But if we get here while we are redoing instructions, there is no
		// guarantee that the root will be visited. So Redo later
		if (BO->user_back() != BO)
		RedoInsts.insert(BO->user_back());
return;		return;
		}

// If this is an add tree that is used by a sub instruction, ignore it		// If this is an add tree that is used by a sub instruction, ignore it
// until we process the subtract.		// until we process the subtract.
if (BO->hasOneUse() && BO->getOpcode() == Instruction::Add &&		if (BO->hasOneUse() && BO->getOpcode() == Instruction::Add &&
cast<Instruction>(BO->user_back())->getOpcode() == Instruction::Sub)		cast<Instruction>(BO->user_back())->getOpcode() == Instruction::Sub)
return;		return;
if (BO->hasOneUse() && BO->getOpcode() == Instruction::FAdd &&		if (BO->hasOneUse() && BO->getOpcode() == Instruction::FAdd &&
cast<Instruction>(BO->user_back())->getOpcode() == Instruction::FSub)		cast<Instruction>(BO->user_back())->getOpcode() == Instruction::FSub)
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE; )
EraseInst(II++);		EraseInst(II++);
} else {		} else {
OptimizeInst(II);		OptimizeInst(II);
assert(II->getParent() == BI && "Moved to a different block!");		assert(II->getParent() == BI && "Moved to a different block!");
++II;		++II;
}		}

// If this produced extra instructions to optimize, handle them now.		// If this produced extra instructions to optimize, handle them now.
while (!RedoInsts.empty()) {		while (!RedoInsts.empty()) {
		mcrosierUnsubmitted Done Reply Inline Actions Please add a period. Comments should be written with proper capitalization, punctuation, etc. mcrosier: Please add a period. Comments should be written with proper capitalization, punctuation, etc.
Instruction *I = RedoInsts.pop_back_val();		Instruction *I = RedoInsts.pop_back_val();
if (isInstructionTriviallyDead(I))		if (isInstructionTriviallyDead(I))
		mcrosierUnsubmitted Done Reply Inline Actions How about something like: Iterate over all instructions to be reevaluated and remove trivially dead instructions. If any operand of the trivially dead instruction becomes dead mark it for deletion as well. Continue this process until all trivially dead instructions have been removed. mcrosier: How about something like: // Iterate over all instructions to be reevaluated and remove…
EraseInst(I);		EraseInst(I);
else		else
		mcrosierUnsubmitted Done Reply Inline Actions Please don't add extra curly brackets. mcrosier: Please don't add extra curly brackets.
OptimizeInst(I);		OptimizeInst(I);
}		}
}		}

// We are done with the rank map.		// We are done with the rank map.
RankMap.clear();		RankMap.clear();
ValueRankMap.clear();		ValueRankMap.clear();

return MadeChange;		return MadeChange;
}		}
		mcrosierUnsubmitted Done Reply Inline Actions Please add a period. mcrosier: Please add a period.

test/Transforms/Reassociate/fast-ReassociateVector.ll

Show All 10 Lines	; CHECK-NEXT: ret <4 x float> %tmp1
%mul1 = fmul fast <4 x float> %b, %c		%mul1 = fmul fast <4 x float> %b, %c
%add = fadd fast <4 x float> %mul, %mul1		%add = fadd fast <4 x float> %mul, %mul1
ret <4 x float> %add		ret <4 x float> %add
}		}

; Check that aab+aac is turned into a(a(b+c)).		; Check that aab+aac is turned into a(a(b+c)).
define <2 x float> @test2(<2 x float> %a, <2 x float> %b, <2 x float> %c) {		define <2 x float> @test2(<2 x float> %a, <2 x float> %b, <2 x float> %c) {
; CHECK-LABEL: @test2		; CHECK-LABEL: @test2
; CHECK-NEXT: fadd fast <2 x float> %c, %b		; CHECK-NEXT: [[TMP1:%tmp.*]] = fadd fast <2 x float> %c, %b
; CHECK-NEXT: fmul fast <2 x float> %a, %tmp2		; CHECK-NEXT: [[TMP2:%tmp.*]] = fmul fast <2 x float> %a, %a
; CHECK-NEXT: fmul fast <2 x float> %tmp3, %a		; CHECK-NEXT: fmul fast <2 x float> [[TMP2]], [[TMP1]]
; CHECK-NEXT: ret <2 x float>		; CHECK-NEXT: ret <2 x float>

%t0 = fmul fast <2 x float> %a, %b		%t0 = fmul fast <2 x float> %a, %b
%t1 = fmul fast <2 x float> %a, %t0		%t1 = fmul fast <2 x float> %a, %t0
%t2 = fmul fast <2 x float> %a, %c		%t2 = fmul fast <2 x float> %a, %c
%t3 = fmul fast <2 x float> %a, %t2		%t3 = fmul fast <2 x float> %a, %t2
%t4 = fadd fast <2 x float> %t1, %t3		%t4 = fadd fast <2 x float> %t1, %t3
ret <2 x float> %t4		ret <2 x float> %t4
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret <2 x float>
%e = fmul fast <2 x float> %a, %c		%e = fmul fast <2 x float> %a, %c
%f = fsub fast <2 x float> <float 0.000000e+00, float 0.000000e+00>, %e		%f = fsub fast <2 x float> <float 0.000000e+00, float 0.000000e+00>, %e
ret <2 x float> %f		ret <2 x float> %f
}		}

; Check xy+yx -> xy2.		; Check xy+yx -> xy2.
define <2 x double> @test11(<2 x double> %x, <2 x double> %y) {		define <2 x double> @test11(<2 x double> %x, <2 x double> %y) {
; CHECK-LABEL: @test11		; CHECK-LABEL: @test11
; CHECK-NEXT: %factor = fmul fast <2 x double> %y, <double 2.000000e+00, double 2.000000e+00>		; CHECK-NEXT: %factor = fmul fast <2 x double> %x, <double 2.000000e+00, double 2.000000e+00>
; CHECK-NEXT: %tmp1 = fmul fast <2 x double> %factor, %x		; CHECK-NEXT: %tmp1 = fmul fast <2 x double> %factor, %y
; CHECK-NEXT: ret <2 x double> %tmp1		; CHECK-NEXT: ret <2 x double> %tmp1

%1 = fmul fast <2 x double> %x, %y		%1 = fmul fast <2 x double> %x, %y
%2 = fmul fast <2 x double> %y, %x		%2 = fmul fast <2 x double> %y, %x
%3 = fadd fast <2 x double> %1, %2		%3 = fadd fast <2 x double> %1, %2
ret <2 x double> %3		ret <2 x double> %3
}		}

▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

test/Transforms/Reassociate/fast-basictest.ll

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	; CHECK: ret void
store float %t4, float* @ff		store float %t4, float* @ff
ret void		ret void
}		}

define float @test7(float %A, float %B, float %C) {		define float @test7(float %A, float %B, float %C) {
; CHECK-LABEL: @test7		; CHECK-LABEL: @test7
; CHECK-NEXT: fadd fast float %C, %B		; CHECK-NEXT: fadd fast float %C, %B
; CHECK-NEXT: fmul fast float %A, %A		; CHECK-NEXT: fmul fast float %A, %A
; CHECK-NEXT: fmul fast float %1, %tmp2		; CHECK-NEXT: fmul fast float %tmp3, %tmp2
; CHECK-NEXT: ret float		; CHECK-NEXT: ret float

%aa = fmul fast float %A, %A		%aa = fmul fast float %A, %A
%aab = fmul fast float %aa, %B		%aab = fmul fast float %aa, %B
%ac = fmul fast float %A, %C		%ac = fmul fast float %A, %C
%aac = fmul fast float %ac, %A		%aac = fmul fast float %ac, %A
%r = fadd fast float %aab, %aac		%r = fadd fast float %aab, %aac
ret float %r		ret float %r
▲ Show 20 Lines • Show All 166 Lines • Show Last 20 Lines

test/Transforms/Reassociate/fast-fp-commute.ll

Show All 27 Lines	; CHECK-NEXT: ret float %3
%1 = fmul fast float %x, %y		%1 = fmul fast float %x, %y
%2 = fmul fast float %y, %x		%2 = fmul fast float %y, %x
%3 = fsub fast float %1, %2		%3 = fsub fast float %1, %2
ret float %3		ret float %3
}		}

define float @test3(float %x, float %y) {		define float @test3(float %x, float %y) {
; CHECK-LABEL: test3		; CHECK-LABEL: test3
; CHECK-NEXT: %factor = fmul fast float %y, 2.000000e+00		; CHECK-NEXT: %factor = fmul fast float %x, 2.000000e+00
; CHECK-NEXT: %tmp1 = fmul fast float %factor, %x		; CHECK-NEXT: %tmp1 = fmul fast float %factor, %y
; CHECK-NEXT: ret float %tmp1		; CHECK-NEXT: ret float %tmp1

%1 = fmul fast float %x, %y		%1 = fmul fast float %x, %y
%2 = fmul fast float %y, %x		%2 = fmul fast float %y, %x
%3 = fadd fast float %1, %2		%3 = fadd fast float %1, %2
ret float %3		ret float %3
}		}

test/Transforms/Reassociate/fast-multistep.ll

	; RUN: opt < %s -reassociate -S \| FileCheck %s			; RUN: opt < %s -reassociate -S \| FileCheck %s

	define float @fmultistep1(float %a, float %b, float %c) {			define float @fmultistep1(float %a, float %b, float %c) {
	; Check that aab+aac is turned into a(a(b+c)).			; Check that aab+aac is turned into a(a(b+c)).
	; CHECK-LABEL: @fmultistep1			; CHECK-LABEL: @fmultistep1
	; CHECK-NEXT: fadd fast float %c, %b			; CHECK-NEXT: [[TMP1:%tmp.*]] = fadd fast float %c, %b
	; CHECK-NEXT: fmul fast float %a, %tmp2			; CHECK-NEXT: [[TMP2:%tmp.*]] = fmul fast float %a, %a
	; CHECK-NEXT: fmul fast float %tmp3, %a			; CHECK-NEXT: fmul fast float [[TMP2]], [[TMP1]]
	; CHECK-NEXT: ret float			; CHECK-NEXT: ret float

	%t0 = fmul fast float %a, %b			%t0 = fmul fast float %a, %b
	%t1 = fmul fast float %a, %t0 ; a(ab)			%t1 = fmul fast float %a, %t0 ; a(ab)
	%t2 = fmul fast float %a, %c			%t2 = fmul fast float %a, %c
	%t3 = fmul fast float %a, %t2 ; a(ac)			%t3 = fmul fast float %a, %t2 ; a(ac)
	%t4 = fadd fast float %t1, %t3			%t4 = fadd fast float %t1, %t3
	ret float %t4			ret float %t4
	Show All 16 Lines

test/Transforms/Reassociate/multistep.ll

	; RUN: opt < %s -reassociate -S \| FileCheck %s			; RUN: opt < %s -reassociate -S \| FileCheck %s

	define i64 @multistep1(i64 %a, i64 %b, i64 %c) {			define i64 @multistep1(i64 %a, i64 %b, i64 %c) {
	; Check that aab+aac is turned into a(a(b+c)).			; Check that aab+aac is turned into a(a(b+c)).
	; CHECK-LABEL: @multistep1(			; CHECK-LABEL: @multistep1(
	%t0 = mul i64 %a, %b			%t0 = mul i64 %a, %b
	%t1 = mul i64 %a, %t0 ; a(ab)			%t1 = mul i64 %a, %t0 ; a(ab)
	%t2 = mul i64 %a, %c			%t2 = mul i64 %a, %c
	%t3 = mul i64 %a, %t2 ; a(ac)			%t3 = mul i64 %a, %t2 ; a(ac)
	%t4 = add i64 %t1, %t3			%t4 = add i64 %t1, %t3
	; CHECK-NEXT: add i64 %c, %b			; CHECK-NEXT: [[TMP1:%tmp.*]] = add i64 %c, %b
	; CHECK-NEXT: mul i64 %a, %tmp{{.*}}			; CHECK-NEXT: [[TMP2:%tmp.*]] = mul i64 %a, %a
	; CHECK-NEXT: mul i64 %tmp{{.*}}, %a			; CHECK-NEXT: mul i64 [[TMP2]], [[TMP1]]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	ret i64 %t4			ret i64 %t4
	}			}

	define i64 @multistep2(i64 %a, i64 %b, i64 %c, i64 %d) {			define i64 @multistep2(i64 %a, i64 %b, i64 %c, i64 %d) {
	; Check that ab+ac+d is turned into a*(b+c)+d.			; Check that ab+ac+d is turned into a*(b+c)+d.
	; CHECK-LABEL: @multistep2(			; CHECK-LABEL: @multistep2(
	%t0 = mul i64 %a, %b			%t0 = mul i64 %a, %b
	Show All 9 Lines

test/Transforms/Reassociate/reassoc-intermediate-fnegs.ll

This file was added.

				; RUN: opt < %s -reassociate -S \| FileCheck %s
				; CHECK: faddsubAssoc1
				mcrosierUnsubmitted Not Done Reply Inline Actions Please use the CHECK-LABEL directive. mcrosier: Please use the CHECK-LABEL directive.
				; CHECK: [[TMP1:%tmp.*]] = fmul fast half %a, 0xH4500
				; CHECK: [[TMP2:%tmp.*]] = fmul fast half %b, 0xH4500
				; CHECK: fsub fast half [[TMP2]], [[TMP1]]
				; CHECK: ret
				; Input is A op (B op C)
				define half @faddsubAssoc1(half %a, half %b) {
				%tmp1 = fmul fast half %b, 0xH4200 ; 3*b
				%tmp2 = fmul fast half %a, 0xH4500 ; 5*a
				%tmp3 = fmul fast half %b, 0xH4000 ; 2*b
				%tmp4 = fsub fast half %tmp2, %tmp1 ; 5 * a - 3 * b
				%tmp5 = fsub fast half %tmp3, %tmp4 ; 2 * b - ( 5 * a - 3 * b)
				ret half %tmp5 ; = 5 * (b - a)
				}

				; CHECK: faddsubAssoc2
				mcrosierUnsubmitted Not Done Reply Inline Actions CHECK-LABEL: mcrosier: CHECK-LABEL:
				; CHECK: [[TMP1:%tmp.*]] = fmul fast half %a, 0xH4500
				; CHECK: [[TMP2:%tmp.*]] = fmul fast half %b, 0xH3C00
				; CHECK: fadd fast half [[TMP2]], [[TMP1]]
				; CHECK: ret
				; Input is (A op B) op C
				define half @faddsubAssoc2(half %a, half %b) {
				%tmp1 = fmul fast half %b, 0xH4200 ; 3*b
				%tmp2 = fmul fast half %a, 0xH4500 ; 5*a
				%tmp3 = fmul fast half %b, 0xH4000 ; 2*b
				%tmp4 = fadd fast half %tmp2, %tmp1 ; 5 * a + 3 * b
				%tmp5 = fsub fast half %tmp4, %tmp3 ; (5 * a + 3 * b) - (2 * b)
				ret half %tmp5 ; = 5 * a + b
				}

test/Transforms/Reassociate/secondary.ll

	; RUN: opt -S -reassociate < %s \| FileCheck %s			; RUN: opt -S -reassociate < %s \| FileCheck %s
	; rdar://9167457			; rdar://9167457

	; Reassociate shouldn't break this testcase involving a secondary			; Reassociate shouldn't break this testcase involving a secondary
	; reassociation.			; reassociation.

	; CHECK: define			; CHECK: define
	; CHECK-NOT: undef			; CHECK-NOT: undef
	; CHECK: %factor = mul i32 %tmp3, -2			; CHECK: %factor = mul i32 %tmp3.neg, 2
	; CHECK-NOT: undef			; CHECK-NOT: undef
	; CHECK: }			; CHECK: }

	define void @x0f2f640ab6718391b59ce96d9fdeda54(i32 %arg, i32 %arg1, i32 %arg2, i32* %.out) nounwind {			define void @x0f2f640ab6718391b59ce96d9fdeda54(i32 %arg, i32 %arg1, i32 %arg2, i32* %.out) nounwind {
	_:			_:
	%tmp = sub i32 %arg, %arg1			%tmp = sub i32 %arg, %arg1
	%tmp3 = mul i32 %tmp, -1268345047			%tmp3 = mul i32 %tmp, -1268345047
	%tmp4 = add i32 %tmp3, 2014710503			%tmp4 = add i32 %tmp3, 2014710503
	%tmp5 = add i32 %tmp3, -1048397418			%tmp5 = add i32 %tmp3, -1048397418
	%tmp6 = sub i32 %tmp4, %tmp5			%tmp6 = sub i32 %tmp4, %tmp5
	%tmp7 = sub i32 -2014710503, %tmp3			%tmp7 = sub i32 -2014710503, %tmp3
	%tmp8 = add i32 %tmp6, %tmp7			%tmp8 = add i32 %tmp6, %tmp7
	store i32 %tmp8, i32* %.out			store i32 %tmp8, i32* %.out
	ret void			ret void
	}			}

test/Transforms/Reassociate/xor_reassoc.ll

	Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	; (x \| c1) ^ (x & c1) = x ^ c1			; (x \| c1) ^ (x & c1) = x ^ c1
	define i32 @xor_special2(i32 %x, i32 %y) {			define i32 @xor_special2(i32 %x, i32 %y) {
	%or = or i32 %x, 123			%or = or i32 %x, 123
	%xor = xor i32 %or, %y			%xor = xor i32 %or, %y
	%and = and i32 %x, 123			%and = and i32 %x, 123
	%xor1 = xor i32 %xor, %and			%xor1 = xor i32 %xor, %and
	ret i32 %xor1			ret i32 %xor1
	; CHECK-LABEL: @xor_special2(			; CHECK-LABEL: @xor_special2(
	; CHECK: %xor = xor i32 %y, 123			; CHECK: %xor = xor i32 %x, 123
	; CHECK: %xor1 = xor i32 %xor, %x			; CHECK: %xor1 = xor i32 %xor, %y
	; CHECK: ret i32 %xor1			; CHECK: ret i32 %xor1
	}			}

	; (x \| c1) ^ (x \| c1) => 0			; (x \| c1) ^ (x \| c1) => 0
	define i32 @xor_special3(i32 %x) {			define i32 @xor_special3(i32 %x) {
	%or = or i32 %x, 123			%or = or i32 %x, 123
	%or1 = or i32 %x, 123			%or1 = or i32 %x, 123
	%xor = xor i32 %or, %or1			%xor = xor i32 %or, %or1
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines