This is an archive of the discontinued LLVM Phabricator instance.

[PATCH] Global reassociation for improved CSE
ClosedPublic

Authored by escha on Nov 14 2017, 1:54 PM.

Download Raw Diff

Details

Reviewers

hfinkel
arsenm
fhahn
spatel
• dberlin
qcolombet

Summary

This has been improved from the original patch to take into account suggestions, fix bugs, and reduce complexity, so it's no longer an RFC and now a patch ;-)

When playing around with reassociate I noticed a seemingly obvious optimization that was not getting done anywhere in llvm… nor in gcc or ICC.

Consider the following trivial function:

void foo(int a, int b, int c, int d, int e, int *res) {
  res[0] = (e * a) * d;
  res[1] = (e * b) * d;
  res[2] = (e * c) * d;
}

This function can be optimized down to 4 multiplies instead of 6 by reassociating such that (e*d) is the common subexpresion. However, no compiler I’ve tested does this. I wrote a slightly hacky heuristic algorithm to augment reassociate to do this and tested it.

First, before the details, the results: on a large offline test suite of graphics shaders it cut down total instruction count by ~0.9% (!) and float math instruction count by 1.5% (!).

Here’s how it works:

Do reassociate once as normal.

Create a “pair map” consisting of a mapping from <Instr, Instr> to <unsigned>. Have one pair map for each type of BinaryOperation. This map represents how common a given operand pair occurs in the source code for a given BinaryOperation. But in addition to counting each actual instruction, we also count each possible O(N^2) pair of each linear operand chain. So for example, if the operand chain is this:

a*b*c

we do:

PairMap[Multiply][{a, b}]++;
PairMap[Multiply][{b, c}]++;
PairMap[Multiply][{a, c}]++;

The chain length is capped at an arbitrary but low value to avoid possible quadratic behavior.

Furthermore, we do *not* count duplicate pairs in a single chain, so for example, if our chain is

a*a*b*c

we only count a*b and a*c once, to avoid bias.

Run reassociate again. All the information is saved from the first time around so hopefully this won’t be very expensive except for the changes we actually make. But this time, whenever emitting a linear operand chain, pick the operand pair that’s *most common* in the source code (using PairMap) and make that one the first operation. Thus, for example:

(((a*b)*c)*d)*e

if “b*e” is the most common, this becomes:

(((b*e)*a)*c)*d

Now b*e can be CSE’d later! Magic!

Also, as a tiebreaker, the current one I’m using is the “pair which has the lowest max rank of the two operands”, which makes sense because in this example, “a*b” is the first operation in the chain, so we want to pick the duplicates which are also higher up in the program vs closer to the leaf. No other tiebreaker I tried seemed to work as well.

Overall this patch was structured to be a minimally invasive change to avoid major structural changes in Reassociate. It does nothing except change the order in which operands are emitted during the final step, which should be much safer and less complex than actually changing the core algorithm. The core algorithm of reassociate is actually agnostic to this; it only serves to determine *which* operands are picked, not which order they're emitted in.

This patch unfortunately has some overlap with N-ary reassociate, but because they work in such different ways they are probably not unifiable (N-ary reassociate uses a very different algorithm that doesn't catch as many cases, but is less heuristic-based and designed around addressing expressions. And because it uses SCEV, it can't work on float, which is my primary use-case).

Most of the test changes are caused by dead code that was eliminated or other effectively no-op changes that come from the fact reassociate is now iterated twice.

Diff Detail

Repository: rL LLVM

Event Timeline

escha created this revision.Nov 14 2017, 1:54 PM

Herald added a subscriber: wdng. · View Herald TranscriptNov 14 2017, 1:54 PM

escha edited the summary of this revision. (Show Details)Nov 14 2017, 1:55 PM

escha edited the summary of this revision. (Show Details)

*gentle bump*

turkey week is over pls glance at patch, thank

My understanding of 'reassociate' isn't good enough to approve this, but I'm curious about running the main loop multiple times (ReassociateStep) because I noticed an improvement in some other code by running the pass twice.

Can we split that part into a preliminary patch?
Is it too expensive in compile time to run that to a fixed-point (like instcombine)?

In D40049#939097, @spatel wrote:

My understanding of 'reassociate' isn't good enough to approve this, but I'm curious about running the main loop multiple times (ReassociateStep) because I noticed an improvement in some other code by running the pass twice.

Can we split that part into a preliminary patch?

Is it too expensive in compile time to run that to a fixed-point (like instcombine)?

I'm extremely nervous about running it to a fixed point; I'm not confident that even the *existing pass* will always converge to a fixed point.

Running it twice without any changes seems fairly pointless; the changes are mostly just removal of dead instructions and such that would get instcombined away later.

In D40049#939097, @spatel wrote:

My understanding of 'reassociate' isn't good enough to approve this, but I'm curious about running the main loop multiple times (ReassociateStep) because I noticed an improvement in some other code by running the pass twice.

Can we split that part into a preliminary patch?

Is it too expensive in compile time to run that to a fixed-point (like instcombine)?

The compile time impact here is a secondary concern.
You first need to prove this reaches a fixpoint, pencil & paper. Not necessarily looking forward to @zhendongsu reports of this pass indefinitely looping on fuzzer generated testcases.

Tests look good.

lib/Transforms/Scalar/Reassociate.cpp
2190	Please provide some high level comments here explaining what is happening like all the other steps in the same function.
2197	Style nitpick: Might be clear to read of both for loops have brackets.
2198	Could there be a way to do this without a nesting loop? If not ignore this.
2265	This next block of nested for loops looks similar to the above best pair code (that is also part of this patch). Could you possibly refactor it to reuse this same pattern in a helper?
2296	Add comment explaining the reason for having 2 reassociate steps. Also consider assining 2 to a value, call it const unsigned MaxReassociatreSteps = 2; And put the comment with the assignment.
2334	Could we populate the pair map prior to the first iteration? If so, Move this before the loop starts.

Updated patch and added comments.

My main change is that it no longer runs reassociate twice; it turns out despite my intuition this helps EXTREMELY little on my shader test suite! It looks like it's better to just build the pair map at the start even if it's not 100% accurate.

scanon added a subscriber: scanon.Dec 4 2017, 11:13 AM

arsenm added inline comments.Dec 4 2017, 11:15 AM

lib/Transforms/Scalar/Reassociate.cpp
2253	Capitalize
2271	I'm not sure what std::less on a Value * means. Is this sorting by pointer value?

escha added inline comments.Dec 4 2017, 11:17 AM

lib/Transforms/Scalar/Reassociate.cpp
2271	yes. it's canonically ordering them so we don't have to worry about checking both [a,b] and [b,a].

qcolombet added a subscriber: qcolombet.Dec 4 2017, 11:32 AM

qcolombet added inline comments.

lib/Transforms/Scalar/Reassociate.cpp
2271	We would rather use getComplexity(Value *V) for that.

qcolombet added inline comments.Dec 4 2017, 11:43 AM

lib/Transforms/Scalar/Reassociate.cpp
2271	(Though you'll probably still want something to break ties)

escha added inline comments.Dec 4 2017, 12:52 PM

lib/Transforms/Scalar/Reassociate.cpp
2271	(talked about this one offline: conclusion was std:less is okay, I think, so long as there's no ordering dependency, which there shouldn't be)

got the approval from puyan so i'm gonna push this later unless someone else has some other comments!

LGTM

This revision is now accepted and ready to land.Dec 12 2017, 10:26 AM

Thanks @escha ! lgtm too :)

A couple minor comments (feel free to address as you commit).

include/llvm/Transforms/Scalar/Reassociate.h
77	Can you make GlobalReassociateLimit a cl::opt (in case we feel like experimenting with tuning it at some point)?
lib/Transforms/Scalar/Reassociate.cpp
2242	This comment is now out of date (reassociate is not run first now, right?).

Looks like this has been committed in rL320515

spatel mentioned this in D45842: [Reassociate] swap binop operands to increase factoring potential.Apr 19 2018, 3:30 PM

spatel mentioned this in rL341288: [Reassociate] swap binop operands to increase factoring potential.Sep 2 2018, 7:27 AM

Revision Contents

Path

Size

include/

llvm/

Transforms/

Scalar/

Reassociate.h

9 lines

lib/

Transforms/

Scalar/

Reassociate.cpp

159 lines

test/

Transforms/

Reassociate/

basictest.ll

15 lines

canonicalize-neg-const.ll

5 lines

fast-ReassociateVector.ll

6 lines

fast-basictest.ll

6 lines

mulfactor.ll

4 lines

reassoc-intermediate-fnegs.ll

1 line

Diff 122909

include/llvm/Transforms/Scalar/Reassociate.h

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines

} // end namespace reassociate		} // end namespace reassociate

/// Reassociate commutative expressions.		/// Reassociate commutative expressions.
class ReassociatePass : public PassInfoMixin<ReassociatePass> {		class ReassociatePass : public PassInfoMixin<ReassociatePass> {
DenseMap<BasicBlock *, unsigned> RankMap;		DenseMap<BasicBlock *, unsigned> RankMap;
DenseMap<AssertingVH<Value>, unsigned> ValueRankMap;		DenseMap<AssertingVH<Value>, unsigned> ValueRankMap;
SetVector<AssertingVH<Instruction>> RedoInsts;		SetVector<AssertingVH<Instruction>> RedoInsts;

		unsigned ReassociateStep;
		// Arbitrary, but prevents quadratic behavior.
		hfinkelUnsubmitted Not Done Reply Inline Actions Can you make GlobalReassociateLimit a cl::opt (in case we feel like experimenting with tuning it at some point)? hfinkel: Can you make GlobalReassociateLimit a cl::opt (in case we feel like experimenting with tuning…
		static const unsigned GlobalReassociateLimit = 10;
		static const unsigned NumBinaryOps =
		Instruction::BinaryOpsEnd - Instruction::BinaryOpsBegin;
		DenseMap<std::pair<Value , Value >, unsigned> PairMap[NumBinaryOps];

bool MadeChange;		bool MadeChange;

public:		public:
PreservedAnalyses run(Function &F, FunctionAnalysisManager &);		PreservedAnalyses run(Function &F, FunctionAnalysisManager &);

private:		private:
void BuildRankMap(Function &F, ReversePostOrderTraversal<Function *> &RPOT);		void BuildRankMap(Function &F, ReversePostOrderTraversal<Function *> &RPOT);
unsigned getRank(Value *V);		unsigned getRank(Value *V);
Show All 17 Lines	private:
Value OptimizeMul(BinaryOperator I,		Value OptimizeMul(BinaryOperator I,
SmallVectorImpl<reassociate::ValueEntry> &Ops);		SmallVectorImpl<reassociate::ValueEntry> &Ops);
Value RemoveFactorFromExpression(Value V, Value *Factor);		Value RemoveFactorFromExpression(Value V, Value *Factor);
void EraseInst(Instruction *I);		void EraseInst(Instruction *I);
void RecursivelyEraseDeadInsts(Instruction *I,		void RecursivelyEraseDeadInsts(Instruction *I,
SetVector<AssertingVH<Instruction>> &Insts);		SetVector<AssertingVH<Instruction>> &Insts);
void OptimizeInst(Instruction *I);		void OptimizeInst(Instruction *I);
Instruction canonicalizeNegConstExpr(Instruction I);		Instruction canonicalizeNegConstExpr(Instruction I);
		void BuildPairMap(ReversePostOrderTraversal<Function *> &RPOT);
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TRANSFORMS_SCALAR_REASSOCIATE_H		#endif // LLVM_TRANSFORMS_SCALAR_REASSOCIATE_H

lib/Transforms/Scalar/Reassociate.cpp

Show All 21 Lines

#include "llvm/Transforms/Scalar/Reassociate.h"		#include "llvm/Transforms/Scalar/Reassociate.h"
#include "llvm/ADT/APFloat.h"		#include "llvm/ADT/APFloat.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
▲ Show 20 Lines • Show All 2,143 Lines • ▼ Show 20 Lines	if (Ops.size() == 1) {
// This expression tree simplified to something that isn't a tree,		// This expression tree simplified to something that isn't a tree,
// eliminate it.		// eliminate it.
I->replaceAllUsesWith(Ops[0].Op);		I->replaceAllUsesWith(Ops[0].Op);
if (Instruction *OI = dyn_cast<Instruction>(Ops[0].Op))		if (Instruction *OI = dyn_cast<Instruction>(Ops[0].Op))
OI->setDebugLoc(I->getDebugLoc());		OI->setDebugLoc(I->getDebugLoc());
RedoInsts.insert(I);		RedoInsts.insert(I);
return;		return;
}		}

		plotfiUnsubmitted Not Done Reply Inline Actions Please provide some high level comments here explaining what is happening like all the other steps in the same function. plotfi: Please provide some high level comments here explaining what is happening like all the other…
		if (ReassociateStep == 1 && Ops.size() > 2 &&
		Ops.size() <= GlobalReassociateLimit) {
		unsigned Max = 1;
		unsigned BestRank = 0;
		std::pair<unsigned, unsigned> BestPair;
		unsigned Idx = I->getOpcode() - Instruction::BinaryOpsBegin;
		for (unsigned i = 0; i < Ops.size() - 1; ++i)
		plotfiUnsubmitted Not Done Reply Inline Actions Style nitpick: Might be clear to read of both for loops have brackets. plotfi: Style nitpick: Might be clear to read of both for loops have brackets.
		for (unsigned j = i + 1; j < Ops.size(); ++j) {
		plotfiUnsubmitted Not Done Reply Inline Actions Could there be a way to do this without a nesting loop? If not ignore this. plotfi: Could there be a way to do this without a nesting loop? If not ignore this.
		unsigned Score = 0;
		Value *Op0 = Ops[i].Op;
		Value *Op1 = Ops[j].Op;
		if (std::less<Value *>()(Op1, Op0))
		std::swap(Op0, Op1);
		auto it = PairMap[Idx].find({Op0, Op1});
		if (it != PairMap[Idx].end())
		Score += it->second;

		unsigned MaxRank = std::max(Ops[i].Rank, Ops[j].Rank);
		if (Score > Max \|\| (Score == Max && MaxRank < BestRank)) {
		BestPair = {i, j};
		Max = Score;
		BestRank = MaxRank;
		}
		}
		if (Max > 1) {
		auto Op0 = Ops[BestPair.first];
		auto Op1 = Ops[BestPair.second];
		Ops.erase(&Ops[BestPair.second]);
		Ops.erase(&Ops[BestPair.first]);
		Ops.push_back(Op0);
		Ops.push_back(Op1);
		}
		}
// Now that we ordered and optimized the expressions, splat them back into		// Now that we ordered and optimized the expressions, splat them back into
// the expression tree, removing any unneeded nodes.		// the expression tree, removing any unneeded nodes.
RewriteExprTree(I, Ops);		RewriteExprTree(I, Ops);
}		}

		void
		ReassociatePass::BuildPairMap(ReversePostOrderTraversal<Function *> &RPOT) {
		// Make a "pairmap" of how often each operand pair occurs.
		for (BasicBlock *BI : RPOT) {
		for (Instruction &I : *BI) {
		if (!I.isAssociative())
		continue;

		// Ignore nodes that aren't at the root of trees.
		if (I.hasOneUse() && I.user_back()->getOpcode() == I.getOpcode())
		continue;

		// Collect all operands in a single reassociable expression.
		// Since Reassociate has already been run once, we can assume things
		hfinkelUnsubmitted Not Done Reply Inline Actions This comment is now out of date (reassociate is not run first now, right?). hfinkel: This comment is now out of date (reassociate is not run first now, right?).
		// are already canonical according to Reassociation's regime.
		SmallVector<Value *, 8> Worklist = { I.getOperand(0), I.getOperand(1) };
		SmallVector<Value *, 8> Ops;
		while (!Worklist.empty() && Ops.size() <= GlobalReassociateLimit) {
		Value *Op = Worklist.pop_back_val();
		Instruction *OpI = dyn_cast<Instruction>(Op);
		if (!OpI \|\| OpI->getOpcode() != I.getOpcode() \|\| !OpI->hasOneUse()) {
		Ops.push_back(Op);
		continue;
		}
		// be paranoid about self-referencing expressions in unreachable code
		arsenmUnsubmitted Not Done Reply Inline Actions Capitalize arsenm: Capitalize
		if (OpI->getOperand(0) != OpI)
		Worklist.push_back(OpI->getOperand(0));
		if (OpI->getOperand(1) != OpI)
		Worklist.push_back(OpI->getOperand(1));
		}
		// Skip extremely long expressions.
		if (Ops.size() > GlobalReassociateLimit)
		continue;

		// Add all pairwise combinations of operands to the pair map.
		unsigned BinaryIdx = I.getOpcode() - Instruction::BinaryOpsBegin;
		SmallSet<std::pair<Value , Value>, 32> Visited;
		plotfiUnsubmitted Not Done Reply Inline Actions This next block of nested for loops looks similar to the above best pair code (that is also part of this patch). Could you possibly refactor it to reuse this same pattern in a helper? plotfi: This next block of nested for loops looks similar to the above best pair code (that is also…
		for (unsigned i = 0; i < Ops.size() - 1; ++i) {
		for (unsigned j = i + 1; j < Ops.size(); ++j) {
		// Canonicalize operand orderings.
		Value *Op0 = Ops[i];
		Value *Op1 = Ops[j];
		if (std::less<Value *>()(Op1, Op0))
		arsenmUnsubmitted Not Done Reply Inline Actions I'm not sure what std::less on a Value * means. Is this sorting by pointer value? arsenm: I'm not sure what std::less on a Value * means. Is this sorting by pointer value?
		eschaAuthorUnsubmitted Not Done Reply Inline Actions yes. it's canonically ordering them so we don't have to worry about checking both [a,b] and [b,a]. escha: yes. it's canonically ordering them so we don't have to worry about checking both [a,b] and [b…
		qcolombetUnsubmitted Not Done Reply Inline Actions We would rather use getComplexity(Value V) for that. qcolombet:* We would rather use getComplexity(Value *V) for that.
		qcolombetUnsubmitted Not Done Reply Inline Actions (Though you'll probably still want something to break ties) qcolombet: (Though you'll probably still want something to break ties)
		eschaAuthorUnsubmitted Not Done Reply Inline Actions (talked about this one offline: conclusion was std:less is okay, I think, so long as there's no ordering dependency, which there shouldn't be) escha: (talked about this one offline: conclusion was std:less is okay, I think, so long as there's no…
		std::swap(Op0, Op1);
		if (!Visited.insert({Op0, Op1}).second)
		continue;
		auto res = PairMap[BinaryIdx].insert({{Op0, Op1}, 1});
		if (!res.second)
		++res.first->second;
		}
		}
		}
		}
		}

PreservedAnalyses ReassociatePass::run(Function &F, FunctionAnalysisManager &) {		PreservedAnalyses ReassociatePass::run(Function &F, FunctionAnalysisManager &) {
// Get the functions basic blocks in Reverse Post Order. This order is used by		// Get the functions basic blocks in Reverse Post Order. This order is used by
// BuildRankMap to pre calculate ranks correctly. It also excludes dead basic		// BuildRankMap to pre calculate ranks correctly. It also excludes dead basic
// blocks (it has been seen that the analysis in this pass could hang when		// blocks (it has been seen that the analysis in this pass could hang when
// analysing dead basic blocks).		// analysing dead basic blocks).
ReversePostOrderTraversal<Function *> RPOT(&F);		ReversePostOrderTraversal<Function *> RPOT(&F);

// Calculate the rank map for F.		// Calculate the rank map for F.
BuildRankMap(F, RPOT);		BuildRankMap(F, RPOT);

MadeChange = false;		MadeChange = false;
// Traverse the same blocks that was analysed by BuildRankMap.		// Traverse the same blocks that was analysed by BuildRankMap.
		for (ReassociateStep = 0; ReassociateStep < 2; ++ReassociateStep) {
		plotfiUnsubmitted Not Done Reply Inline Actions Add comment explaining the reason for having 2 reassociate steps. Also consider assining 2 to a value, call it const unsigned MaxReassociatreSteps = 2; And put the comment with the assignment. plotfi: Add comment explaining the reason for having 2 reassociate steps. Also consider assining 2 to a…
for (BasicBlock *BI : RPOT) {		for (BasicBlock *BI : RPOT) {
assert(RankMap.count(&*BI) && "BB should be ranked.");		assert(RankMap.count(&*BI) && "BB should be ranked.");
// Optimize every instruction in the basic block.		// Optimize every instruction in the basic block.
for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;)		for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;)
if (isInstructionTriviallyDead(&*II)) {		if (isInstructionTriviallyDead(&*II)) {
EraseInst(&*II++);		EraseInst(&*II++);
} else {		} else {
OptimizeInst(&*II);		OptimizeInst(&*II);
assert(II->getParent() == &*BI && "Moved to a different block!");		assert(II->getParent() == &*BI && "Moved to a different block!");
++II;		++II;
}		}

// Make a copy of all the instructions to be redone so we can remove dead		// Make a copy of all the instructions to be redone so we can remove dead
// instructions.		// instructions.
SetVector<AssertingVH<Instruction>> ToRedo(RedoInsts);		SetVector<AssertingVH<Instruction>> ToRedo(RedoInsts);
// Iterate over all instructions to be reevaluated and remove trivially dead		// Iterate over all instructions to be reevaluated and remove trivially dead
// instructions. If any operand of the trivially dead instruction becomes		// instructions. If any operand of the trivially dead instruction becomes
// dead mark it for deletion as well. Continue this process until all		// dead mark it for deletion as well. Continue this process until all
// trivially dead instructions have been removed.		// trivially dead instructions have been removed.
while (!ToRedo.empty()) {		while (!ToRedo.empty()) {
Instruction *I = ToRedo.pop_back_val();		Instruction *I = ToRedo.pop_back_val();
if (isInstructionTriviallyDead(I)) {		if (isInstructionTriviallyDead(I)) {
RecursivelyEraseDeadInsts(I, ToRedo);		RecursivelyEraseDeadInsts(I, ToRedo);
MadeChange = true;		MadeChange = true;
}		}
}		}

// Now that we have removed dead instructions, we can reoptimize the		// Now that we have removed dead instructions, we can reoptimize the
// remaining instructions.		// remaining instructions.
while (!RedoInsts.empty()) {		while (!RedoInsts.empty()) {
Instruction *I = RedoInsts.pop_back_val();		Instruction *I = RedoInsts.pop_back_val();
if (isInstructionTriviallyDead(I))		if (isInstructionTriviallyDead(I))
EraseInst(I);		EraseInst(I);
else		else
OptimizeInst(I);		OptimizeInst(I);
}		}
}		}
		if (ReassociateStep == 0)
		plotfiUnsubmitted Not Done Reply Inline Actions Could we populate the pair map prior to the first iteration? If so, Move this before the loop starts. plotfi: Could we populate the pair map prior to the first iteration? If so, Move this before the loop…
		BuildPairMap(RPOT);
		}
		for (auto &Entry : PairMap)
		Entry.clear();

// We are done with the rank map.		// We are done with the rank map.
RankMap.clear();		RankMap.clear();
ValueRankMap.clear();		ValueRankMap.clear();

if (MadeChange) {		if (MadeChange) {
PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

test/Transforms/Reassociate/basictest.ll

	Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines
	if.then: ; preds = %entry			if.then: ; preds = %entry
	%add1 = add i64 %shl.neg, %shl.neg			%add1 = add i64 %shl.neg, %shl.neg
	%add2 = add i64 %add1, %b			%add2 = add i64 %add1, %b
	ret i64 %add2			ret i64 %add2

	if.end: ; preds = %entry			if.end: ; preds = %entry
	ret i64 0			ret i64 0
	}			}

				; CHECK-LABEL: @test17
				; CHECK: %[[A:.*]] = mul i32 %X4, %X3
				; CHECK-NEXT: %[[C:.*]] = mul i32 %[[A]], %X1
				; CHECK-NEXT: %[[D:.*]] = mul i32 %[[A]], %X2
				; CHECK-NEXT: %[[E:.*]] = xor i32 %[[C]], %[[D]]
				; CHECK-NEXT: ret i32 %[[E]]
				define i32 @test17(i32 %X1, i32 %X2, i32 %X3, i32 %X4) {
				%A = mul i32 %X3, %X1
				%B = mul i32 %X3, %X2
				%C = mul i32 %A, %X4
				%D = mul i32 %B, %X4
				%E = xor i32 %C, %D
				ret i32 %E
				}

test/Transforms/Reassociate/canonicalize-neg-const.ll

	Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	; break up the subtract.			; break up the subtract.
	;			;
	; Check to make sure we don't canonicalize			; Check to make sure we don't canonicalize
	; (%pow2-5.0 + %sub) -> (%sub - %pow25.0)			; (%pow2-5.0 + %sub) -> (%sub - %pow25.0)
	; as we would later break up this subtract causing a cycle.			; as we would later break up this subtract causing a cycle.

	define double @pr34078(double %A) {			define double @pr34078(double %A) {
	; CHECK-LABEL: @pr34078(			; CHECK-LABEL: @pr34078(
	; CHECK-NEXT: [[SUB:%.*]] = fsub fast double 1.000000e+00, %A			; CHECK-NEXT: [[A_NEG:%.*]] = fsub fast double -0.000000e+00, %A
				; CHECK-NEXT: [[SUB:%.*]] = fadd fast double [[A_NEG]], 1.000000e+00
	; CHECK-NEXT: [[POW2:%.*]] = fmul double %A, %A			; CHECK-NEXT: [[POW2:%.*]] = fmul double %A, %A
	; CHECK-NEXT: [[MUL5_NEG:%.*]] = fmul fast double [[POW2]], -5.000000e-01			; CHECK-NEXT: [[MUL5_NEG:%.*]] = fmul fast double [[POW2]], -5.000000e-01
	; CHECK-NEXT: [[SUB1:%.*]] = fadd fast double [[MUL5_NEG]], [[SUB]]			; CHECK-NEXT: [[SUB1:%.*]] = fadd fast double [[SUB]], [[MUL5_NEG]]
	; CHECK-NEXT: [[FACTOR:%.*]] = fmul fast double [[SUB1]], 2.000000e+00			; CHECK-NEXT: [[FACTOR:%.*]] = fmul fast double [[SUB1]], 2.000000e+00
	; CHECK-NEXT: ret double [[FACTOR]]			; CHECK-NEXT: ret double [[FACTOR]]
	;			;
	%sub = fsub fast double 1.000000e+00, %A			%sub = fsub fast double 1.000000e+00, %A
	%pow2 = fmul double %A, %A			%pow2 = fmul double %A, %A
	%mul5 = fmul fast double %pow2, 5.000000e-01			%mul5 = fmul fast double %pow2, 5.000000e-01
	%sub1 = fsub fast double %sub, %mul5			%sub1 = fsub fast double %sub, %mul5
	%add = fadd fast double %sub1, %sub1			%add = fadd fast double %sub1, %sub1
	ret double %add			ret double %add
	}			}

test/Transforms/Reassociate/fast-ReassociateVector.ll

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	;
%tmp2 = fmul reassoc <2 x float> %tmp1, <float 1.200000e+01, float 1.200000e+01>		%tmp2 = fmul reassoc <2 x float> %tmp1, <float 1.200000e+01, float 1.200000e+01>
ret <2 x float> %tmp2		ret <2 x float> %tmp2
}		}

; Check (b+(a+1234))+-a -> b+1234.		; Check (b+(a+1234))+-a -> b+1234.

define <2 x double> @test9(<2 x double> %b, <2 x double> %a) {		define <2 x double> @test9(<2 x double> %b, <2 x double> %a) {
; CHECK-LABEL: @test9(		; CHECK-LABEL: @test9(
; CHECK-NEXT: [[TMP1:%.]] = fsub fast <2 x double> zeroinitializer, [[A:%.]]		; CHECK-NEXT: [[TMP1:%.]] = fadd fast <2 x double> [[B:%.]], <double 1.234000e+03, double 1.234000e+03>
; CHECK-NEXT: [[TMP2:%.]] = fadd fast <2 x double> [[B:%.]], <double 1.234000e+03, double 1.234000e+03>		; CHECK-NEXT: ret <2 x double> [[TMP1]]
; CHECK-NEXT: ret <2 x double> [[TMP2]]
;		;
%1 = fadd fast <2 x double> %a, <double 1.234000e+03, double 1.234000e+03>		%1 = fadd fast <2 x double> %a, <double 1.234000e+03, double 1.234000e+03>
%2 = fadd fast <2 x double> %b, %1		%2 = fadd fast <2 x double> %b, %1
%3 = fsub fast <2 x double> <double 0.000000e+00, double 0.000000e+00>, %a		%3 = fsub fast <2 x double> <double 0.000000e+00, double 0.000000e+00>, %a
%4 = fadd fast <2 x double> %2, %3		%4 = fadd fast <2 x double> %2, %3
ret <2 x double> %4		ret <2 x double> %4
}		}

Show All 13 Lines	;
%4 = fadd reassoc <2 x double> %2, %3		%4 = fadd reassoc <2 x double> %2, %3
ret <2 x double> %4		ret <2 x double> %4
}		}

; Check -(-(z40)a) -> a40z.		; Check -(-(z40)a) -> a40z.

define <2 x float> @test10(<2 x float> %a, <2 x float> %b, <2 x float> %z) {		define <2 x float> @test10(<2 x float> %a, <2 x float> %b, <2 x float> %z) {
; CHECK-LABEL: @test10(		; CHECK-LABEL: @test10(
; CHECK-NEXT: [[TMP1:%.*]] = fsub fast <2 x float> zeroinitializer, zeroinitializer
; CHECK-NEXT: [[E:%.]] = fmul fast <2 x float> [[A:%.]], <float 4.000000e+01, float 4.000000e+01>		; CHECK-NEXT: [[E:%.]] = fmul fast <2 x float> [[A:%.]], <float 4.000000e+01, float 4.000000e+01>
; CHECK-NEXT: [[F:%.]] = fmul fast <2 x float> [[E]], [[Z:%.]]		; CHECK-NEXT: [[F:%.]] = fmul fast <2 x float> [[E]], [[Z:%.]]
; CHECK-NEXT: ret <2 x float> [[F]]		; CHECK-NEXT: ret <2 x float> [[F]]
;		;
%d = fmul fast <2 x float> %z, <float 4.000000e+01, float 4.000000e+01>		%d = fmul fast <2 x float> %z, <float 4.000000e+01, float 4.000000e+01>
%c = fsub fast <2 x float> <float 0.000000e+00, float 0.000000e+00>, %d		%c = fsub fast <2 x float> <float 0.000000e+00, float 0.000000e+00>, %d
%e = fmul fast <2 x float> %a, %c		%e = fmul fast <2 x float> %a, %c
%f = fsub fast <2 x float> <float 0.000000e+00, float 0.000000e+00>, %e		%f = fsub fast <2 x float> <float 0.000000e+00, float 0.000000e+00>, %e
▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

test/Transforms/Reassociate/fast-basictest.ll

Show First 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	;
%r = fadd reassoc float %aab, %aac		%r = fadd reassoc float %aab, %aac
ret float %r		ret float %r
}		}

; (-X)Y + Z -> Z-XY		; (-X)Y + Z -> Z-XY

define float @test8(float %X, float %Y, float %Z) {		define float @test8(float %X, float %Y, float %Z) {
; CHECK-LABEL: @test8(		; CHECK-LABEL: @test8(
; CHECK-NEXT: [[A:%.*]] = fmul fast float %Y, %X		; CHECK-NEXT: [[A:%.*]] = fmul fast float %X, %Y
; CHECK-NEXT: [[C:%.*]] = fsub fast float %Z, [[A]]		; CHECK-NEXT: [[C:%.*]] = fsub fast float %Z, [[A]]
; CHECK-NEXT: ret float [[C]]		; CHECK-NEXT: ret float [[C]]
;		;
%A = fsub fast float 0.0, %X		%A = fsub fast float 0.0, %X
%B = fmul fast float %A, %Y		%B = fmul fast float %A, %Y
%C = fadd fast float %B, %Z		%C = fadd fast float %B, %Z
ret float %C		ret float %C
}		}
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	;
%X = fmul reassoc float %W, 127.0		%X = fmul reassoc float %W, 127.0
%Y = fadd reassoc float %X ,%X		%Y = fadd reassoc float %X ,%X
%Z = fadd reassoc float %Y, %X		%Z = fadd reassoc float %Y, %X
ret float %Z		ret float %Z
}		}

define float @test12(float %X) {		define float @test12(float %X) {
; CHECK-LABEL: @test12(		; CHECK-LABEL: @test12(
; CHECK-NEXT: [[FACTOR:%.*]] = fmul fast float %X, -3.000000e+00		; CHECK-NEXT: [[FACTOR:%.*]] = fmul fast float %X, 3.000000e+00
; CHECK-NEXT: [[Z:%.*]] = fadd fast float [[FACTOR]], 6.000000e+00		; CHECK-NEXT: [[Z:%.*]] = fsub fast float 6.000000e+00, [[FACTOR]]
; CHECK-NEXT: ret float [[Z]]		; CHECK-NEXT: ret float [[Z]]
;		;
%A = fsub fast float 1.000000e+00, %X		%A = fsub fast float 1.000000e+00, %X
%B = fsub fast float 2.000000e+00, %X		%B = fsub fast float 2.000000e+00, %X
%C = fsub fast float 3.000000e+00, %X		%C = fsub fast float 3.000000e+00, %X
%Y = fadd fast float %A ,%B		%Y = fadd fast float %A ,%B
%Z = fadd fast float %Y, %C		%Z = fadd fast float %Y, %C
ret float %Z		ret float %Z
▲ Show 20 Lines • Show All 238 Lines • Show Last 20 Lines

test/Transforms/Reassociate/mulfactor.ll

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	;
%d = mul i32 %c, %x		%d = mul i32 %c, %x
%e = mul i32 %d, %x		%e = mul i32 %d, %x
ret i32 %e		ret i32 %e
}		}

; (x^5) * (y^3) * z		; (x^5) * (y^3) * z
define i32 @test6(i32 %x, i32 %y, i32 %z) {		define i32 @test6(i32 %x, i32 %y, i32 %z) {
; CHECK-LABEL: @test6(		; CHECK-LABEL: @test6(
; CHECK-NEXT: [[TMP1:%.*]] = mul i32 %x, %x		; CHECK-NEXT: [[TMP1:%.*]] = mul i32 %y, %x
; CHECK-NEXT: [[TMP2:%.*]] = mul i32 [[TMP1]], %y		; CHECK-NEXT: [[TMP2:%.*]] = mul i32 [[TMP1]], %x
; CHECK-NEXT: [[F:%.*]] = mul i32 %y, %x		; CHECK-NEXT: [[F:%.*]] = mul i32 %y, %x
; CHECK-NEXT: [[G:%.*]] = mul i32 [[F]], [[TMP2]]		; CHECK-NEXT: [[G:%.*]] = mul i32 [[F]], [[TMP2]]
; CHECK-NEXT: [[TMP3:%.*]] = mul i32 [[G]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = mul i32 [[G]], [[TMP2]]
; CHECK-NEXT: [[H:%.*]] = mul i32 [[TMP3]], %z		; CHECK-NEXT: [[H:%.*]] = mul i32 [[TMP3]], %z
; CHECK-NEXT: ret i32 [[H]]		; CHECK-NEXT: ret i32 [[H]]
;		;
%a = mul i32 %x, %y		%a = mul i32 %x, %y
%b = mul i32 %a, %x		%b = mul i32 %a, %x
Show All 30 Lines

test/Transforms/Reassociate/reassoc-intermediate-fnegs.ll

	; RUN: opt < %s -reassociate -S \| FileCheck %s			; RUN: opt < %s -reassociate -S \| FileCheck %s

	; Input is A op (B op C)			; Input is A op (B op C)

	define half @faddsubAssoc1(half %a, half %b) {			define half @faddsubAssoc1(half %a, half %b) {
	; CHECK-LABEL: @faddsubAssoc1(			; CHECK-LABEL: @faddsubAssoc1(
	; CHECK-NEXT: [[T2_NEG:%.*]] = fmul fast half %a, 0xH4500			; CHECK-NEXT: [[T2_NEG:%.*]] = fmul fast half %a, 0xH4500
	; CHECK-NEXT: [[REASS_MUL:%.*]] = fmul fast half %b, 0xH4500			; CHECK-NEXT: [[REASS_MUL:%.*]] = fmul fast half %b, 0xH4500
	; CHECK-NEXT: [[T51:%.*]] = fsub fast half [[REASS_MUL]], [[T2_NEG]]			; CHECK-NEXT: [[T51:%.*]] = fsub fast half [[REASS_MUL]], [[T2_NEG]]
	; CHECK-NEXT: [[T5:%.*]] = fadd fast half [[REASS_MUL]], [[T2_NEG]]
	; CHECK-NEXT: ret half [[T51]]			; CHECK-NEXT: ret half [[T51]]
	;			;
	%t1 = fmul fast half %b, 0xH4200 ; 3*b			%t1 = fmul fast half %b, 0xH4200 ; 3*b
	%t2 = fmul fast half %a, 0xH4500 ; 5*a			%t2 = fmul fast half %a, 0xH4500 ; 5*a
	%t3 = fmul fast half %b, 0xH4000 ; 2*b			%t3 = fmul fast half %b, 0xH4000 ; 2*b
	%t4 = fsub fast half %t2, %t1 ; 5 * a - 3 * b			%t4 = fsub fast half %t2, %t1 ; 5 * a - 3 * b
	%t5 = fsub fast half %t3, %t4 ; 2 * b - ( 5 * a - 3 * b)			%t5 = fsub fast half %t3, %t4 ; 2 * b - ( 5 * a - 3 * b)
	ret half %t5 ; = 5 * (b - a)			ret half %t5 ; = 5 * (b - a)
	Show All 19 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PATCH] Global reassociation for improved CSEClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 122909

include/llvm/Transforms/Scalar/Reassociate.h

lib/Transforms/Scalar/Reassociate.cpp

test/Transforms/Reassociate/basictest.ll

test/Transforms/Reassociate/canonicalize-neg-const.ll

test/Transforms/Reassociate/fast-ReassociateVector.ll

test/Transforms/Reassociate/fast-basictest.ll

test/Transforms/Reassociate/mulfactor.ll

test/Transforms/Reassociate/reassoc-intermediate-fnegs.ll

[PATCH] Global reassociation for improved CSE
ClosedPublic