This is an archive of the discontinued LLVM Phabricator instance.

Modifying reassociate for improved CSE
AbandonedPublic

Authored by escha on Oct 26 2017, 11:45 AM.

Download Raw Diff

Details

Reviewers

lattner
hfinkel
arsenm
fhahn

Summary

Moved here from the llvm-dev list on their advice.

Copy-paste of description:

When playing around with reassociate I noticed a seemingly obvious optimization that was not getting done anywhere in llvm… nor in gcc or ICC.

Consider the following trivial function:

void foo(int a, int b, int c, int d, int e, int *res) {
  res[0] = (e * a) * d;
  res[1] = (e * b) * d;
  res[2] = (e * c) * d;
}

This function can be optimized down to 4 multiplies instead of 6 by reassociating such that (e*d) is the common subexpresion. However, no compiler I’ve tested does this. I wrote a slightly hacky heuristic algorithm to augment reassociate to do this and tested it.

First, before the details, the results: on a large offline test suite of graphics shaders it cut down total instruction count by ~0.9% (!) and float math instruction count by 1.5% (!). I ran it on SPEC out of curiosity and the one standout was that with fast-math, it cut down 470.lbm by ~6% runtime (see Note 1 at the bottom for the probable reason why). Not coincidentally, this is probably the SPEC program that looks most like a graphics shader.

Here’s how it works:

Do reassociate once as normal. Keep track of the linear chains of operations that are saved to the final IR for later.
Create a “pair map” consisting of a mapping from <Instr, Instr> to <unsigned>. Have one pair map for each type of BinaryOperation. This map represents how common a given operand pair occurs in the source code for a given BinaryOperation. But in addition to counting each actual instruction, we also count each possible O(N^2) pair of each linear operand chain. So for example, if the operand chain is this:

a*b*c

we do:

PairMap[Multiply][{a, b}]++;
PairMap[Multiply][{b, c}]++;
PairMap[Multiply][{a, c}]++;

Obviously if runtime is an issue we can prohibit this on extremely long chains or such that are unlikely to occur in real programs to avoid asymptotically quadratic behavior.

Run reassociate again. All the information is saved from the first time around so hopefully this won’t be very expensive except for the changes we actually make. But this time, whenever emitting a linear operand chain, pick the operand pair that’s *most common* in the source code (using PairMap) and make that one the first operation. Thus, for example:

(((a*b)*c)*d)*e

if “b*e” is the most common, this becomes:

(((b*e)*a)*c)*d

Now b*e can be CSE’d later! Magic!

This could probably be enhanced to allow for multiple operations to be CSE’d, e.g. by doing something like ((b*e)*(a*c))*d, but that would change the canonicalization regime and make it a more invasive change, I think.

Also, as a tiebreaker, the current one I’m using is the “pair which has the lowest max rank of the two operands”, which makes sense because in this example, “a*b” is the first operation in the chain, so we want to pick the duplicates which are also higher up in the program vs closer to the leaf. No other tiebreaker I tried seemed to work as well.

This is not the cleanest implementation or algorithm, but it’s the most straightforward I could come up with and it gives us some unreasonably large perf gains, and of course, optimizes “foo” correctly down to 4 multiplies.

Any thoughts? The gains here are… well, more than I expected, to say the least ;-)

—escha

Note 1: see how (1.0 - u2) is a common factor here.

DST_C ( dstGrid ) = (1.0-OMEGA)*SRC_C ( srcGrid ) + DFL1*OMEGA*rho*(1.0                                 - u2);
DST_N ( dstGrid ) = (1.0-OMEGA)*SRC_N ( srcGrid ) + DFL2*OMEGA*rho*(1.0 +       uy*(4.5*uy       + 3.0) - u2);
DST_S ( dstGrid ) = (1.0-OMEGA)*SRC_S ( srcGrid ) + DFL2*OMEGA*rho*(1.0 +       uy*(4.5*uy       - 3.0) - u2);
DST_E ( dstGrid ) = (1.0-OMEGA)*SRC_E ( srcGrid ) + DFL2*OMEGA*rho*(1.0 +       ux*(4.5*ux       + 3.0) - u2);

(etc)

Diff Detail

Repository: rL LLVM

Event Timeline

escha created this revision.Oct 26 2017, 11:45 AM

Herald added a subscriber: wdng. · View Herald TranscriptOct 26 2017, 11:45 AM

escha added a subscriber: llvm-commits.Oct 26 2017, 11:54 AM

escha edited the summary of this revision. (Show Details)Oct 26 2017, 12:18 PM

fhahn added a subscriber: fhahn.Oct 26 2017, 3:38 PM

arsenm added inline comments.Oct 27 2017, 3:23 AM

include/llvm/Transforms/Scalar/Reassociate.h
76	I find the use of this somewhat confusing. It seems to really be a loop counter stuck in the class to pass it to the other functions. Is it particularly inconvenient to pass this in to the functions where it's needed? Also a better name might be ReassociateStep or Level (possibly with an enum for the steps)? Pass is pretty overloaded (plus lldb doesn't understand what to do with this since there is also a class named Pass)
lib/Transforms/Scalar/Reassociate.cpp
2222	Leftover debug printing

fhahn added inline comments.Oct 27 2017, 5:15 AM

include/llvm/Transforms/Scalar/Reassociate.h
76	I am probably missing something, but would it be possible to collect the linear chains as a pre-processing step and then do the full reassociate only once? You might have to move out the code to linearize binary ops, but I think it would be a more lightweight and clearer solution.

hiraditya added a subscriber: hiraditya.Oct 27 2017, 5:53 AM

hiraditya added inline comments.

lib/Transforms/Scalar/Reassociate.cpp
2217	Does this ensure that the original order of evaluation is emitted when there is no preferred order?

escha added inline comments.Oct 27 2017, 10:23 AM

include/llvm/Transforms/Scalar/Reassociate.h
76	Mmm, I'm not sure. The problem is that the chains can change as a result of reassociation, which can rewrite instructions. e.g. changing use-counts can change the chains. You're probably right; the main reason I didn't do that was that I wanted to make minimally invasive changes because I really don't feel I understand the pass deeply enough to restructure it like that safely.
76	And to the name, absolutely, will change on next iteration regardless.
lib/Transforms/Scalar/Reassociate.cpp
2217	I think it should? Not 100% sure. It currently doesn't reorder if Score never exceeds 1 but I don't fully trust this code. There's probably cases where it's "wrong" (as in, doing something unnecessary/suboptimal with order, not incorrect code).

This seems very similar to what n-ary reassociate wants to do, I'd like to understand why we shouldn't do it there?
(where it seems to fit perfectly)

In D39340#909644, @dberlin wrote:

This seems very similar to what n-ary reassociate wants to do, I'd like to understand why we shouldn't do it there?
(where it seems to fit perfectly)

Yeah, actually, I don't quite understand what N-ary reassociate *is* compared to this. One thing I do know about it is that it uses SCEV, which completely rules it out for floating point, which is the main thing I care about. Does it handle the integer case here?

In D39340#909660, @escha wrote:

Yeah, actually, I don't quite understand what N-ary reassociate *is* compared to this. One thing I do know about it is that it uses SCEV, which completely rules it out for floating point, which is the main thing I care about. Does it handle the integer case here?

It looks like NaryReassoicate tries to do the same thing as this patch, although using a different approach. I think ideally we would not have code in 2 passes that try to do the same thing.

If this patch handles all cases handled by NaryReassoicate AND floating point operations, it might make sense to get rid of NaryReassociate. Could you try and check if this patch handles all the NaryReassoicate test cases?

Hmm, reading through N-ary reassociate, it looks like it has similar goals but is not completely overlapping in action, for better or worse.

N-ary reassociate runs in dominator order and looks to see if any subexpressions of the current computation have already been computed.

But as far as I understand, it does not consider *reassociating expressions* to find those redundancies. In other words, it doesn't consider the N^2 possible sub-combinations of previous expressions; it only considers them in the order they were emitted. Its main goal seems to be optimizing addressing expressions. So for example, if you have (a+b) and a later expression computes (a+(b+2)), it can find that this is redundant. But if you have (a+(b+1)) and (a+(b+2)) but no (a+b), I don't think it handles that. It also only runs in dominator order, so it can't find arbitrary redundancies.

Similarly, my change doesn't handle GetElementPtr in the way N-ary reassociate does, since it wasn't targeted at addressing expressions.

So I agree they're redundant, and kind of share similar goals and purposes, but annoyingly I don't see any way to unify them and achieve the goals of both.

Oh and of course because N-ary uses SCEV it cannot possibly handle floating point without a huge extension of SCEV, I think.

*poke*

In D39340#910844, @escha wrote:

Hmm, reading through N-ary reassociate, it looks like it has similar goals but is not completely overlapping in action, for better or worse.

N-ary reassociate runs in dominator order and looks to see if any subexpressions of the current computation have already been computed.

But as far as I understand, it does not consider *reassociating expressions* to find those redundancies. In other words, it doesn't consider the N^2 possible sub-combinations of previous expressions; it only considers them in the order they were emitted. Its main goal seems to be optimizing addressing expressions. So for example, if you have (a+b) and a later expression computes (a+(b+2)), it can find that this is redundant. But if you have (a+(b+1)) and (a+(b+2)) but no (a+b), I don't think it handles that. It also only runs in dominator order, so it can't find arbitrary redundancies.

Similarly, my change doesn't handle GetElementPtr in the way N-ary reassociate does, since it wasn't targeted at addressing expressions.

So I agree they're redundant, and kind of share similar goals and purposes, but annoyingly I don't see any way to unify them and achieve the goals of both.

Thanks. I suppose it should be possible to to extend Reassociate to support GetElementPtr, and then we would get wha NaryReassociate provides with your patch? I am not saying this should be part of this patch, I am just trying to understand how we can build on this patch.

Your patch uses function range counts for pairs, right? Could this lead to cases were this change reassociates operations based on this global count, even though CSE won't eliminate the reassociated expression (because it would be too costly because of control flow for example) and produces worse results? I assume this is not an issue for shader code, but could happen for control-flow heavy general code. I can run some benchmarks on AArch64 and share results in a couple of days.

include/llvm/Transforms/Scalar/Reassociate.h
76	Ok thanks! I see why it is simpler to just split it up in 2 iterations. I was thinking about was having a loop that just builds the the rankMap and then have the main loop update this maps as it deletes instructions. Although it might be worth considering doing that as follow-up work.
lib/Transforms/Scalar/Reassociate.cpp
2212	Could we order the keys in the pairs to avoid a second lookup here, i.e. insert pairs with <a, b> where a < b?
2251	It looks like we are running the reassociate 4 times? Is this needed? From the description it sounds like 2 times should be enough?
2293	It's probably easier to understand if this would be done in a separate function.

In D39340#913756, @fhahn wrote:

In D39340#910844, @escha wrote:

Hmm, reading through N-ary reassociate, it looks like it has similar goals but is not completely overlapping in action, for better or worse.

N-ary reassociate runs in dominator order and looks to see if any subexpressions of the current computation have already been computed.

But as far as I understand, it does not consider *reassociating expressions* to find those redundancies. In other words, it doesn't consider the N^2 possible sub-combinations of previous expressions; it only considers them in the order they were emitted. Its main goal seems to be optimizing addressing expressions. So for example, if you have (a+b) and a later expression computes (a+(b+2)), it can find that this is redundant. But if you have (a+(b+1)) and (a+(b+2)) but no (a+b), I don't think it handles that. It also only runs in dominator order, so it can't find arbitrary redundancies.

Similarly, my change doesn't handle GetElementPtr in the way N-ary reassociate does, since it wasn't targeted at addressing expressions.

So I agree they're redundant, and kind of share similar goals and purposes, but annoyingly I don't see any way to unify them and achieve the goals of both.

Thanks. I suppose it should be possible to to extend Reassociate to support GetElementPtr, and then we would get wha NaryReassociate provides with your patch? I am not saying this should be part of this patch, I am just trying to understand how we can build on this patch.

I'm not sure. GetElementPtr is a fairly complex instruction that isn't easily mapped to other instructions (i.e. it can represent a whole chain instead of just one). I think the reason that Nary is able to do this is because it uses SCEV, which abstracts over that.

Who wrote Nary; maybe they might have a useful opinion on all this?

Your patch uses function range counts for pairs, right? Could this lead to cases were this change reassociates operations based on this global count, even though CSE won't eliminate the reassociated expression (because it would be too costly because of control flow for example) and produces worse results? I assume this is not an issue for shader code, but could happen for control-flow heavy general code. I can run some benchmarks on AArch64 and share results in a couple of days.

Yeah, I was thinking of that. In the most extreme case, you could end up with disjoint instructions in different control flow paths, but on the other hand, the new GVN-Hoist should ideally catch those. You could use something like a scoped data structure to handle this in theory, maybe?

lib/Transforms/Scalar/Reassociate.cpp
2212	hmm. what would be the best way to do this? It's possible for two instructions to have the same Rank, iirc. would pointer order be acceptable? I'd be worried about determinism.
2251	oh, right. this was me experimenting to see if i ran it twice I'd get better results .... and I did! I'm guessing this is because if you heavily reassociate based on global statistics, it opens up new opportunities, and so on. But it might be unnecessary. this is probably in part caused by the fact that my patch isn't worklist based (i.e. run over it repeatedly instead of add things to a worklist to revisit)
2293	good point. agreed

spatel added a subscriber: spatel.Nov 9 2017, 7:56 AM

Initial benchmark results on AArch64 Cortex-A57 for the LLVM test-suite and SPEC2000: -0.64% geomean of runtimes. Noteable changes:

SingleSource/Benchmarks/Shootout/Shootout-strcat -27.78%
SingleSource/Benchmarks/Linpack/linpack-pc -7.23%

No regressions > 1%.

I will run newer SPEC versions too, but that might take a while

Closing to start a real patch review.

Revision Contents

Path

Size

include/

llvm/

Transforms/

Scalar/

Reassociate.h

7 lines

lib/

Transforms/

Scalar/

Reassociate.cpp

141 lines

Diff 120463

include/llvm/Transforms/Scalar/Reassociate.h

	Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines

	} // end namespace reassociate			} // end namespace reassociate

	/// Reassociate commutative expressions.			/// Reassociate commutative expressions.
	class ReassociatePass : public PassInfoMixin<ReassociatePass> {			class ReassociatePass : public PassInfoMixin<ReassociatePass> {
	DenseMap<BasicBlock *, unsigned> RankMap;			DenseMap<BasicBlock *, unsigned> RankMap;
	DenseMap<AssertingVH<Value>, unsigned> ValueRankMap;			DenseMap<AssertingVH<Value>, unsigned> ValueRankMap;
	SetVector<AssertingVH<Instruction>> RedoInsts;			SetVector<AssertingVH<Instruction>> RedoInsts;

				unsigned Pass;
				arsenmUnsubmitted Not Done Reply Inline Actions I find the use of this somewhat confusing. It seems to really be a loop counter stuck in the class to pass it to the other functions. Is it particularly inconvenient to pass this in to the functions where it's needed? Also a better name might be ReassociateStep or Level (possibly with an enum for the steps)? Pass is pretty overloaded (plus lldb doesn't understand what to do with this since there is also a class named Pass) arsenm: I find the use of this somewhat confusing. It seems to really be a loop counter stuck in the…
				fhahnUnsubmitted Not Done Reply Inline Actions I am probably missing something, but would it be possible to collect the linear chains as a pre-processing step and then do the full reassociate only once? You might have to move out the code to linearize binary ops, but I think it would be a more lightweight and clearer solution. fhahn: I am probably missing something, but would it be possible to collect the linear chains as a pre…
				eschaAuthorUnsubmitted Not Done Reply Inline Actions Mmm, I'm not sure. The problem is that the chains can change as a result of reassociation, which can rewrite instructions. e.g. changing use-counts can change the chains. You're probably right; the main reason I didn't do that was that I wanted to make minimally invasive changes because I really don't feel I understand the pass deeply enough to restructure it like that safely. escha: Mmm, I'm not sure. The problem is that the chains can change as a result of reassociation…
				eschaAuthorUnsubmitted Not Done Reply Inline Actions And to the name, absolutely, will change on next iteration regardless. escha: And to the name, absolutely, will change on next iteration regardless.
				fhahnUnsubmitted Not Done Reply Inline Actions Ok thanks! I see why it is simpler to just split it up in 2 iterations. I was thinking about was having a loop that just builds the the rankMap and then have the main loop update this maps as it deletes instructions. Although it might be worth considering doing that as follow-up work. fhahn: Ok thanks! I see why it is simpler to just split it up in 2 iterations. I was thinking about…
				static const unsigned NumBinaryOps =
				Instruction::BinaryOpsEnd - Instruction::BinaryOpsBegin;
				DenseMap<std::pair<Value , Value >, unsigned> PairMap[NumBinaryOps];
				DenseMap<Instruction , SmallVector<Value , 8>> InstrOpMap;

	bool MadeChange;			bool MadeChange;

	public:			public:
	PreservedAnalyses run(Function &F, FunctionAnalysisManager &);			PreservedAnalyses run(Function &F, FunctionAnalysisManager &);

	private:			private:
	void BuildRankMap(Function &F, ReversePostOrderTraversal<Function *> &RPOT);			void BuildRankMap(Function &F, ReversePostOrderTraversal<Function *> &RPOT);
	unsigned getRank(Value *V);			unsigned getRank(Value *V);
	Show All 30 Lines

lib/Transforms/Scalar/Reassociate.cpp

Show First 20 Lines • Show All 1,864 Lines • ▼ Show 20 Lines
// Insts so they will be removed as well.		// Insts so they will be removed as well.
void ReassociatePass::RecursivelyEraseDeadInsts(		void ReassociatePass::RecursivelyEraseDeadInsts(
Instruction *I, SetVector<AssertingVH<Instruction>> &Insts) {		Instruction *I, SetVector<AssertingVH<Instruction>> &Insts) {
assert(isInstructionTriviallyDead(I) && "Trivially dead instructions only!");		assert(isInstructionTriviallyDead(I) && "Trivially dead instructions only!");
SmallVector<Value *, 4> Ops(I->op_begin(), I->op_end());		SmallVector<Value *, 4> Ops(I->op_begin(), I->op_end());
ValueRankMap.erase(I);		ValueRankMap.erase(I);
Insts.remove(I);		Insts.remove(I);
RedoInsts.remove(I);		RedoInsts.remove(I);
		if (Pass == 0)
		InstrOpMap.erase(I);
I->eraseFromParent();		I->eraseFromParent();
for (auto Op : Ops)		for (auto Op : Ops)
if (Instruction *OpInst = dyn_cast<Instruction>(Op))		if (Instruction *OpInst = dyn_cast<Instruction>(Op))
if (OpInst->use_empty())		if (OpInst->use_empty())
Insts.insert(OpInst);		Insts.insert(OpInst);
}		}

/// Zap the given instruction, adding interesting operands to the work list.		/// Zap the given instruction, adding interesting operands to the work list.
void ReassociatePass::EraseInst(Instruction *I) {		void ReassociatePass::EraseInst(Instruction *I) {
assert(isInstructionTriviallyDead(I) && "Trivially dead instructions only!");		assert(isInstructionTriviallyDead(I) && "Trivially dead instructions only!");
DEBUG(dbgs() << "Erasing dead inst: "; I->dump());		DEBUG(dbgs() << "Erasing dead inst: "; I->dump());

SmallVector<Value*, 8> Ops(I->op_begin(), I->op_end());		SmallVector<Value*, 8> Ops(I->op_begin(), I->op_end());
// Erase the dead instruction.		// Erase the dead instruction.
ValueRankMap.erase(I);		ValueRankMap.erase(I);
RedoInsts.remove(I);		RedoInsts.remove(I);
		if (Pass == 0)
		InstrOpMap.erase(I);
I->eraseFromParent();		I->eraseFromParent();
// Optimize its operands.		// Optimize its operands.
SmallPtrSet<Instruction *, 8> Visited; // Detect self-referential nodes.		SmallPtrSet<Instruction *, 8> Visited; // Detect self-referential nodes.
for (unsigned i = 0, e = Ops.size(); i != e; ++i)		for (unsigned i = 0, e = Ops.size(); i != e; ++i)
if (Instruction *Op = dyn_cast<Instruction>(Ops[i])) {		if (Instruction *Op = dyn_cast<Instruction>(Ops[i])) {
// If this is a node in an expression tree, climb to the expression root		// If this is a node in an expression tree, climb to the expression root
// and add that since that's where optimization actually happens.		// and add that since that's where optimization actually happens.
unsigned Opcode = Op->getOpcode();		unsigned Opcode = Op->getOpcode();
▲ Show 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	if (Ops.size() == 1) {
// eliminate it.		// eliminate it.
I->replaceAllUsesWith(Ops[0].Op);		I->replaceAllUsesWith(Ops[0].Op);
if (Instruction *OI = dyn_cast<Instruction>(Ops[0].Op))		if (Instruction *OI = dyn_cast<Instruction>(Ops[0].Op))
OI->setDebugLoc(I->getDebugLoc());		OI->setDebugLoc(I->getDebugLoc());
RedoInsts.insert(I);		RedoInsts.insert(I);
return;		return;
}		}

		if (Pass == 0) {
		auto res = InstrOpMap.insert({I, SmallVector<Value *, 8>()});
		for (auto Op : Ops)
		res.first->second.push_back(Op.Op);
		} else if (Ops.size() > 2) {
		unsigned Max = 1;
		unsigned BestRank = 0;
		std::pair<unsigned, unsigned> BestPair;
		unsigned Idx = I->getOpcode() - Instruction::BinaryOpsBegin;
		for (unsigned i = 0; i < Ops.size() - 1; ++i)
		for (unsigned j = i + 1; j < Ops.size(); ++j) {
		unsigned Score = 0;
		auto it = PairMap[Idx].find({Ops[i].Op, Ops[j].Op});
		if (it != PairMap[Idx].end())
		Score += it->second;
		if (Ops[i].Op != Ops[j].Op) {
		auto it2 = PairMap[Idx].find({Ops[j].Op, Ops[i].Op});
		fhahnUnsubmitted Not Done Reply Inline Actions Could we order the keys in the pairs to avoid a second lookup here, i.e. insert pairs with <a, b> where a < b? fhahn: Could we order the keys in the pairs to avoid a second lookup here, i.e. insert pairs with <a…
		eschaAuthorUnsubmitted Not Done Reply Inline Actions hmm. what would be the best way to do this? It's possible for two instructions to have the same Rank, iirc. would pointer order be acceptable? I'd be worried about determinism. escha: hmm. what would be the best way to do this? It's possible for two instructions to have the same…
		if (it2 != PairMap[Idx].end())
		Score += it2->second;
		}

		unsigned MaxRank = std::max(Ops[i].Rank, Ops[j].Rank);
		hiradityaUnsubmitted Not Done Reply Inline Actions Does this ensure that the original order of evaluation is emitted when there is no preferred order? hiraditya: Does this ensure that the original order of evaluation is emitted when there is no preferred…
		eschaAuthorUnsubmitted Not Done Reply Inline Actions I think it should? Not 100% sure. It currently doesn't reorder if Score never exceeds 1 but I don't fully trust this code. There's probably cases where it's "wrong" (as in, doing something unnecessary/suboptimal with order, not incorrect code). escha: I think it should? Not 100% sure. It currently doesn't reorder if Score never exceeds 1 but I…
		if (Score > Max \|\| (Score == Max && MaxRank < BestRank)) {
		BestPair = {i, j};
		Max = Score;
		BestRank = MaxRank;
		// fprintf(stderr,"%d\n", BestRank);
		arsenmUnsubmitted Not Done Reply Inline Actions Leftover debug printing arsenm: Leftover debug printing
		}
		}
		if (Max > 1) {
		auto Op0 = Ops[BestPair.first];
		auto Op1 = Ops[BestPair.second];
		Ops.erase(&Ops[BestPair.second]);
		Ops.erase(&Ops[BestPair.first]);
		Ops.push_back(Op0);
		Ops.push_back(Op1);
		}
		}
// Now that we ordered and optimized the expressions, splat them back into		// Now that we ordered and optimized the expressions, splat them back into
// the expression tree, removing any unneeded nodes.		// the expression tree, removing any unneeded nodes.
RewriteExprTree(I, Ops);		RewriteExprTree(I, Ops);
}		}

PreservedAnalyses ReassociatePass::run(Function &F, FunctionAnalysisManager &) {		PreservedAnalyses ReassociatePass::run(Function &F, FunctionAnalysisManager &) {
// Get the functions basic blocks in Reverse Post Order. This order is used by		// Get the functions basic blocks in Reverse Post Order. This order is used by
// BuildRankMap to pre calculate ranks correctly. It also excludes dead basic		// BuildRankMap to pre calculate ranks correctly. It also excludes dead basic
// blocks (it has been seen that the analysis in this pass could hang when		// blocks (it has been seen that the analysis in this pass could hang when
// analysing dead basic blocks).		// analysing dead basic blocks).
ReversePostOrderTraversal<Function *> RPOT(&F);		ReversePostOrderTraversal<Function *> RPOT(&F);

// Calculate the rank map for F.		// Calculate the rank map for F.
BuildRankMap(F, RPOT);		BuildRankMap(F, RPOT);

MadeChange = false;		MadeChange = false;
// Traverse the same blocks that was analysed by BuildRankMap.		// Traverse the same blocks that was analysed by BuildRankMap.
		for (int q = 0; q < 2; ++q) {
		fhahnUnsubmitted Not Done Reply Inline Actions It looks like we are running the reassociate 4 times? Is this needed? From the description it sounds like 2 times should be enough? fhahn: It looks like we are running the reassociate 4 times? Is this needed? From the description it…
		eschaAuthorUnsubmitted Not Done Reply Inline Actions oh, right. this was me experimenting to see if i ran it twice I'd get better results .... and I did! I'm guessing this is because if you heavily reassociate based on global statistics, it opens up new opportunities, and so on. But it might be unnecessary. this is probably in part caused by the fact that my patch isn't worklist based (i.e. run over it repeatedly instead of add things to a worklist to revisit) escha: oh, right. this was me experimenting to see if i ran it twice I'd get better results .... and I…
		for (Pass = 0; Pass < 2; ++Pass) {
for (BasicBlock *BI : RPOT) {		for (BasicBlock *BI : RPOT) {
assert(RankMap.count(&*BI) && "BB should be ranked.");		assert(RankMap.count(&*BI) && "BB should be ranked.");
// Optimize every instruction in the basic block.		// Optimize every instruction in the basic block.
for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;)		for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;)
if (isInstructionTriviallyDead(&*II)) {		if (isInstructionTriviallyDead(&*II)) {
EraseInst(&*II++);		EraseInst(&*II++);
} else {		} else {
OptimizeInst(&*II);		OptimizeInst(&*II);
assert(II->getParent() == &*BI && "Moved to a different block!");		assert(II->getParent() == &*BI && "Moved to a different block!");
++II;		++II;
}		}

// Make a copy of all the instructions to be redone so we can remove dead		// Make a copy of all the instructions to be redone so we can remove
// instructions.		// dead instructions.
SetVector<AssertingVH<Instruction>> ToRedo(RedoInsts);		SetVector<AssertingVH<Instruction>> ToRedo(RedoInsts);
// Iterate over all instructions to be reevaluated and remove trivially dead		// Iterate over all instructions to be reevaluated and remove trivially
// instructions. If any operand of the trivially dead instruction becomes		// dead instructions. If any operand of the trivially dead instruction
// dead mark it for deletion as well. Continue this process until all		// becomes dead mark it for deletion as well. Continue this process
// trivially dead instructions have been removed.		// until all trivially dead instructions have been removed.
while (!ToRedo.empty()) {		while (!ToRedo.empty()) {
Instruction *I = ToRedo.pop_back_val();		Instruction *I = ToRedo.pop_back_val();
if (isInstructionTriviallyDead(I)) {		if (isInstructionTriviallyDead(I)) {
RecursivelyEraseDeadInsts(I, ToRedo);		RecursivelyEraseDeadInsts(I, ToRedo);
MadeChange = true;		MadeChange = true;
}		}
}		}

// Now that we have removed dead instructions, we can reoptimize the		// Now that we have removed dead instructions, we can reoptimize the
// remaining instructions.		// remaining instructions.
while (!RedoInsts.empty()) {		while (!RedoInsts.empty()) {
Instruction *I = RedoInsts.pop_back_val();		Instruction *I = RedoInsts.pop_back_val();
if (isInstructionTriviallyDead(I))		if (isInstructionTriviallyDead(I))
EraseInst(I);		EraseInst(I);
else		else
OptimizeInst(I);		OptimizeInst(I);
}		}
}		}
		if (Pass == 1)
		break;

		for (BasicBlock *BI : RPOT) {
		fhahnUnsubmitted Not Done Reply Inline Actions It's probably easier to understand if this would be done in a separate function. fhahn: It's probably easier to understand if this would be done in a separate function.
		eschaAuthorUnsubmitted Not Done Reply Inline Actions good point. agreed escha: good point. agreed
		for (Instruction &I : *BI) {
		if (!isa<BinaryOperator>(&I))
		continue;

		unsigned BinaryIdx = I.getOpcode() - Instruction::BinaryOpsBegin;
		auto it = InstrOpMap.find(&I);
		if (it != InstrOpMap.end()) {
		for (unsigned i = 0; i < it->second.size(); ++i)
		for (unsigned j = i + 1; j < it->second.size(); ++j) {
		auto res = PairMap[BinaryIdx].insert(
		{{it->second[i], it->second[j]}, 1});
		if (!res.second)
		++res.first->second;
		}
		} else {
		auto res = PairMap[BinaryIdx].insert(
		{{I.getOperand(0), I.getOperand(1)}, 1});
		if (!res.second)
		++res.first->second;
		}
		}
		}
		InstrOpMap.clear();
		}
		for (auto &Entry : PairMap)
		Entry.clear();
		}

// We are done with the rank map.		// We are done with the rank map.
RankMap.clear();		RankMap.clear();
ValueRankMap.clear();		ValueRankMap.clear();
		for (auto &Entry : PairMap)
		Entry.clear();

if (MadeChange) {		if (MadeChange) {
PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
return PA;		return PA;
}		}

▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines