This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Transforms/Scalar/
-
llvm/
-
Transforms/
-
Scalar/
-
Reassociate.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
Reassociate.cpp
-
test/Transforms/Reassociate/
-
Transforms/
-
Reassociate/
4
long-chains.ll

Differential D45734

[reassociate] Fix excessive revisits when processing long chains of reassociatable instructions.
ClosedPublic

Authored by dsanders on Apr 17 2018, 1:20 PM.

Download Raw Diff

Details

Reviewers

javed.absar
• dberlin

Commits

rG8d0d1aa2291e: [reassociate] Fix excessive revisits when processing long chains of…
rL331381: [reassociate] Fix excessive revisits when processing long chains of…

Summary

Some of our internal testing detected a major compile time regression which I've
tracked down to:

r278938 - Revert "Reassociate: Reprocess RedoInsts after each inst".

It appears that processing long chains of reassociatable instructions causes
non-linear (potentially exponential) growth in the number of times an
instruction is revisited. For example, the included test revisits instructions
220 times in a 20-instruction test.

It appears that r278938 reversed the order instructions were visited and that
this is preventing scheduled revisits from being cancelled as a result of
visiting the instructions naturally during normal processing. However, simply
reversing the order also harmed the generated code. Upon closer inspection, it
was discovered that revisits occurred in the opposite order to the first pass
(Thanks to escha for spotting that).

This patch makes the revisit order consistent with the first pass which allows
more revisits to be cancelled. This does appear to have a small impact on the
generated code in few cases but it significantly reduces compile-time.

After this patch, our internal test that was most affected by the regression
dropped from ~2 million revisits to ~4k resulting in Reassociate having 0.46%
of the runtime it had before (99.54% improvement).

Here's the summaries reported by lnt for the LLVM test-suite with --benchmarking-only:

metric	geomean before patch	geomean after patch	delta
compile time	0.1956	0.1261	-35.54%
execution time	0.3240	0.3237	-
code size	7365.4459	7365.6079	-

The results have a few wins and losses on compile-time, mostly in the +/- 2.5% range. There was one outlier though:

Performance Regressions - compile_time	Δ	Previous	Current
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk	9.82%	2.0473	2.2483

Diff Detail

Repository

rL LLVM

Build Status

Buildable 17148
Build 17148: arc lint + arc unit

Event Timeline

dsanders created this revision.Apr 17 2018, 1:20 PM

Herald added a subscriber: kristof.beyls. · View Herald TranscriptApr 17 2018, 1:20 PM

I have some performance figures from the LLVM test-suite now using the --benchmarking-only option. Here's the summaries reported by lnt:

metric	geomean before patch	geomean after patch	delta
compile time	0.1956	0.1261	-35.54%
execution time	0.3240	0.3237	-
code size	7365.4459	7365.6079	-

The results have a few wins and losses on compile-time, mostly in the +/- 2.5% range. There was one outlier though:

Performance Regressions - compile_time	Δ	Previous	Current
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk	9.82%	2.0473	2.2483

Herald added a reviewer: javed.absar. · View Herald TranscriptMay 2 2018, 9:26 AM

• dberlin accepted this revision.May 2 2018, 9:30 AM

This revision is now accepted and ready to land.May 2 2018, 9:30 AM

This looks right.
There is an optimal order to visit and revisit, and if you switch them, you will definitely take an extra N iterations per instruction :)

• dberlin added inline comments.May 2 2018, 9:42 AM

test/Transforms/Reassociate/long-chains.ll
31	Is it possible to check the number of visits is within a range? (IE 100-200, and not 1 million)? I don't remember if lit can do this :(

Thanks

test/Transforms/Reassociate/long-chains.ll
31	I don't think it can do ranges but orders of magnitude are possible by matching a fixed number of digits with something like: ; CHECK: {{[1-9][0-9]}} reassociate - Number of insts reassociated Would you like me to change it to that?

• dberlin added inline comments.May 2 2018, 9:50 AM

test/Transforms/Reassociate/long-chains.ll
31	Yeah, why don't we do that just so it doesn't get messed up by random changes. We really care about the order of magnitude, not the exact number. (I think :-P)

dsanders added inline comments.May 2 2018, 10:28 AM

test/Transforms/Reassociate/long-chains.ll
31	Ok. For the `Number of insts reassociated` checks I'll go with [1-9][0-9] (effectively >=10 and <= 99) and for the `Number of multiplies factored` I'll go with [3-9] so the number of changes made can't go down without us noticing.

dsanders edited the summary of this revision. (Show Details)May 2 2018, 11:02 AM

Closed by commit rL331381: [reassociate] Fix excessive revisits when processing long chains of… (authored by dsanders). · Explain WhyMay 2 2018, 11:02 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Transforms/

Scalar/

Reassociate.h

11 lines

lib/

Transforms/

Scalar/

Reassociate.cpp

15 lines

test/

Transforms/

Reassociate/

long-chains.ll

31 lines

Diff 142820

include/llvm/Transforms/Scalar/Reassociate.h

Show All 23 Lines
#define LLVM_TRANSFORMS_SCALAR_REASSOCIATE_H		#define LLVM_TRANSFORMS_SCALAR_REASSOCIATE_H

#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
		#include <deque>

namespace llvm {		namespace llvm {

class APInt;		class APInt;
class BasicBlock;		class BasicBlock;
class BinaryOperator;		class BinaryOperator;
class Function;		class Function;
class Instruction;		class Instruction;
Show All 24 Lines
};		};

class XorOpnd;		class XorOpnd;

} // end namespace reassociate		} // end namespace reassociate

/// Reassociate commutative expressions.		/// Reassociate commutative expressions.
class ReassociatePass : public PassInfoMixin<ReassociatePass> {		class ReassociatePass : public PassInfoMixin<ReassociatePass> {
		public:
		using OrderedSet =
		SetVector<AssertingVH<Instruction>, std::deque<AssertingVH<Instruction>>>;

		protected:
DenseMap<BasicBlock *, unsigned> RankMap;		DenseMap<BasicBlock *, unsigned> RankMap;
DenseMap<AssertingVH<Value>, unsigned> ValueRankMap;		DenseMap<AssertingVH<Value>, unsigned> ValueRankMap;
SetVector<AssertingVH<Instruction>> RedoInsts;		OrderedSet RedoInsts;

// Arbitrary, but prevents quadratic behavior.		// Arbitrary, but prevents quadratic behavior.
static const unsigned GlobalReassociateLimit = 10;		static const unsigned GlobalReassociateLimit = 10;
static const unsigned NumBinaryOps =		static const unsigned NumBinaryOps =
Instruction::BinaryOpsEnd - Instruction::BinaryOpsBegin;		Instruction::BinaryOpsEnd - Instruction::BinaryOpsBegin;
DenseMap<std::pair<Value , Value >, unsigned> PairMap[NumBinaryOps];		DenseMap<std::pair<Value , Value >, unsigned> PairMap[NumBinaryOps];

bool MadeChange;		bool MadeChange;
Show All 20 Lines	bool CombineXorOpnd(Instruction I, reassociate::XorOpnd Opnd1,
reassociate::XorOpnd *Opnd2, APInt &ConstOpnd,		reassociate::XorOpnd *Opnd2, APInt &ConstOpnd,
Value *&Res);		Value *&Res);
Value *buildMinimalMultiplyDAG(IRBuilder<> &Builder,		Value *buildMinimalMultiplyDAG(IRBuilder<> &Builder,
SmallVectorImpl<reassociate::Factor> &Factors);		SmallVectorImpl<reassociate::Factor> &Factors);
Value OptimizeMul(BinaryOperator I,		Value OptimizeMul(BinaryOperator I,
SmallVectorImpl<reassociate::ValueEntry> &Ops);		SmallVectorImpl<reassociate::ValueEntry> &Ops);
Value RemoveFactorFromExpression(Value V, Value *Factor);		Value RemoveFactorFromExpression(Value V, Value *Factor);
void EraseInst(Instruction *I);		void EraseInst(Instruction *I);
void RecursivelyEraseDeadInsts(Instruction *I,		void RecursivelyEraseDeadInsts(Instruction *I, OrderedSet &Insts);
SetVector<AssertingVH<Instruction>> &Insts);
void OptimizeInst(Instruction *I);		void OptimizeInst(Instruction *I);
Instruction canonicalizeNegConstExpr(Instruction I);		Instruction canonicalizeNegConstExpr(Instruction I);
void BuildPairMap(ReversePostOrderTraversal<Function *> &RPOT);		void BuildPairMap(ReversePostOrderTraversal<Function *> &RPOT);
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TRANSFORMS_SCALAR_REASSOCIATE_H		#endif // LLVM_TRANSFORMS_SCALAR_REASSOCIATE_H

lib/Transforms/Scalar/Reassociate.cpp

Show First 20 Lines • Show All 792 Lines • ▼ Show 20 Lines
/// Insert instructions before the instruction pointed to by BI,		/// Insert instructions before the instruction pointed to by BI,
/// that computes the negative version of the value specified. The negative		/// that computes the negative version of the value specified. The negative
/// version of the value is returned, and BI is left pointing at the instruction		/// version of the value is returned, and BI is left pointing at the instruction
/// that should be processed next by the reassociation pass.		/// that should be processed next by the reassociation pass.
/// Also add intermediate instructions to the redo list that are modified while		/// Also add intermediate instructions to the redo list that are modified while
/// pushing the negates through adds. These will be revisited to see if		/// pushing the negates through adds. These will be revisited to see if
/// additional opportunities have been exposed.		/// additional opportunities have been exposed.
static Value NegateValue(Value V, Instruction *BI,		static Value NegateValue(Value V, Instruction *BI,
SetVector<AssertingVH<Instruction>> &ToRedo) {		ReassociatePass::OrderedSet &ToRedo) {
if (auto *C = dyn_cast<Constant>(V))		if (auto *C = dyn_cast<Constant>(V))
return C->getType()->isFPOrFPVectorTy() ? ConstantExpr::getFNeg(C) :		return C->getType()->isFPOrFPVectorTy() ? ConstantExpr::getFNeg(C) :
ConstantExpr::getNeg(C);		ConstantExpr::getNeg(C);

// We are trying to expose opportunity for reassociation. One of the things		// We are trying to expose opportunity for reassociation. One of the things
// that we want to do to achieve this is to push a negation as deep into an		// that we want to do to achieve this is to push a negation as deep into an
// expression chain as possible, to expose the add instructions. In practice,		// expression chain as possible, to expose the add instructions. In practice,
// this means that we turn this:		// this means that we turn this:
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	if (Sub->hasOneUse() &&
isReassociableOp(VB, Instruction::Sub, Instruction::FSub)))		isReassociableOp(VB, Instruction::Sub, Instruction::FSub)))
return true;		return true;

return false;		return false;
}		}

/// If we have (X-Y), and if either X is an add, or if this is only used by an		/// If we have (X-Y), and if either X is an add, or if this is only used by an
/// add, transform this into (X+(0-Y)) to promote better reassociation.		/// add, transform this into (X+(0-Y)) to promote better reassociation.
static BinaryOperator *		static BinaryOperator BreakUpSubtract(Instruction Sub,
BreakUpSubtract(Instruction *Sub, SetVector<AssertingVH<Instruction>> &ToRedo) {		ReassociatePass::OrderedSet &ToRedo) {
// Convert a subtract into an add and a neg instruction. This allows sub		// Convert a subtract into an add and a neg instruction. This allows sub
// instructions to be commuted with other add instructions.		// instructions to be commuted with other add instructions.
//		//
// Calculate the negative value of Operand 1 of the sub instruction,		// Calculate the negative value of Operand 1 of the sub instruction,
// and set it as the RHS of the add instruction we just made.		// and set it as the RHS of the add instruction we just made.
Value *NegVal = NegateValue(Sub->getOperand(1), Sub, ToRedo);		Value *NegVal = NegateValue(Sub->getOperand(1), Sub, ToRedo);
BinaryOperator *New = CreateAdd(Sub->getOperand(0), NegVal, "", Sub, Sub);		BinaryOperator *New = CreateAdd(Sub->getOperand(0), NegVal, "", Sub, Sub);
Sub->setOperand(0, Constant::getNullValue(Sub->getType())); // Drop use of op.		Sub->setOperand(0, Constant::getNullValue(Sub->getType())); // Drop use of op.
▲ Show 20 Lines • Show All 929 Lines • ▼ Show 20 Lines	Value ReassociatePass::OptimizeExpression(BinaryOperator I,

if (Ops.size() != NumOps)		if (Ops.size() != NumOps)
return OptimizeExpression(I, Ops);		return OptimizeExpression(I, Ops);
return nullptr;		return nullptr;
}		}

// Remove dead instructions and if any operands are trivially dead add them to		// Remove dead instructions and if any operands are trivially dead add them to
// Insts so they will be removed as well.		// Insts so they will be removed as well.
void ReassociatePass::RecursivelyEraseDeadInsts(		void ReassociatePass::RecursivelyEraseDeadInsts(Instruction *I,
Instruction *I, SetVector<AssertingVH<Instruction>> &Insts) {		OrderedSet &Insts) {
assert(isInstructionTriviallyDead(I) && "Trivially dead instructions only!");		assert(isInstructionTriviallyDead(I) && "Trivially dead instructions only!");
SmallVector<Value *, 4> Ops(I->op_begin(), I->op_end());		SmallVector<Value *, 4> Ops(I->op_begin(), I->op_end());
ValueRankMap.erase(I);		ValueRankMap.erase(I);
Insts.remove(I);		Insts.remove(I);
RedoInsts.remove(I);		RedoInsts.remove(I);
I->eraseFromParent();		I->eraseFromParent();
for (auto Op : Ops)		for (auto Op : Ops)
if (Instruction *OpInst = dyn_cast<Instruction>(Op))		if (Instruction *OpInst = dyn_cast<Instruction>(Op))
▲ Show 20 Lines • Show All 444 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;)
} else {		} else {
OptimizeInst(&*II);		OptimizeInst(&*II);
assert(II->getParent() == &*BI && "Moved to a different block!");		assert(II->getParent() == &*BI && "Moved to a different block!");
++II;		++II;
}		}

// Make a copy of all the instructions to be redone so we can remove dead		// Make a copy of all the instructions to be redone so we can remove dead
// instructions.		// instructions.
SetVector<AssertingVH<Instruction>> ToRedo(RedoInsts);		OrderedSet ToRedo(RedoInsts);
// Iterate over all instructions to be reevaluated and remove trivially dead		// Iterate over all instructions to be reevaluated and remove trivially dead
// instructions. If any operand of the trivially dead instruction becomes		// instructions. If any operand of the trivially dead instruction becomes
// dead mark it for deletion as well. Continue this process until all		// dead mark it for deletion as well. Continue this process until all
// trivially dead instructions have been removed.		// trivially dead instructions have been removed.
while (!ToRedo.empty()) {		while (!ToRedo.empty()) {
Instruction *I = ToRedo.pop_back_val();		Instruction *I = ToRedo.pop_back_val();
if (isInstructionTriviallyDead(I)) {		if (isInstructionTriviallyDead(I)) {
RecursivelyEraseDeadInsts(I, ToRedo);		RecursivelyEraseDeadInsts(I, ToRedo);
MadeChange = true;		MadeChange = true;
}		}
}		}

// Now that we have removed dead instructions, we can reoptimize the		// Now that we have removed dead instructions, we can reoptimize the
// remaining instructions.		// remaining instructions.
while (!RedoInsts.empty()) {		while (!RedoInsts.empty()) {
Instruction *I = RedoInsts.pop_back_val();		Instruction *I = RedoInsts.front();
		RedoInsts.erase(RedoInsts.begin());
if (isInstructionTriviallyDead(I))		if (isInstructionTriviallyDead(I))
EraseInst(I);		EraseInst(I);
else		else
OptimizeInst(I);		OptimizeInst(I);
}		}
}		}

// We are done with the rank map and pair map.		// We are done with the rank map and pair map.
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

test/Transforms/Reassociate/long-chains.ll

This file was added.

				; RUN: opt < %s -reassociate -stats -S 2>&1 \| FileCheck %s
				; REQUIRES: asserts

				define i8 @longchain(i8 %in1, i8 %in2, i8 %in3, i8 %in4, i8 %in5, i8 %in6, i8 %in7, i8 %in8, i8 %in9, i8 %in10, i8 %in11, i8 %in12, i8 %in13, i8 %in14, i8 %in15, i8 %in16, i8 %in17, i8 %in18, i8 %in19, i8 %in20) {
				%tmp1 = add i8 %in1, %in2
				%tmp2 = add i8 %tmp1, %in3
				%tmp3 = add i8 %tmp2, %in4
				%tmp4 = add i8 %tmp3, %in3
				%tmp5 = add i8 %tmp4, %in4
				%tmp6 = add i8 %tmp5, %in5
				%tmp7 = add i8 %tmp6, %in6
				%tmp8 = add i8 %tmp7, %in7
				%tmp9 = add i8 %tmp8, %in8
				%tmp10 = add i8 %tmp9, %in9
				%tmp11 = add i8 %tmp10, %in10
				%tmp12 = add i8 %tmp11, %in11
				%tmp13 = add i8 %tmp12, %in12
				%tmp14 = add i8 %tmp13, %in13
				%tmp15 = add i8 %tmp14, %in14
				%tmp16 = add i8 %tmp15, %in15
				%tmp17 = add i8 %tmp16, %in16
				%tmp18 = add i8 %tmp17, %in17
				%tmp19 = add i8 %tmp18, %in18
				%tmp20 = add i8 %tmp19, %in19
				%tmp21 = add i8 %tmp20, %in20
				ret i8 %tmp20
				}

				; Was 220 reassociate - Number of insts reassociated
				; CHECK: 55 reassociate - Number of insts reassociated
				; CHECK: 3 reassociate - Number of multiplies factored
				dberlinUnsubmitted Not Done Reply Inline Actions Is it possible to check the number of visits is within a range? (IE 100-200, and not 1 million)? I don't remember if lit can do this :( dberlin: Is it possible to check the number of visits is within a range? (IE 100-200, and not 1…
				dsandersAuthorUnsubmitted Not Done Reply Inline Actions I don't think it can do ranges but orders of magnitude are possible by matching a fixed number of digits with something like: ; CHECK: {{[1-9][0-9]}} reassociate - Number of insts reassociated Would you like me to change it to that? dsanders: I don't think it can do ranges but orders of magnitude are possible by matching a fixed number…
				dberlinUnsubmitted Not Done Reply Inline Actions Yeah, why don't we do that just so it doesn't get messed up by random changes. We really care about the order of magnitude, not the exact number. (I think :-P) dberlin: Yeah, why don't we do that just so it doesn't get messed up by random changes. We really care…
				dsandersAuthorUnsubmitted Not Done Reply Inline Actions Ok. For the `Number of insts reassociated` checks I'll go with [1-9][0-9] (effectively >=10 and <= 99) and for the `Number of multiplies factored` I'll go with [3-9] so the number of changes made can't go down without us noticing. dsanders: Ok. For the `Number of insts reassociated` checks I'll go with [1-9][0-9] (effectively >=10 and…

This is an archive of the discontinued LLVM Phabricator instance.

[reassociate] Fix excessive revisits when processing long chains of reassociatable instructions.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 142820

include/llvm/Transforms/Scalar/Reassociate.h

lib/Transforms/Scalar/Reassociate.cpp

test/Transforms/Reassociate/long-chains.ll

[reassociate] Fix excessive revisits when processing long chains of reassociatable instructions.
ClosedPublic