This is an archive of the discontinued LLVM Phabricator instance.

Compile time decreasing in the case we're dealing with Machine Combiner
ClosedPublic

Authored by avt77 on Feb 7 2017, 3:00 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
guyblank
Gerolf
hfinkel

Summary

This issue raises in https://reviews.llvm.org/D26855: when we tried to use Machine Combiner instead of DAG Combiner for fast-math operations (in our case it was fdiv as a fisrt step) we got significant increasing of compile time - from about 2s up to 18s. I used a very specific and worst possible test for div operations (see attachment) but in any case it was very unpleasant result. After compiler profiling I discovered that the reason of the slowdown is huge number of creations of MC traces (even if don't use them). This patch moves the trace creation in the place when it's really needed.

Diff Detail

Event Timeline

spill_fdiv.ll644 KBDownload

I uploaded the test for review

RKSimon added a subscriber: llvm-commits.Feb 7 2017, 4:29 AM

Looks alright to me - any other comments?

LGTM.

Please put a perf measurement in the commit message (or add a test to test-suite so we don't regress?) - something like:

Before this patch (testing on N GHz Haswell):
$ time llc -o /dev/null -enable-unsafe-fp-math foo.ll
X secs
After this patch:
$ time llc -o /dev/null -enable-unsafe-fp-math foo.ll
Y secs
foo.ll is attached to D29627

lib/CodeGen/MachineCombiner.cpp
449–450	How about making the comment more direct, so we don't lose track of the reason. Something like: "Calculating the trace metrics may be expensive, so only do this when necessary."

This revision is now accepted and ready to land.Feb 9 2017, 8:44 AM

Thanks for following up on this.

LGTM.

Just for your info: I collected the perf numbers.

DAGCombiner - trunk

time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
real 0m1.685s

DAGCombiner + Speed patch

time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
real 0m1.655s

MachineCombiner w/o Speed patch

time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
real 0m21.614s

MachineCombiner + Speed patch

time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
real 0m1.593s

avt77 mentioned this in rL294936: Compile time decreasing in the case we're dealing with Machine Combiner. .Feb 13 2017, 1:55 AM

Committed revision 294936

rL294936

RKSimon mentioned this in D26855: New unsafe-fp-math implementation for X86 target.Mar 9 2017, 11:27 AM

Revision Contents

Path

Size

lib/

CodeGen/

MachineCombiner.cpp

41 lines

Diff 87384

lib/CodeGen/MachineCombiner.cpp

Show First 20 Lines • Show All 348 Lines • ▼ Show 20 Lines
bool MachineCombiner::doSubstitute(unsigned NewSize, unsigned OldSize) {		bool MachineCombiner::doSubstitute(unsigned NewSize, unsigned OldSize) {
if (OptSize && (NewSize < OldSize))		if (OptSize && (NewSize < OldSize))
return true;		return true;
if (!TSchedModel.hasInstrSchedModelOrItineraries())		if (!TSchedModel.hasInstrSchedModelOrItineraries())
return true;		return true;
return false;		return false;
}		}

		static void insertDeleteInstructions(MachineBasicBlock *MBB, MachineInstr &MI,
		SmallVector<MachineInstr *, 16> InsInstrs,
		SmallVector<MachineInstr *, 16> DelInstrs,
		MachineTraceMetrics *Traces) {
		for (auto *InstrPtr : InsInstrs)
		MBB->insert((MachineBasicBlock::iterator)&MI, InstrPtr);
		for (auto *InstrPtr : DelInstrs)
		InstrPtr->eraseFromParentAndMarkDBGValuesForRemoval();
		++NumInstCombined;
		Traces->invalidate(MBB);
		Traces->verifyAnalysis();
		}

/// Substitute a slow code sequence with a faster one by		/// Substitute a slow code sequence with a faster one by
/// evaluating instruction combining pattern.		/// evaluating instruction combining pattern.
/// The prototype of such a pattern is MUl + ADD -> MADD. Performs instruction		/// The prototype of such a pattern is MUl + ADD -> MADD. Performs instruction
/// combining based on machine trace metrics. Only combine a sequence of		/// combining based on machine trace metrics. Only combine a sequence of
/// instructions when this neither lengthens the critical path nor increases		/// instructions when this neither lengthens the critical path nor increases
/// resource pressure. When optimizing for codesize always combine when the new		/// resource pressure. When optimizing for codesize always combine when the new
/// sequence is shorter.		/// sequence is shorter.
bool MachineCombiner::combineInstructions(MachineBasicBlock *MBB) {		bool MachineCombiner::combineInstructions(MachineBasicBlock *MBB) {
Show All 38 Lines	if (!TII->getMachineCombinerPatterns(MI, Patterns))
continue;		continue;

for (auto P : Patterns) {		for (auto P : Patterns) {
SmallVector<MachineInstr *, 16> InsInstrs;		SmallVector<MachineInstr *, 16> InsInstrs;
SmallVector<MachineInstr *, 16> DelInstrs;		SmallVector<MachineInstr *, 16> DelInstrs;
DenseMap<unsigned, unsigned> InstrIdxForVirtReg;		DenseMap<unsigned, unsigned> InstrIdxForVirtReg;
if (!MinInstr)		if (!MinInstr)
MinInstr = Traces->getEnsemble(MachineTraceMetrics::TS_MinInstrCount);		MinInstr = Traces->getEnsemble(MachineTraceMetrics::TS_MinInstrCount);
MachineTraceMetrics::Trace BlockTrace = MinInstr->getTrace(MBB);
Traces->verifyAnalysis();		Traces->verifyAnalysis();
TII->genAlternativeCodeSequence(MI, P, InsInstrs, DelInstrs,		TII->genAlternativeCodeSequence(MI, P, InsInstrs, DelInstrs,
InstrIdxForVirtReg);		InstrIdxForVirtReg);
unsigned NewInstCount = InsInstrs.size();		unsigned NewInstCount = InsInstrs.size();
unsigned OldInstCount = DelInstrs.size();		unsigned OldInstCount = DelInstrs.size();
// Found pattern, but did not generate alternative sequence.		// Found pattern, but did not generate alternative sequence.
// This can happen e.g. when an immediate could not be materialized		// This can happen e.g. when an immediate could not be materialized
// in a single instruction.		// in a single instruction.
if (!NewInstCount)		if (!NewInstCount)
continue;		continue;

bool SubstituteAlways = false;		bool SubstituteAlways = false;
if (ML && TII->isThroughputPattern(P))		if (ML && TII->isThroughputPattern(P))
SubstituteAlways = true;		SubstituteAlways = true;

// Substitute when we optimize for codesize and the new sequence has		// Substitute when we optimize for codesize and the new sequence has
// fewer instructions OR		// fewer instructions OR
// the new sequence neither lengthens the critical path nor increases		// the new sequence neither lengthens the critical path nor increases
// resource pressure.		// resource pressure.
if (SubstituteAlways \|\| doSubstitute(NewInstCount, OldInstCount) \|\|		if (SubstituteAlways \|\| doSubstitute(NewInstCount, OldInstCount)) {
(improvesCriticalPathLen(MBB, &MI, BlockTrace, InsInstrs,		insertDeleteInstructions(MBB, MI, InsInstrs, DelInstrs, Traces);
DelInstrs, InstrIdxForVirtReg, P) &&
preservesResourceLen(MBB, BlockTrace, InsInstrs, DelInstrs))) {
for (auto *InstrPtr : InsInstrs)
MBB->insert((MachineBasicBlock::iterator) &MI, InstrPtr);
for (auto *InstrPtr : DelInstrs)
InstrPtr->eraseFromParentAndMarkDBGValuesForRemoval();

Changed = true;
++NumInstCombined;

Traces->invalidate(MBB);
Traces->verifyAnalysis();
// Eagerly stop after the first pattern fires.		// Eagerly stop after the first pattern fires.
		Changed = true;
break;		break;
} else {		} else {
		// We're getting the trace when we really need it for computations
		MachineTraceMetrics::Trace BlockTrace = MinInstr->getTrace(MBB);
		spatelUnsubmitted Not Done Reply Inline Actions How about making the comment more direct, so we don't lose track of the reason. Something like: "Calculating the trace metrics may be expensive, so only do this when necessary." spatel: How about making the comment more direct, so we don't lose track of the reason. Something like…
		if (improvesCriticalPathLen(MBB, &MI, BlockTrace, InsInstrs, DelInstrs,
		InstrIdxForVirtReg, P) &&
		preservesResourceLen(MBB, BlockTrace, InsInstrs, DelInstrs)) {
		insertDeleteInstructions(MBB, MI, InsInstrs, DelInstrs, Traces);
		// Eagerly stop after the first pattern fires.
		Changed = true;
		break;
		}
// Cleanup instructions of the alternative code sequence. There is no		// Cleanup instructions of the alternative code sequence. There is no
// use for them.		// use for them.
MachineFunction *MF = MBB->getParent();		MachineFunction *MF = MBB->getParent();
for (auto *InstrPtr : InsInstrs)		for (auto *InstrPtr : InsInstrs)
MF->DeleteMachineInstr(InstrPtr);		MF->DeleteMachineInstr(InstrPtr);
}		}
InstrIdxForVirtReg.clear();		InstrIdxForVirtReg.clear();
}		}
Show All 31 Lines