This is an archive of the discontinued LLVM Phabricator instance.

AArch64: Disable the latency heuristic
ClosedPublic

Authored by MatzeB on Oct 13 2015, 2:59 PM.

Download Raw Diff

Details

Reviewers

rengolin
aadg
jmolloy

Commits

rGd276de6db1c2: AArch64: Disable the latency heuristic
rL251038: AArch64: Disable the latency heuristic

Summary

This patch disable the machine scheduler heuristic that attempts to balance scheduling between multiple long latency chains. In our benchmarks this heuristic tended to increase register pressure and lead to spilling occasionally but didn't appear to have any positive effects on any benchmarks (it seems long latency chains are scarce in practice and out of order cores tend to handle them well).

The main question to review here is if I should guard the changes to AArch64SubTarget.cpp with an "if (isCyclone())" or if the changes are fine on other aarch64 cores as well.

Diff Detail

Repository: rL LLVM

Event Timeline

MatzeB updated this revision to Diff 37289.Oct 13 2015, 2:59 PM

MatzeB retitled this revision from to AArch64: Disable the latency heuristic.

MatzeB updated this object.

MatzeB added reviewers: jmolloy, rengolin, aadg.

MatzeB set the repository for this revision to rL LLVM.

MatzeB added a subscriber: llvm-commits.

Herald added subscribers: rengolin, aemerson. · View Herald TranscriptOct 13 2015, 2:59 PM

mcrosier added a subscriber: mssimpso.Oct 14 2015, 4:39 AM

Hi Matthias,

Sorry for taking so long to spot this on my backlog.

I've just run a bunch of benchmarking, and I can fairly conclusively say that your patch reduces performance overall on both Cortex-A57 and Cortex-A53 (although with a mutually exclusive set of benchmarks, which is abnormal).

I see a 43% regression in lnt.MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 on Cortex-A57, and a 15% regression on lnt.SingleSource/Benchmarks/Misc/salsa20 on Cortex-A53 (there are many more regressions and some improvements - these are just the top).

So I think this should indeed be gated on Cyclone.

James

Thanks for benchmarking. TL;DR: I will change the policy only for Cyclone CPUs.

For the record: I re-checked my results of the llvm-testsuite though I tend to see them less critical than the "big benchmarks" spec*, geekbench* for which the change is neutral for all but two benchmarks which improve by ~10%.

Over the whole llvm-testsuite I have a bunch of ups and downs. Most of the ones in my top 20 are just noisy benchmarks, enc-pc1 regressed only 1% for me which relates to me not seeing any important changes in the assembly, salsa20 shows a 7% regression and it appears to be the first genuine testcase I see where the latency heuristic makes sense because there is indeed a very long loop containing just arithmetic instructions where latency hiding has an effect even on out-of-order CPUs (though it's somewhat unfortunate because the sourcecode was already scheduled in a nice way, llvm performs some impressive load/store optimisations on the benchmark but also reorders the instructions somewhere making the scheduling heuristic necessary. On the other hand I see an improvement of 15% in matmul_f64_4x4.

Matthias

Closed by commit rL251038: AArch64: Disable the latency heuristic (authored by matze). · Explain WhyOct 22 2015, 11:09 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AArch64/

AArch64Subtarget.cpp

5 lines

Diff 38147

llvm/trunk/lib/Target/AArch64/AArch64Subtarget.cpp

	Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines

	void AArch64Subtarget::overrideSchedPolicy(MachineSchedPolicy &Policy,			void AArch64Subtarget::overrideSchedPolicy(MachineSchedPolicy &Policy,
	MachineInstr begin, MachineInstr end,			MachineInstr begin, MachineInstr end,
	unsigned NumRegionInstrs) const {			unsigned NumRegionInstrs) const {
	// LNT run (at least on Cyclone) showed reasonably significant gains for			// LNT run (at least on Cyclone) showed reasonably significant gains for
	// bi-directional scheduling. 253.perlbmk.			// bi-directional scheduling. 253.perlbmk.
	Policy.OnlyTopDown = false;			Policy.OnlyTopDown = false;
	Policy.OnlyBottomUp = false;			Policy.OnlyBottomUp = false;
				// Enabling or Disabling the latency heuristic is a close call: It seems to
				// help nearly no benchmark on out-of-order architectures, on the other hand
				// it regresses register pressure on a few benchmarking.
				if (isCyclone())
				Policy.DisableLatencyHeuristic = true;
	}			}

	bool AArch64Subtarget::enableEarlyIfConversion() const {			bool AArch64Subtarget::enableEarlyIfConversion() const {
	return EnableEarlyIfConvert;			return EnableEarlyIfConvert;
	}			}

	std::unique_ptr<PBQPRAConstraint>			std::unique_ptr<PBQPRAConstraint>
	AArch64Subtarget::getCustomPBQPConstraints() const {			AArch64Subtarget::getCustomPBQPConstraints() const {
	if (!isCortexA57())			if (!isCortexA57())
	return nullptr;			return nullptr;

	return llvm::make_unique<A57ChainingConstraint>();			return llvm::make_unique<A57ChainingConstraint>();
	}			}