This is an archive of the discontinued LLVM Phabricator instance.

Codegen: Decrease minimum jump table density
ClosedPublic

Authored by iteratee on Mar 16 2016, 1:26 PM.

Download Raw Diff

Details

Reviewers

Summary

Minimum density for both optsize and non optsize are now options
-sparse-jump-table-density (default 10) for non optsize functions
-dense-jump-table-density (default 40) for optsize functions, which
matches the current default. This improves several benchmarks at google
at the cost of a small codesize increase. For code compiled with -Os,
the old behavior continues

Diff Detail

Event Timeline

iteratee updated this revision to Diff 50858.Mar 16 2016, 1:26 PM

iteratee retitled this revision from to Codegen: Decrease minimum jump table density.

iteratee updated this object.

iteratee set the repository for this revision to rL LLVM.

iteratee added a reviewer: hans.

iteratee added subscribers: echristo, llvm-commits, timshen.

Some inline comments, I stopped putting ditto down because my hands got tired. :)

-eric

test/CodeGen/ARM/2011-08-25-ldmia_ret.ll
17	Only one function could probably pass your command line option alternately? No preference other than it'll isolate the testcase from any other optsize changes that happen.
test/CodeGen/X86/switch-bt.ll
17	Ditto.
108	Ditto.

For all the optsize cases, I have now either changed the switch values or passed a density as a flag.

Patch looks good at this point, would be good to get size/performance numbers on a run of the testsuite (or something, e.g. SPEC, etc).

Thanks!

-eric

Thanks! I think this basically looks good.

Now that you're passing flags to llc instead, do we have any tests checking that the "optsize" and "minsize" function attributes have the desired effect?

We should probably have a test with functions with no attribute, optsize, and minsize that verifies the thresholds.

And as Eric said, some numbers showing how binary size (e.g. a self-hosted clang build) and perf are affected would be great.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
89	for normal what?
91	For a user who wants to tweak these flags, I'm not sure if the "sparse-" and "dense-" names are the most friendly. What would you think of calling them "-jump-table-density" and "-optsize-jump-table-density"?
95	ultra nit: period at end of comment.
8042	Dense jump table density is dense? :-) I think this variable name would come out better if the flag was renamed as suggested above.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
308	How about calling the new parameter MinDensity, RequiredDensity, or something like that to indicate that it's a threshold that the actual density gets compared against?

I'll get to the suggestions and benchmarking, but wanted to report on size:
clang compiled without the change: 176406957
clang compiled with the change: 176431533

net change: 24.0 KiB a change of 0.0014%

Nice size results. Seems like it's only going to matter in times when we want the performance.

Can't wait for the numbers.

Thanks!

Key: for each run avg, median, stddev (stddev as a percent), 10th percentile.
%Change: avg, median, 10th percentile. %Change is change in runtime, so negative percent is an improvement.

Unchanged is listed first.

I'm only showing the tests with a difference of more than 2 percent.
I ran each of these benchmarks 50 times.

The only significant results are in TSVC. I'll see how many benchmarks that is and maybe run them more times.

test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 7.014 7.012 0.290 (4.130%) 6.672
test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 7.021 7.011 0.199 (2.835%) 6.807
% Change: -0.016 % -0.016 % 2.013 %
test-suite :: MultiSource/Benchmarks/MallocBench/cfrac/cfrac.test 0.924 0.926 0.054 (5.828%) 0.857
test-suite :: MultiSource/Benchmarks/MallocBench/cfrac/cfrac.test 0.914 0.907 0.042 (4.616%) 0.858
% Change: -1.993 % -1.993 % 0.163 %
test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-dbl/Equivalencing-dbl.test 1.976 1.964 0.109 (5.525%) 1.840
test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-dbl/Equivalencing-dbl.test 2.000 1.959 0.122 (6.075%) 1.882
% Change: -0.242 % -0.242 % 2.288 %
test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt.test 1.246 1.233 0.091 (7.321%) 1.144
test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt.test 1.239 1.219 0.066 (5.311%) 1.169
% Change: -1.172 % -1.172 % 2.211 %
test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test 1.745 1.751 0.096 (5.491%) 1.628
test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test 1.732 1.706 0.095 (5.460%) 1.630
% Change: -2.601 % -2.601 % 0.086 %
test-suite :: MultiSource/Benchmarks/TSVC/GlobalDataFlow-dbl/GlobalDataFlow-dbl.test 3.115 3.097 0.110 (3.546%) 2.982
test-suite :: MultiSource/Benchmarks/TSVC/GlobalDataFlow-dbl/GlobalDataFlow-dbl.test 3.190 3.171 0.124 (3.872%) 3.048
% Change: 2.392 % 2.392 % 2.220 %
test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-dbl/InductionVariable-dbl.test 3.691 3.673 0.177 (4.787%) 3.512
test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-dbl/InductionVariable-dbl.test 3.685 3.667 0.181 (4.925%) 3.442
% Change: -0.162 % -0.162 % -1.988 %
test-suite :: MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl.test 3.191 3.140 0.193 (6.044%) 2.978
test-suite :: MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl.test 3.233 3.236 0.159 (4.908%) 3.051
% Change: 3.070 % 3.070 % 2.478 %
test-suite :: MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl.test 3.557 3.524 0.166 (4.669%) 3.378
test-suite :: MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl.test 3.577 3.556 0.118 (3.286%) 3.458
% Change: 0.899 % 0.899 % 2.374 %
test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test 3.480 3.457 0.161 (4.639%) 3.292
test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test 3.520 3.493 0.128 (3.642%) 3.382
% Change: 1.053 % 1.053 % 2.721 %
test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl.test 3.411 3.379 0.176 (5.147%) 3.211
test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl.test 3.469 3.459 0.168 (4.856%) 3.275
% Change: 2.372 % 2.372 % 2.009 %
test-suite :: MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt.test 2.399 2.354 0.156 (6.496%) 2.228
test-suite :: MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt.test 2.415 2.415 0.099 (4.110%) 2.266
% Change: 2.594 % 2.594 % 1.705 %
test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test 3.480 3.449 0.142 (4.087%) 3.349
test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test 3.409 3.412 0.110 (3.237%) 3.266
% Change: -1.089 % -1.089 % -2.467 %
test-suite :: MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl.test 2.778 2.749 0.137 (4.941%) 2.632
test-suite :: MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl.test 2.825 2.824 0.135 (4.784%) 2.648
% Change: 2.715 % 2.715 % 0.581 %
test-suite :: MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt.test 1.002 1.002 0.077 (7.686%) 0.919
test-suite :: MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt.test 1.041 1.043 0.069 (6.591%) 0.962
% Change: 4.099 % 4.099 % 4.579 %

When I re-ran the TSVC tests 100 times on a quieter machine, all the
differences were less than 2%.

Tidied up names according to comments.

In D18223#380658, @iteratee wrote:

Tidied up names according to comments.

Much better, thanks!

I think a test that covers the different thresholds for optsize and regular functions is still needed.

Add a test that verifies the density switch does what it says.

lgtm

test/CodeGen/X86/switch-density.ll
75	This one's always a jump table right, so the comment is slightly wrong?

This revision is now accepted and ready to land.Mar 28 2016, 4:12 PM

iteratee closed this revision.Apr 7 2016, 9:17 PM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.h

5 lines

SelectionDAGBuilder.cpp

26 lines

test/

CodeGen/

ARM/

2011-08-25-ldmia_ret.ll

2 lines

Generic/

MachineBranchProb.ll

14 lines

PowerPC/

pr26690.ll

6 lines

Thumb2/

ldr-str-imm12.ll

20 lines

X86/

switch-bt.ll

2 lines

switch-density.ll

81 lines

switch-edge-weight.ll

12 lines

switch.ll

4 lines

Diff 51850

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Show First 20 Lines • Show All 297 Lines • ▼ Show 20 Lines	struct BitTestBlock {
bool ContiguousRange;		bool ContiguousRange;
MachineBasicBlock *Parent;		MachineBasicBlock *Parent;
MachineBasicBlock *Default;		MachineBasicBlock *Default;
BitTestInfo Cases;		BitTestInfo Cases;
BranchProbability Prob;		BranchProbability Prob;
BranchProbability DefaultProb;		BranchProbability DefaultProb;
};		};

/// Minimum jump table density, in percent.
enum { MinJumpTableDensity = 40 };

/// Check whether a range of clusters is dense enough for a jump table.		/// Check whether a range of clusters is dense enough for a jump table.
bool isDense(const CaseClusterVector &Clusters, unsigned *TotalCases,		bool isDense(const CaseClusterVector &Clusters, unsigned *TotalCases,
unsigned First, unsigned Last);		unsigned First, unsigned Last, unsigned MinDensity);
		hansUnsubmitted Done Reply Inline Actions How about calling the new parameter MinDensity, RequiredDensity, or something like that to indicate that it's a threshold that the actual density gets compared against? hans: How about calling the new parameter MinDensity, RequiredDensity, or something like that to…

/// Build a jump table cluster from Clusters[First..Last]. Returns false if it		/// Build a jump table cluster from Clusters[First..Last]. Returns false if it
/// decides it's not a good idea.		/// decides it's not a good idea.
bool buildJumpTable(CaseClusterVector &Clusters, unsigned First,		bool buildJumpTable(CaseClusterVector &Clusters, unsigned First,
unsigned Last, const SwitchInst *SI,		unsigned Last, const SwitchInst *SI,
MachineBasicBlock *DefaultMBB, CaseCluster &JTCluster);		MachineBasicBlock *DefaultMBB, CaseCluster &JTCluster);

/// Find clusters of cases suitable for jump table lowering.		/// Find clusters of cases suitable for jump table lowering.
▲ Show 20 Lines • Show All 697 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	LimitFPPrecision("limit-float-precision",
"for some float libcalls"),		"for some float libcalls"),
cl::location(LimitFloatPrecision),		cl::location(LimitFloatPrecision),
cl::init(0));		cl::init(0));

static cl::opt<bool>		static cl::opt<bool>
EnableFMFInDAG("enable-fmf-dag", cl::init(true), cl::Hidden,		EnableFMFInDAG("enable-fmf-dag", cl::init(true), cl::Hidden,
cl::desc("Enable fast-math-flags for DAG nodes"));		cl::desc("Enable fast-math-flags for DAG nodes"));

		/// Minimum jump table density for normal functions.
		hansUnsubmitted Done Reply Inline Actions for normal what? hans: for normal what?
		static cl::opt<unsigned>
		JumpTableDensity("jump-table-density", cl::init(10), cl::Hidden,
		hansUnsubmitted Done Reply Inline Actions For a user who wants to tweak these flags, I'm not sure if the "sparse-" and "dense-" names are the most friendly. What would you think of calling them "-jump-table-density" and "-optsize-jump-table-density"? hans: For a user who wants to tweak these flags, I'm not sure if the "sparse-" and "dense-" names are…
		cl::desc("Minimum density for building a jump table in "
		"a normal function"));

		/// Minimum jump table density for -Os or -Oz functions.
		hansUnsubmitted Done Reply Inline Actions ultra nit: period at end of comment. hans: ultra nit: period at end of comment.
		static cl::opt<unsigned>
		OptsizeJumpTableDensity("optsize-jump-table-density", cl::init(40), cl::Hidden,
		cl::desc("Minimum density for building a jump table in "
		"an optsize function"));


// Limit the width of DAG chains. This is important in general to prevent		// Limit the width of DAG chains. This is important in general to prevent
// DAG-based analysis from blowing up. For example, alias analysis and		// DAG-based analysis from blowing up. For example, alias analysis and
// load clustering may not complete in reasonable time. It is difficult to		// load clustering may not complete in reasonable time. It is difficult to
// recognize and avoid this situation within each individual analysis, and		// recognize and avoid this situation within each individual analysis, and
// future analyses are likely to have the same behavior. Limiting DAG width is		// future analyses are likely to have the same behavior. Limiting DAG width is
// the safe approach and will be especially important with global DAGs.		// the safe approach and will be especially important with global DAGs.
//		//
// MaxParallelChains default is arbitrarily high to avoid affecting		// MaxParallelChains default is arbitrarily high to avoid affecting
▲ Show 20 Lines • Show All 7,812 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::updateDAGForMaybeTailCall(SDValue MaybeTC) {
if (MaybeTC.getNode() != nullptr)		if (MaybeTC.getNode() != nullptr)
DAG.setRoot(MaybeTC);		DAG.setRoot(MaybeTC);
else		else
HasTailCall = true;		HasTailCall = true;
}		}

bool SelectionDAGBuilder::isDense(const CaseClusterVector &Clusters,		bool SelectionDAGBuilder::isDense(const CaseClusterVector &Clusters,
unsigned *TotalCases, unsigned First,		unsigned *TotalCases, unsigned First,
unsigned Last) {		unsigned Last,
		unsigned Density) {
assert(Last >= First);		assert(Last >= First);
assert(TotalCases[Last] >= TotalCases[First]);		assert(TotalCases[Last] >= TotalCases[First]);

APInt LowCase = Clusters[First].Low->getValue();		APInt LowCase = Clusters[First].Low->getValue();
APInt HighCase = Clusters[Last].High->getValue();		APInt HighCase = Clusters[Last].High->getValue();
assert(LowCase.getBitWidth() == HighCase.getBitWidth());		assert(LowCase.getBitWidth() == HighCase.getBitWidth());

// FIXME: A range of consecutive cases has 100% density, but only requires one		// FIXME: A range of consecutive cases has 100% density, but only requires one
// comparison to lower. We should discriminate against such consecutive ranges		// comparison to lower. We should discriminate against such consecutive ranges
// in jump tables.		// in jump tables.

uint64_t Diff = (HighCase - LowCase).getLimitedValue((UINT64_MAX - 1) / 100);		uint64_t Diff = (HighCase - LowCase).getLimitedValue((UINT64_MAX - 1) / 100);
uint64_t Range = Diff + 1;		uint64_t Range = Diff + 1;

uint64_t NumCases =		uint64_t NumCases =
TotalCases[Last] - (First == 0 ? 0 : TotalCases[First - 1]);		TotalCases[Last] - (First == 0 ? 0 : TotalCases[First - 1]);

assert(NumCases < UINT64_MAX / 100);		assert(NumCases < UINT64_MAX / 100);
assert(Range >= NumCases);		assert(Range >= NumCases);

return NumCases * 100 >= Range * MinJumpTableDensity;		return NumCases * 100 >= Range * Density;
}		}

static inline bool areJTsAllowed(const TargetLowering &TLI) {		static inline bool areJTsAllowed(const TargetLowering &TLI) {
return TLI.isOperationLegalOrCustom(ISD::BR_JT, MVT::Other) \|\|		return TLI.isOperationLegalOrCustom(ISD::BR_JT, MVT::Other) \|\|
TLI.isOperationLegalOrCustom(ISD::BRIND, MVT::Other);		TLI.isOperationLegalOrCustom(ISD::BRIND, MVT::Other);
}		}

bool SelectionDAGBuilder::buildJumpTable(CaseClusterVector &Clusters,		bool SelectionDAGBuilder::buildJumpTable(CaseClusterVector &Clusters,
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
}		}

void SelectionDAGBuilder::findJumpTables(CaseClusterVector &Clusters,		void SelectionDAGBuilder::findJumpTables(CaseClusterVector &Clusters,
const SwitchInst *SI,		const SwitchInst *SI,
MachineBasicBlock *DefaultMBB) {		MachineBasicBlock *DefaultMBB) {
#ifndef NDEBUG		#ifndef NDEBUG
// Clusters must be non-empty, sorted, and only contain Range clusters.		// Clusters must be non-empty, sorted, and only contain Range clusters.
assert(!Clusters.empty());		assert(!Clusters.empty());
for (CaseCluster &C : Clusters)		for (CaseCluster &C : Clusters)
		hansUnsubmitted Done Reply Inline Actions Dense jump table density is dense? :-) I think this variable name would come out better if the flag was renamed as suggested above. hans: Dense jump table density is dense? :-) I think this variable name would come out better if the…
assert(C.Kind == CC_Range);		assert(C.Kind == CC_Range);
for (unsigned i = 1, e = Clusters.size(); i < e; ++i)		for (unsigned i = 1, e = Clusters.size(); i < e; ++i)
assert(Clusters[i - 1].High->getValue().slt(Clusters[i].Low->getValue()));		assert(Clusters[i - 1].High->getValue().slt(Clusters[i].Low->getValue()));
#endif		#endif

const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
if (!areJTsAllowed(TLI))		if (!areJTsAllowed(TLI))
return;		return;

const int64_t N = Clusters.size();		const int64_t N = Clusters.size();
const unsigned MinJumpTableSize = TLI.getMinimumJumpTableEntries();		const unsigned MinJumpTableSize = TLI.getMinimumJumpTableEntries();

// TotalCases[i]: Total nbr of cases in Clusters[0..i].		// TotalCases[i]: Total nbr of cases in Clusters[0..i].
SmallVector<unsigned, 8> TotalCases(N);		SmallVector<unsigned, 8> TotalCases(N);

for (unsigned i = 0; i < N; ++i) {		for (unsigned i = 0; i < N; ++i) {
APInt Hi = Clusters[i].High->getValue();		APInt Hi = Clusters[i].High->getValue();
APInt Lo = Clusters[i].Low->getValue();		APInt Lo = Clusters[i].Low->getValue();
TotalCases[i] = (Hi - Lo).getLimitedValue() + 1;		TotalCases[i] = (Hi - Lo).getLimitedValue() + 1;
if (i != 0)		if (i != 0)
TotalCases[i] += TotalCases[i - 1];		TotalCases[i] += TotalCases[i - 1];
}		}

if (N >= MinJumpTableSize && isDense(Clusters, &TotalCases[0], 0, N - 1)) {		unsigned MinDensity = JumpTableDensity;
		if (DefaultMBB->getParent()->getFunction()->optForSize())
		MinDensity = OptsizeJumpTableDensity;
		if (N >= MinJumpTableSize
		&& isDense(Clusters, &TotalCases[0], 0, N - 1, MinDensity)) {
// Cheap case: the whole range might be suitable for jump table.		// Cheap case: the whole range might be suitable for jump table.
CaseCluster JTCluster;		CaseCluster JTCluster;
if (buildJumpTable(Clusters, 0, N - 1, SI, DefaultMBB, JTCluster)) {		if (buildJumpTable(Clusters, 0, N - 1, SI, DefaultMBB, JTCluster)) {
Clusters[0] = JTCluster;		Clusters[0] = JTCluster;
Clusters.resize(1);		Clusters.resize(1);
return;		return;
}		}
}		}
Show All 28 Lines	for (int64_t i = N - 2; i >= 0; i--) {
// Baseline: Put Clusters[i] into a partition on its own.		// Baseline: Put Clusters[i] into a partition on its own.
MinPartitions[i] = MinPartitions[i + 1] + 1;		MinPartitions[i] = MinPartitions[i + 1] + 1;
LastElement[i] = i;		LastElement[i] = i;
NumTables[i] = NumTables[i + 1];		NumTables[i] = NumTables[i + 1];

// Search for a solution that results in fewer partitions.		// Search for a solution that results in fewer partitions.
for (int64_t j = N - 1; j > i; j--) {		for (int64_t j = N - 1; j > i; j--) {
// Try building a partition from Clusters[i..j].		// Try building a partition from Clusters[i..j].
if (isDense(Clusters, &TotalCases[0], i, j)) {		if (isDense(Clusters, &TotalCases[0], i, j, MinDensity)) {
unsigned NumPartitions = 1 + (j == N - 1 ? 0 : MinPartitions[j + 1]);		unsigned NumPartitions = 1 + (j == N - 1 ? 0 : MinPartitions[j + 1]);
bool IsTable = j - i + 1 >= MinJumpTableSize;		bool IsTable = j - i + 1 >= MinJumpTableSize;
unsigned Tables = IsTable + (j == N - 1 ? 0 : NumTables[j + 1]);		unsigned Tables = IsTable + (j == N - 1 ? 0 : NumTables[j + 1]);

// If this j leads to fewer partitions, or same number of partitions		// If this j leads to fewer partitions, or same number of partitions
// with more lookup tables, it is a better partitioning.		// with more lookup tables, it is a better partitioning.
if (NumPartitions < MinPartitions[i] \|\|		if (NumPartitions < MinPartitions[i] \|\|
(NumPartitions == MinPartitions[i] && Tables > NumTables[i])) {		(NumPartitions == MinPartitions[i] && Tables > NumTables[i])) {
▲ Show 20 Lines • Show All 719 Lines • Show Last 20 Lines

test/CodeGen/ARM/2011-08-25-ldmia_ret.ll

	; RUN: llc < %s -mtriple=thumbv7-apple-ios -mcpu=cortex-a9 \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv7-apple-ios -mcpu=cortex-a9 -jump-table-density=40 \| FileCheck %s
	; Test that ldmia_ret preserves implicit operands for return values.			; Test that ldmia_ret preserves implicit operands for return values.
	;			;
	; This CFG is reduced from a benchmark miscompile. With current			; This CFG is reduced from a benchmark miscompile. With current
	; if-conversion heuristics, one of the return paths is if-converted			; if-conversion heuristics, one of the return paths is if-converted
	; into sw.bb18 resulting in an ldmia_ret in the middle of the			; into sw.bb18 resulting in an ldmia_ret in the middle of the
	; block. The postra scheduler needs to know that the return implicitly			; block. The postra scheduler needs to know that the return implicitly
	; uses the return register, otherwise its antidep breaker scavenges			; uses the return register, otherwise its antidep breaker scavenges
	; the register in order to hoist the constant load required to test			; the register in order to hoist the constant load required to test
	; the switch.			; the switch.

	declare i32 @getint()			declare i32 @getint()
	declare i1 @getbool()			declare i1 @getbool()
	declare void @foo(i32)			declare void @foo(i32)
	declare i32 @bar(i32)			declare i32 @bar(i32)

	define i32 @test(i32 %in1, i32 %in2) nounwind {			define i32 @test(i32 %in1, i32 %in2) nounwind {
				echristoUnsubmitted Done Reply Inline Actions Only one function could probably pass your command line option alternately? No preference other than it'll isolate the testcase from any other optsize changes that happen. echristo: Only one function could probably pass your command line option alternately? No preference…
	entry:			entry:
	%call = tail call zeroext i1 @getbool() nounwind			%call = tail call zeroext i1 @getbool() nounwind
	br i1 %call, label %sw.bb18, label %sw.bb2			br i1 %call, label %sw.bb18, label %sw.bb2

	sw.bb2: ; preds = %entry			sw.bb2: ; preds = %entry
	%cmp = tail call zeroext i1 @getbool() nounwind			%cmp = tail call zeroext i1 @getbool() nounwind
	br i1 %cmp, label %sw.epilog58, label %land.lhs.true			br i1 %cmp, label %sw.epilog58, label %land.lhs.true

	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

test/CodeGen/Generic/MachineBranchProb.ll

	Show All 35 Lines
	!0 = !{!"branch_weights", i32 7, i32 6, i32 4, i32 4, i32 64}			!0 = !{!"branch_weights", i32 7, i32 6, i32 4, i32 4, i32 64}


	declare void @g(i32)			declare void @g(i32)
	define void @left_leaning_weight_balanced_tree(i32 %x) {			define void @left_leaning_weight_balanced_tree(i32 %x) {
	entry:			entry:
	switch i32 %x, label %return [			switch i32 %x, label %return [
	i32 0, label %bb0			i32 0, label %bb0
	i32 10, label %bb1			i32 100, label %bb1
	i32 20, label %bb2			i32 200, label %bb2
	i32 30, label %bb3			i32 300, label %bb3
	i32 40, label %bb4			i32 400, label %bb4
	i32 50, label %bb5			i32 500, label %bb5
	], !prof !1			], !prof !1
	bb0: tail call void @g(i32 0) br label %return			bb0: tail call void @g(i32 0) br label %return
	bb1: tail call void @g(i32 1) br label %return			bb1: tail call void @g(i32 1) br label %return
	bb2: tail call void @g(i32 2) br label %return			bb2: tail call void @g(i32 2) br label %return
	bb3: tail call void @g(i32 3) br label %return			bb3: tail call void @g(i32 3) br label %return
	bb4: tail call void @g(i32 4) br label %return			bb4: tail call void @g(i32 4) br label %return
	bb5: tail call void @g(i32 5) br label %return			bb5: tail call void @g(i32 5) br label %return
	return: ret void			return: ret void

	; Check that we set branch weights on the pivot cmp instruction correctly.			; Check that we set branch weights on the pivot cmp instruction correctly.
	; Cases {0,10,20,30} go on the left with weight 13; cases {40,50} go on the			; Cases {0,10,20,30} go on the left with weight 13; cases {40,50} go on the
	; right with weight 20.			; right with weight 20.
	;			;
	; CHECK-LABEL: Machine code for function left_leaning_weight_balanced_tree:			; CHECK-LABEL: Machine code for function left_leaning_weight_balanced_tree:
	; CHECK: BB#0: derived from LLVM BB %entry			; CHECK: BB#0: derived from LLVM BB %entry
	; CHECK-NOT: Successors			; CHECK-NOT: Successors
	; CHECK: Successors according to CFG: BB#8({{[0-9a-fx/= ]+}}39.71%) BB#9({{[0-9a-fx/= ]+}}60.29%)			; CHECK: Successors according to CFG: BB#8({{[0-9a-fx/= ]+}}39.71%) BB#9({{[0-9a-fx/= ]+}}60.29%)
	}			}

	!1 = !{!"branch_weights",			!1 = !{!"branch_weights",
	; Default:			; Default:
	i32 1,			i32 1,
	; Case 0, 10, 20:			; Case 0, 100, 200:
	i32 10, i32 1, i32 1,			i32 10, i32 1, i32 1,
	; Case 30, 40, 50:			; Case 300, 400, 500:
	i32 1, i32 10, i32 10}			i32 1, i32 10, i32 10}

test/CodeGen/PowerPC/pr26690.ll

	Show All 29 Lines

	while.body.lr.ph: ; preds = %while.cond.preheader			while.body.lr.ph: ; preds = %while.cond.preheader
	%.pre = load i32, i32* @c, align 4, !tbaa !1			%.pre = load i32, i32* @c, align 4, !tbaa !1
	br label %while.body			br label %while.body

	while.body: ; preds = %while.body.backedge, %while.body.lr.ph			while.body: ; preds = %while.body.backedge, %while.body.lr.ph
	switch i32 %.pre, label %while.body.backedge [			switch i32 %.pre, label %while.body.backedge [
	i32 0, label %sw.bb1			i32 0, label %sw.bb1
	i32 8, label %sw.bb1			i32 80, label %sw.bb1
	i32 6, label %sw.bb1			i32 60, label %sw.bb1
	i32 24, label %while.cond.backedge			i32 240, label %while.cond.backedge
	]			]

	while.body.backedge: ; preds = %while.body, %while.cond.backedge			while.body.backedge: ; preds = %while.body, %while.cond.backedge
	br label %while.body			br label %while.body

	sw.bb1: ; preds = %while.body, %while.body, %while.body			sw.bb1: ; preds = %while.body, %while.body, %while.body
	store i32 2, i32* @a, align 4, !tbaa !1			store i32 2, i32* @a, align 4, !tbaa !1
	br label %while.cond.backedge			br label %while.cond.backedge
	▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

test/CodeGen/Thumb2/ldr-str-imm12.ll

	Show All 23 Lines
	entry:			entry:
	; CHECK: ldr{{(.w)?}} {{(r[0-9]+)\|(lr)}}, [r7, #28]			; CHECK: ldr{{(.w)?}} {{(r[0-9]+)\|(lr)}}, [r7, #28]
	%xgaps.i = alloca [32 x %union.rec], align 4 ; <[32 x %union.rec]*> [#uses=0]			%xgaps.i = alloca [32 x %union.rec], align 4 ; <[32 x %union.rec]*> [#uses=0]
	%ycomp.i = alloca [32 x %union.rec], align 4 ; <[32 x %union.rec]*> [#uses=0]			%ycomp.i = alloca [32 x %union.rec], align 4 ; <[32 x %union.rec]*> [#uses=0]
	br label %bb20			br label %bb20

	bb20: ; preds = %entry			bb20: ; preds = %entry
	switch i32 undef, label %bb1287 [			switch i32 undef, label %bb1287 [
	i32 11, label %bb119			i32 110, label %bb119
	i32 12, label %bb119			i32 120, label %bb119
	i32 21, label %bb420			i32 210, label %bb420
	i32 23, label %bb420			i32 230, label %bb420
	i32 45, label %bb438			i32 450, label %bb438
	i32 46, label %bb438			i32 460, label %bb438
	i32 55, label %bb533			i32 550, label %bb533
	i32 56, label %bb569			i32 560, label %bb569
	i32 64, label %bb745			i32 640, label %bb745
	i32 78, label %bb1098			i32 780, label %bb1098
	]			]

	bb119: ; preds = %bb20, %bb20			bb119: ; preds = %bb20, %bb20
	unreachable			unreachable

	bb420: ; preds = %bb20, %bb20			bb420: ; preds = %bb20, %bb20
	; CHECK: bb420			; CHECK: bb420
	; CHECK: str{{(.w)?}} r{{[0-9]+}}, [sp			; CHECK: str{{(.w)?}} r{{[0-9]+}}, [sp
	Show All 27 Lines

test/CodeGen/X86/switch-bt.ll

; RUN: llc -march=x86-64 -asm-verbose=false < %s \| FileCheck %s		; RUN: llc -march=x86-64 -asm-verbose=false < %s -jump-table-density=40 \| FileCheck %s

; This switch should use bit tests, and the third bit test case is just		; This switch should use bit tests, and the third bit test case is just
; testing for one possible value, so it doesn't need a bt.		; testing for one possible value, so it doesn't need a bt.

; CHECK: movabsq $2305843009482129440, %r		; CHECK: movabsq $2305843009482129440, %r
; CHECK-NEXT: btq %rax, %r		; CHECK-NEXT: btq %rax, %r
; CHECK-NEXT: jb		; CHECK-NEXT: jb
; CHECK: movl $671088640, %e		; CHECK: movl $671088640, %e
; CHECK-NEXT: btq %rax, %r		; CHECK-NEXT: btq %rax, %r
; CHECK-NEXT: jae		; CHECK-NEXT: jae
; CHECK: testq %rax, %r		; CHECK: testq %rax, %r
; CHECK-NEXT: j		; CHECK-NEXT: j

define void @test(i8* %l) nounwind {		define void @test(i8* %l) nounwind {
entry:		entry:
%l.addr = alloca i8, align 8 ; <i8*> [#uses=2]		%l.addr = alloca i8, align 8 ; <i8*> [#uses=2]
		echristoUnsubmitted Done Reply Inline Actions Ditto. echristo: Ditto.
store i8* %l, i8** %l.addr		store i8* %l, i8** %l.addr
%tmp = load i8, i8* %l.addr ; <i8*> [#uses=1]		%tmp = load i8, i8* %l.addr ; <i8*> [#uses=1]
%tmp1 = load i8, i8* %tmp ; <i8> [#uses=1]		%tmp1 = load i8, i8* %tmp ; <i8> [#uses=1]
%conv = sext i8 %tmp1 to i32 ; <i32> [#uses=1]		%conv = sext i8 %tmp1 to i32 ; <i32> [#uses=1]
switch i32 %conv, label %sw.default [		switch i32 %conv, label %sw.default [
i32 62, label %sw.bb		i32 62, label %sw.bb
i32 60, label %sw.bb		i32 60, label %sw.bb
i32 38, label %sw.bb2		i32 38, label %sw.bb2
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	if.end:
ret void		ret void
}		}

; Ensure that optimizing for jump tables doesn't needlessly deteriorate the		; Ensure that optimizing for jump tables doesn't needlessly deteriorate the
; created binary tree search. See PR22262.		; created binary tree search. See PR22262.
define void @test4(i32 %x, i32* %y) {		define void @test4(i32 %x, i32* %y) {
; CHECK-LABEL: test4:		; CHECK-LABEL: test4:

entry:		entry:
		echristoUnsubmitted Done Reply Inline Actions Ditto. echristo: Ditto.
switch i32 %x, label %sw.default [		switch i32 %x, label %sw.default [
i32 10, label %sw.bb		i32 10, label %sw.bb
i32 20, label %sw.bb1		i32 20, label %sw.bb1
i32 30, label %sw.bb2		i32 30, label %sw.bb2
i32 40, label %sw.bb3		i32 40, label %sw.bb3
i32 50, label %sw.bb4		i32 50, label %sw.bb4
i32 60, label %sw.bb5		i32 60, label %sw.bb5
]		]
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

test/CodeGen/X86/switch-density.ll

This file was added.

				; RUN: llc -mtriple=x86_64-linux-gnu %s -o - -jump-table-density=25 \| FileCheck %s --check-prefix=DENSE --check-prefix=CHECK
				; RUN: llc -mtriple=x86_64-linux-gnu %s -o - -jump-table-density=10 \| FileCheck %s --check-prefix=SPARSE --check-prefix=CHECK

				declare void @g(i32)

				define void @sparse(i32 %x) {
				entry:
				switch i32 %x, label %return [
				i32 300, label %bb0
				i32 100, label %bb1
				i32 400, label %bb1
				i32 500, label %bb2
				]
				bb0: tail call void @g(i32 0) br label %return
				bb1: tail call void @g(i32 1) br label %return
				bb2: tail call void @g(i32 1) br label %return
				return: ret void

				; Should pivot around 400 for two subtrees with two jump tables each.
				; CHECK-LABEL: sparse
				; CHECK-NOT: cmpl
				; CHECK: cmpl $399
				; CHECK: cmpl $100
				; CHECK: cmpl $300
				; CHECK: cmpl $400
				; CHECK: cmpl $500
				}

				define void @med(i32 %x) {
				entry:
				switch i32 %x, label %return [
				i32 30, label %bb0
				i32 10, label %bb1
				i32 40, label %bb1
				i32 50, label %bb2
				i32 20, label %bb3
				]
				bb0: tail call void @g(i32 0) br label %return
				bb1: tail call void @g(i32 1) br label %return
				bb2: tail call void @g(i32 1) br label %return
				bb3: tail call void @g(i32 2) br label %return
				return: ret void

				; Lowered as a jump table when sparse, and branches when dense.
				; CHECK-LABEL: med
				; SPARSE: addl $-10
				; SPARSE: cmpl $40
				; SPARSE: ja
				; SPARSE: jmpq *.LJTI
				; DENSE-NOT: cmpl
				; DENSE: cmpl $29
				; DENSE-DAG: cmpl $10
				; DENSE-DAG: cmpl $20
				; DENSE-DAG: cmpl $30
				; DENSE-DAG: cmpl $40
				; DENSE-DAG: cmpl $50
				; DENSE: retq
				}

				define void @dense(i32 %x) {
				entry:
				switch i32 %x, label %return [
				i32 12, label %bb0
				i32 4, label %bb1
				i32 16, label %bb1
				i32 20, label %bb2
				i32 8, label %bb3
				]
				bb0: tail call void @g(i32 0) br label %return
				bb1: tail call void @g(i32 1) br label %return
				bb2: tail call void @g(i32 1) br label %return
				bb3: tail call void @g(i32 2) br label %return
				return: ret void

				; Lowered as a jump table when sparse, and branches when dense.
				hansUnsubmitted Not Done Reply Inline Actions This one's always a jump table right, so the comment is slightly wrong? hans: This one's always a jump table right, so the comment is slightly wrong?
				; CHECK-LABEL: dense
				; CHECK: addl $-4
				; CHECK: cmpl $16
				; CHECK: ja
				; CHECK: jmpq *.LJTI
				}

test/CodeGen/X86/switch-edge-weight.ll

	Show First 20 Lines • Show All 227 Lines • ▼ Show 20 Lines

	define void @test5(i32 %x) nounwind {			define void @test5(i32 %x) nounwind {
	entry:			entry:

	; In this switch statement, there is an edge from jump table to default basic			; In this switch statement, there is an edge from jump table to default basic
	; block.			; block.

	switch i32 %x, label %sw.default [			switch i32 %x, label %sw.default [
	i32 1, label %sw.bb			i32 4, label %sw.bb
	i32 5, label %sw.bb2			i32 20, label %sw.bb2
	i32 7, label %sw.bb3			i32 28, label %sw.bb3
	i32 9, label %sw.bb4			i32 36, label %sw.bb4
	i32 31, label %sw.bb5			i32 124, label %sw.bb5
	], !prof !2			], !prof !2

	sw.bb:			sw.bb:
	call void @foo(i32 0)			call void @foo(i32 0)
	br label %sw.epilog			br label %sw.epilog

	sw.bb2:			sw.bb2:
	call void @foo(i32 1)			call void @foo(i32 1)
	Show All 18 Lines
	sw.epilog:			sw.epilog:
	ret void			ret void

	; Check if weights are correctly assigned to edges generated from switch			; Check if weights are correctly assigned to edges generated from switch
	; statement.			; statement.
	;			;
	; CHECK: BB#0:			; CHECK: BB#0:
	; BB#0 to BB#6: [10, UINT32_MAX] (15)			; BB#0 to BB#6: [10, UINT32_MAX] (15)
	; BB#0 to BB#8: [1, 5, 7, 9] (jump table) (45)			; BB#0 to BB#8: [4, 20, 28, 36] (jump table) (45)
	; CHECK: Successors according to CFG: BB#8({{[0-9a-fx/= ]+}}25.00%) BB#9({{[0-9a-fx/= ]+}}75.00%)			; CHECK: Successors according to CFG: BB#8({{[0-9a-fx/= ]+}}25.00%) BB#9({{[0-9a-fx/= ]+}}75.00%)
	}			}

	!1 = !{!"branch_weights", i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10}			!1 = !{!"branch_weights", i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10}
	!2 = !{!"branch_weights", i32 10, i32 10, i32 10, i32 10, i32 10, i32 10}			!2 = !{!"branch_weights", i32 10, i32 10, i32 10, i32 10, i32 10, i32 10}
	!3 = !{!"branch_weights", i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10}			!3 = !{!"branch_weights", i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10}

test/CodeGen/X86/switch.ll

	; RUN: llc -mtriple=x86_64-linux-gnu %s -o - \| FileCheck %s			; RUN: llc -mtriple=x86_64-linux-gnu %s -o - -jump-table-density=40 \| FileCheck %s
	; RUN: llc -mtriple=x86_64-linux-gnu %s -o - -O0 \| FileCheck --check-prefix=NOOPT %s			; RUN: llc -mtriple=x86_64-linux-gnu %s -o - -O0 -jump-table-density=40 \| FileCheck --check-prefix=NOOPT %s

	declare void @g(i32)			declare void @g(i32)

	define void @basic(i32 %x) {			define void @basic(i32 %x) {
	entry:			entry:
	switch i32 %x, label %return [			switch i32 %x, label %return [
	i32 3, label %bb0			i32 3, label %bb0
	i32 1, label %bb1			i32 1, label %bb1
	▲ Show 20 Lines • Show All 697 Lines • Show Last 20 Lines