This is an archive of the discontinued LLVM Phabricator instance.

Codegen: Decrease minimum jump table density
ClosedPublic

Authored by iteratee on Mar 16 2016, 1:26 PM.

Download Raw Diff

Details

Reviewers

Summary

Minimum density for both optsize and non optsize are now options
-sparse-jump-table-density (default 10) for non optsize functions
-dense-jump-table-density (default 40) for optsize functions, which
matches the current default. This improves several benchmarks at google
at the cost of a small codesize increase. For code compiled with -Os,
the old behavior continues

Diff Detail

Event Timeline

iteratee updated this revision to Diff 50858.Mar 16 2016, 1:26 PM

iteratee retitled this revision from to Codegen: Decrease minimum jump table density.

iteratee updated this object.

iteratee set the repository for this revision to rL LLVM.

iteratee added a reviewer: hans.

iteratee added subscribers: echristo, llvm-commits, timshen.

Some inline comments, I stopped putting ditto down because my hands got tired. :)

-eric

test/CodeGen/ARM/2011-08-25-ldmia_ret.ll
17	Only one function could probably pass your command line option alternately? No preference other than it'll isolate the testcase from any other optsize changes that happen.
test/CodeGen/X86/switch-bt.ll
17	Ditto.
108	Ditto.

For all the optsize cases, I have now either changed the switch values or passed a density as a flag.

Patch looks good at this point, would be good to get size/performance numbers on a run of the testsuite (or something, e.g. SPEC, etc).

Thanks!

-eric

Thanks! I think this basically looks good.

Now that you're passing flags to llc instead, do we have any tests checking that the "optsize" and "minsize" function attributes have the desired effect?

We should probably have a test with functions with no attribute, optsize, and minsize that verifies the thresholds.

And as Eric said, some numbers showing how binary size (e.g. a self-hosted clang build) and perf are affected would be great.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
89	for normal what?
91	For a user who wants to tweak these flags, I'm not sure if the "sparse-" and "dense-" names are the most friendly. What would you think of calling them "-jump-table-density" and "-optsize-jump-table-density"?
95	ultra nit: period at end of comment.
8042	Dense jump table density is dense? :-) I think this variable name would come out better if the flag was renamed as suggested above.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
308	How about calling the new parameter MinDensity, RequiredDensity, or something like that to indicate that it's a threshold that the actual density gets compared against?

I'll get to the suggestions and benchmarking, but wanted to report on size:
clang compiled without the change: 176406957
clang compiled with the change: 176431533

net change: 24.0 KiB a change of 0.0014%

Nice size results. Seems like it's only going to matter in times when we want the performance.

Can't wait for the numbers.

Thanks!

Key: for each run avg, median, stddev (stddev as a percent), 10th percentile.
%Change: avg, median, 10th percentile. %Change is change in runtime, so negative percent is an improvement.

Unchanged is listed first.

I'm only showing the tests with a difference of more than 2 percent.
I ran each of these benchmarks 50 times.

The only significant results are in TSVC. I'll see how many benchmarks that is and maybe run them more times.

test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 7.014 7.012 0.290 (4.130%) 6.672
test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 7.021 7.011 0.199 (2.835%) 6.807
% Change: -0.016 % -0.016 % 2.013 %
test-suite :: MultiSource/Benchmarks/MallocBench/cfrac/cfrac.test 0.924 0.926 0.054 (5.828%) 0.857
test-suite :: MultiSource/Benchmarks/MallocBench/cfrac/cfrac.test 0.914 0.907 0.042 (4.616%) 0.858
% Change: -1.993 % -1.993 % 0.163 %
test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-dbl/Equivalencing-dbl.test 1.976 1.964 0.109 (5.525%) 1.840
test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-dbl/Equivalencing-dbl.test 2.000 1.959 0.122 (6.075%) 1.882
% Change: -0.242 % -0.242 % 2.288 %
test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt.test 1.246 1.233 0.091 (7.321%) 1.144
test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt.test 1.239 1.219 0.066 (5.311%) 1.169
% Change: -1.172 % -1.172 % 2.211 %
test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test 1.745 1.751 0.096 (5.491%) 1.628
test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test 1.732 1.706 0.095 (5.460%) 1.630
% Change: -2.601 % -2.601 % 0.086 %
test-suite :: MultiSource/Benchmarks/TSVC/GlobalDataFlow-dbl/GlobalDataFlow-dbl.test 3.115 3.097 0.110 (3.546%) 2.982
test-suite :: MultiSource/Benchmarks/TSVC/GlobalDataFlow-dbl/GlobalDataFlow-dbl.test 3.190 3.171 0.124 (3.872%) 3.048
% Change: 2.392 % 2.392 % 2.220 %
test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-dbl/InductionVariable-dbl.test 3.691 3.673 0.177 (4.787%) 3.512
test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-dbl/InductionVariable-dbl.test 3.685 3.667 0.181 (4.925%) 3.442
% Change: -0.162 % -0.162 % -1.988 %
test-suite :: MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl.test 3.191 3.140 0.193 (6.044%) 2.978
test-suite :: MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl.test 3.233 3.236 0.159 (4.908%) 3.051
% Change: 3.070 % 3.070 % 2.478 %
test-suite :: MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl.test 3.557 3.524 0.166 (4.669%) 3.378
test-suite :: MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl.test 3.577 3.556 0.118 (3.286%) 3.458
% Change: 0.899 % 0.899 % 2.374 %
test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test 3.480 3.457 0.161 (4.639%) 3.292
test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test 3.520 3.493 0.128 (3.642%) 3.382
% Change: 1.053 % 1.053 % 2.721 %
test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl.test 3.411 3.379 0.176 (5.147%) 3.211
test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl.test 3.469 3.459 0.168 (4.856%) 3.275
% Change: 2.372 % 2.372 % 2.009 %
test-suite :: MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt.test 2.399 2.354 0.156 (6.496%) 2.228
test-suite :: MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt.test 2.415 2.415 0.099 (4.110%) 2.266
% Change: 2.594 % 2.594 % 1.705 %
test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test 3.480 3.449 0.142 (4.087%) 3.349
test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test 3.409 3.412 0.110 (3.237%) 3.266
% Change: -1.089 % -1.089 % -2.467 %
test-suite :: MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl.test 2.778 2.749 0.137 (4.941%) 2.632
test-suite :: MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl.test 2.825 2.824 0.135 (4.784%) 2.648
% Change: 2.715 % 2.715 % 0.581 %
test-suite :: MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt.test 1.002 1.002 0.077 (7.686%) 0.919
test-suite :: MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt.test 1.041 1.043 0.069 (6.591%) 0.962
% Change: 4.099 % 4.099 % 4.579 %

When I re-ran the TSVC tests 100 times on a quieter machine, all the
differences were less than 2%.

Tidied up names according to comments.

In D18223#380658, @iteratee wrote:

Tidied up names according to comments.

Much better, thanks!

I think a test that covers the different thresholds for optsize and regular functions is still needed.

Add a test that verifies the density switch does what it says.

lgtm

test/CodeGen/X86/switch-density.ll
75 ↗	(On Diff #51850)	This one's always a jump table right, so the comment is slightly wrong?

This revision is now accepted and ready to land.Mar 28 2016, 4:12 PM

iteratee closed this revision.Apr 7 2016, 9:17 PM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.h

5 lines

SelectionDAGBuilder.cpp

26 lines

test/

CodeGen/

ARM/

2011-08-25-ldmia_ret.ll

2 lines

Generic/

MachineBranchProb.ll

14 lines

PowerPC/

pr26690.ll

6 lines

Thumb2/

ldr-str-imm12.ll

20 lines

X86/

switch-bt.ll

2 lines

switch-edge-weight.ll

12 lines

switch.ll

4 lines

Diff 50897

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Context not available.
	BranchProbability DefaultProb;	BranchProbability DefaultProb;
	};	};

	/// Minimum jump table density, in percent.
	enum { MinJumpTableDensity = 40 };

	/// Check whether a range of clusters is dense enough for a jump table.	/// Check whether a range of clusters is dense enough for a jump table.
	bool isDense(const CaseClusterVector &Clusters, unsigned *TotalCases,	bool isDense(const CaseClusterVector &Clusters, unsigned *TotalCases,
	unsigned First, unsigned Last);	unsigned First, unsigned Last, unsigned Density);
		hansUnsubmitted Done Reply Inline Actions How about calling the new parameter MinDensity, RequiredDensity, or something like that to indicate that it's a threshold that the actual density gets compared against? hans: How about calling the new parameter MinDensity, RequiredDensity, or something like that to…

	/// Build a jump table cluster from Clusters[First..Last]. Returns false if it	/// Build a jump table cluster from Clusters[First..Last]. Returns false if it
	/// decides it's not a good idea.	/// decides it's not a good idea.
Context not available.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

Context not available.
	EnableFMFInDAG("enable-fmf-dag", cl::init(true), cl::Hidden,	EnableFMFInDAG("enable-fmf-dag", cl::init(true), cl::Hidden,
	cl::desc("Enable fast-math-flags for DAG nodes"));	cl::desc("Enable fast-math-flags for DAG nodes"));

		/// Minimum jump table density for normal
		hansUnsubmitted Done Reply Inline Actions for normal what? hans: for normal what?
		static cl::opt<unsigned>
		SparseJumpTableDensity("sparse-jump-table-density", cl::init(10), cl::Hidden,
		hansUnsubmitted Done Reply Inline Actions For a user who wants to tweak these flags, I'm not sure if the "sparse-" and "dense-" names are the most friendly. What would you think of calling them "-jump-table-density" and "-optsize-jump-table-density"? hans: For a user who wants to tweak these flags, I'm not sure if the "sparse-" and "dense-" names are…
		cl::desc("Minimum density for building a jump table in "
		"a normal function"));

		/// Minimum jump table density for -Os or -Oz functions
		hansUnsubmitted Done Reply Inline Actions ultra nit: period at end of comment. hans: ultra nit: period at end of comment.
		static cl::opt<unsigned>
		DenseJumpTableDensity("dense-jump-table-density", cl::init(40), cl::Hidden,
		cl::desc("Minimum density for building a jump table in "
		"an optsize function"));


	// Limit the width of DAG chains. This is important in general to prevent	// Limit the width of DAG chains. This is important in general to prevent
	// DAG-based analysis from blowing up. For example, alias analysis and	// DAG-based analysis from blowing up. For example, alias analysis and
	// load clustering may not complete in reasonable time. It is difficult to	// load clustering may not complete in reasonable time. It is difficult to
Context not available.

	bool SelectionDAGBuilder::isDense(const CaseClusterVector &Clusters,	bool SelectionDAGBuilder::isDense(const CaseClusterVector &Clusters,
	unsigned *TotalCases, unsigned First,	unsigned *TotalCases, unsigned First,
	unsigned Last) {	unsigned Last,
		unsigned Density) {
	assert(Last >= First);	assert(Last >= First);
	assert(TotalCases[Last] >= TotalCases[First]);	assert(TotalCases[Last] >= TotalCases[First]);

Context not available.
	assert(NumCases < UINT64_MAX / 100);	assert(NumCases < UINT64_MAX / 100);
	assert(Range >= NumCases);	assert(Range >= NumCases);

	return NumCases * 100 >= Range * MinJumpTableDensity;	return NumCases * 100 >= Range * Density;
	}	}

	static inline bool areJTsAllowed(const TargetLowering &TLI) {	static inline bool areJTsAllowed(const TargetLowering &TLI) {
Context not available.
	TotalCases[i] += TotalCases[i - 1];	TotalCases[i] += TotalCases[i - 1];
	}	}

	if (N >= MinJumpTableSize && isDense(Clusters, &TotalCases[0], 0, N - 1)) {	unsigned Density = SparseJumpTableDensity;
		if (DefaultMBB->getParent()->getFunction()->optForSize())
		Density = DenseJumpTableDensity;
		hansUnsubmitted Done Reply Inline Actions Dense jump table density is dense? :-) I think this variable name would come out better if the flag was renamed as suggested above. hans: Dense jump table density is dense? :-) I think this variable name would come out better if the…
		if (N >= MinJumpTableSize
		&& isDense(Clusters, &TotalCases[0], 0, N - 1, Density)) {
	// Cheap case: the whole range might be suitable for jump table.	// Cheap case: the whole range might be suitable for jump table.
	CaseCluster JTCluster;	CaseCluster JTCluster;
	if (buildJumpTable(Clusters, 0, N - 1, SI, DefaultMBB, JTCluster)) {	if (buildJumpTable(Clusters, 0, N - 1, SI, DefaultMBB, JTCluster)) {
Context not available.
	// Search for a solution that results in fewer partitions.	// Search for a solution that results in fewer partitions.
	for (int64_t j = N - 1; j > i; j--) {	for (int64_t j = N - 1; j > i; j--) {
	// Try building a partition from Clusters[i..j].	// Try building a partition from Clusters[i..j].
	if (isDense(Clusters, &TotalCases[0], i, j)) {	if (isDense(Clusters, &TotalCases[0], i, j, Density)) {
	unsigned NumPartitions = 1 + (j == N - 1 ? 0 : MinPartitions[j + 1]);	unsigned NumPartitions = 1 + (j == N - 1 ? 0 : MinPartitions[j + 1]);
	bool IsTable = j - i + 1 >= MinJumpTableSize;	bool IsTable = j - i + 1 >= MinJumpTableSize;
	unsigned Tables = IsTable + (j == N - 1 ? 0 : NumTables[j + 1]);	unsigned Tables = IsTable + (j == N - 1 ? 0 : NumTables[j + 1]);
Context not available.

test/CodeGen/ARM/2011-08-25-ldmia_ret.ll

	; RUN: llc < %s -mtriple=thumbv7-apple-ios -mcpu=cortex-a9 \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv7-apple-ios -mcpu=cortex-a9 -sparse-jump-table-density=40 \| FileCheck %s
	; Test that ldmia_ret preserves implicit operands for return values.			; Test that ldmia_ret preserves implicit operands for return values.
	;			;
	; This CFG is reduced from a benchmark miscompile. With current			; This CFG is reduced from a benchmark miscompile. With current
				echristoUnsubmitted Done Reply Inline Actions Only one function could probably pass your command line option alternately? No preference other than it'll isolate the testcase from any other optsize changes that happen. echristo: Only one function could probably pass your command line option alternately? No preference…

test/CodeGen/Generic/MachineBranchProb.ll

Context not available.
	entry:	entry:
	switch i32 %x, label %return [	switch i32 %x, label %return [
	i32 0, label %bb0	i32 0, label %bb0
	i32 10, label %bb1	i32 100, label %bb1
	i32 20, label %bb2	i32 200, label %bb2
	i32 30, label %bb3	i32 300, label %bb3
	i32 40, label %bb4	i32 400, label %bb4
	i32 50, label %bb5	i32 500, label %bb5
	], !prof !1	], !prof !1
	bb0: tail call void @g(i32 0) br label %return	bb0: tail call void @g(i32 0) br label %return
	bb1: tail call void @g(i32 1) br label %return	bb1: tail call void @g(i32 1) br label %return
Context not available.
	!1 = !{!"branch_weights",	!1 = !{!"branch_weights",
	; Default:	; Default:
	i32 1,	i32 1,
	; Case 0, 10, 20:	; Case 0, 100, 200:
	i32 10, i32 1, i32 1,	i32 10, i32 1, i32 1,
	; Case 30, 40, 50:	; Case 300, 400, 500:
	i32 1, i32 10, i32 10}	i32 1, i32 10, i32 10}
Context not available.

test/CodeGen/PowerPC/pr26690.ll

Context not available.
	while.body: ; preds = %while.body.backedge, %while.body.lr.ph	while.body: ; preds = %while.body.backedge, %while.body.lr.ph
	switch i32 %.pre, label %while.body.backedge [	switch i32 %.pre, label %while.body.backedge [
	i32 0, label %sw.bb1	i32 0, label %sw.bb1
	i32 8, label %sw.bb1	i32 80, label %sw.bb1
	i32 6, label %sw.bb1	i32 60, label %sw.bb1
	i32 24, label %while.cond.backedge	i32 240, label %while.cond.backedge
	]	]

	while.body.backedge: ; preds = %while.body, %while.cond.backedge	while.body.backedge: ; preds = %while.body, %while.cond.backedge
Context not available.

test/CodeGen/Thumb2/ldr-str-imm12.ll

Context not available.

	bb20: ; preds = %entry	bb20: ; preds = %entry
	switch i32 undef, label %bb1287 [	switch i32 undef, label %bb1287 [
	i32 11, label %bb119	i32 110, label %bb119
	i32 12, label %bb119	i32 120, label %bb119
	i32 21, label %bb420	i32 210, label %bb420
	i32 23, label %bb420	i32 230, label %bb420
	i32 45, label %bb438	i32 450, label %bb438
	i32 46, label %bb438	i32 460, label %bb438
	i32 55, label %bb533	i32 550, label %bb533
	i32 56, label %bb569	i32 560, label %bb569
	i32 64, label %bb745	i32 640, label %bb745
	i32 78, label %bb1098	i32 780, label %bb1098
	]	]

	bb119: ; preds = %bb20, %bb20	bb119: ; preds = %bb20, %bb20
Context not available.

test/CodeGen/X86/switch-bt.ll

	; RUN: llc -march=x86-64 -asm-verbose=false < %s \| FileCheck %s			; RUN: llc -march=x86-64 -asm-verbose=false < %s -sparse-jump-table-density=40 \| FileCheck %s

	; This switch should use bit tests, and the third bit test case is just			; This switch should use bit tests, and the third bit test case is just
	; testing for one possible value, so it doesn't need a bt.			; testing for one possible value, so it doesn't need a bt.
				echristoUnsubmitted Done Reply Inline Actions Ditto. echristo: Ditto.
				echristoUnsubmitted Done Reply Inline Actions Ditto. echristo: Ditto.

test/CodeGen/X86/switch-edge-weight.ll

Context not available.
	; block.	; block.

	switch i32 %x, label %sw.default [	switch i32 %x, label %sw.default [
	i32 1, label %sw.bb	i32 4, label %sw.bb
	i32 5, label %sw.bb2	i32 20, label %sw.bb2
	i32 7, label %sw.bb3	i32 28, label %sw.bb3
	i32 9, label %sw.bb4	i32 36, label %sw.bb4
	i32 31, label %sw.bb5	i32 124, label %sw.bb5
	], !prof !2	], !prof !2

	sw.bb:	sw.bb:
Context not available.
	;	;
	; CHECK: BB#0:	; CHECK: BB#0:
	; BB#0 to BB#6: [10, UINT32_MAX] (15)	; BB#0 to BB#6: [10, UINT32_MAX] (15)
	; BB#0 to BB#8: [1, 5, 7, 9] (jump table) (45)	; BB#0 to BB#8: [4, 20, 28, 36] (jump table) (45)
	; CHECK: Successors according to CFG: BB#8({{[0-9a-fx/= ]+}}25.00%) BB#9({{[0-9a-fx/= ]+}}75.00%)	; CHECK: Successors according to CFG: BB#8({{[0-9a-fx/= ]+}}25.00%) BB#9({{[0-9a-fx/= ]+}}75.00%)
	}	}

Context not available.

test/CodeGen/X86/switch.ll

	; RUN: llc -mtriple=x86_64-linux-gnu %s -o - \| FileCheck %s			; RUN: llc -mtriple=x86_64-linux-gnu %s -o - -sparse-jump-table-density=40 \| FileCheck %s
	; RUN: llc -mtriple=x86_64-linux-gnu %s -o - -O0 \| FileCheck --check-prefix=NOOPT %s			; RUN: llc -mtriple=x86_64-linux-gnu %s -o - -O0 -sparse-jump-table-density=40 \| FileCheck --check-prefix=NOOPT %s

	declare void @g(i32)			declare void @g(i32)