This is an archive of the discontinued LLVM Phabricator instance.

Switch lowering: use profile info to build weight-balanced binary search trees
ClosedPublic

Authored by hans on Apr 27 2015, 7:17 PM.

Download Raw Diff

Details

Reviewers

Commits

rG4b828d35fdf8: Switch lowering: use profile info to build weight-balanced binary search trees
rL236192: Switch lowering: use profile info to build weight-balanced binary search trees

Summary

This patch uses profile info to balance the binary search trees used for switch lowering by weight instead of node count, causing hot nodes to appear closer to the root.

I've been using this benchmark:

switch_with_profile.c19 KBDownload

. It's a 512-case switch. A loop exercises the cases in a random sequence, hitting the first case once, the second twice, and so on, making the 512th case the hottest. I ran it like this:

$ bin/clang -O3 switch_with_profile.c 
$ perf stat -r5 ./a.out                                                                                                                                                                                                               

 Performance counter stats for './a.out' (5 runs):

       2456.382229 task-clock (msec)         #    0.999 CPUs utilized            ( +-  0.63% )
               403 context-switches          #    0.164 K/sec                    ( +-  3.93% )
                12 cpu-migrations            #    0.005 K/sec                    ( +- 32.02% )
               131 page-faults               #    0.053 K/sec                  
     7,961,322,884 cycles                    #    3.241 GHz                      ( +-  0.07% ) [66.68%]
     2,511,064,458 stalled-cycles-frontend   #   31.54% frontend cycles idle     ( +-  0.06% ) [50.05%]
     1,874,963,880 stalled-cycles-backend    #   23.55% backend  cycles idle     ( +-  0.05% ) [50.14%]
    10,579,662,099 instructions              #    1.33  insns per cycle        
                                             #    0.24  stalled cycles per insn  ( +-  0.08% ) [66.80%]
     4,910,929,143 branches                  # 1999.253 M/sec                    ( +-  0.05% ) [66.63%]
        14,888,827 branch-misses             #    0.30% of all branches          ( +-  0.07% ) [66.58%]

       2.460022963 seconds time elapsed                                          ( +-  0.64% )

$ bin/clang -O3 switch_with_profile.c -fprofile-instr-generate
$ LLVM_PROFILE_FILE=profile.raw ./a.out
$ bin/llvm-profdata merge -output=profile.profdata profile.raw
$ bin/clang -O3 switch_with_profile.c -fprofile-instr-use=profile.profdata
$ perf stat -r5 ./a.out

 Performance counter stats for './a.out' (5 runs):

       2156.926282 task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.84% )
               348 context-switches          #    0.161 K/sec                    ( +-  3.07% )
                 7 cpu-migrations            #    0.003 K/sec                    ( +- 39.10% )
               131 page-faults               #    0.061 K/sec                    ( +-  0.15% )
     7,269,065,426 cycles                    #    3.370 GHz                      ( +-  0.05% ) [66.56%]
     2,183,070,540 stalled-cycles-frontend   #   30.03% frontend cycles idle     ( +-  0.27% ) [50.00%]
     1,601,430,175 stalled-cycles-backend    #   22.03% backend  cycles idle     ( +-  0.37% ) [50.21%]
    10,328,446,596 instructions              #    1.42  insns per cycle        
                                             #    0.21  stalled cycles per insn  ( +-  0.08% ) [66.91%]
     4,777,439,719 branches                  # 2214.930 M/sec                    ( +-  0.05% ) [66.75%]
        14,804,172 branch-misses             #    0.31% of all branches          ( +-  0.06% ) [66.59%]

       2.160195042 seconds time elapsed                                          ( +-  0.84% )

That's a 12 % speed-up, which I think is good given that the switch's profile is tricky and not completely dominated by one or a few cases.

Note that this also affects non-profile guided builds. When there is no profile info, each case has the same weight. When clustered together in e.g. a jump table, that cluster will be heavier than a single-case cluster, which affects the tree layout.

Diff Detail

Repository: rL LLVM

Event Timeline

hans updated this revision to Diff 24527.Apr 27 2015, 7:17 PM

hans retitled this revision from to Switch lowering: use profile info to build weight-balanced binary search trees.

hans updated this object.

hans edited the test plan for this revision. (Show Details)

hans added a reviewer: djasper.

hans added subscribers: Unknown Object (MLST), hansw, dnovillo and 2 others.

Looks reasonable to me.

djasper added inline comments.Apr 29 2015, 9:34 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7814 ↗	(On Diff #24527)	No braces.
7819 ↗	(On Diff #24527)	At this point Left and Right are identical and the element they are pointing it belongs to the larger/heavier weight, right?
test/CodeGen/Generic/MachineBranchProb.ll
61 ↗	(On Diff #24527)	Maybe add a comment that this division is because the left half is 1+10+1+1 = 13, the right half is 10+10=20 and we are directly checking against the pivot element so that that one weight doesn't contribute to either side. (If I am correct).
test/CodeGen/X86/switch.ll
471 ↗	(On Diff #24527)	Could you describe somewhere (patch description, comment) what weight=0 means and how it affects the balancing? From this test, I would assume that we just do balanced trees in this case. However, the test below suggests otherwise.

Addressing djasper's comments.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7814 ↗	(On Diff #24527)	Done.
7819 ↗	(On Diff #24527)	No, Left == Right - 1 at this point, and LeftWeight and RightWeight are as equal as possible. For example, if we have cases with weights {3,4,3,5}, the first two belong on the left side (LeftWeight=7, Left=1) and the other two on the right (RightWeight=8, Right=2). We use Right as the pivot element since we're doing a < comparison. I'll try to comment this better, and use better names for the variables.
test/CodeGen/Generic/MachineBranchProb.ll
61 ↗	(On Diff #24527)	The first weight in the metadata is for the default case. The weights on the left and right are: 10+1+1+1=13 vs 10+10=20. I'll add comments making this easier to read. The pivot element is part of the right side and does contribute to that weight. (This is different from gcc which I think always does both 'je' and 'jg' on pivot elements, re-using the condition code from the 'cmp'.)
test/CodeGen/X86/switch.ll
471 ↗	(On Diff #24527)	The problem is that the pivot finding code cannot handle 0-weight nodes as they don't affect the weight-balance. E.g. a tree with weights {10,0,0,0,0,0,0} to the left and {10} on the right would be considered balanced because the sub-trees have equal weight. Luckily, BranchProbability::getEdgeWeight() will never return 0. I just figured this was important enough to have a test for. I'll add a comment in the code.

djasper added inline comments.Apr 29 2015, 3:47 PM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7997 ↗	(On Diff #24658)	"each other"?
test/CodeGen/Generic/MachineBranchProb.ll
58 ↗	(On Diff #24658)	This is actually an interesting case. As I understand it, LLVM will now do? this_case = (x < 40 ? x < 20 ? x == 0 ? A : x == 10 ? B : DEFAULT : x == 20 ? C : x == 30 ? D : DEFAULT : x == 40 ? E : x == 50 ? F : DEFAULT); (I hope it is somewhat readable what I mean, case A represents 0, case B 10,... - I just didn't want to re-use the numbers as that would be even more confusing). Now, I hope I counted correctly, this gives me the following numbers. No of comparisons: A: 3 (Weight: 10, Weighted: 30) B: 4 (Weight: 1, Weighted: 4) C: 3 (Weight: 1, Weighted: 3) D: 4 (Weight: 1, Weighted: 4) E: 2 (Weight: 10, Weighted: 20) F: 3 (Weight: 10, Weighted: 30) --- SUM: 19 ( Weighted: 91) However, if we split this equally, we get to do a linear scan on both sides: this_case = (x < 30 ? x == 0 ? A : x == 10 ? B : x == 20 ? C : DEFAULT : x == 40 ? E : x == 50 ? F : x == 30 ? D : DEFAULT); No of comparisons: A: 2 (Weight: 10, Weighted: 20) B: 3 (Weight: 1, Weighted: 3) C: 4 (Weight: 1, Weighted: 4) D: 4 (Weight: 1, Weighted: 4) E: 2 (Weight: 10, Weighted: 20) F: 3 (Weight: 10, Weighted: 30) --- SUM: 18 ( Weighted: 81) Which seems beneficial in total. I think the rule might be something like: Do never create a split with less than three elements on one side unless the smallest weight on that side is larger than all the weights on the other side. But that might not be sufficient.
70 ↗	(On Diff #24658)	I think this is Case 0, 10, 20
61 ↗	(On Diff #24527)	Right. Why doesn't LLVM do that? Intuitively, it sounds like a good idea. (But that's for a follow-up patch if that).
test/CodeGen/X86/switch.ll
471 ↗	(On Diff #24527)	But what's the added benefit over the test below?

Just to clarify. I think this is definitely an excellent step and good to go in independent of whether my analysis is correct or not :-).

So, ship it :-).

This revision is now accepted and ready to land.Apr 29 2015, 3:58 PM

In D9318#163566, @djasper wrote:

Just to clarify. I think this is definitely an excellent step and good to go in independent of whether my analysis is correct or not :-).

So, ship it :-).

Thanks very much for the review!

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7997 ↗	(On Diff #24658)	Done.
test/CodeGen/Generic/MachineBranchProb.ll
58 ↗	(On Diff #24658)	It's going to be this_case = (x < 40 ? x < 30 ? x == 0 ? A : default : x == 10 ? C : x == 20 ? C : x == 30 ? D : DEFAULT : x == 40 ? E : x == 50 ? F : DEFAULT); But the numbers add upp the same way. This is interesting. It's something about our leaves being different than in a typical binary tree in that we can do up to three comparisons in them. Like you, I'm also thinking we should probably be careful creating a split with less than three on one side + some conditions. But this will require some thinking. As discussed on the IRC channel, for switches where some cases are completely dominating, hoisting them out is probably a good idea. What I've been doing when playing with benchmarks is turning: switch (x) { cases ... } into switch (x) { hot_case 1: hot_case 2: ... } switch (x) { cold_cases here } And that's been really beneficial for switches where a few cases dominate. I like the current approach for handling switches in general though, and I think the hoisting would be a separate thing that takes place before.
70 ↗	(On Diff #24658)	Oops. Fixed.
61 ↗	(On Diff #24527)	It's not a given win. It reduces the number of cmps, but introduces more branches. It probably varies by architecture how fast this "three-way branch" is. For the benchmark attached in this review, Clang's -O3 (without profile info) was faster than gcc's -O3, maybe because of this.
test/CodeGen/X86/switch.ll
471 ↗	(On Diff #24527)	Oh, the test below shouldn't have 0-weights. I'll remove that.

Closed by commit rL236192: Switch lowering: use profile info to build weight-balanced binary search trees (authored by hans). · Explain WhyApr 29 2015, 6:01 PM

This revision was automatically updated to reflect the committed changes.

hans added inline comments.Jun 20 2015, 10:23 AM

test/CodeGen/Generic/MachineBranchProb.ll
58 ↗	(On Diff #24658)	Coming back to this example where weight-balancing the tree actually makes the example worse, since it would have been better to have 3 elements in the leaf node instead of just one. I committed r240224 to address this. It's not perfect, but I think it helps a lot.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

42 lines

test/

CodeGen/

Generic/

MachineBranchProb.ll

40 lines

X86/

switch.ll

83 lines

Diff 24672

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,980 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::splitWorkItem(SwitchWorkList &WorkList,
Value *Cond,		Value *Cond,
MachineBasicBlock *SwitchMBB) {		MachineBasicBlock *SwitchMBB) {
assert(W.FirstCluster->Low->getValue().slt(W.LastCluster->Low->getValue()) &&		assert(W.FirstCluster->Low->getValue().slt(W.LastCluster->Low->getValue()) &&
"Clusters not sorted?");		"Clusters not sorted?");

unsigned NumClusters = W.LastCluster - W.FirstCluster + 1;		unsigned NumClusters = W.LastCluster - W.FirstCluster + 1;
assert(NumClusters >= 2 && "Too small to split!");		assert(NumClusters >= 2 && "Too small to split!");

// FIXME: When we have profile info, we might want to balance the tree based		// Balance the tree based on branch weights to create a near-optimal (in terms
// on weights instead of node count.		// of search time given key frequency) binary search tree. See e.g. Kurt
		// Mehlhorn "Nearly Optimal Binary Search Trees" (1975).
		CaseClusterIt LastLeft = W.FirstCluster;
		CaseClusterIt FirstRight = W.LastCluster;
		uint32_t LeftWeight = LastLeft->Weight;
		uint32_t RightWeight = FirstRight->Weight;

		// Move LastLeft and FirstRight towards each other from opposite directions to
		// find a partitioning of the clusters which balances the weight on both
		// sides.
		while (LastLeft + 1 < FirstRight) {
		// Zero-weight nodes would cause skewed trees since they don't affect
		// LeftWeight or RightWeight.
		assert(LastLeft->Weight != 0);
		assert(FirstRight->Weight != 0);

		if (LeftWeight < RightWeight)
		LeftWeight += (++LastLeft)->Weight;
		else
		RightWeight += (--FirstRight)->Weight;
		}
		assert(LastLeft + 1 == FirstRight);
		assert(LastLeft >= W.FirstCluster);
		assert(FirstRight <= W.LastCluster);

		// Use the first element on the right as pivot since we will make less-than
		// comparisons against it.
		CaseClusterIt PivotCluster = FirstRight;
		assert(PivotCluster > W.FirstCluster);
		assert(PivotCluster <= W.LastCluster);

CaseClusterIt PivotCluster = W.FirstCluster + NumClusters / 2;
CaseClusterIt FirstLeft = W.FirstCluster;		CaseClusterIt FirstLeft = W.FirstCluster;
CaseClusterIt LastLeft = PivotCluster - 1;
CaseClusterIt FirstRight = PivotCluster;
CaseClusterIt LastRight = W.LastCluster;		CaseClusterIt LastRight = W.LastCluster;

const ConstantInt *Pivot = PivotCluster->Low;		const ConstantInt *Pivot = PivotCluster->Low;

// New blocks will be inserted immediately after the current one.		// New blocks will be inserted immediately after the current one.
MachineFunction::iterator BBI = W.MBB;		MachineFunction::iterator BBI = W.MBB;
++BBI;		++BBI;

// We will branch to the LHS if Value < Pivot. If LHS is a single cluster,		// We will branch to the LHS if Value < Pivot. If LHS is a single cluster,
// we can branch to its destination directly if it's squeezed exactly in		// we can branch to its destination directly if it's squeezed exactly in
Show All 22 Lines	if (FirstRight == LastRight && FirstRight->Kind == CC_Range &&
RightMBB = FuncInfo.MF->CreateMachineBasicBlock(W.MBB->getBasicBlock());		RightMBB = FuncInfo.MF->CreateMachineBasicBlock(W.MBB->getBasicBlock());
FuncInfo.MF->insert(BBI, RightMBB);		FuncInfo.MF->insert(BBI, RightMBB);
WorkList.push_back({RightMBB, FirstRight, LastRight, Pivot, W.LT});		WorkList.push_back({RightMBB, FirstRight, LastRight, Pivot, W.LT});
// Put Cond in a virtual register to make it available from the new blocks.		// Put Cond in a virtual register to make it available from the new blocks.
ExportFromCurrentBlock(Cond);		ExportFromCurrentBlock(Cond);
}		}

// Create the CaseBlock record that will be used to lower the branch.		// Create the CaseBlock record that will be used to lower the branch.
CaseBlock CB(ISD::SETLT, Cond, Pivot, nullptr, LeftMBB, RightMBB, W.MBB);		CaseBlock CB(ISD::SETLT, Cond, Pivot, nullptr, LeftMBB, RightMBB, W.MBB,
		LeftWeight, RightWeight);

if (W.MBB == SwitchMBB)		if (W.MBB == SwitchMBB)
visitSwitchCase(CB, SwitchMBB);		visitSwitchCase(CB, SwitchMBB);
else		else
SwitchCases.push_back(CB);		SwitchCases.push_back(CB);
}		}

void SelectionDAGBuilder::visitSwitch(const SwitchInst &SI) {		void SelectionDAGBuilder::visitSwitch(const SwitchInst &SI) {
// Extract cases from the switch.		// Extract cases from the switch.
BranchProbabilityInfo *BPI = FuncInfo.BPI;		BranchProbabilityInfo *BPI = FuncInfo.BPI;
CaseClusterVector Clusters;		CaseClusterVector Clusters;
Clusters.reserve(SI.getNumCases());		Clusters.reserve(SI.getNumCases());
for (auto I : SI.cases()) {		for (auto I : SI.cases()) {
MachineBasicBlock *Succ = FuncInfo.MBBMap[I.getCaseSuccessor()];		MachineBasicBlock *Succ = FuncInfo.MBBMap[I.getCaseSuccessor()];
const ConstantInt *CaseVal = I.getCaseValue();		const ConstantInt *CaseVal = I.getCaseValue();
uint32_t Weight = 0; // FIXME: Use 1 instead?		uint32_t Weight = 1;
if (BPI) {		if (BPI) {
Weight = BPI->getEdgeWeight(SI.getParent(), I.getSuccessorIndex());		Weight = BPI->getEdgeWeight(SI.getParent(), I.getSuccessorIndex());
assert(Weight <= UINT32_MAX / SI.getNumSuccessors());		assert(Weight <= UINT32_MAX / SI.getNumSuccessors());
}		}
Clusters.push_back(CaseCluster::range(CaseVal, CaseVal, Succ, Weight));		Clusters.push_back(CaseCluster::range(CaseVal, CaseVal, Succ, Weight));
}		}

MachineBasicBlock *DefaultMBB = FuncInfo.MBBMap[SI.getDefaultDest()];		MachineBasicBlock *DefaultMBB = FuncInfo.MBBMap[SI.getDefaultDest()];
▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/Generic/MachineBranchProb.ll

; RUN: llc < %s -print-machineinstrs=expand-isel-pseudos -o /dev/null 2>&1 \| FileCheck %s		; RUN: llc < %s -print-machineinstrs=expand-isel-pseudos -o /dev/null 2>&1 \| FileCheck %s

; ARM & AArch64 run an extra SimplifyCFG which disrupts this test.		; ARM & AArch64 run an extra SimplifyCFG which disrupts this test.
; XFAIL: arm,aarch64		; XFAIL: arm,aarch64

; Make sure we have the correct weight attached to each successor.		; Make sure we have the correct weight attached to each successor.
define i32 @test2(i32 %x) nounwind uwtable readnone ssp {		define i32 @test2(i32 %x) nounwind uwtable readnone ssp {
; CHECK: Machine code for function test2:		; CHECK-LABEL: Machine code for function test2:
entry:		entry:
%conv = sext i32 %x to i64		%conv = sext i32 %x to i64
switch i64 %conv, label %return [		switch i64 %conv, label %return [
i64 0, label %sw.bb		i64 0, label %sw.bb
i64 1, label %sw.bb		i64 1, label %sw.bb
i64 4, label %sw.bb		i64 4, label %sw.bb
i64 5, label %sw.bb1		i64 5, label %sw.bb1
], !prof !0		], !prof !0
Show All 11 Lines	sw.bb1:
br label %return		br label %return

return:		return:
%retval.0 = phi i32 [ 5, %sw.bb1 ], [ 1, %sw.bb ], [ 0, %entry ]		%retval.0 = phi i32 [ 5, %sw.bb1 ], [ 1, %sw.bb ], [ 0, %entry ]
ret i32 %retval.0		ret i32 %retval.0
}		}

!0 = !{!"branch_weights", i32 7, i32 6, i32 4, i32 4, i32 64}		!0 = !{!"branch_weights", i32 7, i32 6, i32 4, i32 4, i32 64}


		declare void @g(i32)
		define void @left_leaning_weight_balanced_tree(i32 %x) {
		entry:
		switch i32 %x, label %return [
		i32 0, label %bb0
		i32 10, label %bb1
		i32 20, label %bb2
		i32 30, label %bb3
		i32 40, label %bb4
		i32 50, label %bb5
		], !prof !1
		bb0: tail call void @g(i32 0) br label %return
		bb1: tail call void @g(i32 1) br label %return
		bb2: tail call void @g(i32 2) br label %return
		bb3: tail call void @g(i32 3) br label %return
		bb4: tail call void @g(i32 4) br label %return
		bb5: tail call void @g(i32 5) br label %return
		return: ret void

		; Check that we set branch weights on the pivot cmp instruction correctly.
		; Cases {0,10,20,30} go on the left with weight 13; cases {40,50} go on the
		; right with weight 20.
		;
		; CHECK-LABEL: Machine code for function left_leaning_weight_balanced_tree:
		; CHECK: BB#0: derived from LLVM BB %entry
		; CHECK-NOT: Successors
		; CHECK: Successors according to CFG: BB#8(13) BB#9(20)
		}

		!1 = !{!"branch_weights",
		; Default:
		i32 1,
		; Case 0, 10, 20:
		i32 10, i32 1, i32 1,
		; Case 30, 40, 50:
		i32 1, i32 10, i32 10}

llvm/trunk/test/CodeGen/X86/switch.ll

Show First 20 Lines • Show All 436 Lines • ▼ Show 20 Lines	!2 = !{!"branch_weights",
; Default:		; Default:
i32 1,		i32 1,
; Case 100:		; Case 100:
i32 10,		i32 10,
; Case 200:		; Case 200:
i32 1000,		i32 1000,
; Case 300:		; Case 300:
i32 10}		i32 10}


		define void @zero_weight_tree(i32 %x) {
		entry:
		switch i32 %x, label %return [
		i32 0, label %bb0
		i32 10, label %bb1
		i32 20, label %bb2
		i32 30, label %bb3
		i32 40, label %bb4
		i32 50, label %bb5
		], !prof !3
		bb0: tail call void @g(i32 0) br label %return
		bb1: tail call void @g(i32 1) br label %return
		bb2: tail call void @g(i32 2) br label %return
		bb3: tail call void @g(i32 3) br label %return
		bb4: tail call void @g(i32 4) br label %return
		bb5: tail call void @g(i32 5) br label %return
		return: ret void

		; Make sure to pick a pivot in the middle also with zero-weight cases.
		; CHECK-LABEL: zero_weight_tree
		; CHECK-NOT: cmpl
		; CHECK: cmpl $29
		}

		!3 = !{!"branch_weights", i32 1, i32 10, i32 0, i32 0, i32 0, i32 0, i32 10}


		define void @left_leaning_weight_balanced_tree(i32 %x) {
		entry:
		switch i32 %x, label %return [
		i32 0, label %bb0
		i32 10, label %bb1
		i32 20, label %bb2
		i32 30, label %bb3
		i32 40, label %bb4
		i32 50, label %bb5
		], !prof !4
		bb0: tail call void @g(i32 0) br label %return
		bb1: tail call void @g(i32 1) br label %return
		bb2: tail call void @g(i32 2) br label %return
		bb3: tail call void @g(i32 3) br label %return
		bb4: tail call void @g(i32 4) br label %return
		bb5: tail call void @g(i32 5) br label %return
		return: ret void

		; To balance the tree by weight, the pivot is shifted to the right, moving hot
		; cases closer to the root.
		; CHECK-LABEL: left_leaning_weight_balanced_tree
		; CHECK-NOT: cmpl
		; CHECK: cmpl $39
		}

		!4 = !{!"branch_weights", i32 1, i32 10, i32 1, i32 1, i32 1, i32 10, i32 10}


		define void @jump_table_affects_balance(i32 %x) {
		entry:
		switch i32 %x, label %return [
		; Jump table:
		i32 0, label %bb0
		i32 1, label %bb1
		i32 2, label %bb2
		i32 3, label %bb3

		i32 100, label %bb0
		i32 200, label %bb1
		i32 300, label %bb2
		]
		bb0: tail call void @g(i32 0) br label %return
		bb1: tail call void @g(i32 1) br label %return
		bb2: tail call void @g(i32 2) br label %return
		bb3: tail call void @g(i32 3) br label %return
		return: ret void

		; CHECK-LABEL: jump_table_affects_balance
		; If the tree were balanced based on number of clusters, {0-3,100} would go on
		; the left and {200,300} on the right. However, the jump table weights as much
		; as its components, so 100 is selected as the pivot.
		; CHECK-NOT: cmpl
		; CHECK: cmpl $99
		}