This is an archive of the discontinued LLVM Phabricator instance.

Add support to optionally limit the size of jump tables
ClosedPublic

Authored by evandro on Jul 1 2016, 2:19 PM.

Details

Summary

Many high-performance processors have a dedicated branch predictor for indirect branches, commonly used with jump tables. As sophisticated as such branch predictors are, they tend to have well defined limits beyond which their effectiveness is hampered or even nullified. One such limit is the number of possible destinations for a given indirect branches that such branch predictors can handle.

This patch considers a limit that a target may set to the number of destination addresses in a jump table.

Diff Detail

Repository
rL LLVM

Event Timeline

evandro updated this revision to Diff 62542.Jul 1 2016, 2:19 PM
evandro retitled this revision from to Add support to optionally limit the size of jump tables.
evandro updated this object.
evandro added a reviewer: spop.
evandro added reviewers: rengolin, hans.
rengolin edited edge metadata.Jul 1 2016, 2:59 PM

How hard would be to write tests to these?

@rengolin, it depends if it's on a Friday before a holiday...

Seriously, though. I'm working on them.

hans edited edge metadata.Jul 1 2016, 3:31 PM

I'm not super familiar with branch predictors, but this does raise some questions.

The first is how you're planning to use this. It seems your patch only exposes this as two command-line options, something which is typically only used for debugging. Should the various targets have default values for this?

This also seems to be pretty a tricky trade-off between having a single indirect branch, or multiple direct and indirect branches. The latter puts more pressure on the CPU in terms of how many branches it has to track.

I assume you have some benchmark in mind where this is profitable?

I guess what I'm saying is that if the set of destinations is dense, building a jump table seems like a good choice in general, and breaking it up seems like a pretty CPU- and work-load specific tuning that I'm not sure the compiler should be doing.

@hans, what happens is that is the jump table is too large, many entries cannot be predicted and lead to thrashing of the indirect branch predictor. Increasing the mix of direct and indirect branches, as this patch promotes, allows both the direct branch predictor and the indirect branch predictor to yield an overall lower branch mispredict ratio. And, given that the cost of a mispredict may be vastly larger than the cost of computing, I observed an overall gain, remarkably in some benchmarks involving parsing.

hans added a comment.Jul 1 2016, 4:02 PM

Sure, I can see the benefit of this transformation, but I'm worried that it will be hard for the compiler to get right, as whether it's profitable or not depends on the micro-architecture and the run-time dynamics of the switch.

@hans, indeed, this change leaves it up to the targets to tune the limit for jump tables.

evandro updated this revision to Diff 62799.Jul 5 2016, 4:08 PM
evandro edited edge metadata.
evandro set the repository for this revision to rL LLVM.
  • Remove the option to specify the lower limit for jump tables.
  • Make the option to specify the upper limit for jump tables generic.
  • Add test case.

Sorry for the slow reply; I was on vacation.

I can certainly see that this will be beneficial to some codes on certain architectures.

My concern is whether the added complexity is worth it, as the code for building jump tables is already complicated. I now see that you've set a default value for a single AArch64 micro-architecture, ExynosM1. Does that single target justify the added complexity for all other targets that don't use this?

+Tim since he owns AArch64. I'd be interesting to hear what he thinks about the general value of this.

In D21940#487082, @hans wrote:

My concern is whether the added complexity is worth it, as the code for building jump tables is already complicated. I now see that you've set a default value for a single AArch64 micro-architecture, ExynosM1. Does that single target justify the added complexity for all other targets that don't use this?

Yes. On M1, this patch improves 253.perlbmk or 400.perlbench by up to 15% and up to 10% in private benchmarks. However, I'm rather positive that other micro architectures may benefit from this change as well, since indirect branch prediction is typically more limited and thus less accurate than direct branch prediction.

FWIW, it the sub target doesn't override the default limit, the algorithm iterates as many times as before.

t.p.northover edited edge metadata.Jul 19 2016, 1:13 PM

Tim since he owns AArch64. I'd be interesting to hear what he thinks about the general value of this.

I can't really comment on the performance changes, but it doesn't really seem to add much complexity. It seems pretty natural that if the minimum size is tuneable, the maximum ought to be.

hans added a comment.Jul 25 2016, 4:37 PM

Tim since he owns AArch64. I'd be interesting to hear what he thinks about the general value of this.

I can't really comment on the performance changes, but it doesn't really seem to add much complexity. It seems pretty natural that if the minimum size is tuneable, the maximum ought to be.

I hate to disagree, but I do think this adds complexity. In fact, I'm not convinced that the algorithm as modified by the patch actually works (see comment). If it does, I'd like to see some test cases demonstrating it.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8383 ↗(On Diff #62799)

why the variable renaming?

8386 ↗(On Diff #62799)

i suppose this is now Clusters[k]?

8401 ↗(On Diff #62799)

This splits Clusters[i..j] into MaxJumpTableSize-sized tables. But there's no guarantee that each such table would be dense enough for a jump table.

It could be that a range of 2 * MaxJumpTableSize clusters is only dense enough as a whole, and if it's split up as this code would do, the parts would not be dense enough for jump tables.

With this change, I don't think the algorithm finds the optimal partitioning given the density, min- and max-size criteria.

evandro added inline comments.Jul 25 2016, 4:44 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8383 ↗(On Diff #62799)

I believe that it improves the legibility of the code.

8386 ↗(On Diff #62799)

Aye.

8401 ↗(On Diff #62799)

Allow me to investigate this.

hans added inline comments.Jul 25 2016, 4:59 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8396 ↗(On Diff #62799)

Actually, probably all that's needed would be to add a '&& l - k <= MaxJumpTableSize' here. I think that would allow the algorithm to find optimal partitionings given minsize, maxsize and density, and it seems like a straight-forward change so not adding too much complexity.

This also needs better testing, including a check for optsize and some non-straight forward case. I don't think the current test works as the default target may or may not like jump tables; probably better to use test/CodeGen/X86/switch.ll

evandro added inline comments.Aug 2 2016, 3:08 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8396 ↗(On Diff #62799)

That condition limits the splitting unnecessarily. Perhaps isDense(Clusters, &TotalCases[0], i, j, MinDensity) would suffice to guarantee density, even if not without side effects.

hans added inline comments.Aug 2 2016, 3:14 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8396 ↗(On Diff #62799)

Can you elaborate on "That condition limits the splitting unnecessarily"?

Isn't limiting the partitioning so the jump tables are <= the max size what you want here?

evandro added inline comments.Aug 2 2016, 4:17 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8396 ↗(On Diff #62799)

I should have said that it increases the splitting unnecessarily, at least in simple, perfectly dense tests. I think that this happens by limiting the clusters that the loop visits, thus limiting the number of optimal solutions.

evandro added inline comments.Aug 4 2016, 2:09 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8396 ↗(On Diff #62799)

After giving this more thought, I think that the proposed patch does guarantee that the smaller tables are dense.

Given that the set [k, j] is dense, it follows that the maximum table size is j + 1 - k. Since j + 1 is k, the test l - i is correct.

hans requested changes to this revision.Aug 4 2016, 5:11 PM
hans edited edge metadata.

Here's an example where the proposed patch doesn't work:

declare void @g(i32)

define void @test(i32 %x) {
entry:
  switch i32 %x, label %return [
    i32 1, label %bb1
    i32 2, label %bb2
    i32 3, label %bb3
    i32 4, label %bb4

    i32 14, label %bb5
    i32 15, label %bb6
  ]
bb1: tail call void @g(i32 1) br label %return
bb2: tail call void @g(i32 2) br label %return
bb3: tail call void @g(i32 3) br label %return
bb4: tail call void @g(i32 4) br label %return
bb5: tail call void @g(i32 5) br label %return
bb6: tail call void @g(i32 6) br label %return
return: ret void
}

When run with

$ llc -mtriple=x86_64-linux-gnu -jump-table-density=40 -max-jump-table=4 a.ll -o -

it will decide the 1-15 jump table is too big, but fail to create a jump table for cases 1-4.

I believe the correct way to do this is something like https://reviews.llvm.org/differential/diff/66883/ and I'm not sure I understand where it would split things unnecessarily.

This revision now requires changes to proceed.Aug 4 2016, 5:11 PM

I'm confused about the change to the table density in the command line. Why did you deem it necessary?

Thank you.

hans added a comment.Aug 8 2016, 9:00 AM

I'm confused about the change to the table density in the command line. Why did you deem it necessary?

To make it more explicit which minimum jump table density my example was tailored for (40% is default for optsize, 10% otherwise). The switch in my example has 40% density, but your code will split off cases 3, 4, 14, 15 from that which falls below the threshold.

The example can of course be adjusted to show the problem at any density threshold.

In D21940#508684, @hans wrote:

The example can of course be adjusted to show the problem at any density threshold.

I think that I understand it better. I'll devise a better solution.

On a side note, I suspect that I should also examine findBitTestClusters() to enforce both MinimumJumpTableEntries and MaximumJumpTableEntries, yes?

hans added a comment.Aug 8 2016, 9:21 AM

On a side note, I suspect that I should also examine findBitTestClusters() to enforce both MinimumJumpTableEntries and MaximumJumpTableEntries, yes?

Why? findBitTestClusters() doesn't build jump tables.

In D21940#508705, @hans wrote:

Why? findBitTestClusters() doesn't build jump tables.

Of course not. It never calls buildJumpTable(). :-}

evandro updated this revision to Diff 71701.Sep 16 2016, 2:29 PM
evandro updated this object.
evandro edited edge metadata.
hans added a comment.Sep 16 2016, 3:05 PM

It looks like this is essentially what I proposed in https://reviews.llvm.org/differential/diff/66883/, except you're taking into account the "gaps" in the jump table when enforcing the maximum size.

It seems unfortunate that getMinimumJumpTableEntries() and getMaximumJumpTableSize() are now using different units of measurement.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8390 ↗(On Diff #71701)

All these unrelated changes are making the patch much harder to review. I also don't think they're improving anything.

I assume you changed this because you didn't like i being a signed type? And then "i >= 0" doesn't work anymore, so now you check "i > 0" and use "i - 1" everywhere, which is hard to read, in my opinion.

8428 ↗(On Diff #71701)

What was wrong with this one?

In D21940#545345, @hans wrote:

It seems unfortunate that getMinimumJumpTableEntries() and getMaximumJumpTableSize() are now using different units of measurement.

Indeed, but I'm afraid that it's necessary.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8390 ↗(On Diff #71701)

I meant to make sure to use data types consistently.

8428 ↗(On Diff #71701)

Err... purely stylistic. :-}

hans added a comment.Sep 16 2016, 3:17 PM
In D21940#545345, @hans wrote:

It seems unfortunate that getMinimumJumpTableEntries() and getMaximumJumpTableSize() are now using different units of measurement.

Indeed, but I'm afraid that it's necessary.

Can you elaborate on why it's necessary?

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8390 ↗(On Diff #71701)

Oh, you mean because N is unsigned? That makes some sense I suppose, but here I think it hurts readability. int64_t is wider than unsigned, so there is no risk of losing precision, and it allows us to iterate 'i' all the way down to 0.

evandro marked 6 inline comments as done.Sep 16 2016, 3:25 PM
In D21940#545362, @hans wrote:
In D21940#545345, @hans wrote:

It seems unfortunate that getMinimumJumpTableEntries() and getMaximumJumpTableSize() are now using different units of measurement.

Indeed, but I'm afraid that it's necessary.

Can you elaborate on why it's necessary?

getMinimumJumpTableEntries() is actually the lower limit for the number of cases to be considered for a jump table. getMaximumJumpTableSize() is the upper limit for the number of rows in a jump table.

In D21940#545362, @hans wrote:
In D21940#545345, @hans wrote:

It seems unfortunate that getMinimumJumpTableEntries() and getMaximumJumpTableSize() are now using different units of measurement.

Indeed, but I'm afraid that it's necessary.

Can you elaborate on why it's necessary?

getMinimumJumpTableEntries() is actually the lower limit for the number of cases to be considered for a jump table. getMaximumJumpTableSize() is the upper limit for the number of rows in a jump table.

Here's another thought: the difference in what the minimum limit specifies and what the maximum limit specifies is because the former applies to the input conditions and the latter, to the output. Then, perhaps the name of the methods might be changed to getMinimumJumpTableInputEntries() and getMaximumJumpTableOutputEntries(), or something along this line.

hans added a comment.Sep 21 2016, 9:53 AM

getMinimumJumpTableEntries() is actually the lower limit for the number of cases to be considered for a jump table. getMaximumJumpTableSize() is the upper limit for the number of rows in a jump table.

Here's another thought: the difference in what the minimum limit specifies and what the maximum limit specifies is because the former applies to the input conditions and the latter, to the output. Then, perhaps the name of the methods might be changed to getMinimumJumpTableInputEntries() and getMaximumJumpTableOutputEntries(), or something along this line.

Hmm, no I think that's more confusing. getMinimumJumpTableEntries() and getMaximumJumpTableSize() seem fine the more I think about it.

I feel like this code review is dragging out. Let me summarize my position here so we can get it committed:

  • I'm OK with the idea of enforcing a maximum jump table size, even if it's just used by default on some specific target.
  • I think the correct way to do it is just adding a "foo <= MaxSize" check in addition to the isDense() check, as you've done in the latest version
  • The new TotalGaps vector seems unnecessary. It should be possible to compute the JumpTableSize in constant time by looking at Low and High of the first and last cluster. Potentially this could be done with a small helper function similar to isDense.
  • Please remove all the unrelated changes; they make this patch much more intrusive than it should be.
evandro marked 2 inline comments as done.Sep 21 2016, 12:28 PM
In D21940#548881, @hans wrote:
  • The new TotalGaps vector seems unnecessary. It should be possible to compute the JumpTableSize in constant time by looking at Low and High of the first and last cluster. Potentially this could be done with a small helper function similar to isDense.

I'm not sure that it's equivalent, since each element in TotalGaps[] is the sum of of the gaps up to that element. Or am I missing something?

  • Please remove all the unrelated changes; they make this patch much more intrusive than it should be.

Will do.

evandro updated this revision to Diff 72094.Sep 21 2016, 12:30 PM
evandro edited edge metadata.

Tried to minimize ancillary changes. Please, let me know if I missed any.

evandro updated this revision to Diff 72096.Sep 21 2016, 12:32 PM

Really updating the patch this time... :-}

hans added a comment.Sep 21 2016, 1:05 PM
In D21940#548881, @hans wrote:
  • The new TotalGaps vector seems unnecessary. It should be possible to compute the JumpTableSize in constant time by looking at Low and High of the first and last cluster. Potentially this could be done with a small helper function similar to isDense.

I'm not sure that it's equivalent, since each element in TotalGaps[] is the sum of of the gaps up to that element. Or am I missing something?

Sure, but you use TotalGaps[] together with TotalCases[] to compute the size of the jump table:

unsigned JumpTableSize = TotalCases[j] - TotalCases[i - 1] +
                         TotalGaps[j] - TotalGaps[i - 1];

I'm saying you can compute the equivalent just by doing Clusters[j].High - Clusters[i].Low + 1.

Tried to minimize ancillary changes. Please, let me know if I missed any.

I still see lots, just look at the diff.

You're still changing N and the loop indices to unsigned. The comment for TotalCases, and the code to initialize it has changed, you're switching to pre-increment in the loops, ...

In D21940#549096, @hans wrote:

I'm not sure that it's equivalent, since each element in TotalGaps[] is the sum of of the gaps up to that element. Or am I missing something?

Sure, but you use TotalGaps[] together with TotalCases[] to compute the size of the jump table:

unsigned JumpTableSize = TotalCases[j] - TotalCases[i - 1] +
                         TotalGaps[j] - TotalGaps[i - 1];

I'm saying you can compute the equivalent just by doing Clusters[j].High - Clusters[i].Low + 1.

Clusters[i] or Clusters[i - 1] in this patch?

Besides, TotalGaps[] is also used at TotalCases[N - 1] + TotalGaps[N - 1] <= MaxJumpTableSize.

You're still changing N and the loop indices to unsigned.

I think that it's better practice to use data types consistently.

The comment for TotalCases, and the code to initialize it has changed, you're switching to pre-increment in the loops, ...

OK.

hans added a comment.Sep 21 2016, 1:30 PM
In D21940#549096, @hans wrote:

I'm not sure that it's equivalent, since each element in TotalGaps[] is the sum of of the gaps up to that element. Or am I missing something?

Sure, but you use TotalGaps[] together with TotalCases[] to compute the size of the jump table:

unsigned JumpTableSize = TotalCases[j] - TotalCases[i - 1] +
                         TotalGaps[j] - TotalGaps[i - 1];

I'm saying you can compute the equivalent just by doing Clusters[j].High - Clusters[i].Low + 1.

Clusters[i] or Clusters[i - 1] in this patch?

Clusters[i] as the loop is formulated in trunk.

Besides, TotalGaps[] is also used at TotalCases[N - 1] + TotalGaps[N - 1] <= MaxJumpTableSize.

The same computation can be used there. Maybe break it out to a helper function similar to isDense.

You're still changing N and the loop indices to unsigned.

I think that it's better practice to use data types consistently.

As I already pointed out, I think this change reduces readability and is unrelated to enforcing a maximum size on the jump tables.

In D21940#549128, @hans wrote:
In D21940#549096, @hans wrote:

I'm not sure that it's equivalent, since each element in TotalGaps[] is the sum of of the gaps up to that element. Or am I missing something?

Sure, but you use TotalGaps[] together with TotalCases[] to compute the size of the jump table:

unsigned JumpTableSize = TotalCases[j] - TotalCases[i - 1] +
                         TotalGaps[j] - TotalGaps[i - 1];

I'm saying you can compute the equivalent just by doing Clusters[j].High - Clusters[i].Low + 1.

Clusters[i] or Clusters[i - 1] in this patch?

Clusters[i] as the loop is formulated in trunk.

Besides, TotalGaps[] is also used at TotalCases[N - 1] + TotalGaps[N - 1] <= MaxJumpTableSize.

The same computation can be used there. Maybe break it out to a helper function similar to isDense.

I tried changing it but, IIUC, the test cases below fail to pass.

You're still changing N and the loop indices to unsigned.

I think that it's better practice to use data types consistently.

As I already pointed out, I think this change reduces readability and is unrelated to enforcing a maximum size on the jump tables.

Out of curiosity: would such changes be suitable for a follow up NFC patch?

hans added a comment.Sep 21 2016, 1:57 PM
In D21940#549128, @hans wrote:
In D21940#549096, @hans wrote:

I'm not sure that it's equivalent, since each element in TotalGaps[] is the sum of of the gaps up to that element. Or am I missing something?

Sure, but you use TotalGaps[] together with TotalCases[] to compute the size of the jump table:

unsigned JumpTableSize = TotalCases[j] - TotalCases[i - 1] +
                         TotalGaps[j] - TotalGaps[i - 1];

I'm saying you can compute the equivalent just by doing Clusters[j].High - Clusters[i].Low + 1.

Clusters[i] or Clusters[i - 1] in this patch?

Clusters[i] as the loop is formulated in trunk.

Besides, TotalGaps[] is also used at TotalCases[N - 1] + TotalGaps[N - 1] <= MaxJumpTableSize.

The same computation can be used there. Maybe break it out to a helper function similar to isDense.

I tried changing it but, IIUC, the test cases below fail to pass.

Then there must be a bug somewhere.

You're still changing N and the loop indices to unsigned.

I think that it's better practice to use data types consistently.

As I already pointed out, I think this change reduces readability and is unrelated to enforcing a maximum size on the jump tables.

Out of curiosity: would such changes be suitable for a follow up NFC patch?

That would be the way to propose such a change, yes, but I'm still not keen on the idea.

evandro updated this revision to Diff 72105.Sep 21 2016, 2:15 PM
evandro updated this object.

Fewer ancillary changes.

evandro updated this revision to Diff 72107.Sep 21 2016, 2:19 PM

Sigh... it's been a long day.

In D21940#549148, @hans wrote:

Then there must be a bug somewhere.

As usual, the issue of being off by one.

evandro updated this revision to Diff 72113.Sep 21 2016, 3:04 PM

Eliminated TotalGaps[].

hans added a comment.Sep 21 2016, 3:33 PM

Thanks for the update! This is starting to look much better now.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8338 ↗(On Diff #72113)

I prefer the old comment as it explains specifically what the element at index i represents.

8342 ↗(On Diff #72113)

No need to change the contents of this for-loop.

8352 ↗(On Diff #72113)

What do you think about introducing a helper function that we could call like 'jumpTableSize(Clusters, 0, N-1)'? I think that would make this a little easier to read here and below, and in the function we could take care not to let the value overflow for example.

8398 ↗(On Diff #72113)

This could overflow. It would be benign because a jump table of size 2^32 or 2^64 is not likely to be dense (if it is, we have other problems), but maybe we should be doing '.getLimitedValue(UINT_MAX - 1)' to make it saturate instead of overflow.

8401 ↗(On Diff #72113)

This line looks like an unrelated change.

8403 ↗(On Diff #72113)

These two lines look like an unrelated change.

8408 ↗(On Diff #72113)

Why do we need this extra check? We're already guaranteed by the above that there won't be any tables greater than the max size.

llvm/test/CodeGen/Generic/max-jump-table.ll
1 ↗(On Diff #72113)

Running llc without a target triple might end up running for a target that doesn't do jump tables for switches.

I think this test needs to be under a target that we know uses jump tables. If you target ARM you can also test that ExynosM1 does get its desired max jump table size.

I just realized the llvm-commits list wasn't cc'd.

evandro marked 6 inline comments as done.Sep 22 2016, 9:05 AM
evandro added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8342 ↗(On Diff #72113)

Like JumpTableSize, can't this overflow too?

8352 ↗(On Diff #72113)

The operation seemed too short for a function, but saturating its return might make sense wrapping this in a function.

8408 ↗(On Diff #72113)

This extra check is for backwards compatibility. Though the table is no larger than MaxJumpTableSize, we may end up with two tables rather than one. It might make sense to increase tables slightly to decrease the length of the if-then-else chain at the beginning, but when the size of the tables is limited, we might end up with a longer if-then-else chain, because tables smaller than MinJumpTableEntries would be more likely. As a matter of fact, except to improve code size, it'd be better to create fewer jump tables, if it adds links to the if-then--else chain, as most processors can predict the outcome of each link better than of an indirect branch. Without this condition, max-jump-table.ll below fails at the jt1 function and the resulting code is 3 jump tables with a 3 comparisons instead of 2 jump tables with 3 comparisons.

llvm/test/CodeGen/Generic/max-jump-table.ll
1 ↗(On Diff #72113)

True that.

evandro marked 3 inline comments as done.Sep 22 2016, 9:24 AM
evandro added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8352 ↗(On Diff #72113)

It turns out that CaseClusterVector is a private type of SelectionDAGBuilder. In order to minimize the changes by this patch, I'll repeat the code as needed.

evandro updated this revision to Diff 72184.Sep 22 2016, 9:40 AM
evandro marked an inline comment as done.
evandro updated this revision to Diff 72197.Sep 22 2016, 11:15 AM
evandro marked an inline comment as done.Sep 22 2016, 2:26 PM
hans added inline comments.Sep 22 2016, 3:45 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8344 ↗(On Diff #72197)

I don't think it can in practice, at least not currently. Each Cluster is created by merging consecutive case-statements with the same destination. So there would have to be an impractically large amount of case-statements to end up with a Cluster where High - Low would overflow.

For the table size however, it's enough to have one "case 0" and one "case UINT64_MAX" to generate a large potential table size. Now, we would obviously never build such a large table, but we would do the size computation, for example in isDense().

8353 ↗(On Diff #72197)

No problem, I think this turned out pretty well too.

8411–8412 ↗(On Diff #72197)

Hmm. I don't like this.

The point of this algorithm is to find jump tables given the size and density criteria. It sounds like your code wants to go the other way.

Also, your code doesn't make us choose the partitioning with the least amount of jump tables, it only removes the code that makes us choose the one with most jump tables, which means we'll end up choosing whichever partitioning that comes first.

So it seems the jt1 function is just getting lucky here.

And in general, picking the partitioning with more jump tables is important here. It's possible to construct a test case with two possible partitionings of equal size, where one results in no jump tables and a very bad lowering.

llvm/test/CodeGen/Generic/max-jump-table.ll
2 ↗(On Diff #72197)

I think the test also needs to move from CodeGen/Generic to CodeGen/AArch64.

The tests under Generic/ will be run in builds which may not have the AArch64 target.

evandro marked 4 inline comments as done.Sep 23 2016, 7:55 AM
evandro added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8344 ↗(On Diff #72197)

I see.

8411–8412 ↗(On Diff #72197)

I don't feel strongly about it. I'll revert it.

llvm/test/CodeGen/Generic/max-jump-table.ll
2 ↗(On Diff #72197)

True that.

evandro updated this revision to Diff 72283.Sep 23 2016, 7:56 AM
evandro marked 3 inline comments as done.
hans accepted this revision.Sep 23 2016, 9:00 AM
hans edited edge metadata.

lgtm

This revision is now accepted and ready to land.Sep 23 2016, 9:00 AM
This revision was automatically updated to reflect the committed changes.