This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Range reduce switches
AbandonedPublic

Authored by jmolloy on Jun 13 2016, 7:53 AM.

Details

Summary

If a switch is sparse and all the cases (once sorted) are in arithmetic progression, we can extract the common factor out of the switch and create a dense switch. For example:

switch (i) {
case 5: ...
case 9: ...
case 13: ...
case 17: ...
}

can become:

if ( (i - 5) % 4 ) goto default;
switch ((i - 5) / 4) {
case 0: ...
case 1: ...
case 2: ...
case 3: ...
}

The division and remainder operations could be costly so we only do this if the factor is a power of two. Dense switches can be lowered significantly better than sparse switches and can even be transformed into lookup tables.

Diff Detail

Repository
rL LLVM

Event Timeline

jmolloy updated this revision to Diff 60524.Jun 13 2016, 7:53 AM
jmolloy retitled this revision from to [SimplifyCFG] Range reduce switches.
jmolloy updated this object.
jmolloy added reviewers: mcrosier, sbaranga, sanjoy.
jmolloy set the repository for this revision to rL LLVM.
jmolloy added a subscriber: llvm-commits.
sbaranga added inline comments.Jun 13 2016, 8:51 AM
lib/Transforms/Utils/SimplifyCFG.cpp
4993

Could you also add a test for this case?

5009

If it doesn't have holes then the GCD that you are looking for should be Values[1].

jmolloy updated this revision to Diff 60542.Jun 13 2016, 9:16 AM

Hi Silviu,

Thanks for the review!

James

lib/Transforms/Utils/SimplifyCFG.cpp
4993

This should be @test3

5009

Uhhhhh, yes. Yes it should. Thanks for spotting this!

I've rewritten the code entirely.

sanjoy requested changes to this revision.Jun 13 2016, 9:49 AM
sanjoy edited edge metadata.
sanjoy added inline comments.
lib/Transforms/Utils/SimplifyCFG.cpp
4989

I'd prefer a getZExtValue or getSExtValue here. That way the code will assert if someone later accidentally removed the check on the bit width.

4992

How about {-2, 0, 2, 4}? That's sparse and can benefit from this pass, but here you'll conclude that it has no holes.

5006

Don't you need to check that V is divisible by Divisor here?

5007

Might be worth calling out the signedness of the division here explicitly. I'd mildly prefer bitwise ops instead of division.

5017

I'd be tempted to emit the bitwise ops here directly. That way you don't do extra work later and are less sensitive to pass ordering.

This revision now requires changes to proceed.Jun 13 2016, 9:49 AM
sanjoy added inline comments.Jun 13 2016, 9:51 AM
lib/Transforms/Utils/SimplifyCFG.cpp
4992

Bad example, I meant cases like {-3, -1, 1, 3}.

Maybe it makes sense to perform this sort of transformation in SelectionDAGBuilder::visitSwitch instead? Most of the relevant logic is there.

This patch only handles situations where all the values form a linear series, but it seems important to also catch cases where *most* of the values form a linear series (for example, 97, 105, 109, 113, 117, 121).

I think your profitability model is a bit off: we never generate a jump table for less than four cases, so trying to improve the density of a nonexistent jump table seems like a bad idea. And for cases where the divisor is small, the tradeoff between extra jump table entries and an extra compare+branch isn't obvious.

jmolloy updated this revision to Diff 60817.Jun 15 2016, 5:08 AM
jmolloy edited edge metadata.

Hi Sanjoy and Eli,

Thanks for your reviews. I agree with all of your comments. This new version has a real density function and will perform the optimization if the switch is not dense to begin with and would be made dense. The density function is adapted from SelectionDAGBuilder, and I'm not too happy on replicating the heuristic here but I couldn't think of a better way.

Similarly the hardcoded bailout for < 4 cases - ideally I'd use TargetLowering here which has a hook, but we only have TTI. It might be worth adding a hook, but perhaps it's not so urgent so I've added a FIXME.

Eli, the reason I'm doing this in SimplifyCFG is because it's an enabler for switch->lookup table lowering (which is also in SimplifyCFG).

The testing has also been improved to cover more negative and unsigned large values and wraparound cases.

Cheers,

James

eli.friedman added inline comments.Jun 15 2016, 1:59 PM
lib/Transforms/Utils/SimplifyCFG.cpp
5022

Nit: please make the implicit casting here explicit.

5031

If the GCD is some odd number N, you can multiply by the inverse of N and switch on that. Not sure if that's actually useful in practice.

5056

If we were doing this in SelectionDAG, we would be able to combine this branch with the jump table's bounds check: "ROTATE(X - Base, Shift) > Limit".

Haven't done a thorough review yet, but some minor nitpicky comments inline.

lib/Transforms/Utils/SimplifyCFG.cpp
5029

This gcd computation looks like extra work -- why not:

uint64_t PotentialGCD = Values[1];
if (!isPowerOf2(PotentialGCD)) return false;
if (!all_of(Values, [](uint64_t V) { return V & (PotentialGCD - 1) == 0; }) return false;
5072

This linear search (via replaceUsesOfWith) seems wasteful. Why not change getCaseValue to return a Use & and do C.getCaseValue().set(New) or something similar?

jmolloy added inline comments.Jun 15 2016, 2:31 PM
lib/Transforms/Utils/SimplifyCFG.cpp
5056

Hmm, you're close to convincing me. I'm loath to give up the lookup table optimization though.

How gross (/ acceptable) would it be to implement this in SelectionDAG and then also teach lookup table lowering in SimplifyCFG this trick too?

eli.friedman added inline comments.Jun 15 2016, 4:39 PM
lib/Transforms/Utils/SimplifyCFG.cpp
5029

That isn't equivalent for something like "0, 8, 12, 16, 20".

5056

It wouldn't be too terrible as long as the actual algorithm for finding a reducible switch is factored out into a utility, I think.

sanjoy added inline comments.Jun 15 2016, 4:48 PM
lib/Transforms/Utils/SimplifyCFG.cpp
5029

Yeah, you're right. I was still thinking in terms of the older "100% dense" scheme.

jmolloy updated this revision to Diff 60973.Jun 16 2016, 7:34 AM
jmolloy edited edge metadata.

Hi Eli,

That rotate trick is, quite simply, brilliant.

New version uses SelectionDAG - I'll factor some of the machinery out at a later date when I implement this for lookup table formation too.

All comments should now be addressed.

Cheers,

James

eli.friedman added inline comments.Jun 19 2016, 2:58 PM
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8332 ↗(On Diff #60973)

Do we need to merge clusters in some cases, so we don't end up with adjacent clusters with the same destination? (This doesn't affect building a jump table, but it affects isDense.)

Hmm... actually, more generally, whether we transform a switch condition is to some extent independent of whether we form a jump table; for example, if we end up with one cluster here, you don't need a jump table at all. Maybe we can leave that for a followup, though.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
267 ↗(On Diff #60973)

It's unintuitive to hide this in the constructor... it seems like a good idea to push the Shift == 0 special-case out.

hans added a subscriber: hans.Jun 20 2016, 9:56 AM
hans added inline comments.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
1971 ↗(On Diff #60973)

I was going to comment that the code in the patch summary could use rotation instead, but it seems you're already on it :-)

8434 ↗(On Diff #60973)

This code doesn't take into account that a range of cases could be transformed to become dense, so I'm not sure your patch will actually find any more jump table lowerings that don't cover the whole switch?

hans added inline comments.Jun 20 2016, 10:30 AM
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8434 ↗(On Diff #60973)

I guess what I was saying with this comment is that since you're not going to get the benefit of transforming parts of the switch, it there a point in doing this in the DAG instead of in the IR? I think the rotation trick should work in the IR too..

eli.friedman added inline comments.Jun 20 2016, 10:50 AM
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8434 ↗(On Diff #60973)

Err, I guess the rotation trick works in IR? You can let switch lowering do the compare+branch. Not sure what I was thinking before.

That said, we don't really want to do switch lowering in IR. I mean, in theory, we can lower switches in IR, but we want to do all of switch lowering at the same time. Otherwise, you end up with weird guessing games to predict what SelectionDAG will actually do.

Hi guys,

Thanks for the comments. I was away on vacation for 2 weeks which is why this has been sitting around.

Hans, if you're OK with it being done in DAG for the reasons mentioned, is this good to commit?

Cheers,

James

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8434 ↗(On Diff #60973)

I agree with Eli here. It'd be nice to densify individual switch case ranges, and perhaps we could do that in the future?

The main advantage I got from moving from IR to SDAG for this is the heuristics don't need second-guessing, which was a major improvement. Doing this in IR would give one large advantage - it'd allow the switch table lookup lowering to work too. I intend to implement this directly in the switch table lookup lowering code though, which should be fairly simple.

Hi Hans, Sanjoy,

Ping! :) Are you now happy with this?

Cheers,

James

hans added a comment.Jul 27 2016, 11:42 AM

The code seems fine to me. I'm OK with this landing.

I still wonder if we shouldn't just do this in SimplifyCFG though. How big is the pay-off here? I've seen switches with e.g. a common factor of 10 in the cases, but are powers of 2 common enough to justify the complexity, especially if we start looking at densifying only parts of a switch?

I just imagine that doing this in SimplifyCFG instead would be so much simpler: we'd look for a common power-of-two factor and slap a rotate-right operation in front of the switch. This could potentially allow more lookup tables, jump tables, bit-test lowerings, new adjacent cases, who knows -- one could argue that densifying the switch is just general goodness, and if it doesn't pay off it's just an extra rotate which is cheap.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8389 ↗(On Diff #60973)

This comment refers to the algorithm that finds dense partitions in Clusters, which is O(n^2). Now that there's more stuff happening below, maybe update it to "don't try any harder at -O0", or you could jsut move your code before it, since I don't think it would be a problem running it a -O0.

jmolloy updated this revision to Diff 65900.Jul 28 2016, 3:27 AM

Hi Hans,

I do agree with you. I think that the major reason I moved from SimplifyCFG to SDAGBuilder was that, because we were inserting new CFG edges, we didn't want to do this unless we could be *sure* that a dense switch lowering was used.

However now we've used the rotate trick, the inserted code is so tiny that we don't have to be so conservative. So I think it's OK to replicate the density heuristic in SimplifyCFG and that's what I've done.

I hope this is now more acceptable to you? :)

Cheers,

James

hans added a comment.Jul 28 2016, 9:37 AM

Thanks, I like this :-) Just some nits.

lib/Transforms/Utils/SimplifyCFG.cpp
4974

Aren't Values sorted (from line 5081), so this should be unnecessary?

4997

Maybe we should check DL.fitsInLegalInteger() instead of hard-coding 64 here.

Oh I see, 64 is because we use [u]int64_t below. It might still be a good idea to check fitsInLegalInteger() so we're not creating a non-legal rotate operation.

5018

why isn't Base int64_t already? Also, could the subtraction overflow?

5036

Any bijective transformation of the case values will work. The problem is just deciding when they're profitable, I suppose. But I'm all for more creative switch lowerings :-)

jmolloy updated this revision to Diff 66226.Jul 30 2016, 9:37 AM

Thanks Hans!

New version - hopefully this one should be ready to go?

Cheers,

James

lib/Transforms/Utils/SimplifyCFG.cpp
5018

I don't think overflow is possible. The subtraction moves values towards zero; the only way I can consider it overflowing is if the difference between the smallest value and the largest value is INT64_MAX or greater which is impossible.

5036

:)

hans accepted this revision.Jul 30 2016, 2:18 PM
hans added a reviewer: hans.

lgtm

jmolloy abandoned this revision.Aug 4 2016, 4:29 AM

This was committed with Hans' LGTM.