This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Improove and speed up ReduceSwitchRange
AbandonedPublic

Authored by shawnl on Apr 14 2019, 3:03 PM.

Details

Summary

I am trying to move towards much more space-efficient wwitch statements, using popcnt, as described in PR39013. This is the first patch towards that goal.

ReduceSwitchRange only ran if the switch statement wasn't "dense", resulting in unnecessary sparse switch statements. This is a very cheap transform, if it can make improvements should be run, so run it all the time. (that is all IsSwitchDense did, it did not determine if a jump table or lookup table should be generated) Doing this even when not generating a Lookup/Jump Table actually improves even LowerSwitch, as LowerSwitch wants the range to be constrained. Also:

  • Do not do an expensive general GCD calculation when only countTrailingZeros() information is used from it.
  • Re-order values to ideal with a subtraction, instead of only covering some cases by treating the values as signed.
  • Implement the ctlz/cttz transform that was suggested in the comments. This transform is still not ideal, as SwitchToLookupTable doesn't know the potential range of values has been reduced, and there is no good plumbing in LLVM (that I know of) to get that information, and KnownBits is not sufficient.
  • Do not reduce the switch range in SwitchToLookupTable

Gotcha: There a few cases where the old NeedMask would have been better than the sparse map, which patch I have yet to submit. However with only a few holes (8 bytes of holes) the old NeedMask implementation would use more space, as it doesn't use popcnt. Also, removing that feature means this will not be suitable for merging until it's replacement is approved.

Diff Detail

Repository
rL LLVM

Event Timeline

shawnl created this revision.Apr 14 2019, 3:03 PM

I realized that CommonRightDigits will always be 0 because of the Base subtraction. I will submit a patch without that confusing complexity.

shawnl updated this revision to Diff 195087.Apr 14 2019, 3:07 PM
lebedev.ri resigned from this revision.Apr 14 2019, 11:29 PM

I'd recommend git diff --name-only --cached | xargs -n 1 git blame --porcelain | grep "^author " | sort | uniq -c | sort -nr | head -10 to find reviewers.

nikic added inline comments.Apr 15 2019, 6:36 AM
lib/Transforms/Utils/SimplifyCFG.cpp
5578

Just looking at this, shouldn't there be some changes to existing tests?

lebedev.ri added inline comments.
lib/Transforms/Utils/SimplifyCFG.cpp
5498

So what about -Os/-Oz?

5630–5634

Did you see this comment?

shawnl abandoned this revision.Apr 16 2019, 2:28 PM
shawnl marked 2 inline comments as done.

I think this patch will make more sense together with the other changes I want to make. I will withdraw it and submit the changes together.

lib/Transforms/Utils/SimplifyCFG.cpp
5498

This is not making the table, that calculation is elsewhere. This optimization is cheap and should always be applied.

5630–5634

This change does not change whether a lookup table gets created (although I am working on such patches). It only simplifies the switch.

shawnl updated this revision to Diff 195594.Apr 17 2019, 10:06 AM

Add a second patch that changes SwitchToLookupTable

There were a number of cases where a table was smaller and we still didn't use it, and
also i8 tables are small enough that we should use them. Also, remove xor and and when producing a covered table.

shawnl marked an inline comment as done.Apr 17 2019, 10:07 AM

This is a combination of two patches. To see them separately: https://github.com/shawnl/llvm/commits/test

This patch now depends on D60823

lib/Transforms/Utils/SimplifyCFG.cpp
5578

There were no tests for the old behavior, surprisingly.

shawnl edited the summary of this revision. (Show Details)Apr 17 2019, 10:11 AM
shawnl added reviewers: hans, spatel, jmolloy.
shawnl edited the summary of this revision. (Show Details)
shawnl planned changes to this revision.Apr 17 2019, 10:17 AM

get the clz transform working

shawnl updated this revision to Diff 196002.Apr 21 2019, 6:40 AM
shawnl updated this revision to Diff 196007.Apr 21 2019, 9:16 AM

include context

Can you please write a human-readable description for the differential, in layman terms, that makes sense?
Rough bullet point list could be:
"Motivation. Issues with current implementations. The possible solutions. The implemented solution. Drawbacks, gotchas of the change.", etc.

This change is pretty big. It would be really best to do this in smaller steps.
Test coverage in particular is bothering me - this only has basic positive tests.
No negative tests. No tests for boundary conditions

lib/Transforms/Utils/SimplifyCFG.cpp
5130

What is this 24, 3*8? Make it variable with a name.

5170

This sounds like a hardcoded value for some specific target.

5295–5298

Why is this being changed? Can this be a separate patch?

5364–5365

Constants are always on RHS anyway

test/Transforms/SimplifyCFG/switch-simplify-range.ll
3

Please don't use -O? (or for that matter, don't passes other than the pass from directory name)
in tests outside of PassOrdering directory.

shawnl updated this revision to Diff 196089.Apr 22 2019, 10:15 AM

add comment that range analysis would be useful for the clz/ctz transform

shawnl updated this revision to Diff 196090.Apr 22 2019, 10:33 AM
shawnl marked 2 inline comments as done.
shawnl edited the summary of this revision. (Show Details)

just the first patch

shawnl updated this revision to Diff 196092.Apr 22 2019, 10:37 AM
shawnl added inline comments.
lib/Transforms/Utils/SimplifyCFG.cpp
5295–5298

ReduceSwitchRange now re-orders values so that they start at zero, so is superior to this simple "sort by signed" trick.

shawnl updated this revision to Diff 196093.Apr 22 2019, 10:53 AM

Add a comment clarifying that what looks like a edge condition is not.

shawnl edited the summary of this revision. (Show Details)Apr 22 2019, 10:57 AM
shawnl edited the summary of this revision. (Show Details)Apr 22 2019, 11:02 AM
shawnl updated this revision to Diff 196099.Apr 22 2019, 11:08 AM

add another comment about avoiding an edge condition.

shawnl updated this revision to Diff 196100.Apr 22 2019, 11:09 AM
shawnl updated this revision to Diff 196102.
shawnl updated this revision to Diff 196111.Apr 22 2019, 12:04 PM

no need to change this

shawnl updated this revision to Diff 196119.Apr 22 2019, 12:23 PM
shawnl edited the summary of this revision. (Show Details)
shawnl planned changes to this revision.Apr 22 2019, 1:51 PM
shawnl updated this revision to Diff 196133.Apr 22 2019, 2:16 PM
shawnl edited the summary of this revision. (Show Details)

Remove some useless code now that we actually use ReduceSwitchRange(). However I did not bother to update NeedMask, because the whole plan is to implement something much more general than it.

lebedev.ri set the repository for this revision to rL LLVM.
lebedev.ri removed a subscriber: lebedev.ri.
shawnl updated this revision to Diff 196630.Apr 25 2019, 7:18 AM

remove use of -O2 in tests

jmolloy requested changes to this revision.Apr 25 2019, 8:38 AM

Hi Shawn,

Thanks for being patient waiting for me to review this. You mention that there are no tests for this behaviour; that's very strange as I remember committing this change with tests . In fact, it was test/Transforms/SimplifyCFG/rangereduce.ll (https://github.com/llvm/llvm-project/commit/b2e436de423aac1de4d764166317dd7ab9cbf751). I'm surprised this test didn't need updating.

Please reupload with full context.

My general comment on this patch is that it's too busy with too few comments to give a good review on. For changes like this I'd really appreciate splitting it into functionally distinct parts. For example here we have:

  1. SwitchLookupTable no longer takes an offset and assumes things start from zero and can use an unsigned comparison. There is clearly some change in ReduceSwitchRange that makes this true but I can't unpick the relevant hunk from the rest of this patch. Note, I'm not yet convinced this is correct but the review is too cluttered for me to really understand the logic change.
  2. The removal of GCD and use of clz (on APInt) for GCD calculation.
  3. The use of clz/cttz
  4. Change shift/shift/or to fshr, which I'm certain is going to screw up some backend.

These should really be in separate patches.

Cheers,

James

lib/Transforms/Utils/SimplifyCFG.cpp
5295

I'm having trouble groking at a glance why this is correct. Could you comment? Previously we had a slt/sgt pair, now just a ugt?

Note that if you'd have uploaded with full context I could go look closer to understand more ;)

5389

This can't be committed as-is. If you can't modify the existing code to work in the interim between your two patches, the other patch should be merged into this one.

Note this will make it more of a pain to review, so I'd suggest trying hard to modify the existing code :)

5466

I'm having trouble working out how this heuristic relates to the existing code. Have you changed the heuristic as well as the generated code?

Note, a NFC change is much easier to review and get landed.

5482

You're describing a tricky gotcha here. Please be more specific and go into more detail as to why it's there in the first place.

5486

This whole calculation should be outlined into a helper function. Then for example you wouldn't need to nest the whole thing inside if (BitWidth > 8).

5495

use std::min/std::max here?

5500

Please change the phrasing. Instead of describing what we might do *without* the check, please describe what the goal of the check is (negative vs. positive sense makes it easier to grok what the condition is and whether it matches the descriptive comment).

5501

might be worse

... but why?

5502

Why > 64?

5504

This whole comment has me worried. Target-specific guessing games don't belong in SimplifyCFG :)

A heuristic that isn't obviously target-agnostic should get a TTI hook. They're cheap, go for it! :)

5518

You're not only mutating but rearranging Values while iterating? this is an antipattern. Please remove.

5584

Why not use Builder.CreateCall here?

5594

How well supported is codegen for fshr? shift/shift/or for rotate will go through pretty much any backend; I've been out of the loop for a little while - how new is fshr?

This revision now requires changes to proceed.Apr 25 2019, 8:38 AM
shawnl abandoned this revision.Apr 25 2019, 4:26 PM

Split into multiple patches, starting at D61150