Page MenuHomePhabricator

[InlineCost] Enable the new switch cost heuristic
ClosedPublic

Authored by junbuml on Apr 28 2017, 12:07 PM.

Details

Summary

This is to enable the new switch inline cost heuristic (r301649) by removing the
old heuristic as well as the flag itself.
In my experiment for LLVM test suite and spec2000/2006, +17.82% performance and
8% code size reduce was observed in spec2000/vertex with O3 LTO in AArch64.
No significant code size / performance regression was found in O3/O2/Os. No
significant complain was reported from the llvm-dev thread.

Diff Detail

Repository
rL LLVM

Event Timeline

junbuml created this revision.Apr 28 2017, 12:07 PM
haicheng added inline comments.Apr 28 2017, 12:10 PM
lib/Analysis/InlineCost.cpp
1067–1084 ↗(On Diff #97137)

Do we still need this part after enabling your flag?

junbuml added inline comments.Apr 28 2017, 12:14 PM
lib/Analysis/InlineCost.cpp
1067–1084 ↗(On Diff #97137)

I don't think this is needed if we enable it. However, I keep this in this patch to let other people compare with/without this heuristic.

hans edited edge metadata.Apr 28 2017, 12:55 PM

Having a flag is good for people that want to experiment with turning it on. It's only been a day or so with the heuristic available, maybe we should allow a bit more for those who want to experiment before turning it on by default?

Since this patch enables the new heuristic by default, I don't think having the flag and old code-path makes sense anymore.

I will keep the flag for a while in this patch to allow people to turn it on/off, but the flag and old code will be removed when committing it if it's generally acceptable.

Is there anything going on here? Should we do this?

I'm planning on enabling this heuristic by default if no significant regression was reported. Let me ping one more time to llvm.dev. Can any reviewer here please take a chance to test this? Or let other people take a chance to test it?

  • Through llvm.dev, Dibyendu reported that there were no clear positive or negative impact on his environment (x86).
  • I also didn't see any obvious regression in my experiment in aarch64 for llvm-test suite, spec2000 and spec2006 with O2, Os, and O3, but significant performance and size improvement in spec2000/vortex with lto; please see the detail in https://reviews.llvm.org/D31085.
  • We also expect to hear from Evgeny Astigeevich; I guess several different configurations in ARM.

Please let me know if there is any further test/experiment we need to do to enable this heuristic.

FWIW I think the flag "inline-generic-switch-cost" is both misleading (it takes true and not a number) and not helpful (the name of the flag doesn't tell me anything). Changing that would be nice. I'll let others determine whether or not we should turn it on now or not, the patch itself should be fine though.

FWIW I think the flag "inline-generic-switch-cost" is both misleading (it takes true and not a number) and not helpful (the name of the flag doesn't tell me anything). Changing that would be nice. I'll let others determine whether or not we should turn it on now or not, the patch itself should be fine though.

Initially, the flag (-inline-generic-switch-cost) was intended to allow others to turn it on/off temporarily. I will remove both the flag and the old heuristic code when we enable this heuristic by default.

Kindly ping. Please let us know if any of you have any experiment result.

Hans/Chandler,
Is there any experiment result you expect to see to enable this new heuristic ?

hans added a comment.Jun 2 2017, 9:21 AM

Kindly ping. Please let us know if any of you have any experiment result.

Hans/Chandler,
Is there any experiment result you expect to see to enable this new heuristic ?

On May 9, you wrote:

I'm planning on enabling this heuristic by default if no significant regression was reported. Let me ping one more time to llvm.dev. Can any reviewer here please take a chance to test this? Or let other people take a chance to test it?

As there were no complaints on the llvm-dev thread, from my point of view, I think you should go ahead and enable this. If it causes regressions for anyone, they will let you know and we can potentially revert it like any other change.

junbuml updated this revision to Diff 101233.Jun 2 2017, 10:04 AM

Thanks Hans for the comments. Now, I removed the old heuristic and the flag in this change.

junbuml edited the summary of this revision. (Show Details)Jun 2 2017, 10:05 AM
junbuml edited the summary of this revision. (Show Details)
echristo accepted this revision.Jun 2 2017, 10:13 AM
This revision is now accepted and ready to land.Jun 2 2017, 10:13 AM
mcrosier edited edge metadata.Jun 2 2017, 11:20 AM

! In D32653#771489, @hans wrote:
As there were no complaints on the llvm-dev thread, from my point of view, I think you should go ahead and enable this. If it causes regressions for anyone, they will let you know and we can potentially revert it like any other change.

+1

This revision was automatically updated to reflect the committed changes.

Hello. Sorry to bring up ancient history.

llvm/trunk/lib/Analysis/InlineCost.cpp
1072–1074

Should these be uint64's? If Cost is negative (perhaps because of LastCallToStaticBonus), it can take the INT_MAX option.

junbuml added inline comments.Jun 15 2017, 8:44 AM
llvm/trunk/lib/Analysis/InlineCost.cpp
1072–1074

Yes, looks like you are correct. If Cost is some big negative and Cost+SwitchCost is still negative, Cost could be INT_MAX, which is wrong. We should use int64_t instead of uint64_t :

int64_t ExpectedNumberOfCompare = 3 * (int64_t)NumCaseCluster / 2 - 1;
int64_t SwitchCost =
    ExpectedNumberOfCompare * 2 * InlineConstants::InstrCost;
Cost = std::min((int64_t)INT_MAX, SwitchCost + Cost);

Do you by chance have any case which incorrectly take INT_MAX ?

The case I was looking at was a bit of an LTO mess I'm afraid. And from what I can tell, it tends not to inline any less, it just does it in a different order. Some code like the following will print out a bad cost, and choose not to inline test1 into test2, but then does inline it into test3 as it inlines test2.

static int test1(int i)
{
    return i;
}

static int test2(int t)
{
    int s = test1(t);
    switch(s) {
        case 1: return 0;
        case 0: return 0;
        case 42: return 2;
        case 43: return 3;
        default: return 1;
    }
}

int test3(int t)
{
    return test2(t);
}

Debug:

Inliner visiting SCC: test2: 1 call sites.
...
    NOT Inlining:   %call = call fastcc i32 @test1(i32 %t) Cost = -15035, outer Cost = -2147483644
...
Inliner visiting SCC: test3: 1 call sites.
...
    Inlining: cost=-2147483644, thres=250, Call:   %call = call fastcc i32 @test2(i32 %t)
    -> Deleting dead function: test2
...
    Inlining: cost=-15035, thres=375, Call:   %call.i = call fastcc i32 @test1(i32 %t) #2
    -> Deleting dead function: test1
...