Page MenuHomePhabricator

[IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
ClosedPublic

Authored by fhahn on Apr 5 2018, 10:57 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

fhahn created this revision.Apr 5 2018, 10:57 AM

This is interesting.

One of the weaknesses of the SCCP algorithm is that it's not path-sensitive; a value is only "constant" if it has the same value on all possible paths. The use of ssa_copy provides a way around this restriction, to some extent: the result of an ssa_copy can be a constant even if the operand is overdefined. That said, this is adding substantial complexity to IPSCCP, and I'm not sure how much benefit you'll get in practice; the example @test2 isn't really compelling.

Building the PredicateInfo up-front is probably fine; I would guess it's not that expensive compared to the other work IPSCCP is doing (but you should measure to quantify that).

We should be able to extend it to also propagate additional facts about floats

We need to be very cautious about propagating facts about floats... things which are "obviously" true might not actually be true due to fast-math.

dmgreen added a subscriber: dmgreen.Apr 6 2018, 4:20 AM
fhahn added a comment.Apr 9 2018, 1:47 PM

Thanks for having a look!

This is interesting.

One of the weaknesses of the SCCP algorithm is that it's not path-sensitive; a value is only "constant" if it has the same value on all possible paths. The use of ssa_copy provides a way around this restriction, to some extent: the result of an ssa_copy can be a constant even if the operand is overdefined. That said, this is adding substantial complexity to IPSCCP, and I'm not sure how much benefit you'll get in practice; the example @test2 isn't really compelling.

The test cases at the moment are just very simple to briefly illustrate what is going on. One case in particular I think this could be helpful with is propagating null/nonnull to callsites. I will make the patch slightly more powerful and try to gather some numbers in the next few days.

So, predicateinfo is one of the fastest passes (or was) that we had. I'm not horribly worried about speed (i could make it faster, too).
It was faster than anything else we do that isn't lazy (IE it's faster than computing dominators, etc).
It wasn't even really measurable except on hundreds of thousands or millions of blocks.

I did have plans to try to turn it back into a virtual form at some point (which would make it faster), i think i found an not horrible way of doing it.

The one overall comment i'd have here is that SCCP is fairly limited in how it can use this, because of how limited the lattice is.
IPVRP would be a better bet, IMHO.

That said,

fhahn added a comment.Apr 9 2018, 3:00 PM

So, predicateinfo is one of the fastest passes (or was) that we had. I'm not horribly worried about speed (i could make it faster, too).
It was faster than anything else we do that isn't lazy (IE it's faster than computing dominators, etc).
It wasn't even really measurable except on hundreds of thousands or millions of blocks.

I did have plans to try to turn it back into a virtual form at some point (which would make it faster), i think i found an not horrible way of doing it.

The one overall comment i'd have here is that SCCP is fairly limited in how it can use this, because of how limited the lattice is.
IPVRP would be a better bet, IMHO.

Yep. One of the slightly longer-term things I want to do is either make IPSCCP use a lattice including (integer) ranges or have a separate pass doing value range propagation. Do you think having it as a separate pass would be beneficial?

That said,

Looks like the comment got cut off here?

whoops, hit send early.
That said, if you find this actually generates gains (papers on it basically say it should be a few percent improvement in found constants), i don't have any intrinsic reason not to do it just make sure we get the cost vs gain tradeoff right.

fhahn updated this revision to Diff 142193.Apr 12 2018, 10:36 AM

Fix some problems, SingleSource, MultiSource, SPEC2006 now pass with this patch and LTO. Still needs some cleanup and a few more tests.

This is interesting.

One of the weaknesses of the SCCP algorithm is that it's not path-sensitive; a value is only "constant" if it has the same value on all possible paths. The use of ssa_copy provides a way around this restriction, to some extent: the result of an ssa_copy can be a constant even if the operand is overdefined. That said, this is adding substantial complexity to IPSCCP, and I'm not sure how much benefit you'll get in practice; the example @test2 isn't really compelling.

I collected some stats for SPEC2006, MultiSource and SingleSource with -O3 -flto.

For sccp.IPNumInstRemoved I get 88490 with this patch instead of 81585 without it (~ +8% eliminated instructions).
For sccp.IPNumArgsElimed, I get 5084 with this patch instead of 5081 without it (hardly any change).

I think number of instructions eliminated looks quite promising. What do you think?

I think number of instructions eliminated looks quite promising

That's much better than I expected. Although, I'm not sure how much I would trust that number; IPSCCP runs before CorrelatedValuePropagation, so you might be optimizing sequences which would be optimized later anyway.

fhahn added a comment.Apr 12 2018, 2:59 PM

I think number of instructions eliminated looks quite promising

That's much better than I expected. Although, I'm not sure how much I would trust that number; IPSCCP runs before CorrelatedValuePropagation, so you might be optimizing sequences which would be optimized later anyway.

I also aggregated all stats by CorrelatedValuePropagation for the same set of benchmarks: With this patch, CVP triggered in 23754 cases, without this patch it triggered in 24904 cases. I guess this suggests some of the cases eliminated with this patch would also be handled by CVP. But the majority would not be.

One additional thing we could do by using PredicateInfo is propagate non-null constraints to call sites, which might help the inliner.

fhahn planned changes to this revision.Apr 25 2018, 1:40 AM

I still need to look into a compilation issue with CINT2006/403.gcc.

fhahn updated this revision to Diff 145176.May 4 2018, 5:30 AM
fhahn retitled this revision from [WIP][IPSCCP] Use PredicateInfo to propagate facts from cmp instructions. to [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions..
fhahn edited the summary of this revision. (Show Details)

Rebased and fixed a few problems. This now passes the test-suite, SPEC2000 & SPEC2006 with LTO. I've now also done a benchmarking run with the suites mentioned earlier on a cortex-a57 with LTO. Overall the geomean runtime improvement is 0.65 percent. The biggest improvements were

         SingleSource/Benchmarks/BenchmarkGame/recursive    -13.16%
	SingleSource/Benchmarks/Stanford/RealMM	         -9.89%
	MultiSource/Benchmarks/McCat/08-main/main      -7.90%
	MultiSource/Benchmarks/MiBench/telecomm-CRC32/telecomm-CRC32 -6.64%
	MultiSource/Benchmarks/Prolangs-C++/ocean/ocean        -6.26%
	MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk       -6.13%
	MultiSource/Benchmarks/Olden/mst/mst                            -6.08%
	MultiSource/Benchmarks/mediabench/adpcm/rawcaudio/rawcaudio      -5.21%
	SingleSource/Benchmarks/Shootout-C++/Shootout-C++-objinst             -5.19%
	SingleSource/Benchmarks/Polybench/stencils/jacobi-1d-imper/jacobi-1d-imper     -4.48%
	MultiSource/Benchmarks/Prolangs-C++/simul/simul               -4.26%
	SingleSource/Benchmarks/Shootout/Shootout-objinst            -3.72%

There are a few regression ranging from 0 to up 4% worse execution time

The sum of all SCCP stats for all benchmarks is 401732 over 374020 without this patch (+7%).

fhahn added a comment.May 14 2018, 9:23 AM

Ping. Do you think that the results for this patch are convincing enough on it's own?

dberlin accepted this revision.May 14 2018, 1:08 PM

I think the numbers are good enough to be worth it perf wise.
Did you post compile time numbers?

If they look good, i'd say let's commit it.
(if not, please take a look at them before committing)

This revision is now accepted and ready to land.May 14 2018, 1:08 PM
fhahn added a comment.May 15 2018, 9:47 AM

I think the numbers are good enough to be worth it perf wise.
Did you post compile time numbers?

Thanks for having a look!

I did not post them, but for the tests I run (test-suite, spec2000, spec2006) on AArch64, there were no compile time regressions. One change is that we now compute the dominator trees before IPSCCP. I suppose we should try and preserve them in IPSCCP. I'll try to put a patch together before I submit this change.

davide accepted this revision.May 21 2018, 2:06 PM

Sorry Florian, I completely missed this one. LGTM as well.

fhahn updated this revision to Diff 148047.May 22 2018, 10:30 AM

Thanks for having a look. Updated diff to include changes to pipeline tests. I plan to commit this tomorrow, unless there are any more concerns.

fhahn added inline comments.May 23 2018, 7:30 AM
test/Other/opt-O2-pipeline.ll
280 ↗(On Diff #148047)

Arg, the order, in which the additional function pass managers for the analysis required by IPSCCP and GlobalOpt are added, is not deterministic.

A quick look at the legacy pass manager suggests that it uses a bunch of maps and sets to register discovered new passes. So those legacy pass manager tests sometimes fail on some systems.... I am not sure how to best sort out that out. Any ideas?

efriedma added inline comments.May 23 2018, 10:59 AM
test/Other/opt-O2-pipeline.ll
280 ↗(On Diff #148047)

This is specifically the AvailableAnalysis map? You can switch to http://llvm.org/docs/ProgrammersManual.html#llvm-adt-mapvector-h, I guess...

fhahn added inline comments.
test/Other/opt-O2-pipeline.ll
280 ↗(On Diff #148047)

Thank you very much, it's enough to use MapVector for OnTheFlyManagers : D47317

This revision was automatically updated to reflect the committed changes.
fhahn added a comment.May 29 2018, 6:02 AM

I had to revert this patch in rL333323 because it caused a mismatch in the stage3 and stage4 binaries built by the clang-with-thin-lto-ubuntu bot. I will investigate.

So it looks like the codegen differences come from PredicateInfo, rather than from the propagation. With only the parts of this patch to build and remove PredicateInfo, I get a small difference in the strtab section between a stage3 and stage4 build of clang with thin LTO, there's no difference in the .text section.

I also had a look at the IR after IPSCCP for both stages, the only differences are some variable names/labels. I am not entirely sure yet, but I think that might be caused by small differences in the order the ssa_copy intrinsics for the predicates are emitted. Any ideas or pointers would be greatly appreciated :)

My first guess would be some sort of non-determinism.

If you're trying to determine whether to IR modules in memory are identical, make sure to include the use-list order in the dump.

fhahn added a comment.Jun 19 2018, 7:11 AM

Thanks for you suggestions. With D48230, there is no difference in the stage3 and 4 clang binaries with thinlto for me locally.

fhahn added a comment.Jun 21 2018, 8:24 AM

This is now committed as rL335206 and the LTO/Thin LTO bots are happy now. I will follow this up with a change to propagate non-null info.

fhahn added a subscriber: thegameg.Jun 25 2018, 4:04 AM

@thegameg reported another problem with PredicateInfo from a internal test related to mangling. I've put up D48541 to address it and will recommit this again once this is resolved.

Ping, still some issues? Since it was reverted again recently :(

I am especially interested in the follow up - nonnull info propagation.

llvm/trunk/test/Other/opt-Os-pipeline.ll
31

Set name?

fhahn added a comment.Aug 6 2018, 8:08 AM

Ping, still some issues? Since it was reverted again recently :(

Yep, there was one bot left that was failing because of this. I still have to reproduce the problem. Hope to have some time for that soon.

Ping, still some issues? Since it was reverted again recently :(

Yep, there was one bot left that was failing because of this. I still have to reproduce the problem. Hope to have some time for that soon.

Thanks for info, Florian.

fhahn added a comment.Aug 29 2018, 8:53 AM

I finally had time to track down the last buildbot failure and it is committed now.