This is an archive of the discontinued LLVM Phabricator instance.

[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE
ClosedPublic

Authored by bmakam on Nov 3 2014, 3:25 PM.

Details

Summary

All,

This patch addresses the missing PRE opportunities initially reported by James Molloy in 450.soplex

This patch re-factors James' patch to "Make GVN more iterative" based on the comments/suggestions from Daniel that iterating all of GVN over again is pretty big hammer.
Instead of iterating GVN all over, this patch does a ScalarPRE of any scalar instructions that a load is dependent on, before performing LoadPRE on that load.

When tested on a Cortex-A57, James' initial patch to make GVN more iterative improved 450.soplex by 3%. This patch improved 450.soplex by 7% without iterating GVN all over again.
In order to achieve this I had to enable the reverse post order traversal for iterateOnFunction because we would have to value number dependent scalar instructions before performing ScalarPRE on them. Although, traversing in reverse post order is costly in terms of compile time but this may be cheaper than iterating GVN all over again and also results in better performance. What do you guys think?

Diff Detail

Event Timeline

bmakam updated this revision to Diff 15736.Nov 3 2014, 3:25 PM
bmakam retitled this revision from to [GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE.
bmakam updated this object.
bmakam edited the test plan for this revision. (Show Details)
bmakam added a subscriber: Unknown Object (MLST).
hfinkel edited edge metadata.Nov 3 2014, 7:30 PM

Thanks for continuing to work on this!

Generally, please post patches with full context. For instructions on how to do this, see: http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface

I think that a 7% speedup sounds nice, does anything else improve? But please do provide some compile-time slowdown numbers, so that we can get a better handle on the cost/benefit analysis.

lib/Transforms/Scalar/GVN.cpp
2655

If the patch turns this on, please just remove the #if.

bmakam updated this revision to Diff 15763.EditedNov 4 2014, 6:06 AM
bmakam edited edge metadata.

I turned on the slow path and commented out the fast path. If we can decide we no longer need to keep the fast path around I will clean it up.

I am running a perf run to gather compile times and will update the comment once I get back the results.

[Update1]
While I am still waiting for perf data on other benchmarks in Spec2k/2k6, here is the data I got so far:

a) Compilation times:

  1. compiling clang.bc (Thanks to Jiangning for running the tests)

With the patch,

real 19m56.978s
user 141m16.602s
sys 2m59.942s

Without the patch, (original)

real 19m58.099s
user 141m21.219s
sys 2m58.493

which is 2s(0.85%) slower, so the slowdown is in noise range.

  1. On Spec, only 433.milc slowed down the most by 5%. Other slowdowns were in

179.art - 2%
164.gzip - 2%
445.gobmk - 2%

b) Runtime Performance:
Other benchmarks whose performance improved:
447.dealII - 6%
403.gcc - 14%

I will update data on other benchmarks later.

[Update2]
401.bzip2 - 4%
464.h264ref - 4%
186.crafty - 7%

and only regression above noise range was in 181.mcf with at -3%

bmakam added inline comments.Nov 4 2014, 6:09 AM
lib/Transforms/Scalar/GVN.cpp
2655

I will turn on the slow path and turn off the fast path in my next patch that I will upload with full context. I am not sure if we still want to keep the fast path commented out in the code or clean it up.

hfinkel added inline comments.Nov 4 2014, 6:39 AM
lib/Transforms/Scalar/GVN.cpp
2655

If you want to still keep it, add a command-line flag to enable it. We don't generally keep commented-out code, as a policy.

bmakam added a comment.Nov 6 2014, 2:02 PM

All,

I updated my comment with slowdowns in compilation times and other benchmarks that improve with this patch. Slowdowns in compile times are in noise range I do not see any performance regressions greater than 3% in Spec. Based on this data, I would like to get rid of the fast path and will upload a patch removing out the commented-out code if you agree. Thanks for reviewing.

dberlin edited edge metadata.Nov 6 2014, 2:23 PM

Sounds good to me.
I'll review in detail later today

bmakam updated this revision to Diff 15984.Nov 10 2014, 8:13 AM
bmakam edited edge metadata.

Cleaned up dead code.

LGTM modulo one comment

lib/Transforms/Scalar/GVN.cpp
2447–2452

If you are going to add LoadInst, you might as well add all the memory ops (anything where getOpCode() > MemoryOpsBegin && getOpCode < MemoryOpsEnd. If you do this, i'd add isMemoryOp to instruction.h alongside isBinaryOp, etc)

bmakam added inline comments.Nov 13 2014, 1:05 PM
lib/Transforms/Scalar/GVN.cpp
2447–2452

Thanks for catching this Daniel. The LoadInst was unintentional, it is already covered by CurInst->mayReadFromMemory(). I will prepare a patch removing the LoadInst. If LGTM, please feel free to +2 it since I do not have commit rights.

bmakam updated this revision to Diff 16174.Nov 13 2014, 1:10 PM

Addressed Daniel's comment and also rebased.

mcrosier accepted this revision.Nov 13 2014, 1:16 PM
mcrosier edited edge metadata.

Approved based on Daniel's review.

This revision is now accepted and ready to land.Nov 13 2014, 1:16 PM
mcrosier closed this revision.Nov 13 2014, 1:18 PM

Committed r221924.