This is an archive of the discontinued LLVM Phabricator instance.

Global DCE performance improvement
ClosedPublic

Authored by serge-sans-paille on Jan 11 2017, 2:14 AM.

Details

Summary

Change the original algorithm so that it scales better when meeting very large bitcode where every instruction does not implies a global.

Diff Detail

Repository
rL LLVM

Event Timeline

serge-sans-paille retitled this revision from to Global DCE performance improvement.
serge-sans-paille updated this object.
serge-sans-paille added a reviewer: mehdi_amini.
serge-sans-paille set the repository for this revision to rL LLVM.
serge-sans-paille added a subscriber: llvm-commits.
mehdi_amini edited edge metadata.Jan 11 2017, 8:26 AM

Can you add a description to the revision summarizing what you're doing before I try to infer it from the diff?
Thanks!

serge-sans-paille edited edge metadata.
davide requested changes to this revision.Jan 11 2017, 11:03 AM
davide added a reviewer: davide.
davide added a subscriber: davide.

Do you have numbers showing up the improvement?
What's the effect of memoization on memory usage?

This revision now requires changes to proceed.Jan 11 2017, 11:03 AM
mehdi_amini added inline comments.Jan 11 2017, 11:53 AM
include/llvm/Transforms/IPO/GlobalDCE.h
35 ↗(On Diff #83934)

Why did you rename this? Some part of this diff could disappear without this change.

36 ↗(On Diff #83934)

DenseMap is intended to contains small value, SmallPtrSet seems a bit fat here?

Also, after reading the algorithm, this could be a vector, you should not need dedup by construction.

38 ↗(On Diff #83934)

We don't suffix member names.
Also it'd be nice to document these.

lib/Transforms/IPO/GlobalDCE.cpp
101 ↗(On Diff #83934)

The recursion is bound by the fact that we only recurse on constant and stop at the first GV reached. I'd rather have this as a comment in the code.

103 ↗(On Diff #83934)

So the memorization is only used for constants (GV initializers, GEP, what else?), I'd rename it to make it explicit that this cache only applies to constants.

Also can such constant have multiple uses? I'm failing to remember in which case right now. If they only have a single use we could just traverse and remove the memorization entirely.

113 ↗(On Diff #83934)

Why not passing GV to ComputeDependencies instead of Deps and populate directly GVDependencies_ ?

114 ↗(On Diff #83934)

No braces

118 ↗(On Diff #83934)

GlobalValue & GV does not seems formatted, I'd suggest running git clang-format on this patch.

120 ↗(On Diff #83934)

Early return.

121 ↗(On Diff #83934)

What does this do? Isn't it redundant with the one two lines above?

126 ↗(On Diff #83934)

What's the bound on the recursion? How do we guarantee that we won't run out of stack? Worth a comment.

187 ↗(On Diff #83934)

pop_back_val()

190 ↗(On Diff #83934)

No brace

serge-sans-paille marked 9 inline comments as done.Jan 12 2017, 3:07 AM

Do you have numbers showing up the improvement?

The complexity of the original algorithm is roughly in O(|live function instruction|), and the complexity of this algorithm is roughly in O(|globals users|).

So if there's a big global function, previous algorithm is more costly (and that's the original motivating example).
If there's a big dead function, this algorithm is more costly.

On average test cases, it does not make much difference, It's only noticeable on very large functions (> 1000000 instructions) which happens to be my use case.

What's the effect of memoization on memory usage?

memoization of constants was already implemented in the previous algorithm, so not much difference here.
This algorithm contains an additional map that holds the global dependencies, so it should have a bigger memory consumption.

include/llvm/Transforms/IPO/GlobalDCE.h
36 ↗(On Diff #83934)

According to my tests, DenseMap performs better on large test case than std::unorderd_map (by a factor of two)

Correct for Set->Vector

lib/Transforms/IPO/GlobalDCE.cpp
103 ↗(On Diff #83934)

See r179458

113 ↗(On Diff #83934)

I think it's clearer like this, as it separates the dependencies build from the GVDependencies update.

126 ↗(On Diff #83934)

There's a finite number of globals and they can only be visited once because of the set check.

serge-sans-paille edited edge metadata.
serge-sans-paille marked 3 inline comments as done.
mehdi_amini added inline comments.Jan 12 2017, 8:42 AM
include/llvm/Transforms/IPO/GlobalDCE.h
36 ↗(On Diff #83934)

I'm not sure what you mean by "large test case"? Do you mean lots of entries? My point was about sizeof(Map[key]) and memory consumption.

lib/Transforms/IPO/GlobalDCE.cpp
103 ↗(On Diff #83934)

Thanks for the pointer!

126 ↗(On Diff #83934)

I'm not worried about *infinite* recursion, but *too large* recursion for the limit stack size in some environment. If the recursion is in O(global) that's a potential stack overflow.

Can you please add a comment in the code explaining how the new algorithm works? I'd like to see figures on a case where this actually matters (e.g. take the bitcode, run through opt -globaldce -time-passes with the old/new algorithm and see what's the performance difference)

lib/Transforms/IPO/GlobalDCE.cpp
173 ↗(On Diff #84096)

Capital letter, ending with a dot (here and everywhere else).

185 ↗(On Diff #84096)

This whole patch needs to be clang-formatted, this should look like
while (blah).

234 ↗(On Diff #84096)

stray newline.

include/llvm/Transforms/IPO/GlobalDCE.h
36 ↗(On Diff #83934)

Yeah I meant lots of entries. I've done the tests with an std::vector instead and it does not impact the execution time, and it's indeeded better on the memory usage, switching to this.

lib/Transforms/IPO/GlobalDCE.cpp
126 ↗(On Diff #83934)

Previous algorithm worst case recursion depth was in `O(|global|)` and that happened, for instance, when a global function references a static function that references a static function that ... etc.

I'm not 100% sure but this function has a worst case recursion depth of two, has it only propagates the liveness to the whole comdat section. If you agree with my analyse, I'll update the comment.

serge-sans-paille edited edge metadata.

I've updated the diff, formated with git clang-format as suggested. However there's no -U option to that command so the diff does not appear with the whole contxt, and sipmly running clang-format on the files produces too much noise.

I've updated the diff, formated with git clang-format as suggested. However there's no -U option to that command so the diff does not appear with the whole contxt, and sipmly running clang-format on the files produces too much noise.

FYI, My workflow is:

  • git clang-format HEAD~ (will format only the last commit).
  • git diff (just to check the formatting changes)
  • git commit -a --amend
  • generate the patch for phab, I'm using arc diff HEAD~ which handles everything, otherwise manually with git diff -U9999 HEAD~ you get the full context patched to be updated.

I wonder if your diff is correct, it seems that you updated the diff compare to the previous diff state (like if you didn't squash commits).

include/llvm/Transforms/IPO/GlobalDCE.h
42 ↗(On Diff #84331)

Put the comments on the line before the variables, and use /// (doxygen-like prefix).

lib/Transforms/IPO/GlobalDCE.cpp
126 ↗(On Diff #83934)

Yes I guess a GV can belong only to a single comdat so it should be fine!

Patch correctly formatted, thanks @mehdi_amini for sharing your workflow.
Also updated the comments.

davide added inline comments.Jan 17 2017, 12:39 AM
include/llvm/Transforms/IPO/GlobalDCE.h
22 ↗(On Diff #84641)

Are you still using SmallSet or is this a dead include?

lib/Transforms/IPO/GlobalDCE.cpp
132 ↗(On Diff #84641)

stray newline?

187 ↗(On Diff #84641)

Please use an explicit type here

188 ↗(On Diff #84641)

and here, and I guess everywhere else the type is not really obvious (for consistency with what's already in the pass).

189 ↗(On Diff #84641)

Why did you remove this comment?

serge-sans-paille marked an inline comment as done.
serge-sans-paille added inline comments.
include/llvm/Transforms/IPO/GlobalDCE.h
22 ↗(On Diff #84641)

It's still used for `ConstantDependeciesCache`.

serge-sans-paille marked an inline comment as done.Jan 17 2017, 1:58 AM

This looks much better now. Can you please add a comment in the code explaining the algorithm (if I missed it, please point out) while I take a last look? Thanks.

majnemer added inline comments.
include/llvm/Transforms/IPO/GlobalDCE.h
41 ↗(On Diff #84642)

DenseMap?

mehdi_amini added inline comments.Jan 17 2017, 10:55 AM
include/llvm/Transforms/IPO/GlobalDCE.h
41 ↗(On Diff #84642)

DenseMap with a SmallPtrSet?

44 ↗(On Diff #84642)

Doxygen like-prefix.

lib/Transforms/IPO/GlobalDCE.cpp
81 ↗(On Diff #84642)

End sentence with a dot.

82 ↗(On Diff #84642)

Doxygen-like prefix.

This looks much better now. Can you please add a comment in the code explaining the algorithm (if I missed it, please point out) while I take a last look? Thanks.

To be fair: the algorithm isn't changed, it is just the "uses" graph edges that are reversed as a pre-step.
I.e. the question is "how to you get all the globals referenced by another global"?

Before this patch, it was doing this by walking the body (or the initializer) and collecting the references. What this patch is doing, it precomputing the answer to this query for the whole module by walking the use-list of every global instead.
I believe it is the only change.

This looks much better now. Can you please add a comment in the code explaining the algorithm (if I missed it, please point out) while I take a last look? Thanks.

To be fair: the algorithm isn't changed, it is just the "uses" graph edges that are reversed as a pre-step.
I.e. the question is "how to you get all the globals referenced by another global"?

Before this patch, it was doing this by walking the body (or the initializer) and collecting the references. What this patch is doing, it precomputing the answer to this query for the whole module by walking the use-list of every global instead.
I believe it is the only change.

That should go in a comment, this is my point.

serge-sans-paille marked 2 inline comments as done.
serge-sans-paille added inline comments.
include/llvm/Transforms/IPO/GlobalDCE.h
41 ↗(On Diff #84642)

I've kept the original Moving to a std::unordered_multimap for GVDEpendencies, not sure about what you expect for the ConstantDependenciesCache. What's the pro/con of using a DenseMap?

mehdi_amini added inline comments.Jan 18 2017, 2:40 AM
include/llvm/Transforms/IPO/GlobalDCE.h
41 ↗(On Diff #84642)

DenseMap is an open-addressing hashmap where the bucket are allocated with the key, hence "dense". This is very performance but for the memory consumption: empty buckets are taking the same amount of space as occupied ones.
So we usually avoid DenseMap containing "large" elements.

include/llvm/Transforms/IPO/GlobalDCE.h
41 ↗(On Diff #84642)

ok, thanks for the extra info. Everything looks much nicer now.

Hi all,
the review seems ok now, do you have any other suggestion/advices/flames?

As a side note : I don't have the commit right in llvm trunk, so if one of you could take care of that step?

mehdi_amini accepted this revision.Jan 25 2017, 11:13 AM

Davide, any other comment?

davide accepted this revision.Jan 25 2017, 11:37 AM

LGTM. Serge, one of us can commit that for you once llvm.org decides to go back online.

This revision is now accepted and ready to land.Jan 25 2017, 11:37 AM

I can't apply on trunk, mind rebasing?

This revision was automatically updated to reflect the committed changes.