This is an archive of the discontinued LLVM Phabricator instance.

IR: Eliminate non-determinism in the module summary analysis.
ClosedPublic

Authored by pcc on Dec 16 2016, 8:18 PM.

Details

Summary

Also make the summary ref and call graph vectors immutable. This means
a smaller API surface and fewer places to audit for non-determinism.

Diff Detail

Repository
rL LLVM

Event Timeline

pcc updated this revision to Diff 81833.Dec 16 2016, 8:18 PM
pcc retitled this revision from to IR: Eliminate non-determinism in the module summary analysis..
pcc updated this object.
pcc added reviewers: tejohnson, mehdi_amini.
pcc added a subscriber: llvm-commits.
mehdi_amini added inline comments.Dec 19 2016, 7:27 PM
llvm/include/llvm/IR/ModuleSummaryIndex.h
45 ↗(On Diff #81833)

The comment is outdated: IndirectCalls in per-module index are GUID-based.

175 ↗(On Diff #81833)

I didn't know about OwningArrayRef, interesting.
What is the advantage over a const std::vector ?

180 ↗(On Diff #81833)

It seems to me that this is causing memory allocation and copy that could be avoided: because Refs is an ArrayRef we're allocating and copying into RefEdgeList. However looking at call site, if Refs comes from makeRefList we just created a vector there. If it comes from findRefEdges we just created a SetVector.

In both cases it looks like we could take and store a std::vector that would be moved in.

llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
92 ↗(On Diff #81833)

Fusing these looks much more sane!

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
4798 ↗(On Diff #81833)

Ret.reserve( Record.size());

4806 ↗(On Diff #81833)

Ret.reserve( Record.size());

tejohnson edited edge metadata.Dec 19 2016, 7:30 PM

This looks like a good cleanup. One minor suggestion below. I see Mehdi just responded with a couple of suggestions, so will hold off on accepting for now.

llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
138 ↗(On Diff #81833)

CalleeId doesn't seem like the best name anymore. Just "Callee"?

pcc updated this revision to Diff 82061.Dec 19 2016, 8:10 PM
pcc marked 4 inline comments as done.
pcc edited edge metadata.
  • Address review comments
llvm/include/llvm/IR/ModuleSummaryIndex.h
175 ↗(On Diff #81833)

I guess it saves us a pointer as well as the additional space reserved by the std::vector, since the size is fixed. I haven't measured how important this is in practice, though.

180 ↗(On Diff #81833)

I guess there's one way that saves copies and another that saves memory. In the absence of evidence pointing one way or the other, I'd be inclined to go with the slightly simpler approach, i.e. the one I took.

llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
138 ↗(On Diff #81833)

I folded it into the only use.

mehdi_amini added inline comments.Dec 20 2016, 11:46 AM
llvm/include/llvm/IR/ModuleSummaryIndex.h
175 ↗(On Diff #81833)

Oh makes sense! There is no "capacity" so you save a pointer (2 instead of 3).

180 ↗(On Diff #81833)

It is not clear to me why you think that "not simply using std::vector everywhere" is simpler though?

This means hitting the heap a lot without good reason (that I perceive right now), which we traditionally avoid as much as possible I believe (reusing SmallVector across records in the BitcodeReader for example). Because of that it does not seem "one way or another to me": adding this extra malloc/free/copy sequence for saving a pointer should be motivated.

(And I just saved 8B for this structure in D27970, if memory was a concern we would have started with this)

pcc added inline comments.Dec 20 2016, 12:22 PM
llvm/include/llvm/IR/ModuleSummaryIndex.h
180 ↗(On Diff #81833)

It is not clear to me why you think that "not simply using std::vector everywhere" is simpler though?

To avoid the copy we'd need to add a way to move the std::vector out of MapVector/SetVector. Not a big deal but it would make the interface for those classes a little more complicated.

This means hitting the heap a lot without good reason (that I perceive right now), which we traditionally avoid as much as possible I believe (reusing SmallVector across records in the BitcodeReader for example). Because of that it does not seem "one way or another to me": adding this extra malloc/free/copy sequence for saving a pointer should be motivated.

I'd say that copying the vector is the "default position" as it wouldn't require more API surface in MapVector/SetVector. It's also the status quo as we were effectively copying the data structure in using add*Edges before.

And while we're copying the vector we might as well use a slightly better data structure here.

But I can see how using OwningArrayRef here could be seen as taking a position on whether the memory savings are worth it, which I don't (see also other thread), so I guess I don't have a problem with moving the vector.

pcc updated this revision to Diff 82132.Dec 20 2016, 12:42 PM
  • Micro optimize our memory allocations
mehdi_amini accepted this revision.Dec 20 2016, 12:57 PM
mehdi_amini edited edge metadata.

LGTM. Thanks.

llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
176 ↗(On Diff #82132)

takeVector()?

This revision is now accepted and ready to land.Dec 20 2016, 12:57 PM
This revision was automatically updated to reflect the committed changes.