This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Make cloneUsedGlobalVariables deterministic
ClosedPublic

Authored by MaskRay on Feb 20 2021, 3:28 PM.

Details

Summary

Iterating on SmallPtrSet<GlobalValue *, 8> with more than 8 elements
is not deterministic. Use a SmallVector instead because Used is guaranteed to contain unique elements.

While here, decrease inline element counts from 8 to 4. The number of
llvm.used/llvm.compiler.used elements is usually 0 or 1. For full
LTO/hybrid LTO, the number may be large, so we need to be careful.

According to tejohnson's analysis https://reviews.llvm.org/D97128#2582399 , 4 is
good for a large project with WholeProgramDevirt, when available_externally
vtables are placed in the llvm.compiler.used set.

Diff Detail

Event Timeline

MaskRay created this revision.Feb 20 2021, 3:28 PM
MaskRay requested review of this revision.Feb 20 2021, 3:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 20 2021, 3:28 PM
tejohnson added inline comments.Feb 20 2021, 4:04 PM
llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
206

Since Used is still a SmallPtrSet, won't we still have non-determinism due to this iteration?

MaskRay updated this revision to Diff 325260.Feb 20 2021, 4:48 PM

Add a SmallVectorImpl overload of collectUsedGlobalVariables
Use that

tejohnson added inline comments.Feb 20 2021, 9:37 PM
llvm/lib/IR/Module.cpp
662

Instead of having a second very similar interface, which could get confusing, how about just change it to use a set with a deterministic iterator, like std::set? Or if it is just that one callsite that needs the deterministic ordering, perhaps just copy the SmallPtrSet into a std::set there. Otherwise, the interfaces need to be carefully documented to avoid confusion. I'd lean towards just coping into a std::set in this instance for simplicity. WDYT?

MaskRay added inline comments.Feb 21 2021, 12:14 AM
llvm/lib/IR/Module.cpp
662

We probably should delete the SmallPtrSet overload. I've checked several other call sites - The call sites don't seem to benefit from a SmallPtrSet as well. Some call sites actually have similar non-deterministic issues. (I have confirmed that if I add a std::reverse I can get different bitcode files.)

Using std::set will require a comparator on name. Let's use SmallVectorImpl here and migrate SmallPtrSet separately?

MaskRay added inline comments.Feb 21 2021, 12:18 AM
llvm/lib/IR/Module.cpp
662

Hmm. Some call sites benefit from a SmallPtrSet (count), while some are better off with SmallVectorImpl. It looks to me that both overloads are useful.

tejohnson added inline comments.Feb 22 2021, 9:50 AM
llvm/lib/IR/Module.cpp
662

That's true about needing to ensure std::set sorted by name not pointer value. I saw you sent a follow on patch to fix LowerTypeTests. I think there are other places that would have the same issue (e.g. in the BitcodeWriter, in EmbedBitcodeInModule - and that was just the first one I looked at).

My concern is that with 2 similar interfaces it will be confusing and error prone which one gets used.

How about using a SetVector? From https://llvm.org/docs/ProgrammersManual.html#llvm-adt-setvector-h:
"The difference between SetVector and other sets is that the order of iteration is guaranteed to match the order of insertion into the SetVector. This property is really important for things like sets of pointers."

MaskRay added inline comments.Feb 22 2021, 10:07 AM
llvm/lib/IR/Module.cpp
662

A SetVector is by default a composition of a SmallVector and a SmallDenseSet. Its space consumption is the two data structures combined.

Another idea is collectUsedGlobalVariables(const Module &M, function_ref<void(GlobalValue*)>, bool CompilerUsed): let the call site pass in a callback. In every call site we'll do [&](GlobalValue *X) { Set.insert(X); } or push_back. That would make every call site use a bit more code.

How about adding collectUsedGlobalVariablesImpl for function_ref and still keep the SmallVectorImpl+SmallPtrSet interfaces. The internal implementation is not duplicated.

In terms of interface confusion... I think this scheme is no more confusing than the template solution (moving the Module.cpp implementation to Module.h and making the function a template).

tejohnson added inline comments.Feb 22 2021, 10:19 AM
llvm/lib/IR/Module.cpp
662

My sense from experience is that these llvm*.used typically don't hold that many values and aren't terribly common, so I'm not sure that the overhead of the SetVector should be that much of a concern. This seems to be the case that data structure was designed for, and would keep the interface simple. I also see that in most cases the results of collectUsedGlobalVariables are usually consumed and then discarded pretty quickly.

Before adding new interfaces with more complexity, I'd like to understand whether the SetVector overhead is really an issue in practice. Are you seeing long lived and large sets of references from the llvm*.used globals?

MaskRay added inline comments.Feb 22 2021, 10:37 AM
llvm/lib/IR/Module.cpp
662

I think the size overhead of SetVector is minor, but the additional ability it provides can confuse users at all call sites.

The user needs to instantiate a SetVector in a call site, where they just use the Set functionality or the Vector functionality. The powerful SetVector makes readers wonder whether the Set/the Vector features are both used.

tejohnson added inline comments.Feb 22 2021, 10:49 AM
llvm/lib/IR/Module.cpp
662

This seems like a lower level of confusion than figuring out which interface is used (or even realizing that a different interface may be appropriate for their use than one they might have copied from other code). We can just document the interface at the header, as to why a SetVector is used (makes both set based interaction as well as those clients relying on deterministic iteration work equally well).

MaskRay added inline comments.Feb 22 2021, 11:41 AM
llvm/lib/IR/Module.cpp
662

If the overhead is low, how about removing the SmallPtrSet overload in a follow-up? (That requires to refactor most call sites to use SmallVectorImpl instead and the one or two SmallPtrSet use cases to construct a SmallPtrSet from SmallVector after the collect* call.

If we do that, I think this is resolved:)

tejohnson added inline comments.Feb 22 2021, 12:48 PM
llvm/lib/IR/Module.cpp
662

Still have a preference for using SetVector for its simplicity in both use cases. If you are strongly opposed, then migrating to SmallVector is ok with me, if the set use case is rare.

MaskRay added inline comments.Feb 22 2021, 10:35 PM
llvm/lib/IR/Module.cpp
662

I have been thinking about this.

To have inline elements, we have to use SmallSetVector. SmallSetVector doubles the stack consumption. I guess I'll follow
https://eel.is/c++draft/sequence.reqmts "When choosing a container, remember vector is best; leave a comment to explain if you choose from the rest!" and drop the SmallPtrSetImpl overload in D97257.

tejohnson added inline comments.Feb 23 2021, 9:59 AM
llvm/lib/IR/Module.cpp
662

Looking at the use sites in D97257, it looks like 50% want to use the result as a set and 50% as a vector. To me this, combined with the expected small size of elements, argues strongly in favor of SetVector for simplicity - it seems to be the case SetVector was designed for.

To get a sense of how large these sets get in practice, I built a large internal project with >30k input files, and printed out the size of the llvm.used and llvm.compiler.used sets in each when we go to build the summary index on a ThinLTO build. All llvm.compiler.used were of size 0 and almost all of the llvm.used were also 0, with 9 having a set of size 1. Here's the histogram:

    9 Used size: 1
31198 Used size: 0
31207 CompilerUsed size: 0

I then rebuilt with WPD which as of my recent patch puts available_externally vtables in the llvm.compiler.used set. Here's a histogram of the resulting sizes:

    1 CompilerUsed size: 40
    1 CompilerUsed size: 15
    4 CompilerUsed size: 9
   11 CompilerUsed size: 8
   26 CompilerUsed size: 7
   57 CompilerUsed size: 6
  224 CompilerUsed size: 5
  403 CompilerUsed size: 4
  471 CompilerUsed size: 3
 1306 CompilerUsed size: 2
    9 Used size: 1
 2275 CompilerUsed size: 1
29768 Used size: 0
24998 CompilerUsed size: 0

Many more are non-zero, but all but 2 are <10, the vast majority are <5 and still mostly 0.

My concern is that making the interface a vector rather than SetVector adds a more complicated usage model to solve an overhead that is not a problem in practice.

tejohnson added inline comments.Feb 23 2021, 11:32 AM
llvm/lib/IR/Module.cpp
662

Actually, second guessing my analysis. In the case of regular or hybrid LTO, these will all get merged and we'll have one big llvm.compiler.used, so with WPD the size will not be very small. It may still not be a huge issue, but I'm less convinced now. So we can go ahead with your approach. Will take a look at the patches shortly in more detail.

MaskRay updated this revision to Diff 325876.Feb 23 2021, 12:33 PM
MaskRay edited the summary of this revision. (Show Details)

Explain that we decrease the inline element count to 4.
(This can make clang smaller due to fewer instantiations.)

tejohnson accepted this revision.Feb 23 2021, 2:34 PM

lgtm, thanks for fixing this!

llvm/include/llvm/IR/Module.h
896

Can you add a comment for the new interface? The one above is stale as it talks about using a Set. It would also be good to add the rationale for using a vector, so no one decides to do a "cleanup" and switch back to a set later on.

This revision is now accepted and ready to land.Feb 23 2021, 2:34 PM
steven_wu added inline comments.Feb 23 2021, 3:16 PM
llvm/lib/IR/Module.cpp
662

Those can be reasonably sized when you use ObjC since the runtime data are all in used, but the size is definitely growing less than n where n is the size of the program.

Maybe a sorted vector is the option here but I am fine with current implementation.

MaskRay updated this revision to Diff 325922.Feb 23 2021, 3:52 PM
MaskRay marked an inline comment as done.

Change variable name
Add a comment to the new overload

This revision was landed with ongoing or failed builds.Feb 23 2021, 4:09 PM
This revision was automatically updated to reflect the committed changes.