This is an archive of the discontinued LLVM Phabricator instance.

[MemProf] Context disambiguation cloning pass [patch 3/4]
ClosedPublic

Authored by tejohnson on Jan 5 2023, 10:44 AM.

Details

Summary

Applies cloning decisions to the IR, cloning functions and updating
calls. For Regular LTO, the IR is updated directly during function
assignment, whereas for ThinLTO it is recorded in the summary index
(a subsequent patch will apply to the IR via the index during the
ThinLTO backend.

The function assignment and cloning proceeds greedily, and we create new
clones as needed when we find an incompatible assignment of function
clones to callsite clones (i.e. when different callers need to invoke
different combinations of callsite clones).

Depends on D140949.

Diff Detail

Event Timeline

tejohnson created this revision.Jan 5 2023, 10:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2023, 10:44 AM
tejohnson requested review of this revision.Jan 5 2023, 10:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2023, 10:44 AM
Enna1 added a subscriber: Enna1.Jan 8 2023, 6:59 PM
tejohnson updated this revision to Diff 487961.Jan 10 2023, 1:03 PM

Rebase on top of patch 1 and 2 changes.

tejohnson updated this revision to Diff 487980.Jan 10 2023, 1:55 PM

Misc minor fixes from building with a large application

tejohnson updated this revision to Diff 510922.Apr 4 2023, 1:59 PM

Rebase on top of committed patches 1a/1b, and rebased patch 2. Reduce new test cases.

Still need to look into assignFunctions but before I do I'm wondering if this patch can be split into two with applyImport and assignFunctions in separate patches. What do you think?

llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
135 ↗(On Diff #510922)

Add a function comment?

2336 ↗(On Diff #510922)

auto& Call to avoid a copy?

2764 ↗(On Diff #510922)

Is IsMemProfClone more accurate?

2765 ↗(On Diff #510922)

Can we move this to a common const string somewhere? It would be good to use the same constant when assigning the name for the clone in getMemProfFuncName.

2788 ↗(On Diff #510922)

Can we move this functor to a static (or private) helper method?

I think we can return the ValueToValueMap vector as a return value and the number of clones created is implicit from the size of the vector.

2808 ↗(On Diff #510922)

Can we use make_unique instead of new here? Looking at new made me wonder if there was a leak.

2855 ↗(On Diff #510922)

Can we move L2855-L2904 to a helper which returns a function summary (or nullptr)?

2937 ↗(On Diff #510922)

Can we skip the rest of the loop body here since MemProfMD will be false below? I think we just need to duplicate the metadata stripping code.

2949 ↗(On Diff #510922)

Just increment MIBIter here and below we can use MIBIter->StackIdIndices (since I don't see other uses of MIB). We already use this spelling in the assert below.

2973 ↗(On Diff #510922)

Add a comment for the param /*NumClones=*/

3016 ↗(On Diff #510922)

IIUC we index with J-1 since the loop which populates the VMap starts from I=1 on L2807. This seems a little brittle. Perhaps add a comment?

3152 ↗(On Diff #510922)

I think this will silently skip the if condition on error since the bool operation for the condition will set the error as checked [1]. Wondering if this is intended since the previous line exits on error.

[1] https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/Error.h#L234

Still need to look into assignFunctions but before I do I'm wondering if this patch can be split into two with applyImport and assignFunctions in separate patches. What do you think?

Yes, looks like it will be straightforward to remove that part (which also means removing a few tests and a few changes to other files). What I will do is update this patch with the rebased version (on top of the committed patch 2). Then I will address the 1 comment that I see that is in the ThinLTO backend specific handling, so we don't lose that. Then I will strip out the ThinLTO backend code from here, in preparation for moving it to another patch, and finally, I will address the rest of your comments. I'll let you know when it is ready to re-review.

tejohnson marked an inline comment as done.Apr 24 2023, 3:33 PM
tejohnson added inline comments.
llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
3152 ↗(On Diff #510922)

Yes this is a good point, we should also exit if it isn't parseable as a ModuleSummaryIndex. Modified this to ExitOnErr. Will update the patch with this change now, but then will be moving this code out as it is ThinLTO backend specific.

tejohnson updated this revision to Diff 516554.Apr 24 2023, 3:33 PM
tejohnson marked an inline comment as done.

Address comment in ThinLTO backend handling.

I just realized that there are in fact comments in applyImports. Will address those as well before I move that code out...

tejohnson marked 11 inline comments as done.Apr 24 2023, 6:39 PM

Went ahead and addressed the rest of the comments.

llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
2788 ↗(On Diff #510922)

I moved the large chunk of this that does the actual cloning into a static helper that returns the VMaps array. But I kept the outer functor since it does some checking to see if we even need to do any cloning (since we call it multiple times per function, called from a couple different places) - this avoids the needs for duplicating the initial checking at each site.

tejohnson updated this revision to Diff 516593.Apr 24 2023, 6:40 PM
tejohnson marked an inline comment as done.

Address remaining comments

tejohnson updated this revision to Diff 516603.Apr 24 2023, 7:55 PM

Remove code that applies the cloning to the ThinLTO backends, it will be added in a new patch.

tejohnson retitled this revision from [MemProf] Context disambiguation cloning pass [patch 3/3] to [MemProf] Context disambiguation cloning pass [patch 3/4].Apr 24 2023, 7:56 PM
tejohnson edited the summary of this revision. (Show Details)

I have updated this to remove applyImports and associated code, along with the parts of the ThinLTO tests that were testing the IR transformations. This one is ready for review again.

Looks good overall, still thinking if we can make assignFunctions less monolithic and easier to follow.

llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
1347 ↗(On Diff #516603)

Add a comment why? I believe we use CloneNo == 0 (as convention) to identify the original version thus returning the base name instead of adding the suffix.

2212 ↗(On Diff #516603)

I had trouble finding what NV is. I think it would be clearer if we used ore::NV instead of a using namespace ore; directive above. Many uses within LLVM seem to retain the ore qualifier too [1].

[1] https://github.com/search?q=repo%3Allvm%2Fllvm-project%20ore%3A%3ANV&type=code

2242 ↗(On Diff #516603)

Maybe inline the comment into the assert and add the reason it can't be an allocation? E.g. assert(CI && "Caller cannot be an allocation because ... ");

2309 ↗(On Diff #516603)

Thanks for the detailed inline comments. Can we summarize the heuristic as pseudo-code in a function comment above the implementation too?

2314 ↗(On Diff #516603)

Since we don't need the ordering properties here, can we switch to llvm::DenseMap? It would probably be faster and have lower memory overhead than std::map.

2326 ↗(On Diff #516603)

for (auto &[Func, CallsWithMetadata] : FuncToCallsWithMetadata) to make it clearer in the usage below.

2340 ↗(On Diff #516603)

nit: inline the comment into the assert with &&

2364 ↗(On Diff #516603)

[optional] This sounds like we are implementing a worklist. The loop below uses indices and conditions based on them, I wonder if it would be clearer to restructure as a traditional worklist e.g. while(!ClonesWorklist.empty()) { ...}. I think some of the conditions inside can be rewritten and if the resulting code would be easier to follow. Not a major concern but wanted to get your opinion.

2385 ↗(On Diff #516603)

Can we hoist the contents of this if condition outside the loop? Then we can start the iteration at index 1 and remove this check.

2432 ↗(On Diff #516603)

Inline the comment into the assert with &&?

2492 ↗(On Diff #516603)

I think this condition is simple enough to be merged with the condition above with if (Callee == Clone || !Callee->hasCall()) continue.

2671 ↗(On Diff #516603)

Should we flip this default? It seems like the callsites with true outnumber the ones without.

llvm/test/ThinLTO/X86/memprof-basic.ll
109 ↗(On Diff #516603)

I don't the addition of these attributes is necessary for this patch. Is this left over from the split or just added to make the test more resilient? Perhaps split them out and commit as NFC for the latter.

tejohnson marked 10 inline comments as done.May 1 2023, 1:56 PM
tejohnson added inline comments.
llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
2309 ↗(On Diff #516603)

I wasn't sure how much detail you wanted here. PTAL and let me know what I added is helpful.

2364 ↗(On Diff #516603)

Good idea, switched to a worklist approach. The uses of "I" below are switched to uses of a new variable NodeCloneCount that effectively counts from 1, so the uses are changed slightly.

2385 ↗(On Diff #516603)

No, because we don't always get here for I==0. We would only get here for the I==0 case if the FuncClonesToCallMap is empty, i.e. this callsite's function has not been cloned previously. The function might have already been cloned if we previously processed a different callsite from the same function, in which case we will handle this clone like any other clone in the later code that assigns it to an existing function version (which will be the original copy in practice, but we utilize the same code to do this as for all other clones).

llvm/test/ThinLTO/X86/memprof-basic.ll
109 ↗(On Diff #516603)

Ah, this is leftover from the split. It is needed for the IR checking. I will remove this change from here (and from other tests) and eventually they will show up in patch 4 when I rebase.

tejohnson updated this revision to Diff 518555.May 1 2023, 1:56 PM
tejohnson marked an inline comment as done.

Address comments

snehasish accepted this revision.May 2 2023, 11:48 AM

lgtm, remaining comments are (mostly) cosmetic.

llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
2283 ↗(On Diff #518555)

Could this and the one below just be a static cast since we've already checked the type with "is"?

2294 ↗(On Diff #518555)

I thought that the usage of dyn_cast required the addition of enum kind to the struct [1]?

https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/ModuleSummaryIndex.h#L380

2329 ↗(On Diff #518555)

Missing a word here?

2400 ↗(On Diff #518555)

Prefer std::deque which has the same interface with better performance due to spatial locality of elements (I believe it uses small fixed size preallocated arrays internally).

2484 ↗(On Diff #518555)

nit: auto* here and a few uses below where the type is a pointer.

2751 ↗(On Diff #518555)

Should we enforce a tail call to avoid accidental memory usage increase? Is this a concern for the size of inputs we are working with?

2755 ↗(On Diff #518555)

Add a comment for the work done in this DFS?

2309 ↗(On Diff #516603)

I wasn't sure how much detail you wanted here. PTAL and let me know what I added is helpful.

Looks great!

2385 ↗(On Diff #516603)

No, because we don't always get here for I==0. We would only get here for the I==0 case if the FuncClonesToCallMap is empty, i.e. this callsite's function has not been cloned previously. The function might have already been cloned if we previously processed a different callsite from the same function, in which case we will handle this clone like any other clone in the later code that assigns it to an existing function version (which will be the original copy in practice, but we utilize the same code to do this as for all other clones).

I assumed we could add && FuncClonesToCallMap.empty() to the (new) condition outside the loop. If I'm still missing something then I think we don't have enough test coverage since I was able to move the contents outside (in the prior version of the patch) and all tests pass.

Not a strong opinion since the worklist changes make it a easier to follow.

This revision is now accepted and ready to land.May 2 2023, 11:48 AM
tejohnson marked 3 inline comments as done.May 3 2023, 11:45 AM
tejohnson added inline comments.
llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
2283 ↗(On Diff #518555)

CallTy is an IndexCall struct which derived from PointerUnion, and PointerUnion supports a dyn_cast() method but not a static_cast one.

2294 ↗(On Diff #518555)

This is the dyn_cast() implemented by PointerUnion (see above comment about CallTy here).

2484 ↗(On Diff #518555)

These are actually shared_ptr not raw pointer. Some can be auto & however, I will add that.

2751 ↗(On Diff #518555)

I don't see any uses of the musttail attribute currently in LLVM, so I'm not sure if that is a good idea. I looked at the optimized code, and this is not being marked as a tail call (however interestingly the earlier call to updateAllocationCall is). But this isn't a recursive call, so I'm not sure it matters too much.

2385 ↗(On Diff #516603)

I assumed we could add && FuncClonesToCallMap.empty() to the (new) condition outside the loop. If I'm still missing something then I think we don't have enough test coverage since I was able to move the contents outside (in the prior version of the patch) and all tests pass.

Not a strong opinion since the worklist changes make it a easier to follow.

Oh I see what you meant. I think since the worklist approach simplified the logic, I'd prefer to keep all of the handling inside the worklist loop.

tejohnson updated this revision to Diff 519191.May 3 2023, 11:47 AM

Address comments

tejohnson reopened this revision.May 3 2023, 2:10 PM

Reverted in 6fbf022908c104a380fd1854fb96eafc64509366 due to bot failure: https://lab.llvm.org/buildbot/#/builders/245/builds/7888

Looks like the remarks output is correct but in a different order. Must have some nondeterminism somewhere, will need to investigate.

This revision is now accepted and ready to land.May 3 2023, 2:10 PM
tejohnson added inline comments.May 3 2023, 4:02 PM
llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
2755 ↗(On Diff #518555)

I believe it is this loop that is causing the unstable remarks ordering I'm seeing, which is emitted by the IR updating code invoked during the search. The key into this map is a pointer so we don't have a guaranteed stable ordering across runs. I could use a MapVector instead of a std::map, but rather than carry that extra space overhead throughout, probably it makes the most sense to build and then iterate a vector of all the allocation nodes (the map values) and sort by their OrigStackOrAllocId field, which is assigned in increasing order as they are inserted in the map and will be stable.

tejohnson updated this revision to Diff 519313.May 3 2023, 5:38 PM

Fix for nondeterminism.

snehasish accepted this revision.May 3 2023, 6:26 PM

Fix lgtm.

tejohnson reopened this revision.May 4 2023, 10:33 AM

Reopening so I can eventually upload fix for instability surfaced by expensive checks bot.

This revision is now accepted and ready to land.May 4 2023, 10:33 AM
tejohnson added inline comments.May 4 2023, 6:23 PM
llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
2755 ↗(On Diff #518555)

I had to revert due to a different bot failure, in new test funcassigncloning.ll. This was in the expensive checks bot (which among other things also randomly shuffles elements before sorting). I built with expensive checks on but cannot reproduce the failure, even running thousands of times.

However, looking at the excerpt of the failure the bot provided, I realized that the difference was a different ordering of some newly cloned nodes coming out of node cloning, which was not part of this patch. I realized it is almost certainly due to another iteration over AllocationCallToContextNodeMap, this time in identifyClones(), which would affect the ordering of the cloning. I decided it is probably safest to switch this map (as well as NonAllocationCallToContextNodeMap to avoid any problems there in the future) to MapVector.

Since this affects the prior patch, I will send a patch for just that change along with the part of funcassigncloning.ll that doesn't rely on this patch (including the checking which triggered the failure). I can then remove the change I made here to sort the allocation nodes.

tejohnson added inline comments.May 5 2023, 11:59 AM
llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
2755 ↗(On Diff #518555)

That fix was made in D149924 which went in this morning, and the expensive checks bot succeeded with it. I'm going to rebase this one on top of that, remove the change made here to use a vector, incorporate some another bot fix I made earlier on top of this patch (to require asserts in the tests that use -stats), and will submit after retesting.

tejohnson updated this revision to Diff 519948.May 5 2023, 12:40 PM

Rebase on top of the committed D149924.

This revision was landed with ongoing or failed builds.May 5 2023, 1:27 PM
This revision was automatically updated to reflect the committed changes.