This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Thin link efficiency: skip candidate added later with higher threshold (NFC)
ClosedPublic

Authored by tejohnson on Dec 12 2016, 8:18 PM.

Details

Summary

Thin link efficiency improvement. After adding an importing candidate to
the worklist we might have later added it again with a higher threshold.
Skip it when popped from the worklist if we recorded a higher threshold
than the current worklist entry, it will get processed again at the
higher threshold when that entry is popped.

This required adding the summary's GUID to the worklist, so that it can
be used to query the recorded highest threshold for it when we pop from the
worklist.

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson updated this revision to Diff 81175.Dec 12 2016, 8:18 PM
tejohnson retitled this revision from to [ThinLTO] Thin link efficiency: skip candidate added later with higher threshold (NFC).
tejohnson updated this object.
tejohnson added a reviewer: mehdi_amini.
tejohnson added a subscriber: llvm-commits.
mehdi_amini edited edge metadata.Dec 12 2016, 8:20 PM

I'm surprised you're working on optimizing this, is this showing up on any profile?
Did you measure the actual benefit here?

I'm surprised you're working on optimizing this, is this showing up on any profile?
Did you measure the actual benefit here?

Yes, see also D27687. I was running with a large app and also with the thresholds cranked up, and it was slowing down too much. I've shaved off at least 35% in that particular case from these improvements.

mehdi_amini added inline comments.Dec 12 2016, 8:50 PM
lib/Transforms/IPO/FunctionImport.cpp
330 ↗(On Diff #81175)

I'm fairly lost in the logic here between GetAdjustedThreshold and GetBonusMultiplier, still trying to make sense of what we're currently doing here.

mehdi_amini added inline comments.Dec 12 2016, 9:49 PM
lib/Transforms/IPO/FunctionImport.cpp
330 ↗(On Diff #81175)

OK I figured what's going on, it is quite obscure to infer from the code though, and I believe the comment above is misleading.

330 ↗(On Diff #81175)

(I take back what I wrote for the comment, was looking at the wrong place... So confusing!)

2

lib/Transforms/IPO/FunctionImport.cpp
330 ↗(On Diff #81175)

This code motion to use AdjThreshold instead of Threshold below seems like a standalone fix?

343 ↗(On Diff #81175)

All this logic to skip inserting in the worklist seems redundant with the new logic you added below while iterating over the work list (the latter seems to be a superset).

tejohnson added inline comments.Dec 14 2016, 6:54 AM
lib/Transforms/IPO/FunctionImport.cpp
330 ↗(On Diff #81175)

I'm fairly lost in the logic here between GetAdjustedThreshold and GetBonusMultiplier, still trying to make sense
of what we're currently doing here.

The difference is that GetBonusMultiplier is only applied to the current callsite, when the call is hot. It isn't recorded or passed down to the next level of calls (otherwise it would be compounded). GetAdjustedThreshold applies the decay factor for the next level of calls.

This code motion to use AdjThreshold instead of Threshold below seems like a standalone fix?

This isn't a fix per se. By itself this part of the change should be a no op.

E.g. before we would record the (non-decayed) Threshold in the ImportLists, which would then be pulled out into ProcessedThreshold and compared again to the (non-decayed) Threshold here the next time we saw this function.
With this change, we instead record the decayed AdjThreshold in the ImportLists, and compare against the new decayed AdjThreshold here the next time we see this function.

The reason for this change is so that we can compare the decayed threshold recorded in the Worklist to the threshold recorded in the ImportLists further down when we iterate through the Worklist. I.e. we need to compare apples to apples there.

343 ↗(On Diff #81175)

All this logic to skip inserting in the worklist seems redundant with the new logic you added
below while iterating over the work list (the latter seems to be a superset).

It's true that with the change I am adding below where we iterate over the work list this is no longer strictly necessary. However, there's no good reason to insert into the work list again with a lower threshold if we know we have already added this function with a higher threshold, it will just make the work list longer for no benefit. And will require a redundant add of this GUID to the ExportList below - all easily avoidable since we have to access the current threshold anyway to update it above.

The handling I added below to skip work list items when we pull them off the work list is to handle the case where we later added the function at a higher threshold (and it was already in the work list at the earlier lower threshold).

mehdi_amini added inline comments.Dec 14 2016, 8:45 AM
lib/Transforms/IPO/FunctionImport.cpp
330 ↗(On Diff #81175)

The difference is that GetBonusMultiplier is only applied to the current callsite, when the call is hot. It isn't recorded or passed down to the next level of calls (otherwise it would be compounded). GetAdjustedThreshold applies the decay factor for the next level of calls.

Yeah I got it in the end, still not straightfoward from the code. I though about improving but can't figure :)
A high level description may be helpful, I'll think about it.

This isn't a fix per se. By itself this part of the change should be a no op.

E.g. before we would record the (non-decayed) Threshold in the ImportLists, which would then be pulled out into > ProcessedThreshold and compared again to the (non-decayed) Threshold here the next time we saw this function.
With this change, we instead record the decayed AdjThreshold in the ImportLists, and compare against the new decayed AdjThreshold here the next time we see this function.

I was considering the case where a function can be reached from a hot path or from a call path, and it didn't seem like a NFC change. For instance,

  1. first visit with a cold edge and a threshold of 100 -> set ProcessedThreshold to 100 and push the decayed threshold, let say 70, on the stack.
  2. visit with a hot edge and a threshold of 99 -> compare to the previous ProcessedThreshold and decide to not push on the stack.

With your change, I believe we would:

  1. first visit with the cold edge with a threshold of 100 -> set ProcessedThreshold to the decayed threshold of 70 and push it on the stack.
  2. visit with the hot edge and a threshold of 99 -> compare to the previous ProcessedThreshold and decide to set ProcessedThreshold to the decayed threshold of 99 and push it on the stack.
tejohnson added inline comments.Dec 14 2016, 8:59 AM
lib/Transforms/IPO/FunctionImport.cpp
330 ↗(On Diff #81175)

Gah - you're right! This change indeed fixes a bug in the logic, where we could have been missing imports along some hot paths. I will come up with a test case that is affected by this change.

Do you want me to split this into a different patch on Phab, or is it enough to commit separately with a test case?

Regarding the confusing handling of different bonuses, let me add a comment where we compute these two different adjustments.

mehdi_amini accepted this revision.Dec 14 2016, 9:13 AM
mehdi_amini edited edge metadata.

LGTM.

(I'm fine with you committing separately without a new revision on Phab)

This revision is now accepted and ready to land.Dec 14 2016, 9:13 AM
tejohnson updated this revision to Diff 81623.Dec 15 2016, 11:12 AM
tejohnson edited edge metadata.

Remove non-NFC change split out and committed in r289843.

tejohnson updated this object.Dec 15 2016, 11:14 AM
This revision was automatically updated to reflect the committed changes.