This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Thin link efficiency improvement: don't re-export globals (NFC)
ClosedPublic

Authored by tejohnson on Dec 12 2016, 2:16 PM.

Details

Summary

We were reinvoking exportGlobalInModule numerous times redundantly.
No need to re-export globals referenced by a global that was already
exported in the module. This resulted in a large speedup in the thin
link for a big application, particularly when importing aggressiveness
was cranked up.

Event Timeline

tejohnson updated this revision to Diff 81140.Dec 12 2016, 2:16 PM
tejohnson retitled this revision from to [ThinLTO] Thin link efficiency improvement: don't re-export globals (NFC).
tejohnson updated this object.
tejohnson added a reviewer: mehdi_amini.
tejohnson added a subscriber: llvm-commits.
mehdi_amini accepted this revision.Dec 12 2016, 8:27 PM
mehdi_amini edited edge metadata.

LGTM.
Do you have numbers on the overall improvement on the ThinLink?

This revision is now accepted and ready to land.Dec 12 2016, 8:27 PM

LGTM.
Do you have numbers on the overall improvement on the ThinLink?

I collected some this morning (see below).

However, I realized last night when I was building past the thin link stage that this patch is not correct. The reason is that the ExportList includes globals that were added because they were imported somewhere (in which case we have already processed their calls/references here and don't need to do it again), as well as those added to the ExportList by exportGlobalInModule (in which case we have not already processed their calls/refs). What we need to do is instead check if this was previously imported from this module, and only then we can skip the marking of its calls/references. I have a new patch I'm about to upload that does just that, and also adds a test that failed with this earlier patch (but not with the new patch).

The numbers were collected on my Xeon, which I tried to keep quiet during the measurements. I measured each 5 times. The baseline was somewhat noisy, but the improved version less so for some reason. With very aggressive importing enabled, this patch improves the thin link time between 3 and 14% (average 8%). With default importing thresholds the average improvement is only 3%.

I looked at this initially because a perf-events profile of the thin link showed that GlobalValueSummary::modulePath() was the hottest routine (it is invoked frequently by exportGloballnModule), and exportGlobalInModule was 7th. With this patch, modulePath() drops to the 2nd hottest and exportGlobalInModule drops to 9th. These were measured with the aggressive importing thresholds.

tejohnson updated this revision to Diff 81272.Dec 13 2016, 11:44 AM
tejohnson edited edge metadata.

Fix patch and add test that would have caught problem with prior version

Mehdi - you lgtm'ed earlier but did you want to take another look since I changed the patch since then?

However, I realized last night when I was building past the thin link stage that this patch is not correct. The reason is that the ExportList includes globals that were added because they were imported somewhere (in which case we have already processed their calls/references here and don't need to do it again), as well as those added to the ExportList by exportGlobalInModule (in which case we have not already processed their calls/refs).

Good catch!

I looked at this initially because a perf-events profile of the thin link showed that GlobalValueSummary::modulePath() was the hottest routine (it is invoked frequently by exportGloballnModule), and exportGlobalInModule was 7th.

This sounds suspicious to me: GlobalValueSummary::modulePath() is a simple accessor. For it to be so hot it means it is calls many times *and we're not doing anything with the returned value*.

However, I realized last night when I was building past the thin link stage that this patch is not correct. The reason is that the ExportList includes globals that were added because they were imported somewhere (in which case we have already processed their calls/references here and don't need to do it again), as well as those added to the ExportList by exportGlobalInModule (in which case we have not already processed their calls/refs).

Good catch!

I looked at this initially because a perf-events profile of the thin link showed that GlobalValueSummary::modulePath() was the hottest routine (it is invoked frequently by exportGloballnModule), and exportGlobalInModule was 7th.

This sounds suspicious to me: GlobalValueSummary::modulePath() is a simple accessor. For it to be so hot it means it is calls many times *and we're not doing anything with the returned value*.

Figured out why - see D27755 (which is complementary to this patch)

This sounds suspicious to me: GlobalValueSummary::modulePath() is a simple accessor. For it to be so hot it means it is calls many times *and we're not doing anything with the returned value*.

Figured out why - see D27755 (which is complementary to this patch)

Great!
I'm still quite surprised that the accessor modulePath() is the limit here:

auto SummaryIter = llvm::find_if(
    SummaryList->second,
    [&](const std::unique_ptr<GlobalValueSummary> &Summary) {
      return Summary->modulePath() == ExportModulePath;
    });

I guess it is inlined and the profiler is likely not separating perfectly the body of modulePath() from the context.

This revision was automatically updated to reflect the committed changes.