We were reinvoking exportGlobalInModule numerous times redundantly.
No need to re-export globals referenced by a global that was already
exported in the module. This resulted in a large speedup in the thin
link for a big application, particularly when importing aggressiveness
was cranked up.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
I collected some this morning (see below).
However, I realized last night when I was building past the thin link stage that this patch is not correct. The reason is that the ExportList includes globals that were added because they were imported somewhere (in which case we have already processed their calls/references here and don't need to do it again), as well as those added to the ExportList by exportGlobalInModule (in which case we have not already processed their calls/refs). What we need to do is instead check if this was previously imported from this module, and only then we can skip the marking of its calls/references. I have a new patch I'm about to upload that does just that, and also adds a test that failed with this earlier patch (but not with the new patch).
The numbers were collected on my Xeon, which I tried to keep quiet during the measurements. I measured each 5 times. The baseline was somewhat noisy, but the improved version less so for some reason. With very aggressive importing enabled, this patch improves the thin link time between 3 and 14% (average 8%). With default importing thresholds the average improvement is only 3%.
I looked at this initially because a perf-events profile of the thin link showed that GlobalValueSummary::modulePath() was the hottest routine (it is invoked frequently by exportGloballnModule), and exportGlobalInModule was 7th. With this patch, modulePath() drops to the 2nd hottest and exportGlobalInModule drops to 9th. These were measured with the aggressive importing thresholds.
Mehdi - you lgtm'ed earlier but did you want to take another look since I changed the patch since then?
Good catch!
I looked at this initially because a perf-events profile of the thin link showed that GlobalValueSummary::modulePath() was the hottest routine (it is invoked frequently by exportGloballnModule), and exportGlobalInModule was 7th.
This sounds suspicious to me: GlobalValueSummary::modulePath() is a simple accessor. For it to be so hot it means it is calls many times *and we're not doing anything with the returned value*.
Great!
I'm still quite surprised that the accessor modulePath() is the limit here:
auto SummaryIter = llvm::find_if( SummaryList->second, [&](const std::unique_ptr<GlobalValueSummary> &Summary) { return Summary->modulePath() == ExportModulePath; });
I guess it is inlined and the profiler is likely not separating perfectly the body of modulePath() from the context.