Disable the always importing of constants introduced in D70404 by
default under a new internal option, since it is causing order of
magnitude compile time regressions during the thin link. Will continue
investigating why the regressions occur.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I spent some time looking at one of the big increases from this change today, and found that there is a fairly straightforward fix for most of the increase. The problem is that in computeImportForReferencedGlobals, whenever a global variable is imported, we walk its set of references and mark them all as exported - every single time the global variable is imported. We only need to mark its refs as exported the first time it is exported into any other module. This patch didn't introduce the problem, but it was made much worse as we are far more likely to import global variables containing refs.
We can easily skip doing the exporting again if the ImportList insert return value indicates that the insert wasn't done because it was already in the list. But this only avoids the problem when we attempt to import the global variable into the same module multiple times. In order to see if it has been exported from anywhere, we need additional info. It isn't sufficient to check whether the ExportList insert of the variable returns that it was not inserted because it was already there. This is because we don't know whether it was previously marked exported because it itself was imported somewhere (in which case we already exported its refs), or because some other value that referenced the variable was imported and marked it exported (in which case we haven't yet marked its own refs exported). In order to do that we can either keep another set or augment the existing ExportList to keep some additional info. I prototyped a fix using a separate set to mark those global variable defs that were exported. The thin link time went down significantly with constant variable importing enabled.
Unfortunately, even with the above fix, with the importing of constant variables we are still taking about 47% more time in the thin link. The export and import lists will be longer, and I'm not sure if it is just a result of processing those longer lists, or whether there is some other issue. I'll need to do some additional measurements.
I found the reason for the rest of the increase. Thankfully it is another easy to fix inefficiency. If you look at where we mark the refs as exported in computeImportForReferencedGlobals, it is marking each *copy* of the ref as exported:
for (const auto &VI : RefSummary->refs())
  for (const auto &RefFn : VI.getSummaryList())
    MarkExported(VI, RefFn.get());For our huge applications, with thousands of files, that means that for some commonly used linkonce_odr refs we are iterating significant amounts.
We actually only need to mark the copy in the same module as RefSummary (the copy being imported) as exported. I.e., see the equivalent code in computeImportForFunction:
for (auto &Ref : ResolvedCalleeSummary->refs()) ExportList.insert(Ref);
The reason is that we only need to know which symbols that were not previously exported (i.e. were local, or otherwise previously referenced outside the defining module) are now exported due to this import (so that we don't internalize, or promote if needed). E.g. if you look at the caller in runThinLTO, we've already identified which globals were live and exported (referenced in another module).
When I change the code in computeImportForReferencedGlobals to only mark the copy in RefSummary's module as exported, the time drops to within a couple seconds (2.7%) of the version with this not enabled.
I'll work on a cleaned up patch with these efficiency fixes and send for review. After that goes in and I do a bunch more testing with some of our large apps, we can probably turn this back on by default.