Page MenuHomePhabricator

Resolution-based LTO API.
ClosedPublic

Authored by tejohnson on May 13 2016, 10:48 PM.

Details

Summary

This introduces a resolution-based LTO API. The main advantage of this API over
existing APIs is that it allows the linker to supply a resolution for each
symbol in each object, rather than the combined object as a whole. This will
become increasingly important for use cases such as ThinLTO which require us
to process symbol resolutions in a more complicated way than just adjusting
linkage.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
pcc added a comment.Jul 14 2016, 5:22 PM
  • Address review comments
  • Add resolution test based on llvm-lto2
include/llvm/LTO/LTO.h
19

Sorry, but I'd prefer not to do this.

solve any multithreading question about parsing input files.

Let's not pretend that we can parse input files in a multithreaded way that allows them to be used later for linking until we can actually do it. As things stand, doing this will probably regress gold plugin performance as we will need to parse the input file an additional time.

allow to easily get rid of this context later.

It doesn't seem like it will be particularly difficult to remove this argument in clients later, it would probably be on the order of a trivial mechanical change.

237

Good idea, done.

lib/LTO/LTO.cpp
280

Because there may be no regular LTO module available as part of the link. For the gold plugin for example if some of the ThinLTO modules define common symbols then we need to make sure that one of the object files contains the resolved common symbols. In the case where there are no regular LTO modules that can be an object file containing just the common symbols.

lib/LTO/LTOBackend.cpp
149 ↗(On Diff #63907)

Sure, but it's a little simpler to create this where needed, especially when using parallel code gen. Eventually we might want to just create this once in backend() and thinBackend(), but that would require TargetMachine to be thread safe.

tools/llvm-lto2/llvm-lto2.cpp
35

This flag will need to be passed multiple times, and should be familiar to any user of this tool, so it seems appropriate to give it a short name.

45

I don't think so either.

mehdi_amini added inline comments.Jul 14 2016, 9:55 PM
include/llvm/LTO/LTO.h
10

I missed how this friend allows access Obj earlier, and it is not great. It is used to create a dependency on having a Module available and on having "GlobalValue *", while we want to be able to make all the resolution without loading IR.
Anytime LTO is accessing an IR construct through an InputFile, there's something that looks like a boundary violation.

As soon as we have a proper symbol table, InputFile should not even expose access to a module or IR at all.

19

Let's not pretend that we can parse input files in a multithreaded way that allows them to be used later for linking until we can actually do it

I'm not sure what you mean: this is *already* what we do with libLTO:

The file is lazily loaded in its own context, this is cheap: the IR is not fully parsed but only the list of symbols. It is then reloaded separately, and non lazily, in the LTO context later for the purpose of LTO merging.
If you have performance concern, we need to time this first. I can try to write a test tomorrow.

Right now this choice is penalizing ThinLTO that won't be able to load symbols in parallel and still have to load two times anyway.

lib/LTO/LTO.cpp
280

Because there may be no regular LTO module available as part of the link.

Client can just create one then...

Module M;
Lto.add(M);
// done

if some of the ThinLTO modules define common symbols then we need to make sure that one of the object files contains the resolved common symbols.

I don't know what you're referring to here?

lib/LTO/LTOBackend.cpp
150 ↗(On Diff #64068)

For ThinLTO at least, we don't support parallel codegen, so we don't need TM to be thread safe to be able to have only one TM created in thinBackend() and reused for both opt and codegen.

I even think that it should be already achievable by having:

  • backend() and thinbackend() always creating the TM.
  • opt() taking the TM as a parameter
  • codegen() taking the TM as a parameter
  • split_codegen() recreating the TM.

This way you only recreate an extra one in splitcodegen() when parallel codegen is enabled. Unless I missed something this is a net win.

pcc updated this revision to Diff 64198.Jul 15 2016, 3:00 PM
  • Avoid creating another TargetMachine
  • Add FIXME
include/llvm/LTO/LTO.h
10

I agree with you, but until we have a symbol table, there needs to be some way for LTO to access IR entities in the input file.

Added a FIXME for the removal of these friends.

19

If you have performance concern, we need to time this first. I can try to write a test tomorrow.

Please let me know what you find. Bear in mind that you are the one advocating for a change here (the status quo in the only current client of this interface, the gold plugin, is a single context) so the burden is on you to justify it. If you wish to make a change, you are welcome to propose it separately from this patch.

lib/LTO/LTO.cpp
280

Client can just create one then...

Not as easily as that. Roughly:

if (/* there are no regular LTO objects, oh, looks like we'll need more API surface */) {
  LLVMContext Ctx;
  Module M(Ctx);
  M.setTargetTriple(??? /* yet more API surface */);
  // add common symbols
  std::string BC;
  raw_str_ostream OS(BC);
  WriteBitcodeToFile(M, OS);
  ErrorOr<unique_ptr<InputFile>> F = InputFile::create(BC);
  if (!F) { /* error handling */ }
  Lto.add(std::move(*F));
}

Seems much simpler to always have a module with index 0.

I don't know what you're referring to here?

As I previously mentioned on this code review:

the special handling for common symbols required in the gold plugin (search for addCommons).

pcc marked an inline comment as done.Jul 15 2016, 3:01 PM
mehdi_amini requested changes to this revision.Jul 15 2016, 3:36 PM
mehdi_amini edited edge metadata.
mehdi_amini added inline comments.
include/llvm/LTO/LTO.h
10

I agree with you, but until we have a symbol table, there needs to be some way for LTO to access IR entities in the input file.

It is not clear to me why LTO needs to access any IR entity? If you need access an *IR entity* now, how could we don't need access to them tomorrow (<- read: with a symbol table)?

It seems to me that anywhere you're accessing an IR entity directly through Obj right now is a place that should have an API on the InputFile or the Symbol class (like the one that exists: InputFile::getSourceFileName()).

Assuming "Symbol" is abstracting the entries in the symbol table, for instance:

  • Input->Obj->getMemoryBufferRef().getBufferIdentifier(); -> Input->getIdentifier()
  • GlobalValue *GV = Obj->getSymbolGV(Sym.I->getRawDataRefImpl()); -> Symbol *S = Obj->getSymbol(...)
  • GV->hasGlobalUnnamedAddr(); -> needs an API on class Symbol
  • GV->getName(); -> needs an API on class Symbol
  • if (Res.VisibleToRegularObj || (GV && Used.count(GV)) || -> Used contains a GlobalVariable *, how do we go to a symbol table?
  • Module &M = Input->Obj->getModule(); -> used for:
for (GlobalVariable &GV : M.globals())
  if (GV.hasAppendingLinkage())
    Keep.push_back(&GV);

which should iterate on Symbols on the InputFile.

  • LTO::addRegularLto does some symbol resolution using Obj. At this point you probably want to operate on real IR construct anyway, and that's the point where you leave the "InputFile" to go to the IR. I think InputFile should just expose something like a factory std::unique_ptr<Module> parseModule(/* options */);. If it was just for full LTO, that would solve many of the above items as addSymbolToGlobalRes() could take a Module & instead of operating on the private IRObjectFile inside InputFile. However it won't help for ThinLTO.
GlobalValue *GV = Input->Obj->getSymbolGV(Sym.I->getRawDataRefImpl());
if (Res.Prevailing && GV)
  ThinLto.PrevailingModuleForGUID[GV->getGUID()] =
      MBRef.getBufferIdentifier();
  • getGUID() should be implemented on the Symbol directly.
19

I'm not sure, it seems easy to turn it around: "you are the one proposing an API that is targeted for performance reason with a specific use case (and client) in mind, so the burden should be on you to justify it if you want this to get it in.", otherwise write a cleaner API.

lib/LTO/LTO.cpp
255

Is is possible for GV to be null here?

280

Seems much simpler to always have a module with index 0.

"Simpler" from the gold point of view, my point of view is that when adding 1000 ThinLTO modules we'll copy 1000 DataLayout around for no reason.
Which reminds me that I don't expect any object being emitted for this module for a "pure" ThinLTO link.

305

(Same question here, can GV be null?)

This revision now requires changes to proceed.Jul 15 2016, 3:36 PM
mehdi_amini added inline comments.Jul 15 2016, 6:06 PM
include/llvm/LTO/LTO.h
19

Running ld64 on the 701 bitcode files that make an LTO build of llc (X86 only), and exiting right before the LTO optimizer starts, i.e. after the LTO merge happens (best of 10 times, on a 4-cores laptop):

  • Parallel resolution (dedicated context, double parsing): 7.362s
  • Sequential resolution (shared context, single parsing): 8.834s
pcc added inline comments.Jul 15 2016, 6:21 PM
include/llvm/LTO/LTO.h
10

If you need access an *IR entity* now, how could we don't need access to them tomorrow (<- read: with a symbol table)?

The way I see this evolving is that any uses of IR constructs here will be replaced with uses of APIs for accessing the bitcode symbol table (except in places such as LTO::addRegularLto which actually need a Module). The functionality exposed via InputFile and Symbol will remain the primary way by which clients will access the object (i.e. they wouldn't access the symbol table directly).

I would prefer not to expose the properties you propose to expose via InputFile/Symbol because they aren't strictly needed by clients for symbol resolution.

19

Thanks. To me it isn't clear how much of that is due to parsing as opposed to parallel resolution, but I can accept that this would simplify clients. So I suppose I can try to make the change you suggested.

lib/LTO/LTO.cpp
255

Yes, if the symbol is defined by inline asm.

280

I suppose we could avoid creating an empty regular LTO object unless a hook is set.

305

Ditto

mehdi_amini added inline comments.Jul 15 2016, 7:10 PM
include/llvm/LTO/LTO.h
19

I chatted with Duncan and thought you may be interested by some history: his recollections of the original motivation for parsing in separate context was not parallelism but the amount of things that leaks to the context, while the modules wouldn't be used if they come from static archives anyway.
Especially at that time global metadata were not lazyloadable.

pcc added inline comments.Jul 15 2016, 7:26 PM
include/llvm/LTO/LTO.h
19

Interesting. LLD originally used multiple contexts, but now uses a single context because of the reversed tradeoff.
http://llvm.org/viewvc/llvm-project?view=revision&revision=267921

the modules wouldn't be used if they come from static archives anyway.

Huh, I would expect the linker to use the archive symbol table to load only the needed modules.

mehdi_amini added inline comments.Jul 15 2016, 8:40 PM
include/llvm/LTO/LTO.h
19

LLD originally used multiple contexts, but now uses a single context because of the reversed tradeoff.

Going from 29.86187751s to 29.814533787s does not justify clobbering any API IMO.

pcc added inline comments.Jul 15 2016, 8:43 PM
include/llvm/LTO/LTO.h
19

Sure. I previously wasn't aware that Rafael had previously measured this.

pcc updated this revision to Diff 64224.Jul 15 2016, 8:46 PM
pcc edited edge metadata.
pcc marked 2 inline comments as done.
  • Remove LTO argument from InputFile::create
  • Only generate code for the regular LTO module when necessary
mehdi_amini added inline comments.Jul 16 2016, 3:47 PM
lib/LTO/LTO.cpp
323

No std::move on return.

tools/llvm-lto2/llvm-lto2.cpp
165

Ditto

pcc updated this revision to Diff 64350.Jul 18 2016, 10:49 AM
pcc marked 2 inline comments as done.
pcc edited edge metadata.
  • No std::move on return

Ping - Mehdi, any more comments? I am relying on the follow-on patch D21545 adding support for weak resolution and internalization in the ThinLTO backends.

mehdi_amini added inline comments.Jul 25 2016, 6:54 PM
include/llvm/LTO/LTO.h
42

What about

return Flags & object::BasicSymbolRef::SF_Global ||
          Flags & object::BasicSymbolRef::SF_FormatSpecific;
lib/LTO/LTO.cpp
281

This is marked done, but doesn't seem to be?

lib/LTO/LTOBackend.cpp
59 ↗(On Diff #64350)

Deriving an output temp file from a user-supplied *output* is fine, what does not make sense to me is to write stuff where the *input* files are. This is unlike what we usually do (the static archives could be in a read-only system directory for instance, this is quite common on our setup).

You can maintain whatever behavior Gold currently has by passing an option to addSaveTemps maybe. Right now the API being addSaveTemps(std::string OutputFileName) I don't expect files to be written anywhere unexpected.

Also the doxygen is pretty clear about what should be expected from this API:

/// This is a convenience function that configures this Config object to write
/// temporary files named after the given OutputFileName for each of the LTO
/// phases to disk. A client can use this function to implement -save-temps.
pcc updated this revision to Diff 65465.Jul 25 2016, 8:11 PM
  • Refresh
  • Address review comments
  • Implement linked objects file feature
lib/LTO/LTO.cpp
281

See the first three lines of LTO::run.

lib/LTO/LTOBackend.cpp
59 ↗(On Diff #64350)

I agree with you that we should move to a temp file naming scheme based on output files, however like Teresa I think that should be discussed separately from this patch. I don't think we need a flag either, whatever we decide to do here should apply to all linkers.

Added a FIXME here to change the naming scheme.

pcc marked an inline comment as done.Jul 25 2016, 8:11 PM
mehdi_amini accepted this revision.Jul 31 2016, 3:03 PM
mehdi_amini edited edge metadata.

Last comments, we should be able to iterate in-tree after that.

include/llvm/LTO/LTO.h
226

Should be RegularLTO I think.

235

Similarly, should be ThinLTO

284

addRegularLTO

286

addThinLTO

289

runRegularLTO

290

runThinLTO

lib/LTO/LTO.cpp
243

HasThinLTOSummary

281

I see, I was looking for an empty LTO module (RegularLto.CombinedModule) instead of object (as in produced object).

316

Is this test what you intend? It seems reversed (if so, then it's probably not tested).

318

This is still gonna copy a few SmallVector + a couple of string + a few other fields. And it's gonna happens for each and every ThinLTO module. This is just not the right place for that. Adding a triple/datalayout for LTO should not be handled here, it is just not the right place. LTO::run could handle it by poking at ThinLto.ModuleMap if needed.

Similarly, RegularLto.CombinedModule should not be created unless explicitly added/requested by the linker (i.e. not call to make_unique<Module>("ld-temp.o", Ctx); in the ctor).

358

Doc the above tests.

lib/LTO/LTOBackend.cpp
60 ↗(On Diff #65465)

It seems we are not looking at this from the same angle: I was looking at it as a *new* API, not a refactoring of whatever the gold plugin was doing. So I can't see "this is gold's behavior" as a valid motivation here (if you want this to be NFC from the gold point of view, the gold-plugin can be patched in the first place).

This revision is now accepted and ready to land.Jul 31 2016, 3:03 PM

pcc asked me to take over the patch since he is out on vacation this month. Responses below. I have made most of the suggested changes, but not a couple for the reasons stated. Let me see if I can upload the new patch even though pcc created this one.

include/llvm/LTO/LTO.h
226

Done

235

Done

284

Done

286

Done

289

Done

290

Done

lib/LTO/LTO.cpp
243

Done

316

I think what was meant here was to check if the RegularLTO.CombinedModule's target triple was still empty, and if so set it. I have changed it to that.

318

It only does the copy once with the change I made, so there isn't any duplication in the copying anymore. This is the simplest place to set it based on an added ThinLTO module.

I looked at delaying the creation of the CombinedModule, however, it turns out we frequently need it since even if we don't add any regular LTO files, we typically have callback hooks into the linker defined where the linker could add its own resolved symbols to the combined module, in which case it needs to be valid. I don't think we gain much by lazily creating the empty Module.

358

Done

lib/LTO/LTOBackend.cpp
60 ↗(On Diff #65465)

I removed this path so that the OutputFileName is always used, so it now matches the doxygen comment for the API.

tejohnson commandeered this revision.Aug 10 2016, 5:42 PM
tejohnson edited reviewers, added: pcc; removed: tejohnson.

Taking this over so I can upload new patch.

tejohnson updated this revision to Diff 67642.Aug 10 2016, 5:43 PM
tejohnson edited edge metadata.
  • Address review comments
mehdi_amini added inline comments.Aug 10 2016, 5:45 PM
lib/LTO/LTO.cpp
318

The ", we typically have callback hooks into the linker defined where the linker could add its own resolved symbols to the combined module" is not clear to me at all (LTOCodeGenerator will never do that AFAIK for example).
(I understand that Gold does something with "commons" at that time, but haven't add time to figure why it is needed)

(Note: I already approved this patch as good enough to be iterated on in-tree, so feel free to commit)

mehdi_amini added inline comments.Aug 10 2016, 5:50 PM
lib/LTO/LTO.cpp
318

To be more explicit: I don't like this behavior because it creates a relatively strong coupling between the linker and the plugin. The linker changes the module in an unpredictable way that can conflict with assumption that the plugin implementation could make. It makes it harder to follow invariant in the plugin implementation, and makes it easy to break the client of the API (the linker).

tejohnson added inline comments.Aug 11 2016, 8:05 AM
lib/LTO/LTO.cpp
318

True, this is the case now because gold is the only user. I went ahead and committed (temporarily, as it turned out, will recommit after fixing some bot failures) so that I can also get pcc's follow-on patch in that I need. But will also work up a patch to make the creation of the combined module lazy. I'm not sure how to address your other concern about the linker changing the module in unpredictable ways, but the API does document that the linker may add symbols to the combined module via the callback hooks.

lib/LTO/LTOBackend.cpp
61 ↗(On Diff #67642)

I didn't notice that this change was causing an issue since I had old output files around that caused tests to pass, but this caused a bot failure due to missing output files. The problem is not just that the names are different, but that they become difficult to correlate to the corresponding input source name. E.g. in the gold/X86/thinlto_linkonceresolution.ll test, where we have "%gold ... -o %t3.o %t2.o %t.o", the old .opt.bc temp files were named %t.o.4.opt.bc and %t2.o.4.opt.bc, but with the change I made become %t3.o.1.4.opt.bc and %t3.o.2.4.opt.bc. It isn't at all obvious which output file corresponds to which input file (numbering will depend on the ModuleMap iteration order).

What I did was to revert the change I had made, but add a new bool flag parameter on addSaveTemps (UseInputModulePath) that is set to true by gold and will provoke the old behavior. We can iterate on a better solution. Probably pass in a (temp) directory name and output all the files in a tree rooted there (note that you could have same named module identifiers at different paths, so you can't just disambiguate by appending the basename of the input module).

This revision was automatically updated to reflect the committed changes.
rogfer01 added inline comments.
llvm/trunk/tools/gold/gold-plugin.cpp
95 ↗(On Diff #67690)

I think this one should be size_t as lto::InputFile::Symbol::getCommonSize returns size_t and then the std::max in line 629 fails to build in targets where size_t is unsigned int.

Can you confirm if changing to size_t makes sense here? It does fix the build indeed.

If this is OK I can commit the change.