This is an archive of the discontinued LLVM Phabricator instance.

Resolution-based LTO API.
ClosedPublic

Authored by tejohnson on May 13 2016, 10:48 PM.

Details

Summary

This introduces a resolution-based LTO API. The main advantage of this API over
existing APIs is that it allows the linker to supply a resolution for each
symbol in each object, rather than the combined object as a whole. This will
become increasingly important for use cases such as ThinLTO which require us
to process symbol resolutions in a more complicated way than just adjusting
linkage.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
lhames added inline comments.Jun 9 2016, 1:35 PM
lib/CodeGen/LTOBackend.cpp
142–149 ↗(On Diff #60111)

Apologies if this is a naive question - I'm still wrapping my head around this patch, but could opt/codegen/splitCodeGen be std::functions on Config rather than being plain functions that introspect the Config object to build the pass pipelines? This might allow you to decouple some of the pipeline setup logic from the LTO interface. It may also allow you to remove some of the hooks: E.g. If the user is supplying 'codegen', they can just add any module inspection code to their custom implementation, rather than having to set up 'PreCodeGenModuleHook'.

I like Lang approach to decouple a bit more the configuration.

include/llvm/CodeGen/LTOBackend.h
59 ↗(On Diff #60111)

"to add its own resolved symbols": so the linker will modify the module? (I notice the Module is not const in the hook prototype)
What is the use-case?

62 ↗(On Diff #60111)

What level of "thread-safety" are we talking about? Is it just about the linker internal structure? Can the linker safely assume that only one thread will call one of these callbacks for a given Module? For a given LLVMContext?

67 ↗(On Diff #60111)

What is the Task "ID" at this point? This would be a task that will not codegen in the case of a parallel backend?

94 ↗(On Diff #60111)

If the intent is just -save-temps, I'd rather have the Index passed as a const ref.

107 ↗(On Diff #60111)

Why the Triple string and Target here?

120 ↗(On Diff #60111)

This is a strange API at this level.
And the naming is very generic for what the description says it does.

136 ↗(On Diff #60111)

Why this LTOLLVMContext instead of a factory method in the Config class?

include/llvm/LTO/LTO.h
90 ↗(On Diff #60111)

I think some doc may be nice for the enum values (I have no idea what is comdat_alias right now).

93 ↗(On Diff #60111)

doc (public API in public header)

292 ↗(On Diff #60111)

Not clear why ParallelCodeGenParallelismLevel is not part of Config.
Also it does not apply to ThinLTO, which is not clear.

323 ↗(On Diff #60111)

There is too much state here.
I could see getting rid of almost *all* of these.
Until you call run() on this class, I don't expect that we need any but the Config and ModuleMap.

385 ↗(On Diff #60111)

Why do we need this here?

pcc added inline comments.Jun 14 2016, 6:25 PM
include/llvm/CodeGen/LTOBackend.h
59 ↗(On Diff #60111)

Yes, it would modify the module.

The use case is to work around API deficiencies; the specific issue I had in mind was the special handling for common symbols required in the gold plugin (search for addCommons).

I wouldn't be surprised if we had to implement a similar workaround in libLTO as well, but I would not expect lld for example to need to modify the module, as we control both sides of the interface.

That said, the gold plugin only requires a mutable hook at one point (which happens to be post-internalize at the moment, but I think also pre-opt would work); maybe we should only allow a mutable hook at one of those points.

62 ↗(On Diff #60111)

Yes, this is with regard to the linker's internal structure.

On the LLVM side I think the right guarantee is a per-module guarantee (so the linker would be able to freely inspect and modify the provided module without having to worry about thread safety). Although of course at the moment this would be a per-context guarantee unless/until we made LLVMContext thread-safe.

I will change this comment to mention this.

67 ↗(On Diff #60111)

What is the Task "ID" at this point?

I'm not sure what you are asking here. The task ID generally stays consistent across the whole pipeline.

This would be a task that will not codegen in the case of a parallel backend?

Do you mean a distributed backend? In that case, none of the hooks will be called for ThinLTO tasks other than the combined module hook. I will document that.

94 ↗(On Diff #60111)

Makes sense, will do.

107 ↗(On Diff #60111)

You're right, we don't need these. I'll change the implementation to use Module::getTargetTriple().

120 ↗(On Diff #60111)

Yes, agree. After I finish the reorganization I mentioned earlier, I want to move this API to a library-internal header.

136 ↗(On Diff #60111)

Because the LTOLLVMContext needs to own the DiagHandler.

include/llvm/LTO/LTO.h
90 ↗(On Diff #60111)

Will do.

93 ↗(On Diff #60111)

Will do

292 ↗(On Diff #60111)

The idea was that Config controls everything except how code generation is "orchestrated" (that's why I'm passing in a ThinBackend here for example).

Also it does not apply to ThinLTO, which is not clear.

I will document that more clearly.

323 ↗(On Diff #60111)

I think the code is made more straightforward and easier to understand by incrementally building the combined module (for regular LTO) or the combined index (for ThinLTO) from add(), as you can see from the code exactly what happens for each module. That requires us to keep this state here.

Anyway, this state is an implementation detail, so it is not as important as the public API.

385 ↗(On Diff #60111)

Okay, since it looks like we're only using memory buffers here, we can probably just have an array of MemoryBuffers here (or maintain ModuleMap as a MapVector, or something).

lib/CodeGen/LTOBackend.cpp
142–149 ↗(On Diff #60111)

That's an interesting suggestion. My initial concern with that was that it bakes the required state for each pipeline phase into the API. For example, codegen requires a Triple and a Target, and many of the ThinLTO phases would require a combined module index, and those would need to be passed into the pipeline function via the "hook" in Config. If we ever change the required state, that would mean a lot of churn for each user of the API.

Although it occurred to me that we could avoid baking in the state like this:

// In LTOBackend.cpp
struct PromoteState {
  ModuleSummaryIndex &CombinedIndex;
};
struct CodeGenState {
  Config &C;
  StringRef TheTriple;
  const Target *TheTarget;
  AddStreamFn AddStream;
};

// In LTOBackend.h
struct PromoteState;
bool promote(size_t Task, Module &M, PromoteState &State);

struct CodeGenState;
bool codegen(size_t Task, Module &M, CodeGenState &State);

struct Config {
  // ...
  std::function<bool(size_t Task, Module &M, PromoteState &State)> Promote;
  std::function<bool(size_t Task, Module &M, CodeGenState &State)> CodeGen;
  // ...
};

that would still make the signatures of the Config "hooks" non-uniform, which would also make it more complicated to implement save-temps, which is about 90% of the point of these hooks. (Suppose I want to add save-temps to each pipeline phase, including "pre-optimization". I would not only need to write an individual wrapper function for each phase, but I would also need to know which phase comes "first" in order to run my pre-opt save-temps processing before it.) I would prefer to keep the hooks uniform unless there is a good reason not to do so.

pcc updated this revision to Diff 61155.Jun 17 2016, 7:45 PM
pcc marked 3 inline comments as done.
  • Address review comments
  • Remove TheTarget and TheTriple arguments and unneeded error_category (now that we have StringError)
  • Pass index as const ref to CombinedIndexHook
  • Remove ThinObjs, and change ModuleMap's type to MapVector
  • Add Config::addSaveTemps; add initial llvm-lto2 test
  • Wire the gold plugin up to addSaveTemps, and fix several tests that were failing before (didn't notice due to stale outputs in the test directory)
  • Add a .resolution.txt to save-temps, and port comdat.ll
  • Port emit-llvm.ll test to resolution.txt
  • Remove api file feature
  • Move LTO.cpp to Resolution subdirectory
  • Move LTOBackend to LTO/Resolution; inline upgradeLinkage
  • Remove unnecessary #includes from gold-plugin.cpp
  • Add comments
  • Remove unnecessary dep
pcc added a comment.Jun 17 2016, 7:45 PM

In this version of the patch I started exercising the test harness so you can see what that looks like. As previously mentioned I also moved the code to a separate library (lib/LTO/Resolution) so we can depend on it from clang.

I started working on the clang side backend, but there's a hairy output stream yak to shave there, so it'll take a little more time.

include/llvm/CodeGen/LTOBackend.h
120 ↗(On Diff #60111)

I ended up folding this into the caller on the regular LTO side, as ThinLTO linkage upgrades now go via the module summary index.

include/llvm/LTO/LTO.h
90 ↗(On Diff #60111)

I ended up removing this error category and just using StringError for errors produced by this module.

93 ↗(On Diff #60111)

See above

mehdi_amini added inline comments.Jun 20 2016, 8:55 PM
include/llvm/CodeGen/LTOBackend.h
67 ↗(On Diff #60111)

I meant the LTO parallel codegen: after merging the IR you will call this hook with a tuple <task-ID, module> that will never go through the codegen.

include/llvm/LTO/Resolution/LTOBackend.h
71 ↗(On Diff #61155)

The thread safety in LLVM is at the LLVMContext level, not the module level.

pcc added inline comments.Jun 21 2016, 1:20 PM
include/llvm/CodeGen/LTOBackend.h
67 ↗(On Diff #60111)

In this case the task id would be 0, and it would be split into tasks 0..N-1.

Although task 0 will end up using a different module for codegen (i.e. whichever partition happened to be assigned task id 0), the module provided to the hook at this point is still significant, as any changes made to the module (e.g. adding common symbols for gold) will be reflected in the partitions.

include/llvm/LTO/Resolution/LTOBackend.h
71 ↗(On Diff #61155)

Yes, but I wanted to describe the intention here should we ever relax the thread safety in LLVM. (For example, if we ever made it safe for multiple code generation threads to operate on a single module, the guarantee here would be stronger than the default guarantee.)

Looks like it is getting very close. A few comments/questions below.

Also, it is a huge patch - suggest committing in a few different pieces:

  1. Move existing LTO.{cpp,h} to Resolution subdirectory
  2. Add LTOBackend.cpp, changes to LTO.{cpp,h}, and llvm-lto2 + tests
  3. Changes to gold-plugin.cpp to use new interfaces
include/llvm/LTO/Resolution/LTO.h
321 ↗(On Diff #61155)

Is the ThinLTO support still TODO? It looks like this can be removed as I see it here now.

include/llvm/LTO/Resolution/LTOBackend.h
106 ↗(On Diff #61155)

Although it is not called for the backend process (which also has a Config object per D21545). I do know what you are trying to say here - I think the distinction is that this is always invoked via LTO::run regardless of whether the backend is in-process.

Whereas the earlier hooks are only invoked from the backends themselves (regarless of whether invoked in process or in a separate backend process)

111 ↗(On Diff #61155)

Add doxygen comment

lib/LTO/Resolution/LTO.cpp
167 ↗(On Diff #61155)

Can we avoid creating this if we are not going to perform ThinLTO? E.g. Create lazily in runThinLto if it is still null?

370 ↗(On Diff #61155)

Could move this to parent class since it is in both derived classes

561 ↗(On Diff #61155)

Can you document why we are initializing Task to ParallelCodeGenParallelismLevel? I thought earlier it said somewhere that regular LTO (parallel or not) was TaskId 0 and ThinLTO started at TaskId 1.

lib/LTO/Resolution/LTOBackend.cpp
32 ↗(On Diff #61155)

I'm confused by what is going on here with OldHook.

test/LTO/Resolution/X86/comdat.ll
14 ↗(On Diff #61155)

Does this need to be specified here? It looks like a suitable default (with none of the flags) will be constructed by default if not.

73 ↗(On Diff #61155)

What causes the difference w.r.t. gold here (protected vs not)?

test/tools/gold/X86/coff.ll
19 ↗(On Diff #61155)

Why the loss of local_unnamed_addr?

test/tools/gold/X86/thinlto_alias.ll
24 ↗(On Diff #61155)

Since thinBackend() now invokes thinLTOInternalizeModule, why is this failing?

tools/gold/gold-plugin.cpp
730 ↗(On Diff #61155)

Can we skip this if options::thinlto_index_only? Otherwise Backend is simply overwritten just below.

tools/llvm-lto2/llvm-lto2.cpp
2 ↗(On Diff #61155)

Should probably add a comment at the top of the file about why this exists then.

20 ↗(On Diff #61155)

Need to describe format of "resolution" (e.g. 'p','l','x' and meanings). Also, it would be good to mention the default resolution for anything not specified here (which from the code it appears to be non-p, non-l, and non-x).

pcc updated this revision to Diff 62596.Jul 1 2016, 7:52 PM
pcc marked 6 inline comments as done.
  • Refresh
  • Address review comments
pcc added a comment.Jul 1 2016, 7:52 PM

Also, it is a huge patch - suggest committing in a few different pieces:

Makes sense. Could also split it up into differential revisions if it makes it easier to review.

include/llvm/LTO/Resolution/LTOBackend.h
106 ↗(On Diff #61155)

Clarified.

lib/LTO/Resolution/LTO.cpp
167 ↗(On Diff #61155)

Maybe, but that sounds like a premature optimization to me.

lib/LTO/Resolution/LTOBackend.cpp
32 ↗(On Diff #61155)

This is calling the hook provided by the linker, which still needs to run (e.g. in order to add common symbols). If the linker's hook returned false, we need to pass that result through.

test/LTO/Resolution/X86/comdat.ll
14 ↗(On Diff #61155)

After thinking about it some more, I reckon that in order to reduce the possibility of mistakes, we probably don't want to implement a default resolution in llvm-lto2, nor should we accept resolutions for non-existent symbols. I have made that change to the test harness.

73 ↗(On Diff #61155)

Unlike the existing gold plugin, we do not resolve the visibility of each symbol and apply it to the combined module. It is unnecessary to do so because gold has already resolved the symbol's visibility using information provided by the plugin (i.e. LDPV_*) and will apply it to the final output file. The corresponding gold test (test/LTO/Resolution/X86/comdat.ll) shows that the symbol receives the correct visibility.

test/tools/gold/X86/coff.ll
19 ↗(On Diff #61155)

LTO only tracks whether the symbol is global unnamed_addr. We do not track local_unnamed_addr for similar reasons as for visibility: it is the linker's job to keep track of whether the presence of that attribute can permit internalization (or, for Mach-O, auto-hide), and its presence in the IR shouldn't affect the generated code.

test/tools/gold/X86/thinlto_alias.ll
24 ↗(On Diff #61155)

The internalization done in thinLTOInternalizeModule for symbols that are not externally visible is separate from the internalization done for preempted alias targets (search for InternalLinkage in IRMover.cpp), which is not yet implemented for ThinLTO. I believe the practical effect here is that the linker will end up re-resolving the weak symbol from the resulting native object files. (This is similar to how we currently handle (non-ODR) weak or linkonce symbols, as we currently don't have a summary resolution that means "discard this symbol".)

tools/gold/gold-plugin.cpp
730 ↗(On Diff #61155)

It seems more straightforward to just let it be overwritten.

pcc updated this revision to Diff 63907.Jul 13 2016, 8:10 PM

Refresh, rebase on top of D22173, move back to lib/LTO

tejohnson accepted this revision.Jul 14 2016, 8:27 AM
tejohnson edited edge metadata.

LGTM, with one nit below.

However, I'd like to wait to commit this until:

  1. Mehdi signs off
  2. D22302 goes in, enabling weak/linkonce resolution for gold. Your patch has the side effect of doing this enabling, and I think it is cleaner to have that functionality added separately from this patch adding the new API.

Mehdi, can you take a look at this patch and D22302 asap - this is getting on the critical path of some fixes and follow on patches I need to add for the distributed backend case (will send a separate update on that).

lib/LTO/Resolution/LTOBackend.cpp
32 ↗(On Diff #61155)

Ok, maybe rename to LinkerHook and/or add a comment.

This revision is now accepted and ready to land.Jul 14 2016, 8:27 AM

Can you update description / title (it is still marked as "wip" which make me think the revision is not ready).

pcc retitled this revision from [wip] Resolution-based LTO API. to Resolution-based LTO API..Jul 14 2016, 1:31 PM
pcc updated this object.
pcc edited edge metadata.

Can you update description / title (it is still marked as "wip" which make me think the revision is not ready).

Done.

Can you add one test for the .resolution.txt file with llvm-lto2?

include/llvm/LTO/Config.h
82 ↗(On Diff #63907)

"will never be called *concurrently* from multiple threads" -> I don't think we want to promise to pin a Module or a task to a given thread.

146 ↗(On Diff #63907)

You may add that the real reason to have a class here is that you don't want to tie the lifetime of the context to the configuration object and you need to keep a copy of the diagnostic handler alive.
Otherwise it may be easy for someone to nuke this anytime.

include/llvm/LTO/LTO.h
103 ↗(On Diff #63907)

IIUC the reason we need a context is because we can't get the information we want on an InputFile without a context for now.
Since we have plan to fix this (and this is acknowledged in the FIXME), I'd rather remove the LTO param now and add a Context inside the InputFile itself. It'll:

  1. solve any multithreading question about parsing input files.
  2. allow to easily get rid of this context later.
321 ↗(On Diff #63907)

At least, can we wrap them in struct to separate this clearly:

struct {
  unsigned ParallelCodeGenParallelismLevel;
  LTOLLVMContext Ctx;
  std::unique_ptr<Module> CombinedModule;
  IRMover Mover;
} LTOState;

  // These fields are for ThinLTO.
struct {
  ThinBackend Backend;
  ModuleSummaryIndex CombinedIndex;
  MapVector<StringRef, MemoryBufferRef> ModuleMap;
  DenseMap<GlobalValue::GUID, StringRef> PrevailingModuleForGUID;
} ThinLTOState;

It'll make the code more clear on violation (ThinLTO code accessing LTO state or vice-versa)

lib/LTO/LTO.cpp
280 ↗(On Diff #63907)

This isn't clear to me "the client may want to add symbol definitions to it". Why doesn't the client add a regular LTO module if he needs to add its stuff?

lib/LTO/LTOBackend.cpp
58 ↗(On Diff #63907)

How is this supposed to work with static archive? It'll write a bunch of file next to the input shared library?
Could we have a separate directory to stuff all these files with ThinLTO?

149 ↗(On Diff #63907)

Nothing critical but we just already created a TM for the optimization pipeline.

tools/llvm-lto2/llvm-lto2.cpp
2 ↗(On Diff #63907)

Agree

34 ↗(On Diff #63907)

"r" is really not very explicit for an option, I think we usually have more self-explaining names.

44 ↗(On Diff #63907)

Are there incompatible combination? (I don't think so, but just checking).

tejohnson added inline comments.Jul 14 2016, 2:18 PM
lib/LTO/LTOBackend.cpp
58 ↗(On Diff #63907)

Yes, it will write the temp files next to the archive library - each constituent bitcode in the archive is given a unique module ID by the gold-plugin (see D20559 claim_file_hook()).

I find it more intuitive to put the saved temp files next to the input object or archive, but open to other suggestions - that should probably change in a separate patch though as this is maintaining the current gold-plugin behavior.

pcc updated this revision to Diff 64068.Jul 14 2016, 5:22 PM
pcc marked 4 inline comments as done.
pcc added a comment.Jul 14 2016, 5:22 PM
  • Address review comments
  • Add resolution test based on llvm-lto2
include/llvm/LTO/LTO.h
103 ↗(On Diff #63907)

Sorry, but I'd prefer not to do this.

solve any multithreading question about parsing input files.

Let's not pretend that we can parse input files in a multithreaded way that allows them to be used later for linking until we can actually do it. As things stand, doing this will probably regress gold plugin performance as we will need to parse the input file an additional time.

allow to easily get rid of this context later.

It doesn't seem like it will be particularly difficult to remove this argument in clients later, it would probably be on the order of a trivial mechanical change.

321 ↗(On Diff #63907)

Good idea, done.

lib/LTO/LTO.cpp
280 ↗(On Diff #63907)

Because there may be no regular LTO module available as part of the link. For the gold plugin for example if some of the ThinLTO modules define common symbols then we need to make sure that one of the object files contains the resolved common symbols. In the case where there are no regular LTO modules that can be an object file containing just the common symbols.

lib/LTO/LTOBackend.cpp
149 ↗(On Diff #63907)

Sure, but it's a little simpler to create this where needed, especially when using parallel code gen. Eventually we might want to just create this once in backend() and thinBackend(), but that would require TargetMachine to be thread safe.

tools/llvm-lto2/llvm-lto2.cpp
34 ↗(On Diff #63907)

This flag will need to be passed multiple times, and should be familiar to any user of this tool, so it seems appropriate to give it a short name.

44 ↗(On Diff #63907)

I don't think so either.

mehdi_amini added inline comments.Jul 14 2016, 9:55 PM
include/llvm/LTO/LTO.h
94 ↗(On Diff #64068)

I missed how this friend allows access Obj earlier, and it is not great. It is used to create a dependency on having a Module available and on having "GlobalValue *", while we want to be able to make all the resolution without loading IR.
Anytime LTO is accessing an IR construct through an InputFile, there's something that looks like a boundary violation.

As soon as we have a proper symbol table, InputFile should not even expose access to a module or IR at all.

103 ↗(On Diff #64068)

Let's not pretend that we can parse input files in a multithreaded way that allows them to be used later for linking until we can actually do it

I'm not sure what you mean: this is *already* what we do with libLTO:

The file is lazily loaded in its own context, this is cheap: the IR is not fully parsed but only the list of symbols. It is then reloaded separately, and non lazily, in the LTO context later for the purpose of LTO merging.
If you have performance concern, we need to time this first. I can try to write a test tomorrow.

Right now this choice is penalizing ThinLTO that won't be able to load symbols in parallel and still have to load two times anyway.

lib/LTO/LTO.cpp
280 ↗(On Diff #64068)

Because there may be no regular LTO module available as part of the link.

Client can just create one then...

Module M;
Lto.add(M);
// done

if some of the ThinLTO modules define common symbols then we need to make sure that one of the object files contains the resolved common symbols.

I don't know what you're referring to here?

lib/LTO/LTOBackend.cpp
150 ↗(On Diff #64068)

For ThinLTO at least, we don't support parallel codegen, so we don't need TM to be thread safe to be able to have only one TM created in thinBackend() and reused for both opt and codegen.

I even think that it should be already achievable by having:

  • backend() and thinbackend() always creating the TM.
  • opt() taking the TM as a parameter
  • codegen() taking the TM as a parameter
  • split_codegen() recreating the TM.

This way you only recreate an extra one in splitcodegen() when parallel codegen is enabled. Unless I missed something this is a net win.

pcc updated this revision to Diff 64198.Jul 15 2016, 3:00 PM
  • Avoid creating another TargetMachine
  • Add FIXME
include/llvm/LTO/LTO.h
94 ↗(On Diff #64068)

I agree with you, but until we have a symbol table, there needs to be some way for LTO to access IR entities in the input file.

Added a FIXME for the removal of these friends.

103 ↗(On Diff #64068)

If you have performance concern, we need to time this first. I can try to write a test tomorrow.

Please let me know what you find. Bear in mind that you are the one advocating for a change here (the status quo in the only current client of this interface, the gold plugin, is a single context) so the burden is on you to justify it. If you wish to make a change, you are welcome to propose it separately from this patch.

lib/LTO/LTO.cpp
280 ↗(On Diff #64068)

Client can just create one then...

Not as easily as that. Roughly:

if (/* there are no regular LTO objects, oh, looks like we'll need more API surface */) {
  LLVMContext Ctx;
  Module M(Ctx);
  M.setTargetTriple(??? /* yet more API surface */);
  // add common symbols
  std::string BC;
  raw_str_ostream OS(BC);
  WriteBitcodeToFile(M, OS);
  ErrorOr<unique_ptr<InputFile>> F = InputFile::create(BC);
  if (!F) { /* error handling */ }
  Lto.add(std::move(*F));
}

Seems much simpler to always have a module with index 0.

I don't know what you're referring to here?

As I previously mentioned on this code review:

the special handling for common symbols required in the gold plugin (search for addCommons).

pcc marked an inline comment as done.Jul 15 2016, 3:01 PM
mehdi_amini requested changes to this revision.Jul 15 2016, 3:36 PM
mehdi_amini edited edge metadata.
mehdi_amini added inline comments.
include/llvm/LTO/LTO.h
94 ↗(On Diff #64198)

I agree with you, but until we have a symbol table, there needs to be some way for LTO to access IR entities in the input file.

It is not clear to me why LTO needs to access any IR entity? If you need access an *IR entity* now, how could we don't need access to them tomorrow (<- read: with a symbol table)?

It seems to me that anywhere you're accessing an IR entity directly through Obj right now is a place that should have an API on the InputFile or the Symbol class (like the one that exists: InputFile::getSourceFileName()).

Assuming "Symbol" is abstracting the entries in the symbol table, for instance:

  • Input->Obj->getMemoryBufferRef().getBufferIdentifier(); -> Input->getIdentifier()
  • GlobalValue *GV = Obj->getSymbolGV(Sym.I->getRawDataRefImpl()); -> Symbol *S = Obj->getSymbol(...)
  • GV->hasGlobalUnnamedAddr(); -> needs an API on class Symbol
  • GV->getName(); -> needs an API on class Symbol
  • if (Res.VisibleToRegularObj || (GV && Used.count(GV)) || -> Used contains a GlobalVariable *, how do we go to a symbol table?
  • Module &M = Input->Obj->getModule(); -> used for:
for (GlobalVariable &GV : M.globals())
  if (GV.hasAppendingLinkage())
    Keep.push_back(&GV);

which should iterate on Symbols on the InputFile.

  • LTO::addRegularLto does some symbol resolution using Obj. At this point you probably want to operate on real IR construct anyway, and that's the point where you leave the "InputFile" to go to the IR. I think InputFile should just expose something like a factory std::unique_ptr<Module> parseModule(/* options */);. If it was just for full LTO, that would solve many of the above items as addSymbolToGlobalRes() could take a Module & instead of operating on the private IRObjectFile inside InputFile. However it won't help for ThinLTO.
GlobalValue *GV = Input->Obj->getSymbolGV(Sym.I->getRawDataRefImpl());
if (Res.Prevailing && GV)
  ThinLto.PrevailingModuleForGUID[GV->getGUID()] =
      MBRef.getBufferIdentifier();
  • getGUID() should be implemented on the Symbol directly.
103 ↗(On Diff #64198)

I'm not sure, it seems easy to turn it around: "you are the one proposing an API that is targeted for performance reason with a specific use case (and client) in mind, so the burden should be on you to justify it if you want this to get it in.", otherwise write a cleaner API.

lib/LTO/LTO.cpp
255 ↗(On Diff #64198)

Is is possible for GV to be null here?

280 ↗(On Diff #64198)

Seems much simpler to always have a module with index 0.

"Simpler" from the gold point of view, my point of view is that when adding 1000 ThinLTO modules we'll copy 1000 DataLayout around for no reason.
Which reminds me that I don't expect any object being emitted for this module for a "pure" ThinLTO link.

305 ↗(On Diff #64198)

(Same question here, can GV be null?)

This revision now requires changes to proceed.Jul 15 2016, 3:36 PM
mehdi_amini added inline comments.Jul 15 2016, 6:06 PM
include/llvm/LTO/LTO.h
103 ↗(On Diff #64198)

Running ld64 on the 701 bitcode files that make an LTO build of llc (X86 only), and exiting right before the LTO optimizer starts, i.e. after the LTO merge happens (best of 10 times, on a 4-cores laptop):

  • Parallel resolution (dedicated context, double parsing): 7.362s
  • Sequential resolution (shared context, single parsing): 8.834s
pcc added inline comments.Jul 15 2016, 6:21 PM
include/llvm/LTO/LTO.h
94 ↗(On Diff #64198)

If you need access an *IR entity* now, how could we don't need access to them tomorrow (<- read: with a symbol table)?

The way I see this evolving is that any uses of IR constructs here will be replaced with uses of APIs for accessing the bitcode symbol table (except in places such as LTO::addRegularLto which actually need a Module). The functionality exposed via InputFile and Symbol will remain the primary way by which clients will access the object (i.e. they wouldn't access the symbol table directly).

I would prefer not to expose the properties you propose to expose via InputFile/Symbol because they aren't strictly needed by clients for symbol resolution.

103 ↗(On Diff #64198)

Thanks. To me it isn't clear how much of that is due to parsing as opposed to parallel resolution, but I can accept that this would simplify clients. So I suppose I can try to make the change you suggested.

lib/LTO/LTO.cpp
255 ↗(On Diff #64198)

Yes, if the symbol is defined by inline asm.

280 ↗(On Diff #64198)

I suppose we could avoid creating an empty regular LTO object unless a hook is set.

305 ↗(On Diff #64198)

Ditto

mehdi_amini added inline comments.Jul 15 2016, 7:10 PM
include/llvm/LTO/LTO.h
103 ↗(On Diff #64198)

I chatted with Duncan and thought you may be interested by some history: his recollections of the original motivation for parsing in separate context was not parallelism but the amount of things that leaks to the context, while the modules wouldn't be used if they come from static archives anyway.
Especially at that time global metadata were not lazyloadable.

pcc added inline comments.Jul 15 2016, 7:26 PM
include/llvm/LTO/LTO.h
103 ↗(On Diff #64198)

Interesting. LLD originally used multiple contexts, but now uses a single context because of the reversed tradeoff.
http://llvm.org/viewvc/llvm-project?view=revision&revision=267921

the modules wouldn't be used if they come from static archives anyway.

Huh, I would expect the linker to use the archive symbol table to load only the needed modules.

mehdi_amini added inline comments.Jul 15 2016, 8:40 PM
include/llvm/LTO/LTO.h
103 ↗(On Diff #64198)

LLD originally used multiple contexts, but now uses a single context because of the reversed tradeoff.

Going from 29.86187751s to 29.814533787s does not justify clobbering any API IMO.

pcc added inline comments.Jul 15 2016, 8:43 PM
include/llvm/LTO/LTO.h
103 ↗(On Diff #64198)

Sure. I previously wasn't aware that Rafael had previously measured this.

pcc updated this revision to Diff 64224.Jul 15 2016, 8:46 PM
pcc edited edge metadata.
pcc marked 2 inline comments as done.
  • Remove LTO argument from InputFile::create
  • Only generate code for the regular LTO module when necessary
mehdi_amini added inline comments.Jul 16 2016, 3:47 PM
lib/LTO/LTO.cpp
323 ↗(On Diff #64068)

No std::move on return.

tools/llvm-lto2/llvm-lto2.cpp
164 ↗(On Diff #64068)

Ditto

pcc updated this revision to Diff 64350.Jul 18 2016, 10:49 AM
pcc marked 2 inline comments as done.
pcc edited edge metadata.
  • No std::move on return

Ping - Mehdi, any more comments? I am relying on the follow-on patch D21545 adding support for weak resolution and internalization in the ThinLTO backends.

mehdi_amini added inline comments.Jul 25 2016, 6:54 PM
include/llvm/LTO/LTO.h
126 ↗(On Diff #64350)

What about

return Flags & object::BasicSymbolRef::SF_Global ||
          Flags & object::BasicSymbolRef::SF_FormatSpecific;
lib/LTO/LTO.cpp
281 ↗(On Diff #64350)

This is marked done, but doesn't seem to be?

lib/LTO/LTOBackend.cpp
59 ↗(On Diff #64350)

Deriving an output temp file from a user-supplied *output* is fine, what does not make sense to me is to write stuff where the *input* files are. This is unlike what we usually do (the static archives could be in a read-only system directory for instance, this is quite common on our setup).

You can maintain whatever behavior Gold currently has by passing an option to addSaveTemps maybe. Right now the API being addSaveTemps(std::string OutputFileName) I don't expect files to be written anywhere unexpected.

Also the doxygen is pretty clear about what should be expected from this API:

/// This is a convenience function that configures this Config object to write
/// temporary files named after the given OutputFileName for each of the LTO
/// phases to disk. A client can use this function to implement -save-temps.
pcc updated this revision to Diff 65465.Jul 25 2016, 8:11 PM
  • Refresh
  • Address review comments
  • Implement linked objects file feature
lib/LTO/LTO.cpp
281 ↗(On Diff #64350)

See the first three lines of LTO::run.

lib/LTO/LTOBackend.cpp
59 ↗(On Diff #64350)

I agree with you that we should move to a temp file naming scheme based on output files, however like Teresa I think that should be discussed separately from this patch. I don't think we need a flag either, whatever we decide to do here should apply to all linkers.

Added a FIXME here to change the naming scheme.

pcc marked an inline comment as done.Jul 25 2016, 8:11 PM
mehdi_amini accepted this revision.Jul 31 2016, 3:03 PM
mehdi_amini edited edge metadata.

Last comments, we should be able to iterate in-tree after that.

include/llvm/LTO/LTO.h
310 ↗(On Diff #65465)

Should be RegularLTO I think.

319 ↗(On Diff #65465)

Similarly, should be ThinLTO

368 ↗(On Diff #65465)

addRegularLTO

370 ↗(On Diff #65465)

addThinLTO

373 ↗(On Diff #65465)

runRegularLTO

374 ↗(On Diff #65465)

runThinLTO

lib/LTO/LTO.cpp
243 ↗(On Diff #65465)

HasThinLTOSummary

281 ↗(On Diff #65465)

I see, I was looking for an empty LTO module (RegularLto.CombinedModule) instead of object (as in produced object).

316 ↗(On Diff #65465)

Is this test what you intend? It seems reversed (if so, then it's probably not tested).

318 ↗(On Diff #65465)

This is still gonna copy a few SmallVector + a couple of string + a few other fields. And it's gonna happens for each and every ThinLTO module. This is just not the right place for that. Adding a triple/datalayout for LTO should not be handled here, it is just not the right place. LTO::run could handle it by poking at ThinLto.ModuleMap if needed.

Similarly, RegularLto.CombinedModule should not be created unless explicitly added/requested by the linker (i.e. not call to make_unique<Module>("ld-temp.o", Ctx); in the ctor).

358 ↗(On Diff #65465)

Doc the above tests.

lib/LTO/LTOBackend.cpp
60 ↗(On Diff #65465)

It seems we are not looking at this from the same angle: I was looking at it as a *new* API, not a refactoring of whatever the gold plugin was doing. So I can't see "this is gold's behavior" as a valid motivation here (if you want this to be NFC from the gold point of view, the gold-plugin can be patched in the first place).

This revision is now accepted and ready to land.Jul 31 2016, 3:03 PM

pcc asked me to take over the patch since he is out on vacation this month. Responses below. I have made most of the suggested changes, but not a couple for the reasons stated. Let me see if I can upload the new patch even though pcc created this one.

include/llvm/LTO/LTO.h
310 ↗(On Diff #65465)

Done

319 ↗(On Diff #65465)

Done

368 ↗(On Diff #65465)

Done

370 ↗(On Diff #65465)

Done

373 ↗(On Diff #65465)

Done

374 ↗(On Diff #65465)

Done

lib/LTO/LTO.cpp
243 ↗(On Diff #65465)

Done

316 ↗(On Diff #65465)

I think what was meant here was to check if the RegularLTO.CombinedModule's target triple was still empty, and if so set it. I have changed it to that.

318 ↗(On Diff #65465)

It only does the copy once with the change I made, so there isn't any duplication in the copying anymore. This is the simplest place to set it based on an added ThinLTO module.

I looked at delaying the creation of the CombinedModule, however, it turns out we frequently need it since even if we don't add any regular LTO files, we typically have callback hooks into the linker defined where the linker could add its own resolved symbols to the combined module, in which case it needs to be valid. I don't think we gain much by lazily creating the empty Module.

358 ↗(On Diff #65465)

Done

lib/LTO/LTOBackend.cpp
60 ↗(On Diff #65465)

I removed this path so that the OutputFileName is always used, so it now matches the doxygen comment for the API.

tejohnson commandeered this revision.Aug 10 2016, 5:42 PM
tejohnson edited reviewers, added: pcc; removed: tejohnson.

Taking this over so I can upload new patch.

tejohnson updated this revision to Diff 67642.Aug 10 2016, 5:43 PM
tejohnson edited edge metadata.
  • Address review comments
mehdi_amini added inline comments.Aug 10 2016, 5:45 PM
lib/LTO/LTO.cpp
318 ↗(On Diff #67642)

The ", we typically have callback hooks into the linker defined where the linker could add its own resolved symbols to the combined module" is not clear to me at all (LTOCodeGenerator will never do that AFAIK for example).
(I understand that Gold does something with "commons" at that time, but haven't add time to figure why it is needed)

(Note: I already approved this patch as good enough to be iterated on in-tree, so feel free to commit)

mehdi_amini added inline comments.Aug 10 2016, 5:50 PM
lib/LTO/LTO.cpp
318 ↗(On Diff #67642)

To be more explicit: I don't like this behavior because it creates a relatively strong coupling between the linker and the plugin. The linker changes the module in an unpredictable way that can conflict with assumption that the plugin implementation could make. It makes it harder to follow invariant in the plugin implementation, and makes it easy to break the client of the API (the linker).

tejohnson added inline comments.Aug 11 2016, 8:05 AM
lib/LTO/LTO.cpp
318 ↗(On Diff #67642)

True, this is the case now because gold is the only user. I went ahead and committed (temporarily, as it turned out, will recommit after fixing some bot failures) so that I can also get pcc's follow-on patch in that I need. But will also work up a patch to make the creation of the combined module lazy. I'm not sure how to address your other concern about the linker changing the module in unpredictable ways, but the API does document that the linker may add symbols to the combined module via the callback hooks.

lib/LTO/LTOBackend.cpp
61 ↗(On Diff #67642)

I didn't notice that this change was causing an issue since I had old output files around that caused tests to pass, but this caused a bot failure due to missing output files. The problem is not just that the names are different, but that they become difficult to correlate to the corresponding input source name. E.g. in the gold/X86/thinlto_linkonceresolution.ll test, where we have "%gold ... -o %t3.o %t2.o %t.o", the old .opt.bc temp files were named %t.o.4.opt.bc and %t2.o.4.opt.bc, but with the change I made become %t3.o.1.4.opt.bc and %t3.o.2.4.opt.bc. It isn't at all obvious which output file corresponds to which input file (numbering will depend on the ModuleMap iteration order).

What I did was to revert the change I had made, but add a new bool flag parameter on addSaveTemps (UseInputModulePath) that is set to true by gold and will provoke the old behavior. We can iterate on a better solution. Probably pass in a (temp) directory name and output all the files in a tree rooted there (note that you could have same named module identifiers at different paths, so you can't just disambiguate by appending the basename of the input module).

This revision was automatically updated to reflect the committed changes.
rogfer01 added inline comments.
llvm/trunk/tools/gold/gold-plugin.cpp
95

I think this one should be size_t as lto::InputFile::Symbol::getCommonSize returns size_t and then the std::max in line 629 fails to build in targets where size_t is unsigned int.

Can you confirm if changing to size_t makes sense here? It does fix the build indeed.

If this is OK I can commit the change.