Depends on D26951
The error message could be a bit more specific here.
We load the first BitcodeModule, should we assert that this is the only one? How do we know this is the right one?
Since we have the two, it seems worth documenting these.
This is a pretty annoying API with ResI as an InOut parameter: at least document it.
Add a comment to explain why it is out-of-line.
Some high level one or two sentences commenting what is going to happen below in the loop would be welcome I think.
Is there an expectation that no symbol can be duplicated in the various modules?
You can create an InputFile where we wouldn't find any module, if you don't want to support that we should detect it in the create() and return an error there. Otherwise this is UB to be called on a valid created InputFile
Add a comment to explain why it is out-of-line.
We'll arrive here from for (InputFile::InputModule &IM : Input->Mods) in LTO::add(), how are we distinguishing BM.getModuleIdentifier() when there are multiple modules per file?
I added some code that makes sure that we load the right module.
If you mean that the modules may not both contain a definition of the symbol, then yes. The entity creating these bitcode files is expected to adhere to this policy. (The same applies to definitions of comdats, so the code on lines 215-221 below remains the same.)
It's possible to have a defined symbol in one module and a duplicate undefined symbol with the same name in another module; the situation is similar to module inline asm right now (i.e. PR30396). Whatever solution we come up with for that problem should also apply to this one.
Added a check to create().
BM.getModuleIdentifier() will be the same for each BitcodeModule in the same input file by construction, so we can just test the return value of insert here.
(I feel obliged to point out that this is yet another scenario where (path, byte slice) pairs would work better as a representation of ThinLTO modules.)
What if I.first was already present in the map? It seems the same case as below for ThinLTO where we should check that the returned value of insert().second .
What is the other scenario? I don't remember what was specific to ThinLTO for this?
Also a pair path/byte slice doesn't fit the API in general, does it? We don't necessarily have a "file".
That doesn't seem like it can happen. The range for loop starting on line 758 enumerates elements of a map, so each I.first() will be unique. We also break out of this loop as soon as we see a module with a summary.
The main other scenario is (non-thin) archive files, which is already a problem as we don't handle that correctly in the distributed case, and we have workarounds in both the gold plugin and lld to give -save-temps temporaries appropriate names. This is ThinLTO-specific because we don't create temporaries for each input file under regular LTO.
True, but it's also the case that we don't necessarily have a "file" now and we're already passing paths around. I imagine that clients which don't use files could just make something up for the offsets as they would make something up for file paths.
I think that could work as well.
In the C API (and thus in ld64), the expectation is that every input buffer has to be provided with a unique id. It happens that this ID is usually the path on file (with libFoo.a(member.o) when a static archive is involved).
Having multiple modules per file is new though and would break this "unique ID" provided by the linker, we could suffix it when loading each individual bitcode/module though.
The --save-temps is always using integer right now I believe (files are 1.thin.o, 2.thin.o, etc.).
That's just us being inconsistent, I tried to express "ModuleID" instead of path as much as possible.
I think @davide had a case where an archive had two members with the same name, and they were only distinguishable by offset.
Not always: http://llvm-cs.pcc.me.uk/lib/LTO/LTOBackend.cpp#77
We probably want to overhaul how the filenames look for save-temps, but I think in the end they should contain the module ID in some form.
Yes, thin-archivecollision.ll in lld/test/lto is an example, but it could be even worse because the two members can be part of the same archive. I find it weird that's possible to create an archive with two distinct members and the same name, but apparently ar consider this a legitimate operation.
I remember this, I wouldn't be too much concerned *here* with this and push to the client the responsibility to provide unique IDs. (Nothing prevent the linker to build an id that always include the offset for instance)
Well, in the distributed case some component needs to understand the module IDs in order to distribute the work properly. Part of that may involve translating the module IDs into "file paths" (which may be actual file paths or conceptual ones). If we can arrange to use the same ("file path", offset) scheme in all linkers, that component can be shared between linkers.
But to a certain extent this is all hypothetical, I think I'd want to prototype before being sure that this is the right design.