Passing llvm::Linker::Flags::InternalizeLinkedSymbols to the IR linker causes the linked symbols to be internalized, including the global variables llvm.global_ctors, llvm.global_dtors, llvm.used and llvm.compiler.used. This results in the module not being valid anymore. In particular it triggers the assertion "invalid linkage for intrinsic global variable" which checks that these GVs either don't have an initializer or have appending linkage.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Test case?
I'm not familiar with when the InternalizeLinkedSymbols is passed to the module linker. A quick search shows that it is typically passed for GPU code. What is the use case?
Can the users of this instead run llvm::internalizeModule on the linked module? Note that this will invoke InternalizePass::internalizeModule, which in turn already has handling for these special values (see the AlwaysPreserved inserts. This routine also adds things in the llvm.used set to the AlwaysPreserved set, which is the correct thing to do, and is missing from the code in the ModuleLinker that is force internalizing everything currently.
Won't the pass internalize the whole module rather than only the linked values? Would it make sense to extend the InternalizePass::internalizeModule API to take a list of used global variables and call it from the module linker to take care of the internalization?
I'll add a test if we move forward with the change. My use case is not GPU related, rather something LTO like where we have bitcode modules with clashing function/symbol names but different implementations. The problem is that d we don't know which those will be before linking, so we want to internalize them.
llvm::internalizeModule takes a callback that can be used to indicate that a GV should not be internalized (MustPreserveGV), so presumably you could use that. I think the right way to do this is from the ModuleLinker clients, not from in the ModuleLinker itself (i.e. remove the InternalizeLinkedSymbols flag and related handling). In fact, you would end up with a circular dependence calling this from the ModuleLinker, since internalizeModule is in Transforms/IPO which has a dependence on Linker.
I'll add a test if we move forward with the change. My use case is not GPU related, rather something LTO like where we have bitcode modules with clashing function/symbol names but different implementations. The problem is that d we don't know which those will be before linking, so we want to internalize them.
So once they are internalized, the clashing symbols are renamed automatically as they are linked in, I assume? If they are external to start with, how would the linking work in non-LTO mode?
I can see how this feature can be useful, but as Teresa mentioned it seems easier from a client point of view to:
- Collect the list of symbol in the current module in a Set
- Call the IRLinker
- Invoke internalizeModule using the Set from 1) as "preservedGlobals".
That said the IRLinker could do this itself, so that llvm::Linker::Flags::InternalizeLinkedSymbols would still be exposed, if is wasn't for the circular dependency mentioned above. We could solve the dependency "easily" by injection: changing the IRLinker to take a callback to perform internalization for stage3. That would allow the client to pass in a pointer to internalizeModule that the IRlinker could use during 3) above.
@tejohnson The symbols are local, so it doesn't cause an issue with regular linking. It only happens when we throw all the modules together.
@mehdi_amini: Is this what you had in mind?
I've run the tests and this implementation doesn't break any existing tests regarding linking with the internalize flag.
I'm missing something - if the symbols are already local, why is internalization needed? A test case would help.
But this approach looks much better, thanks!
@mehdi_amini: Is this what you had in mind?
I've run the tests and this implementation doesn't break any existing tests regarding linking with the internalize flag.
There's a user in clang (under OPT_mlink_cuda_bitcode), I would think it would lose the internalization without providing a callback. Maybe there is no test for this case?
lib/Linker/LinkModules.cpp | ||
---|---|---|
110 ↗ | (On Diff #91192) | Think the default param should be "{}" for consistency with linkInModules and with check below for a non-null InternalizeCallback. |
554 ↗ | (On Diff #91192) | Presumably the flag can go away, it is subsumed by the presence of the callback. There are a couple of uses in clang itself that will need to change to pass in a callback. |
tools/llvm-link/llvm-link.cpp | ||
44 ↗ | (On Diff #91192) | Don't clang format the whole file with your patch. You can commit a separate patch with just the clang formating, or just clang format your changes. |
Thanks for the feedback!
I have a test case on my work computer, I'll add it tomorrow.
But this approach looks much better, thanks!
@mehdi_amini: Is this what you had in mind?
I've run the tests and this implementation doesn't break any existing tests regarding linking with the internalize flag.
There's a user in clang (under OPT_mlink_cuda_bitcode), I would think it would lose the internalization without providing a callback. Maybe there is no test for this case?
That's my bad, I only ran the LLVM tests. I've removed the flag and updated the code in clang (D30792), which is now also passing all tests. I made sure there was an actual test for this by having the callback always return false, which caused some to fail.
Thanks, a couple more questions.
tools/llvm-link/llvm-link.cpp | ||
---|---|---|
326 ↗ | (On Diff #91218) | Why is this needed since linkInModule has a default for the InternalizeCallback parameter? |
329 ↗ | (On Diff #91218) | Previously the internalize flag was set before invoking this function from main(). With this being set after the first linkInModule it seems like a behavior change, or am I missing something? |
tools/llvm-link/llvm-link.cpp | ||
---|---|---|
326 ↗ | (On Diff #91218) | I'm sorry, I don't understand the question. The function is called with only two parameters so the default argument is used for the callback. No callback means no internalization, so thats what we want in the else branch? Am I overlooking something? |
329 ↗ | (On Diff #91218) | Indeed, however on line 275 it is cleared on the first iteration by AND'ing it with overrideFromSrc flag. |
tools/llvm-link/llvm-link.cpp | ||
---|---|---|
99 ↗ | (On Diff #91192) | Please don't mix these formatting change. Did you run clang-format on the full file to have these? Usually git clang-format will format only the part of the code you change. |
319 ↗ | (On Diff #91192) | Usually we avoid using a Pass outside of the PassManager, can you just call internalizeModule with your callback? Also any reason you're taking the StringSet by value instead of const ref? |
tools/llvm-link/llvm-link.cpp | ||
---|---|---|
326 ↗ | (On Diff #91218) | You are right, nevermind this comment! Looked at it too fast. |
tools/llvm-link/llvm-link.cpp | ||
---|---|---|
329 ↗ | (On Diff #91218) | Ah, that's what I was missing. Please add a comment about why this isn't being applied to the first iteration. |
Is the callback really necessary? I don't see anything fundamentally wrong with the first version of the patch.
The fix can be simplified to
if (P.first().startswith("llvm.")) continue;
That is two lines of code. Even if we simplified the internalize pass in the same way, we should not add this much complexity just to avoid duplicating two lines.
I wouldn't want to see this, this is still duplicating logic. I'd be OK with a helper function`legalToInternalize(GlobalValue &)` that would factor out the logic to decide this.
Sure that is simple from a # lines standpoint, but the internalizer already had all the right logic, so we wouldn't have had this issue if we used internalizeModule to start with. My original suggestion was to move it out of here completely and have the clients invoke internalizeModule directly if they want to internalize, since I'm not sure the module linker should be in the business of doing internalization.
Works for me, I guess.
I'm not sure about that either. We may want to do something like what you propose, but it seems orthogonal to fixing the bug.
- Feedback from Teresa and Mehdi
Regarding Peter's comment: While I like the idea of the legalToInternalize, I'm not convinced that it's really better than the callback. Most of the code in InternalizePass is concerned with deciding which GVs are legal to be internalized. Performing the actual internalization is only a small part, so why not keep a unified interface like it is today? Either way you'll need info from the linker about what symbols to internalize, and the callback nicely combines (1) indicating whether it is necessary and (2) performing it.
PS: I'm working on a test case.
I remembered the other reason I wanted the internalizer to do this - if we have a llvm.used, then likely there is a GV in its aggregate that shouldn't be internalized. The internalizer will handle this appropriately. It also can't be (efficiently) encapsulated in a per-GV "legalToInternalize" helper.
I was thinking more last night about just having the clients call internalizeModule() after module linking (directly, not from a callback) , but the issue becomes that the symbols that were previously in the combined module shouldn't be internalized. So you either need a mechanism for extracting the final value of ValuesToLink from the module linker, and only internalize those, or you need to build a set of defined values in the combined module before calling linkInModule, and use it to disallow internalization of those symbols. I think the callback approach is better, since the module linker has the best knowledge of *which* GVs should be candidates for internalization (aside from the legality of internalization, which is what is being fixed by using internalizeModule).
PS: I'm working on a test case.
Please include one that has a GV listed in a llvm.used or llvm.compiler.used, and make sure it isn't internalized.
Will take a look at the new version of the patch in a little bit. Thanks!
clang/include/clang/CodeGen/CodeGenAction.h | ||
---|---|---|
40 ↗ | (On Diff #91525) | Something I completely missed while reviewing D30792 - I don't see how this is ever getting set. It looks like this line would need to be modified: (From http://llvm-cs.pcc.me.uk/tools/clang/lib/CodeGen/CodeGenAction.cpp#819) LinkModules.push_back( {std::move(ModuleOrErr.get()), F.PropagateAttrs, F.LinkFlags}); That's where the values from BitcodeFileToLink, which has the new Internalize flag being set CompilerInvocation.cpp, should be used to initialize the new LinkModule object. It looks like there are tests in tools/clang/CodeGenCUDA/ that are checking to see if internalization happened properly and should fail. Ah - because of the field ordering, I think the LinkFlags value is essentially being applied to the Internalize flag, so it may be getting lucky on that front. But then I would think the LinkFlags field would be uninitialized, and the desired flag of llvm::Linker::Flags::LinkOnlyNeeded not being set in the CUDA case. Can you look at why test/CodeGenCUDA/link-device-bitcode.cu, which appears to test for both internalization and the only needed linking, isn't failing? If the test is not sufficient, please augment it so that it is failing due to this problem. |
llvm/test/Linker/link-flags.ll | ||
25 ↗ | (On Diff #91525) | Suggest making the type of the new @foo different, so it is clear which one is internalized vs not. |
Addressed code review comments from @tejohnson. Thanks again for reviewing!
clang/include/clang/CodeGen/CodeGenAction.h | ||
---|---|---|
40 ↗ | (On Diff #91525) | Thanks! I remember going over this line and telling myself "don't forget to add the new field here" but apparently I still forgot. The test wasn't triggered because I didn't have the NVPTX target enabled. With all target enabled it failed, as expected. |