This is an archive of the discontinued LLVM Phabricator instance.

[MergeFunctions] Preserve symbols used llvm.used/llvm.compiler.used
ClosedPublic

Authored by Amanieu on Jun 14 2022, 8:13 AM.

Details

Summary

llvm.used and llvm.compiler.used are often used with inline assembly that refers to a specific symbol so that the symbol is kept through to the linker even though there are no references to it from LLVM IR. This fixes the MergeFunctions pass to preserve references to these symbols in llvm.used/llvm.compiler.used so they are not deleted from the IR. This doesn't prevent these functions from being merged, but guarantees that an alias or thunk with the expected symbol name is kept in the IR.

Diff Detail

Event Timeline

Amanieu created this revision.Jun 14 2022, 8:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 8:13 AM
Amanieu requested review of this revision.Jun 14 2022, 8:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 8:13 AM
nikic added inline comments.Jun 14 2022, 8:26 AM
llvm/lib/Transforms/IPO/MergeFunctions.cpp
860

So does this just leave the old function behind, effectively not doing a merge? I think what you actually want is to go into the alias/thunk code path in this case.

llvm/test/Transforms/MergeFunc/merge-used.ll
7

Please include full check lines (update_test_checks.py with --check-globals flag), it's unclear what the result here is.

Amanieu updated this revision to Diff 436814.Jun 14 2022, 9:21 AM

Updated test.

llvm/lib/Transforms/IPO/MergeFunctions.cpp
860

The writeThunkOrAlias call below will replace the old function with a thunk or alias. Adding the old function back to llvm.used will cause the G->use_empty() condition below to fail which prevents the function from being removed from the IR.

nikic added inline comments.Jun 14 2022, 9:58 AM
llvm/lib/Transforms/IPO/MergeFunctions.cpp
860

I see. In that case I'd suggest to move a condition (on a single combined set) next to G->hasGlobalUnnamedAddr(), so we only do replaceDirectCallers() replacement. This might replace the function in slightly less cases, but I'd prefer that over the explicit manipulation of the used globals list here.

Amanieu updated this revision to Diff 436888.Jun 14 2022, 12:27 PM
Amanieu marked an inline comment as done.
Amanieu marked 2 inline comments as done.
nikic accepted this revision.Jun 14 2022, 12:39 PM
nikic added reviewers: MaskRay, efriedma.
nikic added subscribers: MaskRay, efriedma.

This looks good to me. Maybe @MaskRay or @efriedma want to take a look: I'd mainly like confirmation that the current transform (merging an unnamed addr, internal linkage function that is part of llvm.used by replacing all uses) is indeed illegal, because I'm a bit hazy on the precise guarantees in this area.

llvm/test/Transforms/MergeFunc/merge-used.ll
15

Might make sense to including another function that calls @a, just to show that that call does get replaced with @b and doesn't call into the thunk.

This revision is now accepted and ready to land.Jun 14 2022, 12:39 PM

We can't rename or erase a function used by @llvm.used; the symbol has to exist for use by inline asm etc. So this is correct in that respect.

llvm/lib/Transforms/IPO/MergeFunctions.cpp
841

I'm a bit suspicious of the hasGlobalUnnamedAddr() check here. hasGlobalUnnamedAddr() generally implies it's legal to merge two constants/functions, but we don't actually figure out if we're going to perform an actual merge at this point; that doesn't happen until we try to discard it, or call writeThunkOrAlias. (unnamed_addr only allows rewriting all the uses of a variable/function at once, not just some of them. Otherwise you end up with weird inconsistencies where a value isn't equal to itself.)

Amanieu updated this revision to Diff 436964.Jun 14 2022, 3:44 PM
Amanieu added inline comments.
llvm/lib/Transforms/IPO/MergeFunctions.cpp
841

I think the code is valid as it is:

  • If hasGlobalUnnamedAddr() == true then we replace all uses of the function with replaceAllUsesWith.
  • Otherwise we only replace direct caller, which is equivalent to inlining the thunk/alias.
efriedma added inline comments.Jun 14 2022, 4:18 PM
llvm/lib/Transforms/IPO/MergeFunctions.cpp
841

replaceAllUsesWith() only replaces uses in the current module, not uses in other modules. So despite the name, it's not replacing "all" the uses in the sense we need here.

If the function has internal linkage, we're fine because we're actually rewriting all the uses. If we replace the function with an alias, we're fine because every use refers to the same address. Otherwise, there's a problem: we're replacing the uses in the current module, but not the uses in other modules.

(Making sure unnamed_addr works correctly in this respect is more likely to be a practical issue with constant variables, as opposed to functions, but I'd rather keep this consistent if possible.)

MaskRay added inline comments.Jun 14 2022, 9:54 PM
llvm/test/Transforms/MergeFunc/merge-used.ll
34

Add a comment that this is preserved due to llvm.compiler.used

MaskRay added inline comments.Jun 14 2022, 10:25 PM
llvm/lib/Transforms/IPO/MergeFunctions.cpp
841

hasGlobalUnnamedAddr was from D34805. So what's the recommendation here? Further restrict the condition to local linkage functions?

llvm/test/Transforms/MergeFunc/merge-used.ll
2

Use -passes=mergefunc

nikic added inline comments.Jun 15 2022, 1:34 AM
llvm/lib/Transforms/IPO/MergeFunctions.cpp
841

Trying to understand what the concern here is based on an example. Would it be something like this https://llvm.godbolt.org/z/e3WYWr69P? Where in this module we observe that @a == @b, and then we have another module with linkonce_odr a,b (and lets say those definitions get picked by the linker) and we might observe @a != @b in that module?

efriedma added inline comments.Jun 15 2022, 2:20 PM
llvm/lib/Transforms/IPO/MergeFunctions.cpp
841

Trying to understand what the concern here is based on an example.

The fundamental invariant we want to ensure here is basically @a == @a. Or more generally, "%a == %a" for any pointer that isn't poison/undef. This should remain true even if one side of the comparison comes from a different module.

That also implies @a == @b is the same in all modules.

For a real-world example of how things can explode if unnamed_addr is mishandled in this respect, see https://github.com/llvm/llvm-project/issues/32127 .

hasGlobalUnnamedAddr was from D34805. So what's the recommendation here? Further restrict the condition to local linkage functions?

Local linkage should be safe to optimize like this. Situations where canCreateAliasFor() is true should also be safe. (Assuming the global isn't referred to by llvm.used.)

Amanieu marked an inline comment as done.Jun 15 2022, 6:26 PM
Amanieu added inline comments.
llvm/lib/Transforms/IPO/MergeFunctions.cpp
841

I'm not sure what solution you are suggesting here exactly, but it seems to me that this issue was already present beforehand even without this change. Should this be handled separately so this bugfix can land?

nikic added inline comments.Jun 16 2022, 1:11 AM
llvm/lib/Transforms/IPO/MergeFunctions.cpp
841

Yes, I think this is fine to land, these problems aren't really related.

I don't really see a way to resolve @efriedma's issue short of restricting to local linkage. Even if canCreateAliasFor() return true that alias will simply get dropped later, if the function has discardable linkage.

efriedma added inline comments.Jun 16 2022, 10:56 AM
llvm/lib/Transforms/IPO/MergeFunctions.cpp
841

Sure, I won't object to landing this as-is.

This revision was landed with ongoing or failed builds.Jun 16 2022, 1:36 PM
This revision was automatically updated to reflect the committed changes.