Page MenuHomePhabricator

[ThinLTO] Make weak data symbols prevailing when they're visible to regular object
ClosedPublic

Authored by evgeny777 on Apr 29 2019, 2:15 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

evgeny777 created this revision.Apr 29 2019, 2:15 AM

I'm not convinced this is the right approach. Why isn't r323633 working here ("[ThinLTO] - Stop internalizing and drop non-prevailing symbols")?

It isn't working, because llvm::computeDeadSymbols keeps non-prevailing linkonce_odr and weak_odr symbols live.
It is possible to check for function summary when doing this to fix the issue:

if (S->linkage() == GlobalValue::AvailableExternallyLinkage ||
    ((S->linkage() == GlobalValue::WeakODRLinkage ||
    S->linkage() == GlobalValue::LinkOnceODRLinkage) && isa<FunctionSummary>(S->getBaseObject())))
  KeepAliveLinkage = true;

however this will prevent optimizing out linkonce_odr and weak_odr constants

Looks like I have another idea how this can be fixed. In addition to 'ReadOnly' we can introduce 'CompileTimeConstant' flag in GlobalVarSummary.
Not marking linkonce_odr and weak_odr data symbols which are preserved and are not 'CompileTimeConstant' live will fix PR41645.
Also such flag allows exploring some new optimization opportunities:

  • Compile time constants are read only by definition, so we can always import them, even if they're referenced by some GV initializer.
  • We can import readonly GV with initializer if all references from that initializer are to compile time constants

I don't see this happening in legacy thinLTO code generator but I need to do more digging to see what is the difference. For legacy API, the static variable is not internalized or turned into available_externally.

I agree this doesn't look like the correct fix, especially the check !Sym.isExecutable() which doesn't look correct to me. CompileTimeConstant also sounds weird because it is not something used in the linker semantics. Maybe using ReadOnly bits in summary or UnnamedAddr bits in resolution is a better approach but I am not sure if that is correct in all places. There is also a bit in GlobalResolution VisibleOutsideSummary which sounds like is designed to solve this problem?

I agree this doesn't look like the correct fix, especially the check !Sym.isExecutable() which doesn't look correct to me

The problem is that linkonce_odr functions and data should be IMO treated differently. The linkonce_odr linkage implies linker leaves only one copy of the symbol in the final image.
However it's not really a problem having multiple internal copies of linkonce_odr function because all copies follow one definition rule. This fact allows ThinLTO to efficiently optimize such function calls.
Situation however is different with linkonce_odr objects, because it's wrong to have multiple copies of linkonce_odr object because reads and writes supposed to access same memory will actually become
inconsistent. However there is still a nuance: if object is a compile time constant there can't be any write access to it, so we can safely internalize it.

The change is probably not correct anyway just because making prevailing something which isn't originally marked as prevailing seems like a bad idea.

Maybe using ReadOnly bits in summary

ReadOnly can be used only after computeDeadSymbols is executed, so IMO not gonna work

or UnnamedAddr bits in resolution is a better approach

How this can help?

There is also a bit in GlobalResolution VisibleOutsideSummary which sounds like is designed to solve this problem?

Like I said, the problem, as I understand it, is that linkonce_odr function visible outside summary can be internalized, but non-constant object can't.

Maybe using ReadOnly bits in summary

ReadOnly can be used only after computeDeadSymbols is executed, so IMO not gonna work

We decide what to internalize after computeDeadSymbols, so why wouldn't we be able to use this to block internalization of linkonce_odr global variables (when !ReadOnly)?

I agree this doesn't look like the correct fix, especially the check !Sym.isExecutable() which doesn't look correct to me

The problem is that linkonce_odr functions and data should be IMO treated differently. The linkonce_odr linkage implies linker leaves only one copy of the symbol in the final image.
However it's not really a problem having multiple internal copies of linkonce_odr function because all copies follow one definition rule. This fact allows ThinLTO to efficiently optimize such function calls.
Situation however is different with linkonce_odr objects, because it's wrong to have multiple copies of linkonce_odr object because reads and writes supposed to access same memory will actually become
inconsistent. However there is still a nuance: if object is a compile time constant there can't be any write access to it, so we can safely internalize it.

I think you are making the assumption of the input is C++ and using the one definition rule of C++ to guide the optimization which I don't think it is appropriate here.
If there is some special semantics behind the symbol, it should be expressed in the IR. I could be missing something but I don't think LLVM IR has such semantics behind functions vs. global variables. Maybe you are trying to use function type to infer functions are constants?

The change is probably not correct anyway just because making prevailing something which isn't originally marked as prevailing seems like a bad idea.

Maybe using ReadOnly bits in summary

ReadOnly can be used only after computeDeadSymbols is executed, so IMO not gonna work

or UnnamedAddr bits in resolution is a better approach

How this can help?

unnamed_addr means the address doesn't matter and it should be one of the important reason behind whether something can be internalized or not. If the address of a linkonce_odr function is taken and later used to compare in a different module, you can't internalize the function at all.
I think if it is unnamed_addr constants, it is always safe to internalize but I am not sure just the unnamed_adder is strong enough to indicate that or if we will miss too much assume everything if we skip everything doesn't have unnamed_addr because it might affect linker resolution.

There is also a bit in GlobalResolution VisibleOutsideSummary which sounds like is designed to solve this problem?

Like I said, the problem, as I understand it, is that linkonce_odr function visible outside summary can be internalized, but non-constant object can't.

evgeny777 updated this revision to Diff 198285.May 6 2019, 8:20 AM

We decide what to internalize after computeDeadSymbols, so why wouldn't we be able to use this to block internalization of linkonce_odr global variables (when !ReadOnly)?

Sounds like a good idea. I've updated the review.

tejohnson accepted this revision.May 9 2019, 9:40 PM

Fix LGTM. A couple of requests for comments below. Also, please update title of patch, and add a description to the patch summary.

lib/LTO/LTO.cpp
383 ↗(On Diff #198285)

add comment

test/ThinLTO/X86/weak_externals.ll
9 ↗(On Diff #198285)

Add comment about what this test is checking for.

This revision is now accepted and ready to land.May 9 2019, 9:40 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2019, 4:50 AM