This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Internalize read only globals
ClosedPublic

Authored by evgeny777 on Jul 16 2018, 12:09 AM.

Details

Summary

One of annoying problems in ThinLTO is that variable promotion can prevent optimiser from constant folding/propagation. Consider
following C program

main.c

int main() {
  int foo();
  return foo();
}

foo.c

#include <stdlib.h>

static int gFoo = 1;

int foo() {
  return gFoo;
}

void bar() {
  gFoo = rand();
}

Note that bar() is dead, so variable gFoo is never really modified. However promotion of gFoo to hidden global makes it impossible for optimizer to mark it constant and fold it afterwards.

To overcome this problem I suggest to introduce concept of "constant reference" which implies all accesses to a given global from a given function are non-volatile loads. With this we can determine which variables will become read-only after DCE and convert them to constants with a special LLVM pass.

This patch implements first part of this approach. It doesn't have good test cases yet - for now I'm just interested of your opinion.

Diff Detail

Event Timeline

evgeny777 created this revision.Jul 16 2018, 12:09 AM

Awesome, thanks! This was future work that we didn't have the bandwidth to address. A few comments/suggestions:

I think it would be cleaner/clearer to encode this info along with the Ref edges. Similar to how there is callee info for each call edge, we should have a ref info that for now can just be this bit.

When importing a variable definition when the ref has this bit set, it can be imported as a local copy, without promoting the name/linkage type. That will avoid the need to do any modification to the optimization passes. On the exporting side, we could do prevent the promotion if we knew that all modules importing a reference also imported the referenced variable definition. It does however look like we should be importing all non-interposable linkage constant variables that are referenced, which should be all we care about for this anyway.

Will we miss some cases if the reference is on something like a bitcast that feeds a non-volatile load?

When importing a variable definition when the ref has this bit set, it can be imported as a local copy, without promoting the name/linkage type. That will avoid the need to do any modification to the optimization passes. On the exporting side, we could do prevent the promotion if we knew that all modules importing a reference also imported the referenced variable definition. It does however look like we should be importing all non-interposable linkage constant variables that are referenced, which should be all we care about for this anyway.

What if the constant is really big? Looks like the importing decision has to be made by the thin linker directly. You can not import the "constant" global as something like "available_externally constant" because the original copy might not be visible from the original module (just like the example), unless you want to modify the original copy to be "external hidden".

When importing a variable definition when the ref has this bit set, it can be imported as a local copy, without promoting the name/linkage type. That will avoid the need to do any modification to the optimization passes. On the exporting side, we could do prevent the promotion if we knew that all modules importing a reference also imported the referenced variable definition. It does however look like we should be importing all non-interposable linkage constant variables that are referenced, which should be all we care about for this anyway.

What if the constant is really big? Looks like the importing decision has to be made by the thin linker directly. You can not import the "constant" global as something like "available_externally constant" because the original copy might not be visible from the original module (just like the example), unless you want to modify the original copy to be "external hidden".

Not sure I understand your question, can you clarify the concern? Right now, computeImportForReferencedGlobals during the thin link we import any referenced variable, unless it has interposable linkage or references other summaries (in which case it is not a constant). The imported variables are imported as available externally definitions. The original definition is currently promoted to external hidden.

When importing a variable definition when the ref has this bit set, it can be imported as a local copy, without promoting the name/linkage type. That will avoid the need to do any modification to the optimization passes. On the exporting side, we could do prevent the promotion if we knew that all modules importing a reference also imported the referenced variable definition. It does however look like we should be importing all non-interposable linkage constant variables that are referenced, which should be all we care about for this anyway.

What if the constant is really big? Looks like the importing decision has to be made by the thin linker directly. You can not import the "constant" global as something like "available_externally constant" because the original copy might not be visible from the original module (just like the example), unless you want to modify the original copy to be "external hidden".

Not sure I understand your question, can you clarify the concern? Right now, computeImportForReferencedGlobals during the thin link we import any referenced variable, unless it has interposable linkage or references other summaries (in which case it is not a constant). The imported variables are imported as available externally definitions. The original definition is currently promoted to external hidden.

Nevermind, I didn't realize we are already doing that. I guess we are all fine and this is less complicated than I think it is. Sorry for the noise.

I think it would be cleaner/clearer to encode this info along with the Ref edges

This would be a bit of overhead for global to global refs, wouldn't it? We can't make any assumption about them until we traverse an entire index.

When importing a variable definition when the ref has this bit set, it can be imported as a local copy, without promoting the name/linkage type

What if we have non-static variable? We can't make a local copy of it, but opt pass can still handle this case if there are no external references

Will we miss some cases if the reference is on something like a bitcast that feeds a non-volatile load?

I'll check

I think it would be cleaner/clearer to encode this info along with the Ref edges

This would be a bit of overhead for global to global refs, wouldn't it? We can't make any assumption about them until we traverse an entire index.

My understanding of your patch is that there would be a 1-1 association between each bit in the RefAccessBits BitVector and ref edges on that same function. I think my suggestion just reorganizes that slightly by moving the bit onto the ref edge. Does that make sense or am I missing something?

When importing a variable definition when the ref has this bit set, it can be imported as a local copy, without promoting the name/linkage type

What if we have non-static variable? We can't make a local copy of it, but opt pass can still handle this case if there are no external references

If it is non-static to start with, but is globally read-only, then it can be imported as a local copy. In order to do something with an opt pass, don't you also need global analysis during the thin link to ensure there are no writes anywhere?

Will we miss some cases if the reference is on something like a bitcast that feeds a non-volatile load?

I'll check

My understanding of your patch is that there would be a 1-1 association between each bit in the RefAccessBits BitVector and ref edges on that same function

Correct. But we have RefEdgeList in a base class (GlobalValueSummary) and there could be references between 2 globals:

static int Foo;
static int *Bar = &Foo;

It doesn't make sense to have const attribute for such kind of references, because we need to know how Bar is accessed before we can say anything about Foo.
That's why I put access bit list to derived class (FunctionSummary). IMO converting std::vector<ValueInfo> refs; to std::vector<EdgeTy> would be a waste of memory
(though not too big).

If it is non-static to start with, but is globally read-only, then it can be imported as a local copy

This also implies internalizing global in the source module, correct?

In order to do something with an opt pass, don't you also need global analysis during the thin link to ensure there are no writes anywhere?

True, but for some reason I thought that global var fixups to BC module would be easier to implement. I'll give "local copy" approach a try and see how it goes.

My understanding of your patch is that there would be a 1-1 association between each bit in the RefAccessBits BitVector and ref edges on that same function

Correct. But we have RefEdgeList in a base class (GlobalValueSummary) and there could be references between 2 globals:

static int Foo;
static int *Bar = &Foo;

It doesn't make sense to have const attribute for such kind of references, because we need to know how Bar is accessed before we can say anything about Foo.
That's why I put access bit list to derived class (FunctionSummary).

That's true. Actually, thinking more about the above example - presumably your global analysis on these new const bits would have to look at the references on global variables, and treat that as a non-const ref and basically mark the referenced GUID as non-const, to represent the address as escaping, right? And you will need a way to communicate this info to the backends - will you just clear the const bit on all the referring function summary RefAccessBits?

IMO converting std::vector<ValueInfo> refs; to std::vector<EdgeTy> would be a waste of memory
(though not too big).

Note it would be a different type than EdgeTy which is for calls, but yeah.

If it is non-static to start with, but is globally read-only, then it can be imported as a local copy

This also implies internalizing global in the source module, correct?

To get the optimization in the original source module, yes, but that's only correct if all referencing modules import the def. Which I think is already the case for non-interposable constants.

In order to do something with an opt pass, don't you also need global analysis during the thin link to ensure there are no writes anywhere?

True, but for some reason I thought that global var fixups to BC module would be easier to implement. I'll give "local copy" approach a try and see how it goes.

That's true. Actually, thinking more about the above example - presumably your global analysis on these new const bits would have to look at the references on global variables, and treat that as a non-const ref and basically mark the referenced GUID as non-const, to represent the address as escaping, right?

For now I implemented constant propagation in ModuleSummaryIndex which works in three steps:

  1. Set Constant = true in all GlobalVarKind GVS and to false in all others
  2. Iterate through all live function summaries and propagate Constant = false over each non-constant reference
  3. Iterate through all live GlobalVarKind summaries with Constant = false propagate it over all references to other global variables using a work queue.

Also I added few small tweaks to internalize global w/o external usage when Constant is true and produce a local copy during import phase.
I didn't have chance to test it on something real though (just small toy example).

asl added a subscriber: asl.Jul 19 2018, 6:09 AM

I've updated the review with the full version. I've tested on few large applications and results are quite promising in terms of both performance and image size

Will we miss some cases if the reference is on something like a bitcast that feeds a non-volatile load?

I don't think so. Actually this logic conforms to GlobalStatus computation in analyzeGlobalAux, which is the core of globalopt

evgeny777 updated this revision to Diff 158785.Aug 2 2018, 9:55 AM
  1. Now importing globals list from DICompileUnit in IRMover. This is needed because we may internalise GV in destination module, so it can reach final link.
  1. Fixed bug which caused llvm-lto internalising exported symbol with -thinlto-action=import
  1. Overall approach was reimplemented in much simpler way
  1. Added extra tests.
evgeny777 planned changes to this revision.Aug 16 2018, 4:11 AM

Found few issues with patch

evgeny777 requested review of this revision.Oct 16 2018, 10:23 AM
evgeny777 retitled this revision from [ThinLTO] Compute constant references to [ThinLTO] Internalize read only globals.Oct 16 2018, 10:24 AM

Finally I have some time to work on this again
I've simplified a patch to a large extent and tested it on a number of real applications (see below)

Algorithm description

Basically this patch does the following:

  • Computes constant references from all functions seen on analysis phase
  • Implements constant propagation in combined summary, which allows calculating variables which are never changed in the program
  • On the beginning promote phase marks such variables with a special attribute ("thinlto-immutable") in IR if it is possible to internalize them.
  • After the end of the import phase internalizes all variables with thinlto-immutable attribute in IR

Key features:

  • While computing function references patch postpones processing non-volatile load instructions till all other instructions have been processed. This allows grouping constant refs in the end of RefEdges array and we need only 32 bit unsigned integer value per each function summary to represent all constant references in it.
  • Patch can internalize both promoted static variables and variables with external or linkonce linkage. Comdats are not currently supported, but can be done as well.
  • BC files after each of the stages (promote/internalize/import e.t.c) are consistent and can be sent to legacy builder for further processing.

Testing

I've tested it on some of real world applications:

LLVM testsuite

Tests pass with following code size improvements/regressions:


Size regressions are caused by more agressive inlining caused by lower inlining costs
due to constant folding. Can provide size analysis data on request

LLVM/clang

LIT tests pass on newly built toolchain. Unfortunately only few variables were internalized,
so no noticable impact was observed

Mozilla Firefox

Produced working image and make check shows all tests passed.

Chromium

Produced working image of both chrome and unit_tests. Some tests fail, but I see no difference
with an image obtained from "regular" build.

Other

I observed 1-2% raise of performance on some of our in-house projects

Missing features

  • ThinLTO incremental build *may* fail with link error. To avoid this HasComdat flag has to be added to GlobalValueSummary and used in module hash computation.
  • Some Bitcode tests fail due to version bump and extra field in FS_PERMODULE and FS_COMBINED
  • Patch lacks tests for various corner cases

This won't take long to fix, but I'd appreciate some feedback for overall approach, while I'm doing this

Great! I only had a chance to take a cursory look so far (attending the llvm dev meeting), but a few questions/comments based on a very quick look this morning.

One high level comment is that it would be better to mark the linkage type on the index during the thin link when we can internalize, rather than have that logic in the back end as in this patch:

You'd need to either add the comdat flag to the summary (like you mentioned is needed anyway for incremental builds), or better yet could you just conservatively mark the GlobalVarSummary as non-constant during the compile step module summary analysis (i.e. instead of initializing the Constant bit to false, init it to true for all global variables except those with comdats - then your thin link Step 1 would just set it to false if it's in the GUIDPreservedSymbols set and otherwise not change the flag). This should fix your incremental build issue too.

Then if the thin link step marks those that are constant with internal linkage in the index and applies that in the backend, it would be consistent with the way we do other summary based internalization.

The main thing that I think would need to be changed is in shouldPromoteLocalToGlobal for the importing side we would need to look at the linkage type in the index rather than assuming that anything that is imported must be promoted. (In fact I think that making this change to shouldPromoteLocalToGlobal on the importing side to get the linkage type from the index would work today, since presumably any imported locals are already marked as ExternalLinkage in the index since we currently promote them all.) For the exporting side we already set the linkage type based on what is set in the index, so that would presumably stay the same.

I may be missing something that makes this problematic though, let me know what you think.

A couple other comments below.

include/llvm/IR/ModuleSummaryIndex.h
266

Maybe have this flag just on the GlobalVarSummary? Not needed on the function summary, and presumably the alias summary could just look through to the base object summary.

513

Think it would be clearer to have a ref edge type and include a flag there.

lib/IR/ModuleSummaryIndex.cpp
140

Why? The refs on a global variable are those referenced where it is defined/initialized. Presumably if it is written later it doesn't affect those references.

Hmm, what if we store the address of another static global var into a pointer global variable, and then write through that pointer. How is that case handled by the patch? The load will be to the pointer, so presumably it will not be mutable here. I.e.:

static int A = 1;
static int *B = &A;
void foo() {

*B = 0;

}

In this case, there will be a load of B in foo, so B is constant and not mutable. I might be missing how this is handled, but I don't see where/how A will be marked mutable. I think you might need to be conservative and treat all refs on other global variables as potentially written?

If the address was taken in a function I think you would be ok since it would be referenced by a non-load.

evgeny777 added a comment.EditedOct 18 2018, 4:42 AM

In fact I think that making this change to shouldPromoteLocalToGlobal on the importing side to get the linkage type from the index would work today

Yep, this seems obvious and the very first version of this patch uses this kind of approach. Unfortunately it didn't work quite well for me. Modifying linkage in the index will work fine on promotion and
internalization phases. However it's the import phase where all bad things happen. When performing import ThinLTO loads source module and does specific modifications in it given list of globals to import
Those modifications include renaming globals to avoid naming conflicts and setting linkage of imported entities to available_externally. After that IRMover performs all the dirty work.

The main problem in this scheme is handling read only external globals. We can't set internal linkage for them in renameModuleForThinLTO, because IRMover will fail to link variable definition to
it's external declaration. These two variables will be considered different:

@g = internal global i32 0
@g = external global i32

IRMover will rename internal definition, leaving external declaration unsatisfied. This will obviously result in link error.

So while it's possible to use proposed approach to deal with promoted static variables, one still needs some processing in the backend to deal with read only external globals. Still backend approach can deal with both cases, so I used it.

include/llvm/IR/ModuleSummaryIndex.h
266

Makes sense, I'll give it a try

513

What about memory usage? Also, like I said, we have all constant references grouped at the end of RefEdgeList, so this looks excessive

lib/IR/ModuleSummaryIndex.cpp
140

Yep, I also stepped on this recently :). By now I'm considering all non-instruction refs to be non-constant

evgeny777 updated this revision to Diff 170611.EditedOct 23 2018, 5:39 AM
  • ReadOnly attribute's been moved to GlobalVarSummary and is calculated on analysis phase
  • Added flag to ValueInfo which indicates "constantness" instead of using constant reference counter in FunctionSummary
  • Fixed incremental build
  • Fixed few issues including GV to GV references
  • Fixed bitcode tests and added extra test cases

Thanks for the update. A bunch of comments below. I am a little concerned on the relationship between the read only flag on the global var and the importing requirements (see comment in computeVariableSummary). Also, it would be good to have a high level description of the algorithm and how this works somewhere (maybe above propagateConstants()).

In fact I think that making this change to shouldPromoteLocalToGlobal on the importing side to get the linkage type from the index would work today

Yep, this seems obvious and the very first version of this patch uses this kind of approach. Unfortunately it didn't work quite well for me. Modifying linkage in the index will work fine on promotion and
internalization phases. However it's the import phase where all bad things happen. When performing import ThinLTO loads source module and does specific modifications in it given list of globals to import
Those modifications include renaming globals to avoid naming conflicts and setting linkage of imported entities to available_externally. After that IRMover performs all the dirty work.

The main problem in this scheme is handling read only external globals. We can't set internal linkage for them in renameModuleForThinLTO, because IRMover will fail to link variable definition to
it's external declaration. These two variables will be considered different:

@g = internal global i32 0
@g = external global i32

IRMover will rename internal definition, leaving external declaration unsatisfied. This will obviously result in link error.

So while it's possible to use proposed approach to deal with promoted static variables, one still needs some processing in the backend to deal with read only external globals. Still backend approach can deal with both cases, so I used it.

I see, yes that explanation makes sense. I suppose on the importing side we could go back and update the linkage for imported variables based on the linkage in the index after the IRMover importing is complete for the module (since the linkage in the index is currently ignored on the importing side). On the exporting side it would get internalized as usual during the index-based internalization. This has the advantage of being more consistent with how we currently internalize based on thin link analysis. Would that not work as expected?

lib/Analysis/ModuleSummaryAnalysis.cpp
381

Comment needed. I suppose this is because we need to be able to import into any module containing a ref?

tejohnson added inline comments.Oct 24 2018, 9:28 AM
include/llvm/IR/ModuleSummaryIndex.h
513

I have mixed feelings on this. With your current scheme you don't add increase the in-memory index size, but it is unfortunate to have the read only flag on all ValueInfo when it only applies to reference edges. Let me take a look at some of our large indexes today and see what the extra overhead would be to make this explicit on just the ref edges.

lib/Analysis/ModuleSummaryAnalysis.cpp
419

The initialization below seems to be a combination of conditions on whether it can be internalized and whether it can be imported. Can you better document this?

Non-empty RefEdges isn't related to the notEligibleToImport flag, but rather they both preclude importing of a global var. Don't we also need to set the read only flag based on notEligibleToImport if we want to prevent it from being set unless its def can be imported?

In general, since you presumably need to ensure that all variables marked read only at the end of the thin link are imported into every referencing module, I think it would be good to make this connection more explicit (in comments and in code). Can you enforce this during the thin link (i.e. during computeImportForReferencedGlobals)? E.g. detect when we can't import a reference and then assert if it is marked read only? Alternatively, rather than trying to detect here when it can't be imported and initializing to not read only, during the thin link importing can you clear the read only flag on any variable with a reference that we aren't able to import? Otherwise I am concerned that this relationship may prove fragile.

lib/IR/ModuleSummaryIndex.cpp
36

Is this comment current? It looks like you will look at all of the refs below so I don't see any advantage being taken from them being at the end. You could presumably take advantage of it by breaking out of the loop when you see the first non-readonly ref since you are walking it backwards. Not sure if it is worth it though, might be better just to look at them all for simplicity (and in case this ordering scheme ever changes).

88

Just invoke S->getBaseObject() for simplicity.

95

Does this need to handle S being an alias like the above method? Looks like not from the current callsite, but I think it should invoke getBaseObject() to be consistent with the above method and also to be safe in case new calls are added that might pass a generic GlobalValueSummary.

109

Is there a need to perform each of the steps below in a separate walk over the whole index? I don't see that we actually propagate the flag from global var to global var, so I don't see any need for ordering. Can you do one walk over the index and perform all steps on each summary exactly once?

117

This first sentence doesn't seem to relate specifically to Step 1, but rather to the whole algorithm.

121

I would expect the size of this set to be << the size of the whole index, so walking the whole index and checking each seems like it might be less efficient overall compared to doing a lookup in the index of each GUID preserved symbol. That being said, I think a better way is to walk the index once and perform all 3 steps (as mentioned above).

132

Suggest adding the explanation of why.

lib/LTO/LTO.cpp
160

I don't think this is necessary because later on (line 221) we walk all the import summaries and invoke AddUsedThings on them.

lib/Transforms/IPO/FunctionImport.cpp
715

Why is this necessary? Since internalizeImmutableGVs is only invoked from importFunctions(), presumably it doesn't have any effect anyway when not importing?

evgeny777 added a comment.EditedOct 25 2018, 12:45 AM

I see, yes that explanation makes sense. I suppose on the importing side we could go back and update the linkage for imported variables based on the linkage in the index after the IRMover importing is complete for the module (since the linkage in the index is currently ignored on the importing side). On the exporting side it would get internalized as usual during the index-based internalization. This has the advantage of being more consistent with how we currently internalize based on thin link analysis. Would that not work as expected?

It seems to me as significantly more complex approach. At the first glance, I'll have to:

  • Prevent promotion on thin link phase in both LTO.cpp and ThinLTOCodeGenerator.cpp for all read-only variables.
  • On promote phase check all read-only variables with local linkage preventing them from becoming available_externally. Non-local vars must not be touched (or IRMover will fail to link them to declarations), so we also need step 3:
  • After import phase we still need to analyze external read only globals, internalizing them when possible. As I'm supposed to check index instead of IR attributes I'm puzzled how to do this properly because of changing linkage to available_externally during import process (so GUID will obviously change also)

So compared to current implementation we:

  • add extra step (prevent promotion in index) and make handling of external globals signifcantly more complex.
  • must have all three steps consistent, which means spreading checks around the code. For instance, if variable is not eligible to import it (a) shouldn't be prevented from promotion, (b) should have available_externally linkage when being imported and (c) shouldn't be internalized if it has non-local linkage.
include/llvm/IR/ModuleSummaryIndex.h
513

I've already changed the way immutable references are stored. Now I'm using an extra flag in ValueInfo which supposedly has zero overhead. I'm still using immutable reference counter when store/load index to/from bitcode file as all read-only references are still grouped in the end of RefEdges array to save disk space.

lib/Analysis/ModuleSummaryAnalysis.cpp
381

We can't internalize a variable if it is referenced by a function in regular LTO module. This is because regular LTO modules are not participating in ThinLTO import, so we can't make a local copy there.

419

Makes sense.

The RefEdges.empty() is the only condition on whether a variable can be imported, all others are related to internalization. I think, I can safely move it (and only it) to propagateConstants if it looks confusing here. My concerns were ThinLTO cache key computation, but it seems such move won't affect it , because import and export lists will be different so will be the cache key.

lib/IR/ModuleSummaryIndex.cpp
36

Yes, the comment is current, but there is a bug in implementation, it should be something like:

for (int I = Refs.size() - 1; I >= 0 && Refs[I].isReadOnly(); --I)
    ImmutableRefCnt ++;
95

This is a helper method used by dot dumper exclusively. I think it should be moved below to a place of actual call.

lib/LTO/LTO.cpp
160

Yep, I missed second call of AddUsedThings

lib/Transforms/IPO/FunctionImport.cpp
715

On -lto-O0 ThinLTO import is entirely disabled, so we have to prevent internalization of read-only stuff or we'll get link errors.

Just a quick comment below since I'm traveling today and out tomorrow and won't have time for a better response until probably Monday. You may be right on the issue of using the index vs just using the read only bit, need to look more closely at your explanation.

lib/Transforms/IPO/FunctionImport.cpp
715

Right but since the internalization is done via internalizeImmutableGVs which is called from importFunctions, if importing is disabled under -lto-O0 wouldn't the internalization not be done either?

evgeny777 added inline comments.Oct 26 2018, 12:19 AM
lib/Transforms/IPO/FunctionImport.cpp
715

Well, strictly speaking, import in the backend is not disabled: -lto-O0 just prevents computation of import/export lists, leaving them empty. Current approach marks read-only GVs with thinlto-internalize attribute, relying on IRMover to copy them to all destination modules, so internalization happens in all modules using read-only GV (or shouldn't happen at all). If IRMover doesn't do anything (because import lists are empty) we'll internalize GV just in the source module, leaving all external declarations unsatisfied.

I see, yes that explanation makes sense. I suppose on the importing side we could go back and update the linkage for imported variables based on the linkage in the index after the IRMover importing is complete for the module (since the linkage in the index is currently ignored on the importing side). On the exporting side it would get internalized as usual during the index-based internalization. This has the advantage of being more consistent with how we currently internalize based on thin link analysis. Would that not work as expected?

It seems to me as significantly more complex approach. At the first glance, I'll have to:

  • Prevent promotion on thin link phase in both LTO.cpp and ThinLTOCodeGenerator.cpp for all read-only variables.
  • On promote phase check all read-only variables with local linkage preventing them from becoming available_externally. Non-local vars must not be touched (or IRMover will fail to link them to declarations), so we also need step 3:
  • After import phase we still need to analyze external read only globals, internalizing them when possible. As I'm supposed to check index instead of IR attributes I'm puzzled how to do this properly because of changing linkage to available_externally during import process (so GUID will obviously change also)

So compared to current implementation we:

  • add extra step (prevent promotion in index) and make handling of external globals signifcantly more complex.
  • must have all three steps consistent, which means spreading checks around the code. For instance, if variable is not eligible to import it (a) shouldn't be prevented from promotion, (b) should have available_externally linkage when being imported and (c) shouldn't be internalized if it has non-local linkage.

Ok, this does seem substantially simpler, so let's go ahead with your current approach. I made a few more comments below, will take another look after you've had a chance to update.

include/llvm/IR/ModuleSummaryIndex.h
513

ok let's go with this approach. If we need to add more info on the ref edges in the future, it can be revisited.

lib/LTO/ThinLTOCodeGenerator.cpp
649

document constant parameter.

lib/Transforms/IPO/FunctionImport.cpp
715

Ah ok, I missed that lto-O0 was just skipping the index based computation of imports and not the backend importFunctions invocation.

lib/Transforms/Utils/FunctionImportUtils.cpp
228 ↗(On Diff #170611)

Please expand on why the internalization of the read only variables needs to be done this way (i.e. in two steps).

230 ↗(On Diff #170611)

This adds another index hash table lookup for GV, which we do just above here too. Can you lookup the ValueInfo for the GV once for both blocks of code?

231 ↗(On Diff #170611)

We should check notEligibleToImport during the thin link, along the lines of my comment in computeVariableSummary. Also, why is the isLive check needed?

evgeny777 added inline comments.Oct 30 2018, 9:00 AM
lib/Transforms/Utils/FunctionImportUtils.cpp
231 ↗(On Diff #170611)

We should check notEligibleToImport during the thin link, along the lines of my comment in computeVariableSummary

This is problematic because notEligibleToImport is calculated after computing per-module summaries. How about doing this in propagateConstants (as we're going to switch to single index pass algorithm anyway)?

Also, why is the isLive check needed?

To prevent internalization of non-prevailing external globals. See LTO/Resolution/X86/not-prevailing-variables.ll

tejohnson added inline comments.Oct 30 2018, 9:09 AM
lib/Transforms/Utils/FunctionImportUtils.cpp
231 ↗(On Diff #170611)

We should check notEligibleToImport during the thin link, along the lines of my comment in computeVariableSummary

This is problematic because notEligibleToImport is calculated after computing per-module summaries. How about doing this in propagateConstants (as we're going to switch to single index pass algorithm anyway)?

Right, note I suggested doing it in the thin link. Similar to my comment in computeVariableSummary where I suggested checking the import eligibility during the thin link: "Alternatively, rather than trying to detect here when it can't be imported and initializing to not read only, during the thin link importing can you clear the read only flag on any variable with a reference that we aren't able to import? "

Also, why is the isLive check needed?

To prevent internalization of non-prevailing external globals. See LTO/Resolution/X86/not-prevailing-variables.ll

Won't we eliminate these non-live variables in any case (regardless of whether we internalize)? Or will internalization mess that up? In any case, it's probably better if this is also checked during the thin link (e.g. propagateConstants), and just clear the read only flag if we shouldn't internalize in the backend.

evgeny777 added inline comments.Oct 30 2018, 9:24 AM
lib/Transforms/Utils/FunctionImportUtils.cpp
231 ↗(On Diff #170611)

it's probably better if this is also checked during the thin link

We should be able to handle case when alias is dead and aliasee is live (and vice versa). I don't see how this is possible to do with just manipulating ReadOnly attribute because we don't have it in AliasSummary.

tejohnson added inline comments.Oct 30 2018, 9:38 AM
lib/Transforms/Utils/FunctionImportUtils.cpp
231 ↗(On Diff #170611)

This case is only handling global variables (getGVarSummary looks only for a summary of that type, and there is a cast below), so I'm not sure I understand how this part of the code relates to the case where the alias is dead and the aliasee is live. Can you clarify?

If the aliasee (the GVar) is dead, it seems like we could clear the read only flag during the thin link and get the same behavior as today.

Sorry if I am missing something. Can you give me a more concrete example of where/how this would go wrong for aliases?

evgeny777 added inline comments.Oct 30 2018, 10:14 AM
lib/Transforms/Utils/FunctionImportUtils.cpp
231 ↗(On Diff #170611)

Consider the following example:

foo.ll

@g = global i32 42, align 4
@g.alias = weak alias i32, i32* @g

main.ll

@g = external global i32

define i32 @main() {
  %v = load i32, i32* @g
  ret i32 %v
}

Assume we link main.ll and foo.ll in ThinLTO mode. We have global variable @g and it's alias @g.alias, the latter is obviously dead. Still it doesn't mean we can't internalize aliasee (@g), does it?

tejohnson added inline comments.Oct 30 2018, 10:24 AM
lib/Transforms/Utils/FunctionImportUtils.cpp
231 ↗(On Diff #170611)

Assume we link main.ll and foo.ll in ThinLTO mode. We have global variable @g and it's alias @g.alias, the latter is obviously dead. Still it doesn't mean we can't internalize aliasee (@g), does it?

I'm confused though at how this relates to my suggestion. Here you are checking whether the global variable, i.e. @g, is live or not. In this example it is live, so presumably you will go ahead and mark it for internalization here since it is also read only. If you move the liveness check into the thin link (propagateConstants), presumably the same thing would happen - it is live, so it can stay read only, and this code would still mark it for internalization.

evgeny777 added inline comments.Oct 30 2018, 10:33 AM
lib/Transforms/Utils/FunctionImportUtils.cpp
231 ↗(On Diff #170611)

Ah, got it now. Ok, I'll try this out

evgeny777 updated this revision to Diff 171940.Oct 31 2018, 9:19 AM
evgeny777 marked 13 inline comments as done.
  • Addressed review comments
  • Added call to propagateConstants to createCombinedModuleSummaryIndex to support -thinlto mode of llvm-lto
tejohnson added inline comments.Nov 5 2018, 9:11 PM
lib/Analysis/ModuleSummaryAnalysis.cpp
382

suggest adding something like "since this would require importing variable as local copy"

lib/IR/ModuleSummaryIndex.cpp
95

Where is this being handled? I don't see any special handling of references from GlobalVars here.

138

Should this and the above block of code (looking for live values) only be done if S is a GlobalVariableSummary?

Also, can the conditions for importing be shared with computeImportForReferencedGlobals so that the conditions don't diverge? I.e. some kind of helper method to do the checking. Also I noticed that computeImportForReferencedGlobals also checks whether it is interposable.

tools/llvm-lto/llvm-lto.cpp
389 ↗(On Diff #171940)

Why this change? IIRC this is just to test the creation of the combined index, with no optimizations done on it. Can we test the constant propagation elsewhere where the various thinlto optimizations are tested (e.g. see the ThinLTOMode handling).

evgeny777 added inline comments.Nov 6 2018, 12:43 AM
lib/IR/ModuleSummaryIndex.cpp
95

In ValueInfo constructor we have:

RefAndFlags.setInt(HaveGVs);

This means that ReadOnly bit is zero, unless setReadOnly is called

138

Should this and the above block of code (looking for live values) only be done if S is a GlobalVariableSummary?

If alias is either not eligible to import or is preserved we can't make a local copy of aliasee, can we? If we make this check only for GlobalVarSummary we can internalize it if its alias is preserved

Also, can the conditions for importing be shared with computeImportForReferencedGlobals so that the conditions don't diverge?

Sounds reasonable.

Also I noticed that computeImportForReferencedGlobals also checks whether it is interposable.

We don't set ReadOnly bit for GVs with interposable linkage. See computeVariableSummary

tools/llvm-lto/llvm-lto.cpp
389 ↗(On Diff #171940)

Unless this is done the patch breaks Linker/funcimport.ll test case due to internalization of dead GV. I'm not quite aware of the purpose of this code and how to fix it properly. Suggestions?

tejohnson added inline comments.Nov 6 2018, 10:43 AM
lib/IR/ModuleSummaryIndex.cpp
95

Can you update the comment here to note that this is done by only marking refs from functions as read only when building the module summary during the analysis phase? Also suggest adding some checking here under NDEBUG that if VI is read only that S is not a global var summary. Will help prevent drift between the comments here and any changes to the module summary analysis phase later.

138

If alias is either not eligible to import or is preserved we can't make a local copy of aliasee, can we? If we make this check only for GlobalVarSummary we can internalize it if its alias is preserved

I'm not sure why we couldn't import and make a local copy of the aliasee when an alias to it is preserved and/or not eligible to import - assuming the reference is to the aliasee not the alias. In fact, we currently will never import a reference to an alias of a global var AFAICT [1].

[1] computeImportForReferencedGlobals only handles refs from functions directly to GlobalVarSummary objects - it doesn't call getBaseObject on the refs. So I don't believe we will even consider importing an alias to a global var. If that is ever supported in the future, we would presumably want to handle it like we do importing of an alias to a function, which is to import as a local copy [2]. In that case, we'd presumably want to ensure that the alias was importable, that the aliasee is importable, and that the aliasee is read only. However I don't think that requires marking the aliasee as not read only if the alias is not importable.

[1] As an aside, looking at function importing, it will look through to the base object (in selectCallee) *before* checking the notEligibleToImport flag. So I don't think having an alias marked as notEligibleToImport will actually prevent it from being imported! I think this works currently since AFAICT the only time we will mark an alias summary as notEligibleToImport is when it is a local that can't be promoted/renamed, and we only import aliases to functions as a local copy, which doesn't require promotion/renaming of the alias in the original module. This seems a little dicey to me, if we ever decide to mark an alias summary as notEligibleToImport for some other reason it might cause failures...not related at all to this patch, just dumping my thoughts here as a note to self...

We don't set ReadOnly bit for GVs with interposable linkage. See computeVariableSummary

Ok. In any case, extracting the importability check into a helper called both here and in the importer will keep these checks consistent (I'd prefer the interposability to be checked redundantly over having the checks for importability be different).

tools/llvm-lto/llvm-lto.cpp
389 ↗(On Diff #171940)

Ah, that test uses some legacy functionality in llvm-link to test importing. We've already had to work around the fact that it isn't doing a true thin link before importing, with a hack to mark all variables as exported (in linkFiles). To fix this you should be able to move this call to propagateConstants() onto the index after we load it in importFunctions (in llvm-link.cpp).

The code here is just building the original combined index which in most cases is used to drive thin link optimizations via llvm-lto. Probably these old tests that use llvm-link to test importing should be migrated to llvm-lto and the import handling in llvm-link should be ripped out. I'll try to get to this today or tomorrow, but in the meantime, you should be able to work around in llvm-link itself as noted above.

evgeny777 added a comment.EditedNov 6 2018, 11:11 AM

I'm not sure why we couldn't import and make a local copy of the aliasee when an alias to it is preserved and/or not eligible to import

My guess is because alias and aliasee are in the fact the same object (both point to the same memory location). Now assume alias is preserved and visible outside of the DSO. One can modify the object via alias (during program initialization for example) which
will take no effect on aliasee if it's been imported to a different module and internalized. The same applies when alias is listed in llvm.used

I'm not sure why we couldn't import and make a local copy of the aliasee when an alias to it is preserved and/or not eligible to import

My guess is because alias and aliasee are in the fact the same object (both point to the same memory location). Now assume alias is preserved and visible outside of the DSO. One can modify the object via alias (during program initialization for example) which
will take no effect on aliasee if it's been imported to a different module and internalized. The same applies when alias is listed in llvm.used

Ok, that makes sense. For normal importing, we don't need to look at the importability of the alias on a reference to the aliasee since we will promote it. The difference here is that these conditions on the alias mean we don't have visibility into possible writes to the aliasee via the alias, and thus can't safely mark it read only. Can you add a comment to this effect in propagateConstants where you are checking those conditions?

I think it would be clearer in propagateConstants to do the checks only on the applicable summary types:

  1. liveness check only on global var summaries
  2. the importability checks on either global var or alias summaries (with note as suggested above about why on alias summaries)
  3. the call to propagateConstantsToRefs only on function summaries

It works without the checks, but I think that guarding with the appropriate checks + a comment as to why would aid in understanding.

the call to propagateConstantsToRefs only on function summaries

This will not work, because we need to drop read only attribute from GVs, referenced by initializer

the call to propagateConstantsToRefs only on function summaries

This will not work, because we need to drop read only attribute from GVs, referenced by initializer

Right, forgot about the fact that while the GlobalVar refs are never marked read only, the info still needs to be propagated. So might as well do that call on all summaries, since you already note in the function that for aliases it will be a no-op.

tejohnson added inline comments.Nov 6 2018, 1:18 PM
tools/llvm-lto/llvm-lto.cpp
389 ↗(On Diff #171940)

I tried migrating the llvm-link tests to llvm-lto but it turns out not to be straightforward for a couple of tests that are testing importing in the presence of old debug info, because we don't have a way other than with llvm-link to force the importing of a given function, and the bitcode being old means the index is old and we don't have importing flags set so importing is conservative. So I suggest going with the fix I suggested above for now.

evgeny777 updated this revision to Diff 172912.Nov 7 2018, 2:36 AM
evgeny777 marked 7 inline comments as done.

Rebased and addressed review comments

Looks really good - just one minor code change request, and a few comment changes, and one question.

lib/IR/ModuleSummaryIndex.cpp
133

Please update the comment to say why they can't be internalized. We drop dead symbols anyway in the backend, but I guess the issue is that once it is internalized the GUID changes and we can't find the summary in dropDeadSymbols?

134

If/when we start importing aliases to global vars, would that affect this code? I still think we wouldn't want to update the readonly bit on the aliasee if alias is dead. (We shouldn't do any importing of dead symbols in any case - only live roots are added to the import worklist.)

138

typo: s/referneces/references/

144

Comment is still a little vague. Specifically, from what I understand:

  • global variable can't be marked read only if it is not eligible to import since we need to ensure that all external references get a local (imported) copy.
  • a global variable can't be marked read only if it or any alias (since alias points to the same memory) are preserved or notEligibleToImport, since either of those means there could be writes that are not visible (because preserved means it could have external to DSO writes, and notEligibleToImport means it could have writes via inline assembly leading it to be in the @llvm.*used).

Can you update the comment to note the above specifics (adjusted if I am missing something)? Will avoid future head scratching when I or someone else looks at this again down the road.

lib/Transforms/IPO/FunctionImport.cpp
263

Please move this one into canImportGlobalVar. I'd rather check it redundantly in the readonly case (i.e. even though you are already checking when building the summary and initializing that bit), to keep the checking centralized and consistent between here and there.

evgeny777 added inline comments.Nov 7 2018, 8:35 AM
lib/IR/ModuleSummaryIndex.cpp
133

This was originally intended to prevent internalization of non-prevailing defs. See test/LTO/Resolution/X86/not-prevailing-variables.ll

134

If/when we start importing aliases to global vars, would that affect this code

I would simply move liveness check to processGlobalsForThinLTO where it originally was. This will also allow eliminating the calls to propagateConstants from llvm-link.cpp. Of course computeImportForReferencedGlobals must be modified to support aliases

144

Can you update the comment to note the above specifics

Can I simply quote you in comments (as your understanding is 100% correct)? :)

tejohnson added inline comments.Nov 7 2018, 8:49 AM
lib/IR/ModuleSummaryIndex.cpp
133

Right but I'm trying to get to the *why* of that no longer working if you didn't do the liveness check here. As noted in that test case, we should drop the definition of that non-prevailing var2. I'm speculating that this would stop working if you marked var2 as readOnly since it would get internalized which would mean that dropDeadSymbols would compute a different GUID for the now-internalized variable which in turn would mean that we can't find the summary to see that it is dead.

134

But I don't understand why that would require a change to any of this handling? I.e. if an alias is dead, we should never even encounter it in computeImportForReferencedGlobals, since we only compute importing starting from live roots, and if the alias is reached from a live root then by definition it can't be dead.

144

Sure =)

evgeny777 added inline comments.Nov 8 2018, 1:51 AM
lib/IR/ModuleSummaryIndex.cpp
133

Well, here is what I find out:

  • dropDeadSymbols is invoked right after promotion and before internalization of immutable GVs, which happens much later (after import is finished). I rechecked not-prevailing-variables.ll test case and it works perfectly, even if liveness check is removed. Small modification to internalizeImmutableGVs is still required: we need to ignore declarations, not assert on them.
  • If we remove liveness check then funcimport.ll test case will fail. This happens because dropDeadSymbols checks for ModuleSummaryIndex::isGlobalValueLive which can return true even if isLive() returns false. In particular this happens when all symbols in the index are dead.

That said I suggest to:

  • remove liveness check from propagateConstants
  • revert changes to llvm-link.cpp
  • update the failed test cases
134

If we remove liveness check (see above) then, I guess, we can simply add required functionality to computeImportForReferencedGlobals

evgeny777 updated this revision to Diff 173149.EditedNov 8 2018, 5:27 AM
  • Addressed review comments
  • Reverted changes in llvm-link.cpp. See inline comments
  • No longer clearing read only attribute on dead variables in propagateConstants. Liveness check still remains because we must ignore dead values during constant propagation. As a result the following tests were updated
test/Linker/funcimport.ll
test/Transforms/FunctionImport/funcimport.ll
  • Changed isLive() to isGlobalValueLive() in propagateConstants
tejohnson accepted this revision.Nov 8 2018, 6:39 AM

LGTM - just a couple of comments in the modified tests that need updates before commit. Thanks!

test/Linker/funcimport.ll
14 ↗(On Diff #173149)

Update comment.

17 ↗(On Diff #173149)

Update this comment - "Eventually" is now! =)

test/Transforms/FunctionImport/funcimport.ll
83 ↗(On Diff #173149)

update comment.

This revision is now accepted and ready to land.Nov 8 2018, 6:39 AM
evgeny777 marked 2 inline comments as done.EditedNov 9 2018, 6:03 AM

Sorry I have to suggest small amendment to this patch before committing after looking test/ThinLTO/X86/distributed_import.ll.
It looks like distributed import is using per-module indexes, so internalization of RO vars will not work correctly. To overcome this
I suggest adding a check for propagateConstants to have been run before marking GV for internalization

See https://reviews.llvm.org/D54306

This change causes modifications to FunctionImport/funcimport.ll to be reverted. The check for withGlobalValueDeadStripping is a bit counter intuitive, so suggestions welcome.

Sorry I have to suggest small amendment to this patch before committing after looking test/ThinLTO/X86/distributed_import.ll.
It looks like distributed import is using per-module indexes, so internalization of RO vars will not work correctly. To overcome this
I suggest adding a check for propagateConstants to have been run before marking GV for internalization

See https://reviews.llvm.org/D54306

This change causes modifications to FunctionImport/funcimport.ll to be reverted. The check for withGlobalValueDeadStripping is a bit counter intuitive, so suggestions welcome.

Why won't it work for distributed builds? The read only gvar flag is being serialized out and back in, so shouldn't it show up in the distributed indexes? If not, the problem should be fixed there so it can work with distributed builds, rather than skipping it (otherwise e.g. here at Google we won't be able to take advantage of this).

Why won't it work for distributed builds? The read only gvar flag is being serialized out and back in, so shouldn't it show up in the distributed indexes?

It does, but we can't be sure about "read-only" unless we analyze function references. And we need propagateConstants for that.

Why won't it work for distributed builds? The read only gvar flag is being serialized out and back in, so shouldn't it show up in the distributed indexes?

It does, but we can't be sure about "read-only" unless we analyze function references. And we need propagateConstants for that.

propagateConstants should run during the thin link, which would happen in a distributed build before writing indexes. Maybe there is something weird about that test where it's not actually running the real thin link, will look...

Why won't it work for distributed builds? The read only gvar flag is being serialized out and back in, so shouldn't it show up in the distributed indexes?

It does, but we can't be sure about "read-only" unless we analyze function references. And we need propagateConstants for that.

propagateConstants should run during the thin link, which would happen in a distributed build before writing indexes. Maybe there is something weird about that test where it's not actually running the real thin link, will look...

Hmm that test is using llvm-lto2 to run the thin link and write distributed indexes, which should do the full thin link. Can you look at why it isn't running propagateConstants during the thin link?

That being said, your change in D54306 seems fine in that we wouldn't have run propagateConstants if the dead stripping bit in the index isn't set. But there again, we should have run dead stripping and therefore propagateConstants during the thin link of llvm-lto2, set the with dead stripping bit in the index, and serialized that through the distributed indexes. Can you see why that isn't being run and set in the distributed indexes (both the with dead stripping flag and the read only flags)?

Oh, my bad. opt -function-import gets index from llvm-lto2, not from opt -thinlto-bc. Sorry about that

However, there is still a small problem:

In FunctionImport/funcimport.ll we use index from llvm-lto -thinlto, for which propagateConstants hasn't been run. This results in internalizing P.llvm.0, which shouldn't happen, because it's writeable (from setfunc)
Can such thing occur in real life (i.e we use index from llvm-lto -thinlto in distrivuted build)?

Oh, my bad. opt -function-import gets index from llvm-lto2, not from opt -thinlto-bc. Sorry about that

However, there is still a small problem:

In FunctionImport/funcimport.ll we use index from llvm-lto -thinlto, for which propagateConstants hasn't been run. This results in internalizing P.llvm.0, which shouldn't happen, because it's writeable (from setfunc)

Ok that makes sense.

Can such thing occur in real life (i.e we use index from llvm-lto -thinlto in distrivuted build)?

Nope, that's just a testing config. So your follow on fix seems fine to fix that.

This revision was automatically updated to reflect the committed changes.

This commit causes our internal bots failed to bootstrap clang. The error we are getting is:

"gCrashRecoveryEnabled (.llvm.1401930837577591816)", referenced from:
    clang::ParseAST(clang::Sema&, bool, bool) in 286.thinlto.o

I have a bit trouble to reproduce the problem but it is 100% reproducible on our bots. I suspect the issue is ThinLTO cache reuse because if I clear the cache then the link will succeed. I will update if I can reproduce and pinpoint the issue.

I reverted this in r346768.

I think this is indeed a caching problem. There are some dylib/binary in clang projects that toggles whether to enable the recovery context but some does not. I can reproduce the issue by thin link libclang.dylib first, then link clang-func-mapping on Darwin.
libclang.dylib thinks the file scope variable gCrashRecoveryEnabled not read-only, so it promotes it.
clang-func-mapping thinks gCrashRecoveryEnabled read-only, so it internalize and constant propagate the variable but ParseAST.o in the cache is still expecting gCrashRecoveryEnabled to be available.

Let me know if you need more information.

I reverted this in r346768.

I think this is indeed a caching problem. There are some dylib/binary in clang projects that toggles whether to enable the recovery context but some does not. I can reproduce the issue by thin link libclang.dylib first, then link clang-func-mapping on Darwin.
libclang.dylib thinks the file scope variable gCrashRecoveryEnabled not read-only, so it promotes it.
clang-func-mapping thinks gCrashRecoveryEnabled read-only, so it internalize and constant propagate the variable but ParseAST.o in the cache is still expecting gCrashRecoveryEnabled to be available.

Let me know if you need more information.

Thinking through this example, I'm not sure what is going on. This patch did change the cache key computation to include the read only bit from the global var summary of anything defined or imported. Since gCrashRecoveryEnabled is a static in CrashRecoveryContext.cpp, it must have been imported into ParseAST.o, which means that the read only bit on the associated gvar summary should have been hashed into ParseAST.o's cache key. So presumably we shouldn't have had a cache hit for ParseAST.o (built when the read only bit is not set on gCrashRecoveryEnabled) when thin linking clang-func-mapping which has this bit set for that variable.

Even if the variable was originally externally visible, in order for it to be marked read only by the thin link it would have had to have been imported at all use sites. Which means that any referencing module had to have it in the import set (and therefore would have hashed the read only bit) in the thin link where it was marked read only. So I am not sure offhand why we could ever share a referencing object between a link where it was read only and another link where it is not... Hopefully Eugene can figure out what is going wrong here.

I was trying this out internally and noticed a couple of minor things that can be fixed when you recommit the patch with the caching fix.

llvm/trunk/lib/Transforms/IPO/FunctionImport.cpp
1048 ↗(On Diff #173495)

This should always be true - globals() returns only GlobalVariables (you shouldn't even need to cast).

llvm/trunk/lib/Transforms/Utils/FunctionImportUtils.cpp
225 ↗(On Diff #173495)

As we discussed earlier in the review thread, there should not be any issue with doing this for a distributed import (I just checked a small test case and confirmed it works fine). Please update the comment (it was only in certain testing contexts that you wouldn't have dead stripping at this point).