This is an archive of the discontinued LLVM Phabricator instance.

ThinLTO: do not promote GlobalVariable that have a specific section.
ClosedPublic

Authored by mehdi_amini on Mar 19 2016, 3:17 PM.

Details

Summary

The linker has some implicit contract on some sections, and
promoting/renaming is breaking it.

Diff Detail

Event Timeline

mehdi_amini retitled this revision from to ThinLTO: do not promote GlobalVariable that have a specific section..
mehdi_amini updated this object.
mehdi_amini added a reviewer: tejohnson.
mehdi_amini added a subscriber: llvm-commits.

I think this is reasonable.

If it is ever a significant limitation we should investigate changing the representation or special casing some sections.

tejohnson added inline comments.Mar 21 2016, 12:10 PM
lib/Transforms/Utils/FunctionImportUtils.cpp
67

What happens when a reference to this GVar is imported to another module? Note that this change will prevent the renaming and promotion in the exporting module, but will still do so when importing the symbol (e.g. see getName() and other callsites below).

I don't think that it can be addressed by simply preventing the renaming/promotion on the importing side as well. For one, isn't it possible that there will be name clashes between same-named local variables imported from different modules when they have assigned sections? Second, while perhaps unlikely, the local could presumably be eliminated in its original module if it is no longer needed there through some type of optimization, unless that same optimization happened on the imported copy.

(As an aside, I think there is a related issue with the preceding handling for local const variables due to the change not to import GVar defs - we expect to use the imported copy but won't have one, so the non-promoted, non-renamed imported ref won't be satisfied.)

Just like with functions that have inline assembly, functions with the opt_none attribute, functions with the opt_size attributes (etc.), we need to represent this information in the summary.
We talked about having a "flags" fields that could be compressed with the existing linkage field. (for section I plan on having a single bit "hasSection" right now).
Thoughts?

I'm hitting a wall: even with adding the "hasSection" flag to the summary: no renaming a function breaks the model.

Module A {
  define internal foo() { ... };
  define bar() { foobar() };
}

Module B {
  define foo() { ... };
  define foobar() { foo() };
}

If we leave internal foo() internal in module A and we don't rename it, we have a problem when we want to import foobar() because of the collision on the call to foo() in module B.
There is nothing in the index that can help to address this situation. We reach a point were the only conservative strategy I can imagine is to detect the collision "late" (i.e. during the FunctionImporter, in the "backend" job) and do not perform the import.
This is annoying because it prevents future optimizations (like "moving" a global for instance).

Something that can be done is an intermediate solution: allow to rename globals that have a section, but not allow to change their linkage. This should work for me.

However it won't help for functions in llvm.used for example, since these can't be renamed right now.

tejohnson edited edge metadata.Apr 22 2016, 9:11 AM
In D18298#408779, @joker.eph wrote:

I'm hitting a wall: even with adding the "hasSection" flag to the summary: no renaming a function breaks the model.

Module A {
  define internal foo() { ... };
  define bar() { foobar() };
}

Module B {
  define foo() { ... };
  define foobar() { foo() };
}

If we leave internal foo() internal in module A and we don't rename it, we have a problem when we want to import foobar() because of the collision on the call to foo() in module B.
There is nothing in the index that can help to address this situation. We reach a point were the only conservative strategy I can imagine is to detect the collision "late" (i.e. during the FunctionImporter, in the "backend" job) and do not perform the import.
This is annoying because it prevents future optimizations (like "moving" a global for instance).

How frequent/important are the modules where specific sections are used? Another conservative possibility is to simply prevent importing into any module that contains a GV with local linkage and the hasSection flag set (which is info available in the index).

Also, I assume the plan is to use the proposed hasSection flag to prevent importing any function containing a reference to a local with that flag set (to avoid introducing a cross-module reference requiring promotion). If renaming is ok, but not the promotion, then a local function with hasSection can itself be imported, if we keep the imported copy local (in which case functions containing references to it can also be imported).

How frequent/important are the modules where specific sections are used?

Very frequent: NSConstantString (objc) is using a specific section all the time.
(I can't bootstrap clang on OS X without the current diff applied for instance, the ThinLTO link does not succeed)

Also, I assume the plan is to use the proposed hasSection flag to prevent importing any function containing a reference to a local with that flag set (to avoid introducing a cross-module reference requiring promotion).

Yes.

If renaming is ok, but not the promotion, then a local function with hasSection can itself be imported, if we keep the imported copy local (in which case functions containing references to it can also be imported).

Probably OK most of the time.

So the only way I could foresee to make it work requires collision detection in names based on the summary. Of course it will require to some changes in the naming scheme (the complication comes from using hashes...).
The hash function is currently hash(symbol_name, linkage, module_id) ; this is the GUID we store in the bitcode *and* we use to index in the global table in the combined index.

We can instead compute another GUID this way: hash(hash(symbol_name), hash(linkage, module_id)) .
During the FE phase, we generate in the summary only this part: hash(symbol_name) ; this is what we store in the bitcode file. We already have module_id and linkage propagated somehow.
In the combined index in memory, we store hash(symbol_name) in the summary itself (GlobalValueSummary), and compute the finale GUID hash(hash(symbol_name), hash(linkage, module_id)) for the key in the global table.
That way we can always lookup the summary of all the symbol in a module and compare hash(symbol_name) to detect the collision.

In D18298#409202, @joker.eph wrote:

So the only way I could foresee to make it work requires collision detection in names based on the summary. Of course it will require to some changes in the naming scheme (the complication comes from using hashes...).
The hash function is currently hash(symbol_name, linkage, module_id) ; this is the GUID we store in the bitcode *and* we use to index in the global table in the combined index.

We can instead compute another GUID this way: hash(hash(symbol_name), hash(linkage, module_id)) .
During the FE phase, we generate in the summary only this part: hash(symbol_name) ; this is what we store in the bitcode file. We already have module_id and linkage propagated somehow.
In the combined index in memory, we store hash(symbol_name) in the summary itself (GlobalValueSummary), and compute the finale GUID hash(hash(symbol_name), hash(linkage, module_id)) for the key in the global table.
That way we can always lookup the summary of all the symbol in a module and compare hash(symbol_name) to detect the collision.

I think a variant of this should work. The current hashing scheme is actually (IsLocalLinkage)?hash(SourceFileName+":"+symbol_name):hash(symbol_name)

We don't currently have the SourceFileName in the combined index. We could however augment the module path string table to include that for use in computing the GUID/hash key into the combined index table for locals. Then use your idea of keeping a hash of just the symbol name in the combined index summary for comparison, although we only really need to do that in the case where IsLocalLinkage==true (if !IsLocalLinkage then its GUID/hash used as key in the index can be used for the comparison).

In D18298#409202, @joker.eph wrote:

So the only way I could foresee to make it work requires collision detection in names based on the summary. Of course it will require to some changes in the naming scheme (the complication comes from using hashes...).
The hash function is currently hash(symbol_name, linkage, module_id) ; this is the GUID we store in the bitcode *and* we use to index in the global table in the combined index.

We can instead compute another GUID this way: hash(hash(symbol_name), hash(linkage, module_id)) .
During the FE phase, we generate in the summary only this part: hash(symbol_name) ; this is what we store in the bitcode file. We already have module_id and linkage propagated somehow.
In the combined index in memory, we store hash(symbol_name) in the summary itself (GlobalValueSummary), and compute the finale GUID hash(hash(symbol_name), hash(linkage, module_id)) for the key in the global table.
That way we can always lookup the summary of all the symbol in a module and compare hash(symbol_name) to detect the collision.

I think a variant of this should work. The current hashing scheme is actually (IsLocalLinkage)?hash(SourceFileName+":"+symbol_name):hash(symbol_name)

We don't currently have the SourceFileName in the combined index. We could however augment the module path string table to include that for use in computing the GUID/hash key into the combined index table for locals. Then use your idea of keeping a hash of just the symbol name in the combined index summary for comparison,

Peter was thinking about changing the naming scheme right? Instead of using the SourceFileName we would hash the names of all the public globals in a module IIRC. With a minor variant, by hashing the hash of each global name we don't need any other information in the bitcode to build the module_hash.

although we only really need to do that in the case where IsLocalLinkage==true (if !IsLocalLinkage then its GUID/hash used as key in the index can be used for the comparison).

Yes but any suggestion on how to achieve having this field optional? Unless having an extra layer in the hierarchy LocalFunctionSummary inheriting from FunctionSummary and LocalAliasSummary inheriting from AliasSummary etc. But this does not seem like the right design (won't scale with other fields).
Another way may be to have a variable size class that we malloc we extra size for the optional fields.

pcc added a subscriber: pcc.Apr 22 2016, 11:07 AM
In D18298#409219, @joker.eph wrote:
In D18298#409202, @joker.eph wrote:

So the only way I could foresee to make it work requires collision detection in names based on the summary. Of course it will require to some changes in the naming scheme (the complication comes from using hashes...).
The hash function is currently hash(symbol_name, linkage, module_id) ; this is the GUID we store in the bitcode *and* we use to index in the global table in the combined index.

We can instead compute another GUID this way: hash(hash(symbol_name), hash(linkage, module_id)) .
During the FE phase, we generate in the summary only this part: hash(symbol_name) ; this is what we store in the bitcode file. We already have module_id and linkage propagated somehow.
In the combined index in memory, we store hash(symbol_name) in the summary itself (GlobalValueSummary), and compute the finale GUID hash(hash(symbol_name), hash(linkage, module_id)) for the key in the global table.
That way we can always lookup the summary of all the symbol in a module and compare hash(symbol_name) to detect the collision.

I think a variant of this should work. The current hashing scheme is actually (IsLocalLinkage)?hash(SourceFileName+":"+symbol_name):hash(symbol_name)

We don't currently have the SourceFileName in the combined index. We could however augment the module path string table to include that for use in computing the GUID/hash key into the combined index table for locals. Then use your idea of keeping a hash of just the symbol name in the combined index summary for comparison,

Peter was thinking about changing the naming scheme right? Instead of using the SourceFileName we would hash the names of all the public globals in a module IIRC. With a minor variant, by hashing the hash of each global name we don't need any other information in the bitcode to build the module_hash.

(Following up from IRC discussion). This is different than the way we do the promotion naming. I don't think this change affects or is related to the promotion naming scheme.

although we only really need to do that in the case where IsLocalLinkage==true (if !IsLocalLinkage then its GUID/hash used as key in the index can be used for the comparison).

Yes but any suggestion on how to achieve having this field optional? Unless having an extra layer in the hierarchy LocalFunctionSummary inheriting from FunctionSummary and LocalAliasSummary inheriting from AliasSummary etc. But this does not seem like the right design (won't scale with other fields).
Another way may be to have a variable size class that we malloc we extra size for the optional fields.

I was thinking more about the combined index format it bitcode, where we would want to emit as an optional record.

mehdi_amini edited edge metadata.

Update: globals with a specific section will prevents import

tejohnson added inline comments.Apr 25 2016, 9:14 AM
lib/Transforms/IPO/FunctionImport.cpp
86

s/globals/global/

99

DefinedGVSummaries only contains entries for the module we are importing into, not from. So I don't think this is the right check. In fact, won't this cause this routine to always return true for anything not already in this module and therefore exactly when we would be introducing a cross-module reference?

Looks like your new test is checking that we don't import a function that has a section, but needs to be extended to check that we don't import a function containing references to something that has a section.

106

Empty comment.

113

s/globals/global/

Can you add a FIXME here that we could import this if we do so as a copy (e.g. don't promote an imported local function). And that if we do so we could also import functions containing references to it?

mehdi_amini added inline comments.Apr 25 2016, 10:19 AM
lib/Transforms/IPO/FunctionImport.cpp
99

You're right that the logic seems wrong.

But I'm not sure your are describing what the test is doing: it is checking that we don't import reference_gv_with_section which is a function that does not have a section attached, but reference an internal variable with a section.

tejohnson added inline comments.Apr 25 2016, 1:22 PM
lib/Transforms/IPO/FunctionImport.cpp
99

You're right, I looked too quickly at the test to see why it didn't fail. I'm wondering why it doesn't import reference_gv_with_section (incorrectly), since I think valueCanBeExternallyReferencedFromModule should return true since var_with_section wouldn't be in the DefinedGVSummaries for the importing module.

Hitting a wall because I was trying to plumb `const StringMap<std::map<GlobalValue::GUID, GlobalValueSummary *>> &

ModuleToDefinedGVSummaries()` to get the definedGV for the module we import from, but could not because of `void llvm::ComputeCrossModuleImportForModule`.

The latter was introduced in D18945, for which we had some discussions.
I think we're moving in a direction where we can remove this anyway right?

mehdi_amini marked 6 inline comments as done.Apr 25 2016, 8:26 PM
lib/Transforms/IPO/FunctionImport.cpp
99

Many tests were in fact broken with this patch as it was before.

In D18298#411471, @joker.eph wrote:

Hitting a wall because I was trying to plumb `const StringMap<std::map<GlobalValue::GUID, GlobalValueSummary *>> &

ModuleToDefinedGVSummaries()` to get the definedGV for the module we import from, but could not because of `void llvm::ComputeCrossModuleImportForModule`.

The latter was introduced in D18945, for which we had some discussions.
I think we're moving in a direction where we can remove this anyway right?

Yes with the changes I am working on to use individual index files for distributed backend compilations, and a change to compute the imports from the gold-plugin and pass them down to the in-process threads, we shouldn't need that anymore. However, I don't think you really need the defined GV info for the module we import from, see below.

lib/Transforms/IPO/FunctionImport.cpp
99

Can't you simply reverse this condition? I.e. if it is in the DefinedGVSummaries for the module doing the importing, you aren't going to be externally referencing it by importing the referring function.

ISTM that it is sufficient to check if var_with_section is not already defined in the importing module, because if not then you are adding a cross-module reference into the importing module. If the symbol in question is not defined in the same module as the function being imported, then presumably it is already externally referenced and canBeExternallyReferenced() should return true anyway. So I think this can just be "if (Summary != DefinedGVSummaries.end()) return true;", and perhaps the name of the routine should change to something like "valueCanBeExternallyReferencedFromImportingModule".

mehdi_amini added inline comments.Apr 26 2016, 9:29 AM
lib/Transforms/IPO/FunctionImport.cpp
99

I'm not sure to understand (but haven't had my coffee this morning yet...).

If you look at the test, we want to import function @reference_gv_with_section which references @var_with_section which is internal (so can't be externally referenced). And this is what I'm checking.

Can you illustrate the logic you're explaining above using the test as an example?

tejohnson added inline comments.Apr 26 2016, 9:39 AM
lib/Transforms/IPO/FunctionImport.cpp
99

Sure.

When we invoke eligibleForImport on @reference_gv_with_section, it will first check whether its own summary canBeExternallyReferenced() which is true (@reference_gv_with_section is not itself a local that has a section), so then it invokes this routine to see whether the referenced @var_with_section can be externally referenced. Because @var_with_section is not defined in the module we are importing into, it is not in the DefinedGVSummaries list. So currently with your patch, valueCanBeExternallyReferencedFromModule will incorrectly return true since it isn't in the list. With my proposed change, which is:

if (Summary != DefinedGVSummaries.end())
   return true;

we will not return true here, since the summary for @var_with_section is not in the list (not defined in the importing module). Then it will fall through to check canBeExternallyReferenced(), which should return false given the that it is local and therefore needsRenaming(), and because it hasSection flag.

mehdi_amini added inline comments.Apr 26 2016, 9:42 AM
lib/Transforms/IPO/FunctionImport.cpp
99

Interestingly it seemed obvious when I reached the first parenthesis of the first sentence now. I don't know why I went with something so complicated...
Thanks.

mehdi_amini added inline comments.Apr 26 2016, 1:16 PM
lib/Transforms/IPO/FunctionImport.cpp
99

Somehow you convinced me above, but I'm confused again now. What you write is almost what I attempted originally until I wrote that I hit a wall and went for the current patch.

It is not clear to me how you check anything on var_with_section without its summary. And you will *not* find it if DefinedGVSummaries is about the importing module and not the exporting one.

tejohnson added inline comments.Apr 26 2016, 1:26 PM
lib/Transforms/IPO/FunctionImport.cpp
99

Oh I see your concern - because that is how you are getting the summary to check. Why don't you just get it the same way selectCallee gets it for a candidate call? I.e. access it from the Index via Index.findGlobalValueInfoList(GUID). If there is more than one then you could look for the one in the same module as the referring summary (which would need to be passed in here). That's where you could return false if none is found in the referring summary's module (since it already must be an external access).

Address Teresa's comment. Thanks it is a lot better now :)

mehdi_amini added inline comments.Apr 26 2016, 2:30 PM
include/llvm/IR/ModuleSummaryIndex.h
377

Note: I don't use this method anymore, but you need it in your patch. I guess you should take it in your patch and I'll remove it from here.

tejohnson accepted this revision.Apr 26 2016, 2:32 PM
tejohnson edited edge metadata.

LGTM!

This revision is now accepted and ready to land.Apr 26 2016, 2:32 PM
tejohnson added inline comments.Apr 26 2016, 2:32 PM
include/llvm/IR/ModuleSummaryIndex.h
377

Will do

mehdi_amini edited edge metadata.

Remove unused "findSummaryInModule" function