This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Fix crash: transitively referenced global shouldn't be promoted
ClosedPublic

Authored by tejohnson on Nov 18 2016, 6:48 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

mehdi_amini retitled this revision from to [ThinLTO] Fix crash: transitively referenced global shouldn't be promoted.
mehdi_amini updated this object.
mehdi_amini added a reviewer: tejohnson.
mehdi_amini added a subscriber: llvm-commits.
tejohnson edited edge metadata.Nov 19 2016, 7:31 AM

Interesting, this is pretty much the opposite of a patch I sent yesterday (D26866) to fix a Chromium build issue (PR31052).

By removing this code, we can have more failures of the type described in that bug, because the backend will decide not to promote constant unnamed_addr constants (doPromoteLocalToGlobal).

I see two fixes to both of these issues (PR31052 and the issue you are trying to fix here):

  1. Disable the constant unnamed_addr promotion optimization in doPromoteLocalToGlobal
  2. Change the importing logic to prevent import of anything that will cause a non-renamable value to be exported via recursive analysis of exported globals (recursively invoke canBeExternallyReferenced when looking at global variable summaries)

The disadvantage of #1, while simpler, is that we would prevent optimizations. I.e. the situation in PR26866 is from constant propagation once we have imported since the parameters of the imported function were constants used to access the imported global constant array.

Whatever solution we adopt, it seems that logic should be in the *import* directly, so this code should go away and the importing could then be taught to walk refs (we don't right now I believe) and import constant.

So my current feeling is:

  • the fix for PR31052 is to disable the "magic" in IRLinker that import definition we didn't ask for.
  • let this patch go
  • work on the importing logic to start considering "refs" to be imported when possible / profitable (i.e. catch the performance gain we hope to have with PR31052)

So my current feeling is:

  • the fix for PR31052 is to disable the "magic" in IRLinker that import definition we didn't ask for.

The IRLinker shouldn't be changed - it would be removing the constant unnamed_addr handling from doPromoteLocalToGlobal. The IRLinker does the right thing based on the linkage.

  • let this patch go

I think you would also need this patch to go in so globalvar refs wouldn't then be marked exported, which presumably is causing an issue when the exporting (defining) module tried to promote those with sections?

  • work on the importing logic to start considering "refs" to be imported when possible / profitable (i.e. catch the performance gain we hope to have with PR31052)

This is the right long term fix. However I am concerned about performance we may give up in the meantime (PR31052 is fixing a case where we already are getting optimized code). I'd prefer to keep the current approach in the meantime, unless benchmarking shows that it doesn't have significant impact. It would mean changing this fix to instead simply do a recursive walk of global var refs in eligibleForImport, along with my current fix for PR31052.

So my current feeling is:

  • the fix for PR31052 is to disable the "magic" in IRLinker that import definition we didn't ask for.

The IRLinker shouldn't be changed - it would be removing the constant unnamed_addr handling from doPromoteLocalToGlobal. The IRLinker does the right thing based on the linkage.

I don't get this part?
I expect the imported to *control* what is imported, and nothing that isn't marked for import should be imported.

I think you would also need this patch to go in so globalvar refs wouldn't then be marked exported, which presumably is causing an issue when the exporting (defining) module tried to promote those with sections?

Yes, this is the crash with the test-case in this patch.

  • work on the importing logic to start considering "refs" to be imported when possible / profitable (i.e. catch the performance gain we hope to have with PR31052)

This is the right long term fix. However I am concerned about performance we may give up in the meantime (PR31052 is fixing a case where we already are getting optimized code). I'd prefer to keep the current approach in the meantime, unless benchmarking shows that it doesn't have significant impact. It would mean changing this fix to instead simply do a recursive walk of global var refs in eligibleForImport, along with my current fix for PR31052.

I have a different approach: 1) this should go in in any case, because correctness first, and 2) it does not seem that much "long term", it is even fairly easy to implement in the importer, but I would prioritize it depending on the benchmarks.

So my current feeling is:

  • the fix for PR31052 is to disable the "magic" in IRLinker that import definition we didn't ask for.

The IRLinker shouldn't be changed - it would be removing the constant unnamed_addr handling from doPromoteLocalToGlobal. The IRLinker does the right thing based on the linkage.

I don't get this part?
I expect the imported to *control* what is imported, and nothing that isn't marked for import should be imported.

The IRLinker necessarily needs to link in the definition of any local referenced from the module being linked. This is necessary for both Full LTO, as well as for ThinLTO if we decided not to promote and want to import a copy. For ThinLTO we control this by promoting before the IRLinker is invoked (by invoking renameModuleForThinLTO) - so if it has been promoted then the IRLinker sees a non-local and does not force the def to be linked, but if it is still local at that point we need to import a local copy.

Note that this is still going to be the same when we change the local const heuristic to be decided in the thin link instead of during the backend's renaming logic, based on a new flag to be added indicating constness. In that case the local would not be marked as a global in the index, and we will not promote in renameModuleForThinLTO (superseding the old heuristic in doPromoteLocalToGlobal), and the IRLinker will import a local copy since it will still be local.

I think you would also need this patch to go in so globalvar refs wouldn't then be marked exported, which presumably is causing an issue when the exporting (defining) module tried to promote those with sections?

Yes, this is the crash with the test-case in this patch.

  • work on the importing logic to start considering "refs" to be imported when possible / profitable (i.e. catch the performance gain we hope to have with PR31052)

This is the right long term fix. However I am concerned about performance we may give up in the meantime (PR31052 is fixing a case where we already are getting optimized code). I'd prefer to keep the current approach in the meantime, unless benchmarking shows that it doesn't have significant impact. It would mean changing this fix to instead simply do a recursive walk of global var refs in eligibleForImport, along with my current fix for PR31052.

I have a different approach: 1) this should go in in any case, because correctness first,

This patch will not be correct on it's own, it can't go in without removing the logic in doPromoteLocalToGlobal.

and 2) it does not seem that much "long term", it is even fairly easy to implement in the importer, but I would prioritize it depending on the benchmarks.

I'm concerned about the performance we lose in the meantime. Note the optimization is working in other cases (when the reference to a non-const local is directly referenced by the const local being imported). To keep it working until the constness is tracked and handled in the thin link, along with my patch for PR31052, I think this would need to be changed to do a similar recursive check in canBeExternallyReferenced, like the following:

  • a/lib/Transforms/IPO/FunctionImport.cpp

+++ b/lib/Transforms/IPO/FunctionImport.cpp
@@ -135,7 +135,23 @@ static bool canBeExternallyReferenced(const ModuleSummaryIndex &Index,

// We don't need to check for the module path, because if it can't be
// externally referenced and we call it, it is necessarilly in the same
// module
  • return canBeExternallyReferenced(**Summaries->second.begin());

+ if (!canBeExternallyReferenced(**Summaries->second.begin()))
+ return false;
+
+ auto GVS = dyn_cast<GlobalVarSummary>(Summaries->second.begin()->get());
+ if (!GVS)
+ return true;
+ FunctionImportGlobalProcessing::doPromoteLocalToGlobal() will always
+
trigger importing the initializer for constant unnamed addr globals that
+ are referenced. We conservatively check all the referenced symbols for
+
every global to workaround this.
+ // FIXME: with a "isConstant" flag in the summary we could be more targetted.
+ for (auto &Ref : GVS->refs()) {
+ auto GUID = Ref.getGUID();
+ if (!canBeExternallyReferenced(Index, GUID))
+ return false;
+ }
+ return true;
}

// Return true if the global described by \p Summary can be imported in another

The IRLinker shouldn't be changed - it would be removing the constant unnamed_addr handling from doPromoteLocalToGlobal. The IRLinker does the right thing based on the linkage.

I don't get this part?
I expect the imported to *control* what is imported, and nothing that isn't marked for import should be imported.

The IRLinker necessarily needs to link in the definition of any local referenced from the module being linked.

Not necessarily: it can just abort and return an error.

This is necessary for both Full LTO, as well as for ThinLTO if we decided not to promote and want to import a copy.

If we decide to import a copy, we should ask the IRLinker to import this local.

For ThinLTO we control this by promoting before the IRLinker is invoked (by invoking renameModuleForThinLTO) - so if it has been promoted then the IRLinker sees a non-local and does not force the def to be linked, but if it is still local at that point we need to import a local copy.

What you're describing is that we *can't* express the copy at this point, so we leave the IRLinker implicitly handling it.
I'm against any form of implicit: it adds complexity and makes it too hard to reason about each components and the invariant of the system by "hiding" them from the APIs.
This is what lead to bugs like the one addressed in this current patch.

I have a different approach: 1) this should go in in any case, because correctness first,

This patch will not be correct on it's own, it can't go in without removing the logic in doPromoteLocalToGlobal.

Then we're missing a test case in the validation, as it is passing with this patch.

and 2) it does not seem that much "long term", it is even fairly easy to implement in the importer, but I would prioritize it depending on the benchmarks.

I'm concerned about the performance we lose in the meantime.

I'm not why it should be a priority? Development on trunk should focus on correctness of the system *before* optimizations concerns.

The IRLinker shouldn't be changed - it would be removing the constant unnamed_addr handling from doPromoteLocalToGlobal. The IRLinker does the right thing based on the linkage.

I don't get this part?
I expect the imported to *control* what is imported, and nothing that isn't marked for import should be imported.

The IRLinker necessarily needs to link in the definition of any local referenced from the module being linked.

Not necessarily: it can just abort and return an error.

This is necessary for both Full LTO, as well as for ThinLTO if we decided not to promote and want to import a copy.

If we decide to import a copy, we should ask the IRLinker to import this local.

It's possible that with the new LTO API the full LTO case would work with that handling removed, but I haven't walked through the usage in detail. not sure about the old libLTO case for full LTO.

For ThinLTO we control this by promoting before the IRLinker is invoked (by invoking renameModuleForThinLTO) - so if it has been promoted then the IRLinker sees a non-local and does not force the def to be linked, but if it is still local at that point we need to import a local copy.

What you're describing is that we *can't* express the copy at this point, so we leave the IRLinker implicitly handling it.
I'm against any form of implicit: it adds complexity and makes it too hard to reason about each components and the invariant of the system by "hiding" them from the APIs.
This is what lead to bugs like the one addressed in this current patch.

I have a different approach: 1) this should go in in any case, because correctness first,

This patch will not be correct on it's own, it can't go in without removing the logic in doPromoteLocalToGlobal.

Then we're missing a test case in the validation, as it is passing with this patch.

I looked at the test case that was added when the code you are removing here was added (D19102). I think if referencedbyglobal was a local it would have exposed the issue (we would have been importing it and promoting it, with this patch we would lose the promotion).

and 2) it does not seem that much "long term", it is even fairly easy to implement in the importer, but I would prioritize it depending on the benchmarks.

I'm concerned about the performance we lose in the meantime.

I'm not why it should be a priority? Development on trunk should focus on correctness of the system *before* optimizations concerns.

I normally would agree, except that this is an optimization that has been in for awhile, and maintaining it has a simple fix (the patch I sent last night should do it), until we have the longer term enhancement in.

I normally would agree, except that this is an optimization that has been in for awhile, and maintaining it has a simple fix (the patch I sent last night should do it), until we have the longer term enhancement in.

Right, but that's two separate discussions IMO: 1) the correctness fix, 2) adding a temporary and well scoped optimization patch waiting for a better solution (and a test-case will be needed).

I normally would agree, except that this is an optimization that has been in for awhile, and maintaining it has a simple fix (the patch I sent last night should do it), until we have the longer term enhancement in.

Right, but that's two separate discussions IMO: 1) the correctness fix, 2) adding a temporary and well scoped optimization patch waiting for a better solution (and a test-case will be needed).

Not sure I understand - the "temporary" optimization patch is already in (and works in many cases, just not when the reference from the const global init to the referenced local that needs promotion is through some levels of indirection) - this change would necessitate removing that.

Correct: this optimization is incorrect as is, and not tested, which seems a good reason to remove it :)
How to recover the optimization is open to discussion though: it seems that we agree on the right thing to do, but you also seem to target a temporary patch along the same line of what I'm removing here. I'm not opposed to the latter, but writing a correct and tested patch to have this optimization seems independent from fixing the correctness issue in the first place.

Correct: this optimization is incorrect as is, and not tested, which seems a good reason to remove it :)

The optimization is only incorrect in the transitively referenced case because the handling here and in exportGlobalInModule needs to be recursive. It's easy enough to fix (as shown in D26866 and my proposed patch to fix this case). My other patch D26866 adds a test case and it should be straightforward to modify the test added with the earlier patch that added the code here (D19102) to have regression testing.

How to recover the optimization is open to discussion though: it seems that we agree on the right thing to do, but you also seem to target a temporary patch along the same line of what I'm removing here. I'm not opposed to the latter, but writing a correct and tested patch to have this optimization seems independent from fixing the correctness issue in the first place.

I think the fix to make this work in the transitive reference case is pretty simple, as is adding tests for it. I am concerned about the cases where we lose performance where this is currently kicking in in the non-transitive case if we remove for some time before a summary-based approach is added.

krasin added a subscriber: krasin.Nov 21 2016, 11:16 PM

How to recover the optimization is open to discussion though: it seems that we agree on the right thing to do, but you also seem to target a temporary patch along the same line of what I'm removing here. I'm not opposed to the latter, but writing a correct and tested patch to have this optimization seems independent from fixing the correctness issue in the first place.

I think the fix to make this work in the transitive reference case is pretty simple, as is adding tests for it. I am concerned about the cases where we lose performance where this is currently kicking in in the non-transitive case if we remove for some time before a summary-based approach is added.

I collected SPEC performance data with the current optimization on (default) and disabled (in doPromoteLocalToGlobal), and there was no consistent performance delta. So I'll drop my objection to removing this optimization for now until we can do it in the thin link via the summary.

However, please update this patch to remove that handling from doPromoteLocalToGlobal and add a test case that would fail with the current patch until the doPromoteLocalToGlobal handling is removed (I suggest start with the test case added in D19102 but make referencedbyglobal a local so that the referencing global var would be imported as a copy instead of promoted - then referencedbyglobal should be referenced but not promoted).

How to recover the optimization is open to discussion though: it seems that we agree on the right thing to do, but you also seem to target a temporary patch along the same line of what I'm removing here. I'm not opposed to the latter, but writing a correct and tested patch to have this optimization seems independent from fixing the correctness issue in the first place.

I think the fix to make this work in the transitive reference case is pretty simple, as is adding tests for it. I am concerned about the cases where we lose performance where this is currently kicking in in the non-transitive case if we remove for some time before a summary-based approach is added.

I collected SPEC performance data with the current optimization on (default) and disabled (in doPromoteLocalToGlobal), and there was no consistent performance delta. So I'll drop my objection to removing this optimization for now until we can do it in the thin link via the summary.

However, please update this patch to remove that handling from doPromoteLocalToGlobal and add a test case that would fail with the current patch until the doPromoteLocalToGlobal handling is removed (I suggest start with the test case added in D19102 but make referencedbyglobal a local so that the referencing global var would be imported as a copy instead of promoted - then referencedbyglobal should be referenced but not promoted).

Ping. Want to get a fix in to unblock the Chromium build. Mehdi, do you want me to take this over or can you get the fix in soon?

tejohnson commandeered this revision.Dec 1 2016, 2:19 PM
tejohnson edited reviewers, added: mehdi_amini; removed: tejohnson.
tejohnson updated this revision to Diff 79983.Dec 1 2016, 2:23 PM

Complete the fix by removing special casing for static consts from
the promotion logic. Update tests.

mehdi_amini accepted this revision.Dec 1 2016, 4:51 PM
mehdi_amini edited edge metadata.

LGTM.
Thanks Teresa!

This revision is now accepted and ready to land.Dec 1 2016, 4:51 PM
This revision was automatically updated to reflect the committed changes.