This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Fix thinLTO crash
AcceptedPublic

Authored by davidxl on Aug 7 2017, 9:02 PM.

Details

Summary

Compiling a simple program with a call to an external function with the same linkage name as a file static variable in another module with thinLTO, the compiler will crash. This patch fixed the problem.

Diff Detail

Event Timeline

davidxl created this revision.Aug 7 2017, 9:02 PM
mehdi_amini added inline comments.Aug 7 2017, 10:04 PM
test/Transforms/FunctionImport/funcimport_var.ll
13

Something seems strange to me: the static variable GUID shouldn't be the same as the one here: the static variable name should be prepended with the module name when computing its GUID, shouldn't it?

davidxl added inline comments.Aug 7 2017, 10:15 PM
test/Transforms/FunctionImport/funcimport_var.ll
13

That is what I thought too, but it is not implemented this way -- see computeVariableSummary and computeFunctionSummary, where summaries are added to the index with the original names.

tejohnson added inline comments.Aug 8 2017, 8:17 AM
test/Transforms/FunctionImport/funcimport_var.ll
13

The GVs get added to the per-module index (when we compute the summaries in the compile step) with the original name. But when the combined index is built during the thin link (actually when we read in the summaries in the bitcode reader) we add with a GUID that prepends the module path for statics.

So for the attached example, I would expect that funcimport_var.ll will end up in the index with a call to link which uses a GUID that is the MD5 hash of "link", whereas funcimport_var2.ll will have a definition for link which has the GUID that is the MD5 hash of something like "funcimport_var2.ll:link". Again in the combined index not the individual summaries in the .bc files

David, can you use the -print-summary-global-ids option to print the GUIDs when we read in the summaries during the thinlink action? And/or -debug to trace the importing to figure out why we pick that GUID which should be different.

davidxl added inline comments.Aug 8 2017, 8:47 AM
test/Transforms/FunctionImport/funcimport_var.ll
13

hers are the global ids printed. There are no related debug output.

GUID 1949444753616948755(1949444753616948755) is _Z4LinkPKcS0_
GUID 14802282251123568682(14802282251123568682) is link
GUID 1134627848233703551(14802282251123568682) is link
GUID 15625676307060645568(15625676307060645568) is get_link

The relevant stack trace of the crash when invoking llvm-lto with -thinlto-action=import:

#15 0x000000000233053e (anonymous namespace)::selectCallee
#16 0x00000000023309a8 (anonymous namespace)::computeImportForFunction
#17 0x00000000023310c0 (anonymous namespace)::ComputeImportForModule(
#18 0x0000000002331392 llvm::ComputeCrossModuleImport
#19 0x00000000017e7505 llvm::ThinLTOCodeGenerator::crossModuleImport
#20 0x0000000000415cbe thinlto::ThinLTOProcessing::import

tejohnson added inline comments.Aug 8 2017, 2:11 PM
test/Transforms/FunctionImport/funcimport_var.ll
13

As expected, the 2 "link" have different GUID. I'm not sure why we end up with the static one from another module in the callee list for the other module.

There is DEBUG output when we perform the thin link and select callees etc, -debug-only=function-import or -debug should presumably print it, not sure why you won't get any.

I will debug this a little more when I find the time.

got time to look at it more. The root cause of why the call resolves to a static variable in another module is from the following code in BitcodeWriter.cpp. The call to ::link was recorded to GUID: 14802282251123568682 which was correct. Since it does not have a value-id, the logic for sample PGO kicks in. 14802282251123568682 was treated as 'original guid' and the from the map, the PGO annotated GUID for the static var was found (note that one original guid maps to multiple GUIDs). After this, the call to :link now resolves to static var 'link's GUID.

3612│ for (auto &EI : FS->calls()) {
3613│ If this GUID doesn't have a value id, it doesn't have a function
3614│
summary and we don't need to record any calls to it.
3615│ GlobalValue::GUID GUID = EI.first.getGUID();
3616├> auto CallValueId = getValueId(GUID);
3617│ if (!CallValueId) {
3618│ For SamplePGO, the indirect call targets for local functions will
3619│
have its original name annotated in profile. We try to find the
3620│ // corresponding PGOFuncName as the GUID.
3621│ GUID = Index.getGUIDFromOriginalID(GUID);

davidxl updated this revision to Diff 110772.Aug 11 2017, 10:50 AM

Update the fix.

tejohnson accepted this revision.Aug 15 2017, 4:34 PM

got time to look at it more. The root cause of why the call resolves to a static variable in another module is from the following code in BitcodeWriter.cpp. The call to ::link was recorded to GUID: 14802282251123568682 which was correct. Since it does not have a value-id, the logic for sample PGO kicks in. 14802282251123568682 was treated as 'original guid' and the from the map, the PGO annotated GUID for the static var was found (note that one original guid maps to multiple GUIDs). After this, the call to :link now resolves to static var 'link's GUID.

I guess for sample PGO it will essentially guess as to which local with a matching original GUID is the intended one.

This fix LGTM, but please add a comment about how this is required because in this case there can be multiple matches with the original GUID, and we may pick wrong.

This revision is now accepted and ready to land.Aug 15 2017, 4:34 PM