This is an archive of the discontinued LLVM Phabricator instance.

[LTO] Fix a crash caused by accessing an empty ValueInfo
ClosedPublic

Authored by tejohnson on Nov 1 2018, 10:45 AM.

Details

Summary

[LTO] Fix a crash caused by accessing an empty ValueInfo

ModuleSummaryIndex::exportToDot crashes when linking the Linux kernel
under ThinLTO using LLVMgold.so. This is due to the exportToDot
function trying to get the GUID of an empty ValueInfo. The root cause
related to the fact that we attempt to get the GUID of an aliasee
via its OriginalGUID recorded in the aliasee summary, and that is not
always possible. Specifically, we cannot do this mapping when the value
is internal linkage and there were other internal linkage symbols with
the same name.

There are 2 fixes for the problem included here.

  1. In all cases where we can currently print the dot file from the

command line (which is only via save-temps), we have a valid AliaseeGUID
in the AliasSummary. Use that when it is available, so that we can get
the correct aliasee GUID whenever possible.

  1. However, if we were to invoke exportToDot from the debugger right

after it is built during the initial analysis step (i.e. the per-module
summary), we won't have the AliaseeGUID field populated. In that case,
we have a fallback fix that will simply print "@"+GUID when we aren't
able to get the GUID from the OriginalGUID. It simply checks if the VI
is valid or not before attempting to get the name. Additionally, since
getAliaseeGUID will assert that the AliaseeGUID is non-zero, guard the
earlier fix #1 by a new function hasAliaseeGUID().

Diff Detail

Repository
rL LLVM

Event Timeline

tmroeder created this revision.Nov 1 2018, 10:45 AM
evgeny777 added a subscriber: evgeny777.

Can you please post the *.index.bc file? It is saved before dot, so it should be there

Can you please post the *.index.bc file? It is saved before dot, so it should be there

Yes, but I'll need to repro on an upstream kernel tree rather than the internal one I was using, since I don't want to post something that might have Google-internal information in it. That might take a little time.

tejohnson added inline comments.Nov 1 2018, 12:04 PM
lib/IR/ModuleSummaryIndex.cpp
229 ↗(On Diff #172177)

Always calling the new version means that we will never print the name, even if there is a ValueInfo for this GUID. Can you call the appropriate version of getNodeVisualName based on whether the VI is empty? I.e. invoke ValueInfo::operator bool() on it.

tmroeder updated this revision to Diff 172217.Nov 1 2018, 1:16 PM

Use the ValueInfo for the name of an external node if the ValueInfo is not empty.

tmroeder marked an inline comment as done.Nov 1 2018, 1:19 PM
tmroeder added inline comments.
lib/IR/ModuleSummaryIndex.cpp
229 ↗(On Diff #172177)

Good point, thanks. I've changed this now to take the ValueInfo as an argument again and do that check before calling getNodeVisualName.

The change looks ok to me now, but we really need a test case for this. I tried unsuccessfully so far to induce the error. Was this produced when using the save-temps plugin-opt during your ThinLTO link? Can you go into the debugger and see what the value of E.Dst is?

I'm also at Google so feel free to reach out to me internally if your reproducer can't be shared upstream.

The change looks ok to me now, but we really need a test case for this. I tried unsuccessfully so far to induce the error. Was this produced when using the save-temps plugin-opt during your ThinLTO link? Can you go into the debugger and see what the value of E.Dst is?

I'm also at Google so feel free to reach out to me internally if your reproducer can't be shared upstream.

Via the test case Tom provided me, I can see what is going on. This is from an aliasee, where the aliasee is internal linkage, and because it has internal linkage and there are multiple internal symbols with the same name and internal linkage we are not able to get the GUID from the original GUID. I have a more robust fix, and can create a test case. Mind if I commandeer this patch?

tmroeder marked an inline comment as done.Nov 2 2018, 12:00 PM

The change looks ok to me now, but we really need a test case for this. I tried unsuccessfully so far to induce the error. Was this produced when using the save-temps plugin-opt during your ThinLTO link? Can you go into the debugger and see what the value of E.Dst is?

I'm also at Google so feel free to reach out to me internally if your reproducer can't be shared upstream.

Via the test case Tom provided me, I can see what is going on. This is from an aliasee, where the aliasee is internal linkage, and because it has internal linkage and there are multiple internal symbols with the same name and internal linkage we are not able to get the GUID from the original GUID. I have a more robust fix, and can create a test case. Mind if I commandeer this patch?

Sure, no problem. I don't know Phabricator well, so please let me know if there's something I need to do.

The change looks ok to me now, but we really need a test case for this. I tried unsuccessfully so far to induce the error. Was this produced when using the save-temps plugin-opt during your ThinLTO link? Can you go into the debugger and see what the value of E.Dst is?

I'm also at Google so feel free to reach out to me internally if your reproducer can't be shared upstream.

Via the test case Tom provided me, I can see what is going on. This is from an aliasee, where the aliasee is internal linkage, and because it has internal linkage and there are multiple internal symbols with the same name and internal linkage we are not able to get the GUID from the original GUID. I have a more robust fix, and can create a test case. Mind if I commandeer this patch?

Sure, no problem. I don't know Phabricator well, so please let me know if there's something I need to do.

Nope, I can do it. I'll do so when I have something ready to upload later today.

tejohnson commandeered this revision.Nov 2 2018, 2:06 PM
tejohnson edited reviewers, added: tmroeder; removed: tejohnson.

Uploading new patch shortly. It includes this earlier fix along with a newer one to better handle aliases, as it is needed as a fallback in certain cases as described in the updated description.

tejohnson updated this revision to Diff 172430.Nov 2 2018, 2:13 PM

Update with broader fix and test case

tejohnson retitled this revision from [lto] Fix a crash caused by accessing an empty ValueInfo to [LTO] Fix a crash caused by accessing an empty ValueInfo.Nov 2 2018, 2:15 PM
tejohnson edited the summary of this revision. (Show Details)

I can confirm that this version of the revision also fixes the bug I observed. I patched it in and built and re-ran the build step that failed.

However, the test case fails, at least for me:

FAIL: LLVM :: ThinLTO/X86/alias_internal.ll (22171 of 28416)
******************** TEST 'LLVM :: ThinLTO/X86/alias_internal.ll' FAILED ********************
Script:
--
: 'RUN: at line 4';   /usr/local/google/home/tmroeder/src/llvm/svn-build/release/bin/opt -module-summary /usr/local/google/home/tmroeder/src/llvm/svn/test/ThinLTO/X86/alias_internal.ll -o /usr/local/google/home/tmroeder/src/llvm/svn-build/release/test/ThinLTO/X86/Output/alias_internal.ll.tmp1.bc
: 'RUN: at line 5';   /usr/local/google/home/tmroeder/src/llvm/svn-build/release/bin/opt -module-summary /usr/local/google/home/tmroeder/src/llvm/svn/test/ThinLTO/X86/Inputs/alias_internal.ll -o /usr/local/google/home/tmroeder/src/llvm/svn-build/release/test/ThinLTO/X86/Output/alias_internal.ll.tmp2.bc
: 'RUN: at line 6';   /usr/local/google/home/tmroeder/src/llvm/svn-build/release/bin/llvm-lto2 run /usr/local/google/home/tmroeder/src/llvm/svn-build/release/test/ThinLTO/X86/Output/alias_internal.ll.tmp1.bc /usr/local/google/home/tmroeder/src/llvm/svn-build/release/test/ThinLTO/X86/Output/alias_internal.ll.tmp2.bc -o /usr/local/google/home/tmroeder/src/llvm/svn-build/release/test/ThinLTO/X86/Output/alias_internal.ll.tmp.out -save-temps    -r /usr/local/google/home/tmroeder/src/llvm/svn-build/release/test/ThinLTO/X86/Output/alias_internal.ll.tmp1.bc,a1,plx    -r /usr/local/google/home/tmroeder/src/llvm/svn-build/release/test/ThinLTO/X86/Output/alias_internal.ll.tmp2.bc,a2,plx
: 'RUN: at line 10';   cat /usr/local/google/home/tmroeder/src/llvm/svn-build/release/test/ThinLTO/X86/Output/alias_internal.ll.tmp.out.index.dot | /usr/local/google/home/tmroeder/src/llvm/svn-build/release/bin/FileCheck /usr/local/google/home/tmroeder/src/llvm/svn/test/ThinLTO/X86/alias_internal.ll
--
Exit Code: 1

Command Output (stderr):
--
/usr/local/google/home/tmroeder/src/llvm/svn/test/ThinLTO/X86/alias_internal.ll:11:14: error: CHECK-DAG: expected string not found in input
; CHECK-DAG: M0_12511626713252727690 -> M0_1505494552724110632 [style=dotted]; // alias
             ^
<stdin>:1:1: note: scanning from here
digraph Summary {
^
<stdin>:22:2: note: possible intended match here
 M0_12511626713252727690 -> M0_12714169483090670833 [style=dotted]; // alias
 ^

--

I reran the steps by hand without the FileCheck part of the pipe, and I get the following graph:

$ cat /usr/local/google/home/tmroeder/src/llvm/svn-build/release/test/ThinLTO/X86/Output/alias_internal.ll.tmp.out.index.dot
digraph Summary {
  // Module: /usr/local/google/home/tmroeder/src/llvm/svn-build/release/test/ThinLTO/X86/Output/alias_internal.ll.tmp2.bc
  subgraph cluster_1 {
    style = filled;
    color = lightgrey;
    label = "alias_internal.ll.tmp2.bc";
    node [style=filled,fillcolor=lightblue];
    M1_5602470142352873009 [shape="record",label="f|internal (inst: 1, ffl: 0000)}"]; // function
    M1_8129049334585965161 [style="dotted,filled",shape="box",label="a2"]; // alias
    // Edges:
    M1_8129049334585965161 -> M1_5602470142352873009 [style=dotted]; // alias
  }
  // Module: /usr/local/google/home/tmroeder/src/llvm/svn-build/release/test/ThinLTO/X86/Output/alias_internal.ll.tmp1.bc
  subgraph cluster_0 {
    style = filled;
    color = lightgrey;
    label = "alias_internal.ll.tmp1.bc";
    node [style=filled,fillcolor=lightblue];
    M0_12714169483090670833 [shape="record",label="f|internal (inst: 1, ffl: 0000)}"]; // function
    M0_12511626713252727690 [style="dotted,filled",shape="box",label="a1"]; // alias
    // Edges:
    M0_12511626713252727690 -> M0_12714169483090670833 [style=dotted]; // alias
  }
  // Cross-module edges:
}

It looks like the sources match in both CHECK-DAG cases but that the destinations don't.

It looks like the sources match in both CHECK-DAG cases but that the destinations don't.

Thanks for testing! Ack, that makes sense and I should have realized earlier. For internal symbols, the path is included with the name when computing the guid, specifically so multiple locals with the same name are disambiguated. So the rhs of the -> will be different when running the test in different places! Will upload a fix with a looser matching shortly.

tejohnson updated this revision to Diff 172463.Nov 2 2018, 4:37 PM

Fix test so that it will not try to match exact internal symbol GUIDs

tmroeder accepted this revision.Nov 2 2018, 4:49 PM

LGTM. With the looser condition, the tests pass for me.

This revision is now accepted and ready to land.Nov 2 2018, 4:49 PM
This revision was automatically updated to reflect the committed changes.