Page MenuHomePhabricator

Put "built-in" function definitions in global Used list, for LTO. (fix bug 34169)
ClosedPublic

Authored by cmtice on Jul 17 2018, 10:20 AM.

Details

Summary

When building with LTO, builtin functions that are defined but whose calls have not been inserted yet, get internalized. The Global Dead Code Elimination phase in the new LTO implementation then removes these function definitions. Later optimizations add calls to those functions, and the linker then dies complaining that there are no definitions. This CL fixes the new LTO implementation to check if a function is builtin, and if so, to not internalize (and later DCE) the function. As part of this fix I needed to move the RuntimeLibcalls.{def,h} files from the CodeGen subidrectory to the IR subdirectory. I have updated all the files that accessed those two files to access their new location.

Fixes PR34169

Diff Detail

Repository
rL LLVM

Event Timeline

cmtice created this revision.Jul 17 2018, 10:20 AM
cmtice edited the summary of this revision. (Show Details)Jul 17 2018, 10:26 AM
cmtice added a subscriber: llvm-commits.
tejohnson added inline comments.Jul 17 2018, 10:38 AM
lib/Transforms/IPO/GlobalOpt.cpp
2929 ↗(On Diff #155910)

I'm not sure if this is the right place to do this, since it doesn't need to be done iteratively, only once. I.e. this code is executed "while (LocalChange)".

Wondering if other reviewers have thoughts on where the right place to add these to the used is.

test/ThinLTO/X86/builtin-nostrip.ll
1 ↗(On Diff #155910)

You will probably need to add a line like:
; REQUIRES: x86-registered-target

to prevent non-x86 bot failures.

15 ↗(On Diff #155910)

Need to add an llvm-nm invocation on the correct object file emitted by llvm-lto2. Otherwise you aren't doing any checking like you are for llvm-lto above.

22 ↗(On Diff #155910)

This llvm-lto2 invocation is exactly the same as the prior one - I don't think you should do this twice?

51 ↗(On Diff #155910)

Needs some kind of FileCheck on one of the output files, similar to my earlier comment on the first llvm-lto2 invocation.

pcc added a comment.Jul 17 2018, 10:39 AM

I don't think that globalopt is the right place for this because there is no guarantee that the pass is run. If we wanted to do something about this, it seems like a better place would be in lib/Analysis/ModuleSummaryAnalysis.cpp where we decide which functions are GC roots.

Also this is only a subset of the functions that the code generator can generate calls to. There is a list of runtime functions in llvm/include/llvm/CodeGen/RuntimeLibcalls.def, but as far as I know it isn't 100% comprehensive. Using that list would probably be better than just 3 functions though.

pcc@ Originally I tried to make the change in ModuleSummaryAnalysis.cpp, but it turns out that this is not allowed in LLVM: I am not supposed to make changes to the IR in the Analysis layer of LLVM. I need to find some place in the Transform layer to make the change. Hence my first guess as to a location: GlobalOpt.cpp. I would be happy to make the change in some other location instead, but it must be someplace in the Transform layer.

pcc added a comment.Jul 17 2018, 12:21 PM

You wouldn't be making changes to the IR, only the summary. I was thinking that you might add code to http://llvm-cs.pcc.me.uk/lib/Analysis/ModuleSummaryAnalysis.cpp#534 that would effectively do:

if (ValueInfo VI = Index.getValueInfo(GlobalValue::getGUID("memcpy")))
    for (auto &Summary : VI.getSummaryList())
      Summary->setLive(true);
if (ValueInfo VI = Index.getValueInfo(GlobalValue::getGUID("memmove")))
    for (auto &Summary : VI.getSummaryList())
      Summary->setLive(true);
// etc.
In D49434#1165612, @pcc wrote:

You wouldn't be making changes to the IR, only the summary. I was thinking that you might add code to http://llvm-cs.pcc.me.uk/lib/Analysis/ModuleSummaryAnalysis.cpp#534 that would effectively do:

if (ValueInfo VI = Index.getValueInfo(GlobalValue::getGUID("memcpy")))
    for (auto &Summary : VI.getSummaryList())
      Summary->setLive(true);
if (ValueInfo VI = Index.getValueInfo(GlobalValue::getGUID("memmove")))
    for (auto &Summary : VI.getSummaryList())
      Summary->setLive(true);
// etc.

If I recall correctly it wasn't the LTO dead code elimination that was removing them. Caroline, I forget what was happening in the original case - were they being internalized by LTO and then eliminated by GlobalDCE? If so, then Peter's approach should fix the issue by preventing them from being internalized (at least by ThinLTO, not sure about regular LTO).

In D49434#1165381, @pcc wrote:

I don't think that globalopt is the right place for this because there is no guarantee that the pass is run. If we wanted to do something about this, it seems like a better place would be in lib/Analysis/ModuleSummaryAnalysis.cpp where we decide which functions are GC roots.

Also this is only a subset of the functions that the code generator can generate calls to. There is a list of runtime functions in llvm/include/llvm/CodeGen/RuntimeLibcalls.def, but as far as I know it isn't 100% comprehensive. Using that list would probably be better than just 3 functions though.

FWIW I agree. Having a more central place that these functions are gotten will help avoid a "just this function, and this one" set of patches in the future.

In D49434#1165612, @pcc wrote:

You wouldn't be making changes to the IR, only the summary. I was thinking that you might add code to http://llvm-cs.pcc.me.uk/lib/Analysis/ModuleSummaryAnalysis.cpp#534 that would effectively do:

if (ValueInfo VI = Index.getValueInfo(GlobalValue::getGUID("memcpy")))
    for (auto &Summary : VI.getSummaryList())
      Summary->setLive(true);
if (ValueInfo VI = Index.getValueInfo(GlobalValue::getGUID("memmove")))
    for (auto &Summary : VI.getSummaryList())
      Summary->setLive(true);
// etc.

If I recall correctly it wasn't the LTO dead code elimination that was removing them. Caroline, I forget what was happening in the original case - were they being internalized by LTO and then eliminated by GlobalDCE? If so, then Peter's approach should fix the issue by preventing them from being internalized (at least by ThinLTO, not sure about regular LTO).

The legacy LTOCodeGenerator has a preserve list which you can declare symbols to preserve (which is essentially the same mechanism as this implementation). Not sure what is the equivalent for the C++ API but I guess you can just reuse that?

cmtice updated this revision to Diff 156360.Jul 19 2018, 2:51 PM
cmtice edited the summary of this revision. (Show Details)

pcc's original suggested fix did not work, because the real issue was the function definitions being initially internalized by LTO and later DCE'd. Peter's suggestion did not prevent them from being internalized. I consulted with him offline and we came up with this alternative solution. It fixes the issue for the new LTO implemention. The old LTO implementation still internalizes the function definitions, but they do not get Dead Code Eliminated. I have also fixed the test case, and have generalized the code to check for all builtin functions (as requested).

The old LTO implementation still internalizes the function definitions, but they do not get Dead Code Eliminated.

Quick clarification based on our discussion earlier - the old LTO implementation internalizes but does not eliminate with regular LTO. The old LTO implementation with ThinLTO is still broken (hence that part of the test is currently disabled).

pcc added inline comments.Jul 19 2018, 3:39 PM
include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
28 ↗(On Diff #156360)

I don't think we should move RuntimeLibcalls.h (because it references CodeGen internals), just RuntimeLibcalls.def.

lib/Object/IRSymtab.cpp
26 ↗(On Diff #156360)

With my suggestion for the array you shouldn't need to include this header.

46 ↗(On Diff #156360)

I think this can just be a statically initialized array. You can use something like:

static const char *LibcallRoutineNames[] = {
#define HANDLE_LIBCALL(code, name) name,
#include "llvm/IR/RuntimeLibcalls.def"
};
239 ↗(On Diff #156360)

Use CamelCase to match the surrounding style.

241 ↗(On Diff #156360)

Use a range for loop to enumerate this array.

cmtice updated this revision to Diff 156500.Jul 20 2018, 8:36 AM

I've made Peter's suggested changes.

pcc added inline comments.Jul 20 2018, 10:23 AM
lib/Object/IRSymtab.cpp
235 ↗(On Diff #156500)

IsBuiltinFunc

237 ↗(On Diff #156500)

I don't think we need this FIXME. As far as I know, this problem is already "solved" in the old LTO API by the client providing an appropriate preserve list.

241 ↗(On Diff #156500)

LibcallName

test/ThinLTO/X86/builtin-nostrip.ll
1 ↗(On Diff #156500)

A lot of this test case can be removed I think.

  • You just need one libcall function, testing more than one wouldn't give you additional coverage.
  • A lot of the test input (including the second input file) is irrelevant to what you're trying to test. Basically I think this test just needs two functions, both with resolution p, one which is a libcall and one which isn't, and you would test that the first one is kept and the second one isn't.
  • Remove the parts about the old LTO API because the problem is solved in a different way there.
tejohnson added inline comments.Jul 20 2018, 10:31 AM
lib/Object/IRSymtab.cpp
237 ↗(On Diff #156500)

I disagree regarding the old LTO API - just as we are solving it here for all clients of the new LTO API, I think it would be nice to be consistent and user-friendly to solve it in a similar way in the old LTO API. However, I don't believe the FIXME belongs here. I would just move it to the new test case above the llvm-lto line that has the commented out llvm-nm check.

cmtice updated this revision to Diff 156553.Jul 20 2018, 11:25 AM
cmtice marked 3 inline comments as done.

Made requested fixes.

For the old API, I don't think it is reasonable to expect user or linker to pass in the correct preserve list. There is likely no knowledge outside the LTO modules that these lib funcs are needed.

The old LTO implementation still internalizes the function definitions, but they do not get Dead Code Eliminated.

Quick clarification based on our discussion earlier - the old LTO implementation internalizes but does not eliminate with regular LTO. The old LTO implementation with ThinLTO is still broken (hence that part of the test is currently disabled).

This also sounds pretty fragile (not just LTO, but in general). If you have a module with implementation of memcpy and all you have is memcpy builtin function, DCE might just cause linker failure. I wonder the problem should be solved in a lower level.
People who actually implement libfunc in the module are probably working on some embedded environment and code size is probably a big concern. It would be good if those libfuncs can be eliminated when it is safe to do so.

pcc added a comment.Jul 20 2018, 12:27 PM

For the old API, I don't think it is reasonable to expect user or linker to pass in the correct preserve list. There is likely no knowledge outside the LTO modules that these lib funcs are needed.

I was under the impression that that was how things worked already on Apple platforms but given that it isn't, I agree.

The old LTO implementation still internalizes the function definitions, but they do not get Dead Code Eliminated.

Quick clarification based on our discussion earlier - the old LTO implementation internalizes but does not eliminate with regular LTO. The old LTO implementation with ThinLTO is still broken (hence that part of the test is currently disabled).

This also sounds pretty fragile (not just LTO, but in general). If you have a module with implementation of memcpy and all you have is memcpy builtin function, DCE might just cause linker failure. I wonder the problem should be solved in a lower level.
People who actually implement libfunc in the module are probably working on some embedded environment and code size is probably a big concern. It would be good if those libfuncs can be eliminated when it is safe to do so.

I would imagine that folks who care about code size would be linking with --gc-sections (or -dead_strip in ld64) so the library functions would end up being dropped if they aren't actually needed.

In D49434#1170267, @pcc wrote:

I would imagine that folks who care about code size would be linking with --gc-sections (or -dead_strip in ld64) so the library functions would end up being dropped if they aren't actually needed.

This is true.

This also sounds pretty fragile (not just LTO, but in general). If you have a module with implementation of memcpy and all you have is memcpy builtin function, DCE might just cause linker failure. I wonder the problem should be solved in a lower level.

Now I think again, it seems this is LTO specific because I can't think of any other condition that those function can be internal.

In D49434#1170267, @pcc wrote:

For the old API, I don't think it is reasonable to expect user or linker to pass in the correct preserve list. There is likely no knowledge outside the LTO modules that these lib funcs are needed.

I was under the impression that that was how things worked already on Apple platforms but given that it isn't, I agree.

No, it doesn't work. I think it just happens that regularLTO from old API doesn't actually drop them. I think it would be better to update the Internalize pass used by regularLTO C API to not internalize those function to be safe (it already has an exception for stack protector symbols) just in case. The thinLTO C API shouldn't be that hard to fix as well. I am ok to fix them with a separate commits.

cmtice updated this revision to Diff 156608.Jul 20 2018, 2:35 PM

I just realized that my patch did not contain the actual move of RuntimeLibcalls.def from CodeGen to IR. This update includes that move.

pcc added inline comments.Jul 20 2018, 2:47 PM
test/ThinLTO/X86/builtin-nostrip.ll
1 ↗(On Diff #156500)

Can you simplify the test case as I suggested? I think the first two bullet points should still apply.

cmtice updated this revision to Diff 156637.Jul 20 2018, 4:34 PM

Simplify test case, as requested by Peter.

I *think* this takes care of all the changes I was requested to make. Have I missed any?

pcc accepted this revision.Jul 20 2018, 5:45 PM

LGTM

test/ThinLTO/X86/builtin-nostrip.ll
40 ↗(On Diff #156637)

Comment should refer to __stack_chk_fail.

This revision is now accepted and ready to land.Jul 20 2018, 5:45 PM
This revision was automatically updated to reflect the committed changes.

Hi Caroline, I was looking at this patch as I needed to reference it in another discussion, and realized there was no test committed with the patch. Looks like old review comments exist for a test/ThinLTO/X86/builtin-nostrip.ll on a prior version of the patch, but that seems to have disappeared on the version that got committed (probably just a mistake during the commit process since it was a new file). Can you grab that test off the prior diff and commit it now?

Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2019, 8:24 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript
cmtice added a comment.Apr 1 2019, 9:28 AM

Test case has been committed, as of revision 357409.