This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)
ClosedPublic

Authored by hfinkel on Nov 29 2016, 3:18 PM.

Details

Summary

This patch aims to unify and correct our logic for when we need to allow for the possibility of the linker adding a TOC restoration instruction after a call. This comes up in two contexts:

  1. When determining tail-call eligibility. If we make a tail call (i.e. directly branch to a function) then there is no place for the linker to add a TOC restoration.
  2. When determining when we need to add a cop instruction after a call. Likewise, if there is a possibility that the linker might need to add a TOC restoration after a call, then we need to put a nop after the call (the bl instruction).

First problem: We were using similar, but different, logic to decide (1) and (2). This is just wrong. Both the resideInSameModule function (used when determining tail-call eligibility) and the isLocalCall function (used when deciding if the post-call nop is needed) were supposed to be determining the same underlying fact (i.e. might a TOC restoration be needed after the call). The same logic should be used in both places.

Second problem: The logic in both places was wrong. We only know that two functions will share the same TOC when both functions come from the same section of the same object. Otherwise the linker might cause the functions to use different TOC base addresses (unless the multi-TOC linker option is disabled, in which case only shared-library boundaries are relevant). There are a number of factors that can cause functions to be placed in different sections or some from different objects (-ffunction-sections, explicitly-specified section names, COMDAT, weak linkage, etc.). All of these need to be checked. The existing logic only checked properties of the callee, but the properties of the caller must also be checked (for example, calling from a function in a COMDAT section means calling between sections).

There was a conceptual error in the resideInSameModule function in that it allowed tail calls to functions with weak linkage and protected/hidden visibility. While protected/hidden visibility does prevent the function implementation from being replaced at runtime (via interposition), it does not prevent the linker from using an alternate implementation at link time (i.e. using some strong definition to replace the provided weak one during linking). If this happens, then we're still potentially looking at a required TOC restoration upon return.

Otherwise, in general, the post-call nop is needed wherever ELF interposition needs to be supported. We don't currently support ELF interposition at the IR level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html for more information), and don't think we should try to make it appear to work in the backend in spite of that fact. This will yield subtle bugs if interposition is attempted. As a result, regardless of whether we're in PIC mode, we don't assume that we need to add the nop to support the possibility of ELF interposition. However, the necessary check is in place (i.e. calling GV->isInterposable and TM.shouldAssumeDSOLocal) so when we have functions for which interposition is allowed at the IR level, we'll add the nop as necessary. In the mean time, we'll generate more tail calls and fewer nops when compiling position-independent code.

Diff Detail

Event Timeline

hfinkel updated this revision to Diff 79652.Nov 29 2016, 3:18 PM
hfinkel retitled this revision from to [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility).
hfinkel updated this object.
hfinkel added reviewers: joerg, echristo, kbarton.
hfinkel added a subscriber: llvm-commits.
kbarton accepted this revision.Dec 1 2016, 9:33 AM
kbarton edited edge metadata.

Aside from some minor comments, this LGTM.

lib/Target/PowerPC/PPCISelLowering.cpp
4128

Probably should update this comment, since we're now checking for a section, not a module.
Also, I'm not sure if the link below is still relevant

test/CodeGen/PowerPC/ppc64-blnop.ll
4

Do we also want to test -relocation-model=pic with -mtriple=powerpc64le-unknown-linux-gnu or will the code path be the same for powerpc64-unknown-linux-gnu?

5

Same question here, but for function-sections

This revision is now accepted and ready to land.Dec 1 2016, 9:33 AM
This revision was automatically updated to reflect the committed changes.

I found what was causing our tests to fail.

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
4017 ↗(On Diff #81349)

This line has to remain for the code to be correct, but it's overly broad. The correct line is:

if (TM.getRelocationModel() == Reloc::PIC_ &&

  GV->hasDefaultVisibility() &&
  !GV->hasLocalLinkage())
return false;

See PPCSubtarget::classifyGlobalReference in PPCSubtarget.cpp

under PIC, any non-local global is interposable, and so the linker will insert a trampoline for those calls.

I'll verify that this fixes the broken test, but it fixes the testcase I extracted.

I should clarify the problem.

LLVM / Clang may not support ELF interposition. But the linker just doesn't care. If we emit a symbol that is ExternallyAvailable and not local linkage, the linker will emit a trampoline that saves r2, even for a tail call.

This is a problem for a tail call as before the jump, it removes its own stack frame (or it lived in the red zone in the case I debugged), so the r2 being saved by the trampoline is not correct.

a -- calls -> b -- tail calls -> c

if a is in a different module from b, then a's saved copy of r2 will be clobbered by the b->c trampoline that the linker generates.

I should clarify the problem.

LLVM / Clang may not support ELF interposition. But the linker just doesn't care. If we emit a symbol that is ExternallyAvailable and not local linkage, the linker will emit a trampoline that saves r2, even for a tail call.

This is a problem for a tail call as before the jump, it removes its own stack frame (or it lived in the red zone in the case I debugged), so the r2 being saved by the trampoline is not correct.

a -- calls -> b -- tail calls -> c

if a is in a different module from b, then a's saved copy of r2 will be clobbered by the b->c trampoline that the linker generates.

Thanks for tracking this down! I'll update the patch accordingly.

I should clarify the problem.

LLVM / Clang may not support ELF interposition. But the linker just doesn't care. If we emit a symbol that is ExternallyAvailable and not local linkage, the linker will emit a trampoline that saves r2, even for a tail call.

This is a problem for a tail call as before the jump, it removes its own stack frame (or it lived in the red zone in the case I debugged), so the r2 being saved by the trampoline is not correct.

a -- calls -> b -- tail calls -> c

if a is in a different module from b, then a's saved copy of r2 will be clobbered by the b->c trampoline that the linker generates.

Thanks for tracking this down! I'll update the patch accordingly.

I've reapplied this in r291003 with a comment based on your explanation.