This is an archive of the discontinued LLVM Phabricator instance.

Remove unnecessary load via GOT when accessing globals with PIE in x86_64
ClosedPublic

Authored by tmsriram on Apr 12 2016, 4:10 PM.

Details

Summary

PR #24964

LLVM generates the following code with PIE for this example:

hello.cpp

int a = 0;

int main() {

return a;
}

$ clang -O2 -fPIE hello.cc -S
$ cat hello.s

main: # @main
movq a@GOTPCREL(%rip), %rax
movl (%rax), %eax
retq

Creating a GOT entry for global 'a' and storing its address there is unnecessary as 'a' is defined in hello.cpp which will be linked into a position independent executable (fPIE). Hence, the definition of 'a' cannot be overridden and we can remove a load. The efficient access is this:

main: # @main
movl a(%rip), %eax
retq

This patch fixes the problem.

Diff Detail

Repository
rL LLVM

Event Timeline

tmsriram updated this revision to Diff 53485.Apr 12 2016, 4:10 PM
tmsriram retitled this revision from to Remove unnecessary load via GOT when accessing globals with PIE in x86_64.
tmsriram updated this object.
tmsriram added reviewers: rnk, davidxl.
tmsriram added a subscriber: llvm-commits.
rnk edited edge metadata.Apr 12 2016, 4:24 PM

This needs a new LLVM IR lit test. Take what you have in the comments and add -S -emit-llvm to the clang command line, and you'll have the beginning of one. You should probably test stores, loads, and taking the address of such a global. You should probably also exhaustively test the linkage types. The non-ODR linkage types in particular probably can't use this optimization.

lib/Target/X86/X86Subtarget.cpp
91 ↗(On Diff #53485)

You don't want hasExactDefinition, it will be false for linkonce_odr and weak_odr globals, which can benefit from this more efficient access. I think you want something equivalent to this condition:

TM.Options.PositionIndependentExecutable && !GV->isDeclaration() && !GV->mayBeOverridden()
DavidKreitzer added inline comments.
lib/Target/X86/X86Subtarget.cpp
103 ↗(On Diff #53485)

Do you want to fix the 32-bit case too? Under the same isPIEDefinition condition as in the 64-bit case, you can use X86II::MO_GOTOFF which also saves a load from the GOT.

AsafBadouh added inline comments.
lib/Target/X86/X86Subtarget.cpp
91 ↗(On Diff #53485)

I agree with RNK,
I'm not sure if it will handle weak attribute and reference to function.
for example:

long  bar_add; 
int __attribute__((weak)) weak_x = 3; 
int ext_bar(void);  // extern function

int bar()
{
  return (long)&ext_bar; // reading address of extern function should use GOT
}

long foo()
{
  bar_add = (long)&bar;       // reading address of bar should be directly (not by GOT)
  return bar_add
          + weak_x;
}

bar() should be compute directly, (not loaded from GOT like extern ext_bar()
so it should generate code like:

bar():
[...]
        mov     rax, QWORD PTR ext_bar()@GOTPCREL[rip]
[...]
        ret
foo():
[...]
        lea     rax, bar()[rip]
        mov     QWORD PTR bar_add[rip], rax
        mov     eax, DWORD PTR weak_x[rip]
[...]
        ret

I think the condition should be something similar to this :

TM.Options.PositionIndependentExecutable && !(GV->isDeclarationForLinker() && isa<Function>(GV))
tmsriram added inline comments.Apr 18 2016, 12:11 PM
lib/Target/X86/X86Subtarget.cpp
91 ↗(On Diff #53485)

It handles function pointer references fine. FWIW, I tried the same example with my patch and I got the code you desired. Checking explicity for functions is not necessary. I will incorporate rnk@'s changes.

tmsriram updated this object.Apr 18 2016, 4:19 PM
tmsriram edited edge metadata.
tmsriram updated this revision to Diff 54430.Apr 20 2016, 3:02 PM
  • Fixed the check that decides when to not use GOT.
  • Extended the test case for x86-64.
tmsriram updated this revision to Diff 54441.Apr 20 2016, 5:12 PM

Handle global access for 32-bit.

tmsriram updated this revision to Diff 54442.Apr 20 2016, 5:17 PM

Fix emutls-pie.ll

AsafBadouh added inline comments.Apr 21 2016, 1:17 AM
test/CodeGen/X86/global-access-pie.ll
30 ↗(On Diff #54442)

I think that weak global symbols should not be referred with GOT.
gcc5.3.0 for example will generate:

mov b(%rip), %rax
tmsriram added inline comments.Apr 21 2016, 9:36 AM
test/CodeGen/X86/global-access-pie.ll
30 ↗(On Diff #54442)

I agree and I know about it. I am going to work on another patch that uses copy relocations to avoid the GOT for any kind of global access. This what happens in GCC now with PIE. I punted on this as this case would be handled then. Maybe I should not and just fix this too.

tmsriram updated this revision to Diff 54544.Apr 21 2016, 11:01 AM
  • Moved check for Global Definition in PIE to a new function.
  • Fixed weak global symbol access for PIE to not use the GOT.
rnk added inline comments.Apr 21 2016, 11:33 AM
lib/Target/X86/X86Subtarget.cpp
62–63 ↗(On Diff #54544)

This isn't the usual formatting, consider running clang-format

65–67 ↗(On Diff #54544)

After doing some experiments, I think the predicate we want here is really TM.Options.PositionIndependentExecutable && !GV->isDeclarationForLinker().

Basically, once you know you are looking at a symbol definition that will become part of an executable, the only thing that can replace it is another definition within the executable. The glibc loader does *not* appear to replace weak symbols in an executable with strong symbols from a DSO. I mistakenly thought it would work the same way the linker does, where strong symbols beat weak symbols.

tmsriram updated this revision to Diff 54556.Apr 21 2016, 11:54 AM
  • Fix check for global definition.
  • Move the definition of isGlobalDefinedInPIE into X86Subtarget.h
rnk accepted this revision.Apr 21 2016, 12:33 PM
rnk edited edge metadata.

lgtm

We might consider merging the new tests with pie.ll, but that is not essential.

This revision is now accepted and ready to land.Apr 21 2016, 12:33 PM
This revision was automatically updated to reflect the committed changes.