Page MenuHomePhabricator

[ELF] - Implemented optimization for R_X86_64_GOTPCREL relocation.
ClosedPublic

Authored by grimar on Dec 25 2015, 9:16 AM.

Details

Summary

System V Application Binary Interface AMD64 Architecture Processor Supplement Draft Version 0.99.8
(https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-r249.pdf, B.2 "B.2 Optimize GOTPCRELX Relocations")
introduces possible relaxations for R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX.

That patch implements the next relaxation:
mov foo@GOTPCREL(%rip), %reg => lea foo(%rip), %reg
and also opens door for implementing all other ones.

Implementation was suggested by Rafael Ávila de Espíndola with few additions and testcases by myself.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar updated this revision to Diff 43635.Dec 25 2015, 9:16 AM
grimar retitled this revision from to [ELF] - Implemented optimization for R_X86_64_GOTPCREL relocation..
grimar updated this object.
grimar added reviewers: ruiu, rafael.
grimar added subscribers: llvm-commits, grimar.
rafael edited edge metadata.Jan 18 2016, 8:13 AM

Do you also intend to also implement the other optimizations listed in the ABI?

ELF/Target.cpp
664 ↗(On Diff #43635)

canBePreempted already checks this, no?

test/ELF/gotpc-relax-und-dso.s
24 ↗(On Diff #43635)

Use {{.*}} instead of checking the bits.

grimar updated this revision to Diff 45247.Jan 19 2016, 5:54 AM
grimar edited edge metadata.
grimar marked 2 inline comments as done.
  • Addressed review comments
  • Rebased

Do you also intend to also implement the other optimizations listed in the ABI?

I have plan to implement all those which listed in "B.2 Optimize GOTPCRELX Relocations":
call *foo@GOTPCREL(%rip) -> nop call foo
call *foo@GOTPCREL(%rip) -> call foo nop
jmp *foo@GOTPCREL(%rip) -> jmp foo nop
mov foo@GOTPCREL(%rip), %reg -> lea foo(%rip), %reg (this patch implements it)

I have no any plans about any others optimizations.

ELF/Target.cpp
664 ↗(On Diff #43635)

Looks like so, fixed.

test/ELF/gotpc-relax-und-dso.s
24 ↗(On Diff #43635)

I leaved bits specially here. I think for patches that emit binary data it is reasonable to check it.
For example the same instruction can be encoded in a different way. This one patch changes one byte and so I would like to check binary output as well as instructions generated.

One thing to keep in mind is the possibility of overflow (https://sourceware.org/bugzilla/show_bug.cgi?id=18591). No need to handle it now since llvm is also broken (pr 26208). I will try to fix the LLVM, at which point we should probably check for overflows in lld.

ELF/Target.cpp
698 ↗(On Diff #45247)

Update the comment to say that we can do mov -> lea with R_X86_64_GOTPCREL, but we need R_X86_64_GOTPCRELX for the other optimizations

702 ↗(On Diff #45247)

Could canBePreempted take care of the ifunc check?

955 ↗(On Diff #45247)

You are passing -1 in here because the offset is not available, correct?
Is there any point in ever passing the offset if we cannot pass it here?

grimar added inline comments.Jan 19 2016, 8:34 AM
ELF/Target.cpp
698 ↗(On Diff #45247)

Ok, I can do that.
I just supposed that since we are not going to implement any other ones before R_X86_64_GOTPCRELX then such comment is excessive.
Current one just describes what we do, not sure if we want to list everything we will not do :)

702 ↗(On Diff #45247)

I would not do it.
For example we have the next code currently:

if (!CanBePreempted && Body && isGnuIFunc<ELFT>(*Body))
      Reloc = Target->getIRelativeReloc();

Currently we implemented ifunc for static linking case. But afaik gold/bfd supports dynamic case for which real CBP makes sence. And I am not sure we should cover such special type under clear for understanding canBePreempted() name.

955 ↗(On Diff #45247)

I cannot pass it here, but I still do the check for that place inside canOptimizeGotPcRel().
I just assume that there is not possible to have negative buffer overflow from here.
My point was: since we dont have R_X86_64_GOTPCRELX yet its better to do all possible additional checks. After implementation of R_X86_64_GOTPCRELX we probably can remove it (like we dont check overflows for TLS relaxations, assuming inputs are correct).

Also I am not sure will we still perform relaxation for raw R_X86_64_GOTPCREL then or not (I quess better would be to stop support to be consistent with ABI, but if both gold/bfd do that then it is still safe I think).

One thing to keep in mind is the possibility of overflow (https://sourceware.org/bugzilla/show_bug.cgi?id=18591). No need to handle it now since llvm is also broken (pr 26208). I will try to fix the LLVM, at which point we should probably check for overflows in lld.

That is interesting issue, thaks for info.

Could canBePreempted take care of the ifunc check?

I would not do it.
For example we have the next code currently:

if (!CanBePreempted && Body && isGnuIFunc<ELFT>(*Body))
      Reloc = Target->getIRelativeReloc();

Currently we implemented ifunc for static linking case. But afaik gold/bfd supports dynamic case for which real CBP makes sence. And I am not sure we should cover such special type under clear for understanding canBePreempted() name.

Good point. Thanks.

================
Comment at: ELF/Target.cpp:955
@@ -906,1 +954,3 @@

case R_X86_64_GOTPCREL:

+ if (S && canOptimizeGotPcRel(Type, *S, Loc, (uint64_t)-1))
+ optimizeGotPcRel(Loc);


rafael wrote:

You are passing -1 in here because the offset is not available, correct?
Is there any point in ever passing the offset if we cannot pass it here?

I cannot pass it here, but I still do the check for that place inside canOptimizeGotPcRel().
I just assume that there is not possible to have negative buffer overflow from here.
My point was: since we dont have R_X86_64_GOTPCRELX yet its better to do all possible additional checks. After implementation of R_X86_64_GOTPCRELX we probably can remove it (like we dont check overflows for TLS relaxations, assuming inputs are correct).

You would still get a buffer overflow. Say Off is 1. On the first call
you pass 1 and avoid the buffer access. On the second one you pass -1
and still access position -2.

The options I see are

  • Save the result of the call.
  • Propagate Off to RelocateOne
  • Don't check it.

I would go with 3 for now.

Could canBePreempted take care of the ifunc check?

I would not do it.
For example we have the next code currently:

if (!CanBePreempted && Body && isGnuIFunc<ELFT>(*Body))
      Reloc = Target->getIRelativeReloc();

Currently we implemented ifunc for static linking case. But afaik gold/bfd supports dynamic case for which real CBP makes sence. And I am not sure we should cover such special type under clear for understanding canBePreempted() name.

Good point. Thanks.

================
Comment at: ELF/Target.cpp:955
@@ -906,1 +954,3 @@

case R_X86_64_GOTPCREL:

+ if (S && canOptimizeGotPcRel(Type, *S, Loc, (uint64_t)-1))
+ optimizeGotPcRel(Loc);


rafael wrote:

You are passing -1 in here because the offset is not available, correct?
Is there any point in ever passing the offset if we cannot pass it here?

I cannot pass it here, but I still do the check for that place inside canOptimizeGotPcRel().
I just assume that there is not possible to have negative buffer overflow from here.
My point was: since we dont have R_X86_64_GOTPCRELX yet its better to do all possible additional checks. After implementation of R_X86_64_GOTPCRELX we probably can remove it (like we dont check overflows for TLS relaxations, assuming inputs are correct).

You would still get a buffer overflow. Say Off is 1. On the first call
you pass 1 and avoid the buffer access. On the second one you pass -1
and still access position -2.

Yes, I know, as "overflow" I was mean that there would not be access violation in that place.
(It is not possible to get it here I believe because of lot of other data in front of buffer before relocations).
So the worst thing could happen is a possible read of some side data at second check. But at least one correct check would still be performed (first one).

The options I see are

  • Save the result of the call.
  • Propagate Off to RelocateOne
  • Don't check it.

    I would go with 3 for now.

I am agree with that. Now I think that having partial checks is no better than absence of them here.

grimar added a comment.EditedJan 19 2016, 12:20 PM

Lots of tests started to crash if I remove the "Off <= 2" check. I think that was the reason why I added that initially.

25> lld :: ELF/dynamic-reloc-weak.s
25> lld :: ELF/got.s
25> lld :: ELF/gotpc-relax-und-dso.s
25> lld :: ELF/local-got-shared.s
25> lld :: ELF/local-got.s
25> lld :: ELF/relocation.s
25> lld :: ELF/relro.s

For example ELF/local-got.s which has

_start:
	call bar@gotpcrel

So it looks would be better or to leave check as is for now or to add Off argument to RelocateOne.
I think we should check the buffer somehow to have correct solution. So I would add the argument.

Lots of tests started to crash if I remove the "Off <= 2" check. I think that was the reason why I added that initially.

I`ll recheck the reasons of it and how to fix that tomorrow. Probably I am mistaken about that Off argument is needed. Will update the patch then.

ruiu edited edge metadata.Jan 19 2016, 3:47 PM

This is not a comment for this particular patch, but in general, pieces of code for code relaxation are scattered to many places -- you have multiple calls of canXXX and call optimizeXXX at some place. It tend to hard to understand. Each patch doesn't increase complexity that much, but when they accumulate, they look being entangled.

Can we separate code relaxation from relocation application? I think that we can create a function that visits all relocations and rewrite code and possibly relocations to relax code, and call that function before applying relocations. Then when we are applying relocations, we don't need to think about code relaxation at all. Anyway, I'd like to find some way to reduce complexity of relocation application.

In D15779#330631, @ruiu wrote:

This is not a comment for this particular patch, but in general, pieces of code for code relaxation are scattered to many places -- you have multiple calls of canXXX and call optimizeXXX at some place. It tend to hard to understand. Each patch doesn't increase complexity that much, but when they accumulate, they look being entangled.

Can we separate code relaxation from relocation application? I think that we can create a function that visits all relocations and rewrite code and possibly relocations to relax code, and call that function before applying relocations. Then when we are applying relocations, we don't need to think about code relaxation at all. Anyway, I'd like to find some way to reduce complexity of relocation application.

I need to think about how possible to do that better. In general that looks for me like additional pass through all relocations and marking them or excluding from futher pass. Idea itself looks attractive for me, but I afraid of possible code duplication that might happen in that case.
Also do you mean you want to see that change before landing any other relaxation patch or its fine to make it right after current pending ones (there are two of them from my side now at reviews: http://reviews.llvm.org/D16201, http://reviews.llvm.org/D15779) ?

grimar updated this revision to Diff 45368.Jan 20 2016, 3:53 AM
grimar edited edge metadata.
grimar marked 4 inline comments as done.
  • Addressed review comments.
  • Removed Off argument.
  • Added error message for case when relocation overflow happens during relaxation.
grimar added a comment.EditedJan 20 2016, 4:02 AM

One thing to keep in mind is the possibility of overflow (https://sourceware.org/bugzilla/show_bug.cgi?id=18591). No need to handle it now since llvm is also broken (pr 26208). I will try to fix the LLVM, at which point we should probably check for overflows in lld.

Regarding this I added error message to relocateOne() and that have 2 problems:

  1. I just unsure how to find out the overflow and not convert mov->lea before we already applying relocations in relocateOne(). I need to find out the distance between symbol and instructions really early to use canOptimizeGotPcRel() method for avoiding create of GOT entries and so on.
  2. I did not include the test for it, because it creates 4gb file, I am not sure that is good for test. But I used sample from https://sourceware.org/bugzilla/show_bug.cgi?id=18591 to check that error is displaying.
rafael added inline comments.Jan 20 2016, 6:37 AM
ELF/Target.cpp
782 ↗(On Diff #45368)

This highlights Rui's comment that we are spreading optimizations out too much. A relocation being relative or not should really not depend on it being optimized. If it is optimized, some other relocation takes its place.

Can you just check for optimization at the caller and pass the new relocation value?

Or, we should try not computing the size upfront:

It might be possible to avoid this by outputting the file with write:
* Write the allocated output sections, computing addresses.
* Apply relocations, recording which ones require a dynamic reloc.
* Write the dynamic relocations.
// * Write the rest of the file.

emaste added a subscriber: emaste.Feb 11 2016, 2:20 PM

X86_64TargetInfo::canOptimizeGotPcRel() needs to have check for "_DYNAMIC" symbol either (since we almost have it, http://reviews.llvm.org/D17607) + testcase.
Both gold/bfd does not apply such optimization for it.

grimar updated this revision to Diff 50918.EditedMar 17 2016, 3:13 AM

That patch was old, so I:

  • Rebased it
  • Added _DYNAMIC symbol ignore.
  • Reformated the testcases.
grimar updated this revision to Diff 53862.Apr 15 2016, 2:52 AM
  • Reimplemented to match updated relocations handling code.
grimar updated this revision to Diff 57580.May 18 2016, 2:18 AM

Rebased to fit with the latest llvm and lld code.

Note: this version does not check the instuction op code and does not relax with common gotpcrel relocation.
I think it is save and correct now just to handle R_X86_64_REX_GOTPCRELX one for what this patch do.
This approach simplifies the code a little and should be fully compatible with ABI now.

rafael added inline comments.May 18 2016, 12:20 PM
ELF/Target.cpp
513 ↗(On Diff #57580)

Why is _DYNAMIC special?

The other checks are not architecture specific and should be in architecture independent code.

537 ↗(On Diff #57580)

I would probably have this return R_GOT_PC and then have a generic logic for "can we optimize this got use". This would be somewhat similar to the fact that we optimize plt acesses to local symbols.

grimar updated this revision to Diff 57761.May 19 2016, 3:53 AM
  • Reimplemented in more generic way (addressed review comments).
ELF/Target.cpp
513 ↗(On Diff #57580)

That is something directly mentioned in docs I think, but initially in binutils there also were no exception for _DYNAMIC,
and they had to do that finally because it turns out ld.so uses it:
https://sourceware.org/ml/binutils/2012-09/msg00000.html

537 ↗(On Diff #57580)

Done, reimplemented in more generic way.

grimar added inline comments.May 19 2016, 3:55 AM
ELF/Target.cpp
529 ↗(On Diff #57761)

Something *not* mentioned, I mean.

I am reworking this, it was not correct to remove opcode check (I was for some reason thinking that R_X86_64_REX_GOTPCRELX is generated specially for mov->lea relaxation),
so I plan to update this soon.

grimar updated this revision to Diff 58087.May 23 2016, 6:18 AM
  • Restored removed earlier by mistake a check of opcode for mov->lea conversion.
  • Updated code and testcases to produce/relax R_X86_64_GOTPCRELX.
rafael added inline comments.May 23 2016, 3:47 PM
ELF/InputSection.h
67 ↗(On Diff #58087)

It is a bit surprising to see this here. If the got access is relaxed, it result will not refer to a got entry, no?

ELF/Target.cpp
244 ↗(On Diff #58087)

I am pretty sure this reference to "_DYNAMIC" is bogus. The email you posted refers to got[0], but we write that value directly. This code will never be used for that.

758 ↗(On Diff #58087)

Every target would have to call this, no? Why not call it from target independent code?

762 ↗(On Diff #58087)

It is not clear why you need this. relocateOne has a value check. Why you need one here?

rafael added inline comments.May 23 2016, 4:36 PM
ELF/Writer.cpp
535 ↗(On Diff #58087)

I don't think you can do an early return here. You should set Expr and let the function continue so we get the checks.

I decided to apply the patch to give it a better review.

Some of the issues I pointed out in the review (R_RELAX_GOT referring
to got entry) were the causes of other changes (unnecessary continue).

Also, the *GOTPCRELX relocations can always be relaxed if the symbol
is not ifunc or preemptable, no? It is just that it is not always a
mov->lea.

In any case, I have attached the result of simplifying your patch a
bit. Let me know what you think.

Cheers,
Rafael

I decided to apply the patch to give it a better review.

Some of the issues I pointed out in the review (R_RELAX_GOT referring
to got entry) were the causes of other changes (unnecessary continue).

Thanks !

Also, the *GOTPCRELX relocations can always be relaxed if the symbol
is not ifunc or preemptable, no? It is just that it is not always a
mov->lea.

Sure, but this patch supports only mov->lea conversion. So for our case
I decided to check that instruction is exactly mov, because of next:

In any case, I have attached the result of simplifying your patch a
bit. Let me know what you think.

I think it looks better than mine, but it have an issue:
Imagine you have next code to relax:

.text
.globl foo
.type foo, @function
foo:
 nop

.text
.globl _start
 call	*foo@GOTPCREL(%rip)

Since your patch does not check the mov opcode before relaxing, it "relax" that to lea instead
of "nop call foo" or "call foo nop". That is because relaxGot() does not yet know about how to relax the other
instructions yet.

Cheers,
Rafael

If you dont mind, I`ll use your code as a base, will fix the issue above, add the testcase for it and update.
(I also plan to add other relaxations, but I wonder will it be better to do that in this patch or separatelly,
I`ll take a look how to do that cleaner, probably implementing all of them at once can help to simplify
the patch, probably not).

ELF/Target.cpp
762 ↗(On Diff #58087)

It would write something like "relocation R_X86_64_PC32 is out of range". That does not imo provide enough
info here about what really happened.
Ideally we just should not do that relaxation if we know that overflow happens.

I decided to apply the patch to give it a better review.

Some of the issues I pointed out in the review (R_RELAX_GOT referring
to got entry) were the causes of other changes (unnecessary continue).

Also, the *GOTPCRELX relocations can always be relaxed if the symbol
is not ifunc or preemptable, no? It is just that it is not always a
mov->lea.

In any case, I have attached the result of simplifying your patch a
bit. Let me know what you think.

Cheers,
Rafael

Rafael, returning to this, I have few thoughts now.

Latest x64 ABI describes 3 possible groups of relaxations atm (https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI):

  1. Convert call and jump
  2. Convert mov->lea
  3. Convert Test and Binop Convert memory operand of test and binop into

immediate operand, where binop is one of adc, add, and, cmp, or,
sbb, sub, xor instructions, when position-independent code is disabled.

If there was no 3, we probably would be able to leave R_RELAX_GOT_PC expression assigning
logic as is and just teach relaxGot() to relax all of them at once. If do that at once, that would help to avoid
the op code checks anywhere except relaxGot() itself.

But 3 says that depending on PIC for a specific set of instruction relaxation is possible or not.

That probably does not leave us a chance for above and solution I see is introduce Target method
canGotBeRelaxed() or something, which will check the instructions opcodes, like:

bool canGotBeRelaxed() {
if (call or jump or move opcodes) 
  return true;
if (Test or given binop opcodes)
  return !PIC:
return false; //I guess it is possible to have instructions that can not be relaxed even if RELX relaxation is present ?
}

and call it from adjustExpr(). What do you think ?

Rafael, returning to this, I have few thoughts now.

Latest x64 ABI describes 3 possible groups of relaxations atm (https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI):

  1. Convert call and jump
  2. Convert mov->lea
  3. Convert Test and Binop Convert memory operand of test and binop into

    immediate operand, where binop is one of adc, add, and, cmp, or, sbb, sub, xor instructions, when position-independent code is disabled.

    If there was no 3, we probably would be able to leave R_RELAX_GOT_PC expression assigning logic as is and just teach relaxGot() to relax all of them at once. If do that at once, that would help to avoid the op code checks anywhere except relaxGot() itself.

    But 3 says that depending on PIC for a specific set of instruction relaxation is possible or not.

    That probably does not leave us a chance for above and solution I see is introduce Target method canGotBeRelaxed() or something, which will check the instructions opcodes, like:

    bool canGotBeRelaxed() { if (call or jump or move opcodes) return true; if (Test or given binop opcodes) return !PIC: return false; //I guess it is possible to have instructions that can not be relaxed even if RELX relaxation is present ? }

    and call it from adjustExpr(). What do you think ?

Yes, I missed that case 3 is non pic only :-(

Given that we don't want to complicate the interface of getRelExpr, I
think that, relative to my patch, what is needed is

  • Delete R_RELAXABLE_GOT_PC, getRelExpr returns just R_GOT_PC.
  • In Writer.cpp, the code

    if (Expr == R_RELAXABLE_GOT_PC)

becomes

if (Expr == R_GOT_PC && Target->canRelaxGot(....))

Given that we will need the predicate, I would leave this patch doing
just mov -> lea for now.

Cheers,
Rafael

Given that we will need the predicate, I would leave this patch doing
just mov -> lea for now.

Cheers,
Rafael

Was this about this patch only or it means I should suspend working on
other relaxations for following patches either ?

George.

Given that we will need the predicate, I would leave this patch doing
just mov -> lea for now.

Cheers,
Rafael

Ah, I misread. Please ignore my previous comment.

grimar updated this revision to Diff 58398.May 25 2016, 3:31 AM
grimar updated this object.
  • Updated code according to Rafael's directions.
  • Updated testcase to check few other instructions (that them are not converted to lea).
grimar updated this revision to Diff 58404.May 25 2016, 4:48 AM
  • Rebased to top
rafael accepted this revision.May 25 2016, 7:10 AM
rafael edited edge metadata.

LGTM with nits.

Thank you so much for caring this over some many redesigns. I am very happy to see that this fits nicely in the existing infrastructure.

ELF/Relocations.cpp
345 ↗(On Diff #58404)

Don't make uintX_t a template parameter. You can just use ELFT::uint

490 ↗(On Diff #58404)

Use the existing variable Buf.

ELF/Target.cpp
531 ↗(On Diff #58404)

Now this is just moving the label, please leave it in the original location.

744 ↗(On Diff #58404)

Converting

This revision is now accepted and ready to land.May 25 2016, 7:10 AM

LGTM wit nits. See the webpage for the nits, phab is not sending
emails again :-(

grimar marked 4 inline comments as done.May 25 2016, 7:33 AM

LGTM with nits.

Thank you so much for caring this over some many redesigns. I am very happy to see that this fits nicely in the existing infrastructure.

Thank you for all that reviews and final code suggestion. Btw today is exactly 5 month from first time this was posted :)

This revision was automatically updated to reflect the committed changes.