This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Stop producing broken output for R_386_GOT32X relocation.
ClosedPublic

Authored by grimar on Mar 7 2017, 8:51 AM.

Details

Reviewers
ruiu
rafael
Summary

Previously we silently produced broken output for R_386_GOT32X/R_386_GOT32 relocations if
they were used to compute the address of the symbol’s global offset table entry without
base register when position-independent code is disabled.

Situation happened because of recent ABI changes. Released ABI mentions that
R_386_GOT32X can be calculated in a two different ways (so we did not follow ABI here before this patch)
, but draft ABI also mentions R_386_GOT32 relocation here.
We should use the same calculations for both relocations.

Problem is that we always calculated them as G + A - GOT (offset from end of GOT),
but for case when PIC is disabled, according to i386 ABI calculation should be G + A,
what should produce just an address in GOT finally.

ABI: https://github.com/hjl-tools/x86-psABI/wiki/intel386-psABI-draft.pdf (p36, p60).

Diff Detail

Event Timeline

grimar created this revision.Mar 7 2017, 8:51 AM
grimar updated this revision to Diff 90865.Mar 7 2017, 8:56 AM
  • Fixed wierd symbol in copy pasted comment.
ruiu added inline comments.Mar 7 2017, 9:24 AM
ELF/Target.cpp
369

Is there any way to not add a new parameter to this function? This part of code has grown organically to the point that that's hard to understand. Needs simplifying.

377–388

I don't think you need to repeat the x86 instruction format here. Instead you want to describe it at high level. If I understand correctly, you are checking if the instruction the relocation is pointing to has %ebp as its operand, right?

394–395

Concatenate the strings.

grimar updated this revision to Diff 91161.Mar 9 2017, 4:45 AM
  • Addressed review comments
  • Updated comments and testcase.
ELF/Target.cpp
369

I tried playing with code, but
problem is that this relocation has 2 ways of calculations which depends on was base register used or not.
in this method we return the expression used for calculation. And it depends on specific target instruction.

One way to avoid adding parameter here is to introduce new expression type like
R_GOT_OR_GOTOFFSET for this and postpone instruction checks.

But it does not work good.
At some point (in getRelocTargetVA()) we still need to know how to calculate the final value,
that requires the same target specific logic (scanning ModRM), what does not fit there because getRelocTargetVA() is completely target independent.

Also It does not look that introducing one more relocation expression type really worth that.
It involves additional code changes for places where expressions are checked, though at fact
we can use already existent ones and evaluate the relocation expression type early, like I do.

It seems for me that method is a best place for this logic.

377–388

No, I am checking that effective address is "disp32", and not "[SOME REGISTER] + disp32" (The disp32 nomenclature denotes a 32-bit displacement that follows the ModR/M byte (or the SIB byte if one is present) and that is added to the index.)
see "Table 2-2. 32-Bit Addressing Forms with the ModR/M Byte" (2-6 Vol. 2A).

Examples of baseless instructions:
movl ifunc@GOT, %eax (8b 05 00 00 00 00)
movl ifunc@GOT, %ebx (8b 1d 00 00 00 00)
05 == 00101b, ModRM = 0x5 & 0xc7 = 0x5
1d == 11101b, ModRM = 0x1d & 0xc7 = 0x5
I am checking that ModR/M == 00101 == 0x5, that corresponds to effective address = disp32, so instruction
pointed by relocation has no base register and uses disp32 value as is.
After link it contains got entry address as value (and not an offset), what works only when no-PIC. And for PIC
case instruction has to have base register for keeping GOT address, relocation resolves to GOT offset then.

394–395

Done.

grimar updated this revision to Diff 91164.Mar 9 2017, 5:28 AM
  • Cosmetic changes.
ruiu edited edge metadata.Mar 9 2017, 4:26 PM

We do not rewrite instructions for GOT32X relocations, so I don't think we've implemented the optimization for GOT32X. In https://groups.google.com/forum/#!topic/ia32-abi/GbJJskkid4I, H.J. Lu said that GOT32X can be handled as GOT32 if the linker chose not to optimize the relocation. I'm little confused. What are you trying to resolve with this patch?

In D30699#697084, @ruiu wrote:

We do not rewrite instructions for GOT32X relocations, so I don't think we've implemented the optimization for GOT32X. In https://groups.google.com/forum/#!topic/ia32-abi/GbJJskkid4I, H.J. Lu said that GOT32X can be handled as GOT32 if the linker chose not to optimize the relocation. I'm little confused. What are you trying to resolve with this patch?

Yes, we do not optimize(rewrite) the instructions yet (I am working on a patch for this now, btw), so let me explain.
Optimizing this relocation and evaluating it is orthogonal things. We currently do not evaluate it properly.

There are 2 possible cases when relocation can be generated, example:
movl ifunc@GOT, %eax vs movl ifunc@GOT(%eax), %eax
First one should be evaluated as G + A and so resolves to entry address in GOT (allowed for no-PIC only).
Second is evaluated to offset because address in runtime will be [eax] + offset.
Notice that is not relative to optimizations at all at this moment.

H.J. Lu writes: "I am proposing to add a new relocation, R_386_GOT32X, to i386 psABI.
Instead of generating R_386_GOT32 relocation agasint foo for
foo@GOT(%reg), we generate R_386_GOT32X. R_386_GOT32X relocation can
also be used without the base register for the global offset table,
foo@GOT, when position-independent code is disable. In this case, the
static base address of the global offset table will be used instead.

Linker can treat R_386_GOT32X the same as R_386_GOT32 or it can perform
the transformations listed above. "

Here is a patch by H.J. Lu for gold:
https://sourceware.org/ml/binutils/2015-10/msg00259.html
(See that place below, notice it touches both relaxed and not relaxed cases)
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gold/i386.cc;h=0b447ef0d68f3ea8965091502d5a1302d69d1b32;hb=HEAD#l2916)

At fact both gold and bfd implement R_386_GOT32 and R_386_GOT32X in the same way:

  • G + A - GOT for case when base register present
  • G + A when base register is absent (only allowed for no-PIC).

(that is how R_386_GOT32X must be evaluated according to ABI, see p36).

My patch changes calculation for R_386_GOT32X only for now, because 2 different calculations
are mentioned on p36 of ABI (https://github.com/hjl-tools/x86-psABI/wiki/intel386-psABI-1.1.pdf) for this
relocation and not for R_386_GOT32. Though we might want to follow GNU linkers behavior probably and do that for
R_386_GOT32 too (I would do that separatelly).

ruiu added a comment.Mar 10 2017, 9:09 AM

In short, do you mean that H.J. Lu's statement that GOT32X can be handled as GOT32 is wrong?

grimar added a comment.EditedMar 10 2017, 10:55 PM
In D30699#697854, @ruiu wrote:

In short, do you mean that H.J. Lu's statement that GOT32X can be handled as GOT32 is wrong?

No, in short - I agree with that.
And gnu linkers handle these relocations in the same way. That way is different from what we do now.

To match LLD behavior with GNU linkers we need to fix the calculation according to ABI, we do it wrong now and
have broken output.
Patch fixes handling for R_386_GOT32X case. If we change both R_386_GOT32 and R_386_GOT32X we will end
up with the same result as GNU linkers has.

joerg added a subscriber: joerg.Mar 11 2017, 12:06 AM

Are we talking about statically or dynamically linked programs? It makes a different for the former, but shouldn't for the latter.

Are we talking about statically or dynamically linked programs? It makes a different for the former, but shouldn't for the latter.

Why ? Issue is not relative to static/dynamic linking I believe.
We have "movl foo@GOT, %eax",

lld generates currenly:
8b 05 fc ff ff ff mov 0xfffffffc,%eax, where 0xfffffffc is offset.

but bfd generates:
8b 05 fc 9f 04 08 mov 0x8049ffc,%eax, where 08049ffc is address in .got

That the issue I am trying to fix, it does not seem to be relative with static/dynamic linked programs.
It is just about wrong relocation calculation for case when there is no base register in instructiuon.

"movl foo@GOT, %eax" is restricted for use in PIC code.
Patch adds a error for that: "relocation R_386_GOT32X against 'ifunc' without base register can not be used when PIC enabled".

joerg added a comment.Mar 13 2017, 4:34 AM

Statically linked binaries don't normally have a GOT, but should synthesize an entry in .data for it. Chances are that is exactly the problem you are seeing?

Statically linked binaries don't normally have a GOT, but should synthesize an entry in .data for it. Chances are that is exactly the problem you are seeing?

Case I am fixing in this patch is next:
https://lists.gnu.org/archive/html/bug-binutils/2015-10/msg00332.html
I think we have exactly the same issue as gold had that time.

R_386_GOT32X has 2 ways of calculation according to ABI. G + A or G + A - GOT, depending on was base register used or not.
Both ways of calculation assumes we have GOT entry. Difference is do we write offset from end of GOT or GOT entry address.
If there is no GOT then different relocation should be emited by producer I belive.

ruiu added a comment.Mar 13 2017, 1:13 PM

I think that if we are doing something wrong for the R_386_GOT32 relocation, you want to fix that first, as that is not (directly) related to the GOT32X relocation.

In D30699#699688, @ruiu wrote:

I think that if we are doing something wrong for the R_386_GOT32 relocation, you want to fix that first, as that is not (directly) related to the GOT32X relocation.

I disagree. We probably do not want to do something with R_386_GOT32 at all. For R_386_GOT32X ABI explicitly says that: "The R_386_GOT32X relocation can be used to compute the address of the
symbol’s global offset table entry without base register when position-independent code is disabled. ".
ABI says nothing like this about R_386_GOT32. I assume that gold/bfd implemented them in the same way to support outdated producers.

It is like they optimize R_X86_64_GOTPCREL, R_X86_64_GOTPCRELX, R_X86_64_REX_GOTPCRELX, in the same way, but we do that only for latter 2. We are always trying to follow what ABI says explicitly, them are not.

My patch is about R_386_GOT32X for baseless instruction case, I know a producer of this relocation - it is "GNU as v.2.27" as mentioned in testcase. I do not know modern producer which
will emit R_386_GOT32 for the same place. So my patch is 100% consistent with current ABI and I do not think we should do something different from what ABI says explicitly until there is a solid reason.
For now there is no such reason for R_386_GOT32, but there is for R_386_GOT32X (broken output using GAS 2.27).

ruiu added a comment.Mar 16 2017, 4:22 PM

My apologies, but I'm really confused by your conflicting responses about this patch.

H.J. Lu said that GOT and GOTX are handled as the same relocation if linkers decide to not optimize them. You agreed with that. Therefore, I suggested handling them as the same relocation, because that's at least not wrong. And then you are saying that that is wrong. Your explanation is too detailed, and I don't get the gist of that. Simply put, is H.J. Lu.'s comment wrong?

In D30699#703365, @ruiu wrote:

My apologies, but I'm really confused by your conflicting responses about this patch.

H.J. Lu said that GOT and GOTX are handled as the same relocation if linkers decide to not optimize them. You agreed with that.

I agreed "in short", sorry probably "in short" format was not good format here to explain.

See what he is exactly saying: "R_386_GOT32X relocation can
also be used without the base register for the global offset table,
foo@GOT, when position-independent code is disable.
In this case, the
static base address of the global offset table will be used instead.
Linker can treat R_386_GOT32X the same as R_386_GOT32 or it can perform
the transformations listed above. "

See ? Latter bold part partially conflicts with first part.
He does not say that R_386_GOT32 can be "used without base register ....", he said that only for R_386_GOT32X.
And current ABI does not say that. So in short R_386_GOT32X != R_386_GOT32 even for non-optimized case.

Therefore, I suggested handling them as the same relocation, because that's at least not wrong.

It can't be wrong, otherwize GNU linkers would generate broken output (I did not notice complains about that, so I assume
they don't do that, that works for years), but it seems useless to handle them as the same relocation now (2017 year).
And it is definetely would be a change incompatible with ABI.

And then you are saying that that is wrong. Your explanation is too detailed, and I don't get the gist of that. Simply put, is H.J. Lu.'s comment wrong?

If I should answer basing just on what ABI currently says, then statement "can treat R_386_GOT32X the same as R_386_GOT32" is not correct,
but it just should not be read outside of whole message context.

For baseless code like "movl bar@GOT, %eax", GNU as v.2.27 generates R_386_GOT32X and not R_386_GOT32.
So current behavior of GNU as matches the ABI, though both gold and bfd also handle R_386_GOT32 for case "without the base register".
I assume that was implemented to support some outdated producers (I don't know exactly, but that would explain everything),
but that is incompatible with what ABI explicitly says.

I do not see reasons to just blindly copy behavior of gold/bfd and handle R_386_GOT32 in the same way right now.

ruiu added a comment.Mar 17 2017, 12:49 PM

You are talking based on https://github.com/hjl-tools/x86-psABI/wiki/intel386-psABI-draft.pdf, right? There is a table of relocation types on p.36. Please open that page and see R_386_GOT32 and R_386_GOT32X. The calculation expressions for the two relocations are the same, including the footnote saying that "[a]pplied on memory operand without base register when position-independent code is disabled." So, they are the same, unless you do optional optimization for GOT32X.

In D30699#704135, @ruiu wrote:

You are talking based on https://github.com/hjl-tools/x86-psABI/wiki/intel386-psABI-draft.pdf, right? There is a table of relocation types on p.36. Please open that page and see R_386_GOT32 and R_386_GOT32X. The calculation expressions for the two relocations are the same, including the footnote saying that "[a]pplied on memory operand without base register when position-independent code is disabled." So, they are the same, unless you do optional optimization for GOT32X.

Hmm no, I was talking basing on latest released ABI (The Intel386 psABI version 1.1, https://github.com/hjl-tools/x86-psABI/wiki/intel386-psABI-1.1.pdf). On p.36 calculation expression is different there.
I did not notice that draft before, it clarifies the situation.

But then question is: do you think we should implement draft version of ABI ? It is most probably contains change that will be released I guess, but it still did not yet.

That I think means we still can go with R_386_GOT32X first, because it is already released.

ruiu added a comment.Mar 20 2017, 8:02 AM

So, you were sending contradicting messages. You should have said "no, his comment is wrong" when asked if the H.J. Lu's comment was correct, because that is wrong from the point of view of your documentation. Anyway, we are finally on the same page. Please make a change to R_386_GOT32 first, so that the relocation is handled as described in https://github.com/hjl-tools/x86-psABI/wiki/intel386-psABI-draft.pdf. The R_386_GOT32X optimization is optional, and we don't need to do that at the moment.

I think head gold implementation is broken for what llvm-mc produces.
I reported an issue: https://sourceware.org/bugzilla/show_bug.cgi?id=21285 and suggest to wait for reply before doing anything for R_386_GOT32.

And at the same time, we still can land this patch first because it uses GNU as 2.27 and relies on released ABI and not on draft.

ruiu added a comment.Mar 22 2017, 2:57 PM

Do you mean that https://github.com/hjl-tools/x86-psABI/wiki/intel386-psABI-1.1.pdf and https://github.com/hjl-tools/x86-psABI/wiki/intel386-psABI-draft.pdf are incompatible in such a way that valid code produced by gas 2.27 that worked with the former spec no longer works with the latter spec? I'd be surprised if there's such code because that would mean the new draft spec is going to introduce an incompatible change to the ABI, which shouldn't happen.

In D30699#708077, @ruiu wrote:

Do you mean that https://github.com/hjl-tools/x86-psABI/wiki/intel386-psABI-1.1.pdf and https://github.com/hjl-tools/x86-psABI/wiki/intel386-psABI-draft.pdf are incompatible in such a way that valid code produced by gas 2.27 that worked with the former spec no longer works with the latter spec? I'd be surprised if there's such code because that would mean the new draft spec is going to introduce an incompatible change to the ABI, which shouldn't happen.

No, code produced by GAS is not affected I think. Problem is relative to code produced by llvm-mc.

Released ABI does not say that R_386_GOT32 should be be calculated differently for baseless case. Draft says that.
See what llvm-mc currently produces when -relax-relocations=false:
movl bar@GOT, %eax encoded as a1 00 00 00 00 mov 0x0,%eax (R_386_GOT32)
movl foo@GOT, %ebx encoded as 8b 1d 00 00 00 00 mov 0x0,%ebx (R_386_GOT32)

So uses 0xa1 opcode to encode first operation instead 0x8b opcode + ModR/M byte.
gold HEAD I think calculates relocation wrong for such objects (see bug I posted https://sourceware.org/bugzilla/show_bug.cgi?id=21285).
Because gold assumes that "0x8b" is always used to encode "mov", what true for GAS but not true for llvm-mc.

Both draft and released ABI on page 37 says: "mov name@GOT, %eax must be encoded with opcode 0x8b, not 0xa0, to allow linker optimization.".
But I am not doing any optimizations yet. R_386_GOT32 is not supposed to be optimized according to ABI. So code produced by llvm-mc looks correct for me.

And then it is not clear for me how to check the instructions properly to calculate R_386_GOT32.
I can't assume that Buf[-1] is ModRM byte, it can be either instruction opcode or ModRM at the same time. I would like to
see something like "mov name@GOT, %eax must be encoded with opcode 0x8b, not 0xa0 Period" in ABI, that way implementation would be clear.

ruiu added a comment.Mar 23 2017, 10:37 AM

I spent a few hours to understand the situation. Here are my conclusions:

  1. The expression to compute R_386_GOT32 value in the i386 psABI draft version 1.1 doesn't indeed make sense. It says that you need to use G+A (as opposed to G+A-GOT) when the relocation is "[a]pplied on memory operand without base register when position-independent code is disabled". However, it is not doable because you are allowed to use R_386_GOT32 anywhere and we cannot make an assumption that an R_386_GOT32 follows some certain limited set of instructions. I think this is a bug in the draft spec.
  1. That being said, I believe what we are doing now for R_386_GOT32X is correct, and this patch will break it. When you compute G+A instead of G+A-GOT for an R_386_GOT32X, you always need to rewrite instructions as well. If you don't rewrite instructions, you should always compute G+A-GOT (which is what we are currently doing.)
In D30699#708994, @ruiu wrote:

When you compute G+A instead of G+A-GOT for an R_386_GOT32X, you always need to rewrite instructions as well.

Why ? Optimization in not applicable for some situations, for example for IFUNC.
If I have movl ifunc@GOT, %eax code, there is no base register and formula should be G+A.
No instructions will be overwrited here.
Otherwize result of (G+A-GOT) would be an offset against nothing (since there is no base register). That is what fixed in patch.

I think llvm-mc emits GOT32X relocation only for 0x8b opcode case.
So for case movl ifunc@GOT, %eax it would emit 0xa1 00 00 00 00 and R_386_GOT32, and R_386_GOT32X + 0x8b for
movl ifunc@GOT, %ebx and other registers.

GAS looks always uses 0x8b form for R_386_GOT32X.

That means that if we see R_386_GOT32X relocation, then I belive we *can* assume it used with 0x8b.
That should work with both GAS and llvm-mc I think.

Though in any case, I still hope to see comments on issue I posted to gold bugtracker.
We can postpone this one, but I really believe it is correct.

rafael added inline comments.Mar 31 2017, 7:48 AM
ELF/Target.cpp
379

Why the if? Just use the existing switch.

test/ELF/got32x-i386.s
10

You don't need an ifunc for this, please simplify.

grimar updated this revision to Diff 94034.Apr 4 2017, 3:47 AM
  • Addressed review comments.
ELF/Target.cpp
379

Done.

test/ELF/got32x-i386.s
10

Ok, though below in comments I had explained why I did that (because for futher patches it is convinent to have ifunc here since it will not be affected by relocation optimization). We can go for start with non-ifunc also, not a problem.

rafael accepted this revision.Apr 4 2017, 10:53 AM

LGTM with a nit.

test/ELF/got32-i386.s
5

I would suggest just running llvm-mc for now. I understand that we will change llvm-mc to produce a R_386_GOT32X, but using it here should actually remind us to switch the other binary to a llvm-mc run when we do.

This revision is now accepted and ready to land.Apr 4 2017, 10:53 AM
grimar updated this revision to Diff 94192.Apr 5 2017, 4:28 AM
  • Addressed review comments,
ruiu added inline comments.Apr 5 2017, 1:14 PM
ELF/Target.cpp
394

What is "this" in this sentence?

406

You want to use toString(Type) instead of R_386_GOT32/R_386_GOT32X.

grimar updated this revision to Diff 94325.Apr 6 2017, 12:57 AM
  • Addressed review comments.
ELF/Target.cpp
394

I rewrote this comment, hope it is more readable now.

406

Fixed.

ruiu accepted this revision.Apr 6 2017, 3:20 PM

LGTM

ELF/Target.cpp
392

Remove "a".

393

This comment needs rewritten. I'll do that after you commit this.

grimar edited the summary of this revision. (Show Details)Apr 7 2017, 11:20 PM
grimar added inline comments.Apr 7 2017, 11:24 PM
ELF/Target.cpp
393

Thanks in advance.

grimar closed this revision.Apr 7 2017, 11:29 PM

r299812