This is an archive of the discontinued LLVM Phabricator instance.

[LLD][ELF] Ifunc implementation using synthetic sections
ClosedPublic

Authored by peter.smith on Dec 5 2016, 4:03 AM.

Details

Summary

This change implements ifunc support using additional synthetic sections for the parts of the plt, got and relocations that ifunc uses. This has involved more changes than I would have liked so I think it is worth getting some feedback on the design before going much further.

The background is that there are several problems with the Ifunc implementation
in lld that I think we can use SyntheticSections to fix. These problems are:

  • The IRELATIVE relocations must be processed last by the dynamic loader as

the IFunc resolver may call other functions with PLT entries (https://sourceware.org/bugzilla/show_bug.cgi?id=13302)

  • lld does not generate IRELATIVE relocations in rel.dyn with --pie [*] when the address of an IFunc resolver function is taken. These must also be processed last by the dynamic loader.
  • ARM (and Power) places all R_ARM_IRELATIVE relocations in rel.dyn and ld.so won't accept R_ARM_RELATIVE in .rel.plt.

The patch here addresses the first and the third but not the second, although I believe it could be extended to do so. The existing tests have been updated because they only need the .plt for Ifuncs so no PLT header is needed. At least one more test case for each Target is needed for the case when the .plt does need a header and uses Ifunc.

I've not done any testing with linker scripts yet, as the new synthesised sections are assigned to the same output section as the .plt, .got, .rela.dyn and .rela.plt I'm not expecting any problems.

Review Questions:

  • Is this the right general approach? I've outlined some alternatives below.
  • Do we want to support taking the address of ifunc resolvers, an alternative is to give a not supported error message.

The outline of the design is to use SyntheticSections for the Ifunc related parts of the plt, plt.got (.got for ARM) and rela.plt (rela.dyn for ARM). In psuedo
linker script notation using the synthetic section names we have for AArch64, x86_32 and x86_64:

`.plt : { PltSection IpltSection }
.got.plt : { GotPltSection IgotPltSection }
.rel.plt : { RelaPlt RelaIPlt }
.rel.dyn : { RelaDyn }`

For ARM we have:
`.plt : { PltSection IpltSection }
.got : { GotSection IgotPltSection }
.rel.plt : { RelaPlt }
.rel.dyn : { RelaDyn RelaIPlt }`

Implications of the design:

  • When looking up the Plt, Got or Relocation we need to know whether the Symbol is in the standard or Ifunc section and redirect appropriately.
  • We cannot assume that there is one InputSection for the PLT, GOT and Relocations.
  • We don't need a header in the Iplt section so for static linking when there is only the IpltSection we omit the Plt header completely. My understanding is that there isn't much use of Ifunc outside of glibc and it could be a reasonable position to say we support ifunc to statically link with glibc and for straight forward user-defined cases. The gcc bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70082 has some useful information.

Some thoughts and questions:
Do we need a new Synthetic Section for the IPltSection?
The Plt section always accesses the PltGot and is relocated by the RelaPlt. An alternative would be to have addEntry() and addIfuncEntry() to merge the logic in IPlt to the Plt. I'm thinking that this might be less code overall but that code is likely to be messier.

Do we need a new Synthetic IgotPlt section?
For AArch64 and x86_64 the IgotPlt should write each entry using Target->writeGotPlt() for ARM we need to write each entry using getGotVA() if we were to use on of the existing gotPlt or Got sections we'd need to add the logic in there.

Do we need the separate RelaIPlt relocation sections?
The separate sections solve the ordering problem and avoids some of the if statements needed for ARM.

Do we need to alter the behaviour of getPltVA() and getGotPltVA()?
An alternative is to add getIpltVA() and getIGotPltVA() but then the calling code needs to know which function to call.

  • Example when compiled with gcc -fpie --pie will produce an R_X86_64_IRELATIVE relocation in both .rel.dyn and .rel.plt. LLD only produces the R_X86_64_IRELATIVE relocation for .rel.plt and as a result misses the call through the function pointer.
#include <stdio.h>
void alt2(void) { printf("alt2\n"); }

static void (*resolve (void)) (void)
{
  return alt2;
}

void fct(void) __attribute__ ((ifunc ("resolve")));

void (*g)(void)  = fct;
int
main ()
{
  g ();
  fct();
  return 0;
}

Diff Detail

Repository
rL LLVM

Event Timeline

peter.smith updated this revision to Diff 80250.Dec 5 2016, 4:03 AM
peter.smith retitled this revision from to [LLD][ELF] Ifunc implementation using synthetic sections.
peter.smith updated this object.
peter.smith added a subscriber: llvm-commits.
evgeny777 edited edge metadata.Dec 5 2016, 6:23 AM

It looks like IpltSection and PltSection share large fraction of common code.
I wonder if it is possible to virtualize addEntry and addReloc, creating ARMRelaDynSection, ARMPltSection and ARMGotSection.
In the latter case you don't need IsInIgot and IsInIplt flags as well.

Thank you for the comments.

If I've understood you correctly, I would keep

if (Body.isGnuIFunc() && !Preemptible)
  Rel = Target->IRelativeRel;
else
  Rel = Target->PltRel;

In<ELFT>::GotPlt->addEntry(Body);
In<ELFT>::RelaPlt->addReloc({Rel, In<ELFT>::GotPlt,
                             Body.getGotPltOffset<ELFT>(), !Preemptible,
                             &Body, 0});

The overloads in ARMGotPlt and ARMRelaPlt would redirect (possibly to ARMGot and ARMRelaDyn) if Body.isGNUIFunc(). If that is the case I think that could help solve the difference with ARM and the other Targets, without adding too many explicit if Target is ARM.

I don't think it would help with the ordering of relocations in RelaPlt (RelaDyn for ARM) as this affects all Targets. The IRelative relocations need to be last in the table so that the dynamic loader can initialise the entries for functions that the ifunc resolver might call. It could be possible to sort the relocations, but just doing that would invalidate the RelocOffsets that the x86 and x86_64 uses.

I think that there is some scope for using inheritance, for example the IPltSection could be a specialisation of PltSection.

Apologies if I've misunderstood.

I don't think it would help with the ordering of relocations in RelaPlt (RelaDyn for ARM) as this affects all Targets

But you can add some code to compRelocations, no ?

The PltSection stores the offset in the RelaPlt of each relocation when adding the entry. These are used when writing the x86 and x86_64 writePlt(). If we were to sort the PLT relocations later we'd have to fix up these offsets after sorting (or only calculate the offsets after sorting). This is certainly possible but I'm not sure it is the best solution.

One of the attractions of adding an Iplt section is that there is no need to sort or fix up the offsets stored in the PLT.

ruiu edited edge metadata.Dec 5 2016, 1:28 PM

Overall, I think this is towards a right direction. Synthetic sections as input sections allows us do this kind of stuff, and it seems to be working well to separate ARM-specific things from other architectures nicely. Is this patch complete? I have a few minor comments.

My hope is that the implementation is complete for direct calls to ifunc resolvers. I definitely need a few more test cases, which I hope to get done today.

The area this patch doesn't cover is adding IRelative relocations for function pointers to ifunc resolvers when position independent executables are used. This isn't supported by the existing implementation at the moment so I kept it out of the patch to make it easier to review.

I've been doing some more testing today, I can't get a lld built x86_32 ifunc to work on my machine before or after applying the patch. When I try to run the program I get "Inconsistency detected by ld.so: dl-runtime.c: 87: _dl_fixup: Assertion `((reloc->r_info) & 0xff) == 7' failed!". There aren't any obvious differences from the ld.bfd generated object.

I'll try to get to the bottom of why this is failing. It would be good to know if anyone else has had any success with ifunc on lld for x86_32? It is possible that the multiarch glibc on my machine is too old.

For what it is worth my test program is:

int val = 1;
static void func1_x(void)
{
    val = 0;
}
void* func1_ifunc();
void func1(void) __attribute__ ((ifunc("func1_ifunc")));
void* func1_ifunc()
{
    return &func1_x;
}
int main(void)
{
    func1();
    return val;
}

gcc -m32 t.c -o t.exe # with ld.lld as the linker
./t.exe
Inconsistency detected by ld.so: dl-runtime.c: 87: _dl_fixup: Assertion `((reloc->r_info) & 0xff) == 7' failed!
The assertion is stating that it was expecting a R_386_JUMP_SLOT relocation.

My current thinking is that the x86_32 ifunc support is not working at the moment.
When I look at the disassembly and the .plt.got that both ld.bfd and lld generate, I see that in ld.bfd the .plt.got entry for the ifunc .plt entry contains the address of the ifunc resolver, whereas in lld the equivalent .plt.got entry points back to the lazy function resolver. I think that this is because on x86_64 the address of the ifunc resolver is in the relocation addend, whereas in x86_32 (which uses rel) it has to put the address directly into the .plt.got.

ld.bfd (*804a008 == 0804847b == func1_ifunc)

 08048370 <*ABS*@plt>:
 8048370:	ff 25 08 a0 04 08    	jmp    *0x804a008
 8048376:	68 10 00 00 00       	push   $0x10
 804837b:	e9 c0 ff ff ff       	jmp    8048340 <_init+0x24>

Contents of section .got.plt:
 8049ff4 089f0408 00000000 00000000 56830408  ............V...
 804a004 66830408 7b840408                    f...{...       

0804847b <func1_ifunc>:

lld (*1301c == 11216 == *ABS*@plt + 6)

000111f0 <__libc_start_main@plt-0x10>:
   111f0:	ff 35 10 30 01 00    	pushl  0x13010
   111f6:	ff 25 14 30 01 00    	jmp    *0x13014
   111fc:	90                   	nop
   111fd:	90                   	nop
   111fe:	90                   	nop
   111ff:	90                   	nop

00011210 <*ABS*@plt>:
   11210:	ff 25 1c 30 01 00    	jmp    *0x1301c
   11216:	68 08 00 00 00       	push   $0x8
   1121b:	e9 d0 ff ff ff       	jmp    111f0 <_fini+0x24>

Contents of section .got.plt:
 1300c 0c200100 00000000 00000000 06120100  . ..............
 1301c 16120100
ruiu added a comment.Dec 6 2016, 10:40 AM

I think I like this patch. Some minor comments.

ELF/Symbols.h
120 ↗(On Diff #80250)

nit: add .

121 ↗(On Diff #80250)

I wonder if they need to be IPlt and IGot instead of Iplt or Igot.

ELF/SyntheticSections.cpp
726 ↗(On Diff #80250)

nit: remove extraneous ().

838 ↗(On Diff #80250)
ELF/Writer.cpp
352 ↗(On Diff #80250)

Please add a blank line before the comment.

355–356 ↗(On Diff #80250)

This is a bit tricky. How about this?

(Config->EMachine == EM_ARM) ? ".rel.dyn" : In<ELFT>::RelaPlt->Name
ruiu added a comment.Dec 6 2016, 10:50 AM

As to 32-bit x86, it is likely that it's just broken. I personally don't create 32-bit executables, so I don't know the current status that much. If it's broken, it's of course nice to fix, but that's out of scope of this patch.

peter.smith updated this revision to Diff 80597.Dec 7 2016, 8:05 AM
peter.smith edited edge metadata.

I've updated the implementation with suggested changes. I've added some tests for AArch64, ARM and x86_64 to cover the ordering of the IRELATIVE relocations, and a check for IRELATIVE to non-preemptible symbols in a shared library.

The x86_32 problem is simple to fix as it needs to do the same thing as ARM and write the getVA() of the ifunc resolver instead of getPltVA(). This can come in a separate patch.

ruiu accepted this revision.Dec 7 2016, 11:23 AM
ruiu edited edge metadata.

LGTM. It seems that the abstraction of synthetic input section is working well here. I'm glad to see that.

This revision is now accepted and ready to land.Dec 7 2016, 11:23 AM
This revision was automatically updated to reflect the committed changes.