This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Don't special case symbolic relocations with 0 addend to ifunc in writable locations
ClosedPublic

Authored by MaskRay on Aug 8 2019, 11:41 PM.

Details

Summary

Currently the following 3 relocation types do not trigger the creation
of a canonical PLT (which changes STT_GNU_IFUNC to STT_FUNC and
redirects all references):

  1. GOT-generating (needsGot)
  2. PLT-generating (needsPlt)
  3. R_ABS with 0 addend in a writable location. This is used for for ifunc function pointers in writable sections such as .data and .toc.

This patch deletes case 3) to simplify the R_*_IRELATIVE generating
logic added in D57371. Other advantages:

  • It is guaranteed no more than 1 R_*_IRELATIVE is created for an ifunc.
  • PPC64: no need to special case ifunc in toc-indirect to toc-relative relaxation. See D65755

Diff Detail

Repository
rL LLVM

Event Timeline

MaskRay created this revision.Aug 8 2019, 11:41 PM
Herald added a project: Restricted Project. · View Herald Transcript
MaskRay updated this revision to Diff 214312.Aug 8 2019, 11:46 PM

Improve aarch64-gnu-ifunc-nonpreemptable2.s to test .data as well

MaskRay updated this revision to Diff 214314.Aug 8 2019, 11:55 PM

Fix a comment

ruiu added a comment.Aug 8 2019, 11:57 PM

Currently the following 3 relocation types do not trigger the creation of a canonical PLT:

  • GOT-generating
  • PLT-generating
  • symbolic relocation with 0 addend in a writable location

Is it a typo that a PLT-generating relocation do *not* trigger the creation a canonical PLT, or am I missing something?

Currently the following 3 relocation types do not trigger the creation of a canonical PLT:

  • GOT-generating
  • PLT-generating
  • symbolic relocation with 0 addend in a writable location

Is it a typo that a PLT-generating relocation do *not* trigger the creation a canonical PLT, or am I missing something?

It isn't a typo. The creation of a canonical PLT refers to this piece of code:

} else if (!needsPlt(expr)) {
  // Make the ifunc's PLT entry canonical by changing the value of its
  // symbol to redirect all references to point to it.
  unsigned entryOffset = sym.pltIndex * target->pltEntrySize;
  if (config->zRetpolineplt)
    entryOffset += target->pltHeaderSize;

  auto &d = cast<Defined>(sym);
  d.section = in.iplt;
  d.value = entryOffset;
  d.size = 0;
  // It's important to set the symbol type here so that dynamic loaders
  // don't try to call the PLT as if it were an ifunc resolver.
  d.type = STT_FUNC;

It changes the type from STT_GNU_IFUNC to STT_FUNC.

If a non-preemptable ifunc just has PLT-generating and GOT-generating relocations. The type doesn't need a change.

MaskRay updated this revision to Diff 214325.Aug 9 2019, 12:40 AM
MaskRay edited the summary of this revision. (Show Details)
MaskRay removed a subscriber: wuzish.

Fix comments and clarify "canonical PLT"

If I've understood correctly, this will make the case "address of an ifunc is taken from RW but there are no other relocations that would create a canonical PLT entry" worse as we would always create the canonical PLT entry even if it isn't strictly needed. The trade off is simpler code and a possible size saving from only needing one irelative relocation?

I think that taking the address of Ifunc's is rare in most user code, so even if we make this case worse it isn't likely to make a lot of difference, however it would be good to hear from PCC as he may have a special use case using lots more ifuncs than usual. I think FreeBSD also make heavy use of ifuncs so they may have some concerns.

Personally I'd like to avoid canonical PLT entries as much as possible as if we can show that the PLT addresses don't leak, the BTI PLT entries can be optimised to remove the BTI as it is only needed if the PLT entry is called indirectly. I recognise that this is only of theoretical interest at the moment though so I'm happy to go with the consensus.

ELF/Relocations.cpp
1095 ↗(On Diff #214325)

I think that this is only used by code that you have deleted. Could be removed as well?

test/ELF/aarch64-gnu-ifunc-nonpreemptable2.s
1 ↗(On Diff #214325)

Assuming I've not made a mistake somewhere, this test is passing with lld prior to this change. Is this intentional?

Possibly the reference from RO is causing the canonical PLT entry even with the previous LLD?

MaskRay updated this revision to Diff 214401.Aug 9 2019, 10:18 AM
MaskRay marked 3 inline comments as done.

Delete IRelativeReloc

MaskRay added a comment.EditedAug 9 2019, 10:30 AM

If I've understood correctly, this will make the case "address of an ifunc is taken from RW but there are no other relocations that would create a canonical PLT entry" worse as we would always create the canonical PLT entry even if it isn't strictly needed. The trade off is simpler code and a possible size saving from only needing one irelative relocation?

We have probably overloaded the meaning of "canonical PLT" here:) Let me give a bit more details about the two types of canonical PLT:

  1. STT_FUNC, Undefined or SharedSymbol. The canonical PLT makes it a Defined and exports it (replaceWithDefined sets the exportDynamic field) in the dynamic symbol table. This fake definition (st_value=addr(PLT)!=0, st_shndx=0, ld.so has logic to handle such symbols) can preempt a real definition in a DSO. This can be seen as an address leak (in AArch64 BTI protection such PLT needs BTI c).
  2. STT_IFUNC, Defined. The canonical PLT is created because the symbol is referenced by a non-PLT-generating-non-GOT-generating relocation. This case is already a Defined, so there is no new Defined (exportDynamic field is not set). We just change some fields of the symbol (st_value and st_type are important to ld.so. For st_shndx, as long as is non-zero, it doesn't matter what its actual value is). If the STT_GNU_IFUNC symbol was exported before, the converted STT_FUNC is still exported. If the STT_GNU_IFUNC was not (e.g. local/hidden), the new STT_FUNC is not.

Both are created for pointer equality. This patch deals with 2).

If the ifunc is referenced by another component.

  • Without a canonical PLT, its type is STT_GNU_IFUNC. A reference (symbolic relocation/GLOB_DAT/JUMP_SLOT) has to call the ifunc resolver to get the real address.
  • With a canonical PLT, its type is STT_FUNC. A reference does not have to call the ifunc resolver, but every subsequent function call has to go through the canonical PLT. Address taken of a non-preemptable ifunc in a static storage (.rodata, .data, etc) is rare, so when making the trade-off, we can lean toward implementation complexity.
ELF/Relocations.cpp
1095 ↗(On Diff #214325)

Thanks! Will delete.

test/ELF/aarch64-gnu-ifunc-nonpreemptable2.s
1 ↗(On Diff #214325)

Yes, the reference from .rodata creates the canonical PLT. This test is to increase test coverage. It passes prior to this change.

If I delete .data, there is a canonical PLT.

If I delete .rodata, there is no canonical PLT, but there are 2 R_*_IRELATIVE. The R_*_IRELATIVE relocating .got.plt is redundant. We can move the if (expr == R_ABS ... code before if (!sym.isInPlt()) to decrease one. That was what I experimented before I thought: we should probably just simplify the cases.

If I've understood correctly, this will make the case "address of an ifunc is taken from RW but there are no other relocations that would create a canonical PLT entry" worse as we would always create the canonical PLT entry even if it isn't strictly needed. The trade off is simpler code and a possible size saving from only needing one irelative relocation?

We have probably overloaded the meaning of "canonical PLT" here:) Let me give a bit more details about the two types of canonical PLT:

  1. STT_FUNC, Undefined or SharedSymbol. The canonical PLT makes it a Defined and exports it (replaceWithDefined sets the exportDynamic field) in the dynamic symbol table. This fake definition (st_value=addr(PLT)!=0, st_shndx=0, ld.so has logic to handle such symbols) can preempt a real definition in a DSO. This can be seen as an address leak (in AArch64 BTI protection such PLT needs BTI c).
  2. STT_IFUNC, Defined. The canonical PLT is created because the symbol is referenced by a non-PLT-generating-non-GOT-generating relocation. This case is already a Defined, so there is no new Defined (exportDynamic field is not set). We just change some fields of the symbol (st_value and st_type are important to ld.so. For st_shndx, as long as is non-zero, it doesn't matter what its actual value is). If the STT_GNU_IFUNC symbol was exported before, the converted STT_FUNC is still exported. If the STT_GNU_IFUNC was not (e.g. local/hidden), the new STT_FUNC is not.

Both are created for pointer equality. This patch deals with 2).

If the ifunc is referenced by another component.

  • Without a canonical PLT, its type is STT_GNU_IFUNC. A reference (symbolic relocation/GLOB_DAT/JUMP_SLOT) has to call the ifunc resolver to get the real address.
  • With a canonical PLT, its type is STT_FUNC. A reference does not have to call the ifunc resolver, but every subsequent function call has to go through the canonical PLT. Address taken of a non-preemptable ifunc in a static storage (.rodata, .data, etc) is rare, so when making the trade-off, we can lean toward implementation complexity.

I agree that in hand written code it is not common to take the address of an ifunc in such a way that a canonical PLT entry isn't needed. My concern is that this part of the code is motivated by something like HWASAN, which I believe may make compiler generated use of ifuncs to access shadow memory for example: D50544 . I must confess to not knowing too much about the details about HWASAN so I'm hoping that PCC can let us know if this will be a real performance problem or not.

pcc added a comment.Aug 12 2019, 6:08 PM

HWASAN should only be using GOT relative relocations to access shadow memory, so I wouldn't expect this change to have an impact on HWASAN.

I wouldn't object too much to removing this code at this point, but it may be worth bringing it back if we ever manage to improve relocation processing here to make this simpler to implement.

In D65995#1626365, @pcc wrote:

HWASAN should only be using GOT relative relocations to access shadow memory, so I wouldn't expect this change to have an impact on HWASAN.

Thanks for clarification!

I wouldn't object too much to removing this code at this point, but it may be worth bringing it back if we ever manage to improve relocation processing here to make this simpler to implement.

Do you have other things in mind that this may be beneficial?

In D65995#1626365, @pcc wrote:

HWASAN should only be using GOT relative relocations to access shadow memory, so I wouldn't expect this change to have an impact on HWASAN.

Thanks for clarification!

Thanks, I've no more objections.

I wouldn't object too much to removing this code at this point, but it may be worth bringing it back if we ever manage to improve relocation processing here to make this simpler to implement.

Do you have other things in mind that this may be beneficial?

I think that there are a few things that are made difficult due to a single pass through the relocations. For correctness we can always make decisions based on considering a relocation in isolation, but there are a few optimizations where we need to know about all the relocations. For example if we know that there is no need to create a canonical PLT then we could very easily avoid a lot of the complexity. This is easier said than done though, we'd need to be careful of performance as there can be a lot of relocations in a program.

peter.smith accepted this revision.Aug 13 2019, 1:46 AM

Given that this is mostly deleting code, I'm happy to approve.

This revision is now accepted and ready to land.Aug 13 2019, 1:46 AM
This revision was automatically updated to reflect the committed changes.