This is an archive of the discontinued LLVM Phabricator instance.

[BOLT][RFC] Implement composed relocations
ClosedPublic

Authored by jobnoorman on Mar 21 2023, 10:28 AM.

Details

Summary

Note: I marked this patch as RFC since it implements a rather obscure
ELF feature at a relatively high cost in terms of code complexity.
Therefore, I would like some feedback on whether this approach is
desirable.

Second note: the implemented feature isn't used (afaict) by X86 or
AArch64 so this should in principle be NFC. It is used by RISC-V so this
patch is part of my efforts to bring BOLT support to that platform. The
patch itself doesn't contain any RISC-V-specific code though.

BOLT currently assumes (and asserts) that no two relocations can share
the same offset. Although this is true in most cases, ELF has a feature
called (not sure if this is an official term) composed relocations [1]
where multiple relocations at the same offset are combined to produce a
single value.

For example, to support label subtraction (a - b) on RISC-V, two
relocations are emitted at the same offset:

  • R_RISCV_ADD32 a + 0
  • R_RISCV_SUB32 b + 0

which, when combined, will produce the value of (a - b).

To support this in BOLT, first, RelocationSetType in BinarySection is
changed to be a multiset in order to allow it to store multiple
relocations at the same offset.

Next, Relocation::emit() is changed to receive an iterator pair of
relocations. In most cases, these will point to a single relocation in
which case its behavior is unaltered by this patch. For composed
relocations, they should point to all relocations at the same offset and
the following happens:

  • A new method Relocation::createExpr() is called for every relocation. This method is essentially the same as the original emit() except that it returns the MCExpr without emitting it.
  • The MCExprs of relocations i and i+1 are combined using the opcode returned by the new method Relocation::getComposeOpcodeFor().
  • After combining all MCExprs, the last one is emitted.

Note that in the current patch, getComposeOpcodeFor() simply calls
llvm_unreachable() since none of the current targets use composed
relocations. This will change once the RISC-V target lands.

Finally, BinarySection::emitAsData() is updated to group relocations by
offset and emit them all at once.

Note that this means composed relocations are only supported in data
sections. Since this is the only place they seem to be used in RISC-V, I
believe it's reasonable to only support them there for now to avoid
further code complexity.

[1]: https://www.sco.com/developers/gabi/latest/ch4.reloc.html

Diff Detail

Event Timeline

jobnoorman created this revision.Mar 21 2023, 10:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2023, 10:28 AM
jobnoorman requested review of this revision.Mar 21 2023, 10:28 AM

Nice! I didn't know about this feature. May I ask you to create dependent review with getComposeOpcodeFor ? Thank you!

Nice! I didn't know about this feature. May I ask you to create dependent review with getComposeOpcodeFor ? Thank you!

Sure! D146554

Note that I still have to write tests for that patch; I will add them later.

treapster added a comment.EditedMar 22 2023, 2:12 AM

The implemented feature isn't used (afaict) by X86 or AArch64 so this should in principle be NFC

In fact we need to have it for other targets because for example some golang structures contain offsets into .text which is ok if we know output .text address, but if we land https://reviews.llvm.org/D144560 it would be cumbersume to reserve .text address ahead of time, and i think other cases of symbol diffs or other expressions may emerge outside of golang. Last week i tried to merge golang with -rewrite and ended up creating bogus relocation type which can contain arbitrary symbol expression and has advantages of not using multiset and extra indirection at the cost of Relocation's PODness and privatizing Relocation::Symbol. However, in case of RISCV we'll need to collect composed relocations in a single expression while reading them and only then add them to BinarySection. Here's the patch, should i open a competing RFC with it?:)

The implemented feature isn't used (afaict) by X86 or AArch64 so this should in principle be NFC

In fact we need to have it for other targets because for example some golang structures contain offsets into .text which is ok if we know output .text address, but if we land https://reviews.llvm.org/D144560 it would be cumbersume to reserve .text address ahead of time, and i think other cases of symbol diffs or other expressions may emerge outside of golang. Last week i tried to merge golang with -rewrite and ended up creating bogus relocation type which can contain arbitrary symbol expression and has advantages of not using multiset and extra indirection at the cost of Relocation's PODness and privatizing Relocation::Symbol. However, in case of RISCV we'll need to collect composed relocations in a single expression while reading them and only then add them to BinarySection. Here's the patch, should i open a competing RFC with it?:)

Thanks, this is an interesting alternative approach! I believe it could indeed be used to implement composed relocations on RISC-V, at least for the ones I currently care about.

I do think there is some information loss, though: since the relocation type is replaced by a "bogus" one, we won't know what the original type was. For composed relocations, the type of the last one determines how the value is written. Since we lost that type, this could be an issue. Now, for the relocations on RISC-V that are used in a composed way (ADD32/SUB32), this doesn't matter since their size corresponds to the "bogus" ones so this might just be a mere theoretical issue.

In any case, since this heavily changes one of the core data structures in BOLT, I'll leave it for others to decide which approach is preferable :)

I do think there is some information loss, though: since the relocation type is replaced by a "bogus" one, we won't know what the original type was. For composed relocations, the type of the last one determines how the value is written. Since we lost that type, this could be an issue. Now, for the relocations on RISC-V that are used in a composed way (ADD32/SUB32), this doesn't matter since their size corresponds to the "bogus" ones so this might just be a mere theoretical issue.

In any case, since this heavily changes one of the core data structures in BOLT, I'll leave it for others to decide which approach is preferable :)

We do lose the original type, but on the other hand MCStreamer doesn't get original type anyway since it only knows about expressions and size(which we save in type), and it's just a matter of whether we convert to MCExpr in Relocation::emit or earlier when creating the relocation instance. But since i mix Relocation with MCExpr and explicitly delegate handling of composed relocations to external code, i'd also wait for others to decide whether they like it.

yota9 added a comment.Mar 22 2023, 5:15 AM

From my standpoint this patch seems to be more solid and universal. We don't have to change structure, don't have to create pseudo relocation types on BOLT level and can compose relocations with more then 2 symbols easily if needed (although I'm, not sure if there are such cases). Indeed I was interested in this patch since the first thought was that might be re-used in golang support in couple of places and it seems to be a solution that might help with it. But the only major downside of this solution - we don't have special label relocations for aarch64/x86. But it seems to be reasonable addon and probably worth to speak with community about adding such types to other platforms too.

treapster added a comment.EditedMar 22 2023, 5:58 AM

From my standpoint this patch seems to be more solid and universal. We don't have to change structure, don't have to create pseudo relocation types on BOLT level and can compose relocations with more then 2 symbols easily if needed (although I'm, not sure if there are such cases). Indeed I was interested in this patch since the first thought was that might be re-used in golang support in couple of places and it seems to be a solution that might help with it. But the only major downside of this solution - we don't have special label relocations for aarch64/x86. But it seems to be reasonable addon and probably worth to speak with community about adding such types to other platforms too.

IMO if we can create some expressions in opaque way regardless of architecture, it's simpler than stacking up proper arch-specific relocations only to convert them to the same expression later. While this patch is more solid for incoming RISC-V relocs, having an internal opaque type is also useful, not necessarily implemented in the way it's done in my patch. It's definitely easier to implement internal hack than to force other platforms to introduce new relocation types. And since now we don't have the needed types for AArch64/X86, i guess we have to invent our own anyway.

So it feels to me (please correct me if I'm wrong) that there is some agreement that the current patch is the preferred approach for dealing with the composed relocations in RISC-V. There might be a need though for supporting custom MCExprs in relocations for features found in golang. I think it should be possible to implement that *in addition to* the current patch so maybe it would make sense to move that discussion to another RFC?

Friendly ping.

This revision is now accepted and ready to land.Apr 10 2023, 5:01 PM

@yota9 Are you ok with landing this?

This revision was automatically updated to reflect the committed changes.