This is an archive of the discontinued LLVM Phabricator instance.

[ELF][RISCV] Implement --emit-relocs with relaxation
ClosedPublic

Authored by jobnoorman on Aug 29 2023, 4:12 AM.

Details

Summary

Linker relaxation may change relocations (offsets and types). However,
when --emit-relocs is used, relocations are simply copied from the input
section causing a mismatch with the corresponding (relaxed) code
section.

This patch fixes this as follows: for non-relocatable RISC-V binaries,
InputSection::copyRelocations reads relocations from the relocated
section's relocations array (since this gets updated by the relaxation
code). For all other cases, relocations are read from the input section
directly as before.

In order to reuse as much code as possible, and to keep the diff small,
the original InputSection::copyRelocations is changed to accept the
relocations as a range of Relocation objects. This means that, in the
general case when reading from the input section, raw relocations need
to be converted to Relocations first, which introduces quite a bit of
boiler plate. It also means there's a slight code size increase due to
the extra instantiations of copyRelocations (for both range types).

Diff Detail

Event Timeline

jobnoorman created this revision.Aug 29 2023, 4:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2023, 4:12 AM
jobnoorman requested review of this revision.Aug 29 2023, 4:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2023, 4:12 AM

Nice!

[lld][RISCV] Fix --emit-relocs with relaxation

Sorry for some bikeshedding.
For projects like llvm,clang,lld, I think we omit [lld] unless we do treewide cleanup.
And a more specific tag is probably more common, like [ELF].

I think "Implement" or "Support" is better than "Fix" as we don't claim that --emit-relocs for RISC-V is implemented.
We just don't report an error.

GNU ld changes R_RISCV_RELAX to R_RISCV_NONE. I think keeping R_RISCV_RELAX is better. Projects depending on the relocation type should be aware.

lld/ELF/InputSection.cpp
364

indicate nonnull-ness

lld/test/ELF/riscv-relax-emit-relocs.s
14

We need --no-relax tests.

40

Add alignment directives to create R_RISCV_ALIGN

jobnoorman updated this revision to Diff 555588.Sep 2 2023, 5:26 AM
jobnoorman retitled this revision from [lld][RISCV] Fix --emit-relocs with relaxation to [ELF][RISCV] Implement --emit-relocs with relaxation.

Address @MaskRay's comments:

  • Make MapRel::file a reference;
  • Add tests for --no-relax (and make sure it works because it didn't before);
  • Add tests for R_RISCV_ALIGN;
  • Update title.
jobnoorman marked 3 inline comments as done.Sep 2 2023, 5:35 AM

GNU ld changes R_RISCV_RELAX to R_RISCV_NONE. I think keeping R_RISCV_RELAX is better. Projects depending on the relocation type should be aware.

GNU ld seems to keep R_RISCV_RELAX but changes R_RISCV_ALIGN to R_RISCV_NONE:

$ riscv64-linux-gnu-gcc -nostdlib -Wl,--emit-relocs,-Ttext=0x10000 lld/test/ELF/riscv-relax-emit-relocs.s
$ riscv64-linux-gnu-objdump -dr a.out

a.out:     file format elf64-littleriscv


Disassembly of section .text:

0000000000010000 <_start>:
   10000:	008000ef          	jal	ra,10008 <f>
			10000: R_RISCV_JAL	f
			10000: R_RISCV_RELAX	*ABS*
   10004:	004000ef          	jal	ra,10008 <f>
			10004: R_RISCV_JAL	f
			10004: R_RISCV_RELAX	*ABS*

0000000000010008 <f>:
   10008:	8082                	ret
			10008: R_RISCV_NONE	*ABS*+0x6

I can see the logic of removing R_RISCV_ALIGN since it doesn't make too much sense anymore once the padding nops are removed. Do you think we should follow this behavior? The current version of the patch keeps it which at least allows analysis tools to recover the original alignment requirements.

jrtc27 added a comment.Sep 2 2023, 9:56 AM

GNU ld changes R_RISCV_RELAX to R_RISCV_NONE. I think keeping R_RISCV_RELAX is better. Projects depending on the relocation type should be aware.

GNU ld seems to keep R_RISCV_RELAX but changes R_RISCV_ALIGN to R_RISCV_NONE:

$ riscv64-linux-gnu-gcc -nostdlib -Wl,--emit-relocs,-Ttext=0x10000 lld/test/ELF/riscv-relax-emit-relocs.s
$ riscv64-linux-gnu-objdump -dr a.out

a.out:     file format elf64-littleriscv


Disassembly of section .text:

0000000000010000 <_start>:
   10000:	008000ef          	jal	ra,10008 <f>
			10000: R_RISCV_JAL	f
			10000: R_RISCV_RELAX	*ABS*
   10004:	004000ef          	jal	ra,10008 <f>
			10004: R_RISCV_JAL	f
			10004: R_RISCV_RELAX	*ABS*

0000000000010008 <f>:
   10008:	8082                	ret
			10008: R_RISCV_NONE	*ABS*+0x6

I can see the logic of removing R_RISCV_ALIGN since it doesn't make too much sense anymore once the padding nops are removed. Do you think we should follow this behavior? The current version of the patch keeps it which at least allows analysis tools to recover the original alignment requirements.

They would be invalid though:

The addend indicates the number of bytes occupied by nop instructions at the relocation offset. The alignment boundary is specified by the addend rounded up to the next power of two.

jrtc27 added inline comments.Sep 2 2023, 10:00 AM
lld/ELF/InputSection.cpp
355

Isn’t that true of other architectures though? TLS relaxations, GOT->PCREL relaxation, …

jobnoorman updated this revision to Diff 555688.Sep 4 2023, 1:42 AM

Emit R_RISCV_NONE instead of R_RISCV_ALIGN since the latter would be invalid
once padding nops are removed. This is also consistent with the behavior of GNU
ld.

Note that I set the addend of R_RISCV_NONE to zero since the addend doesn't have
any meaning here. However, this is not consistent with GNU ld, which has an
addend but its value is a bit mysterious to me (it's not the same as the
original R_RISCV_ALIGN).

Note that I would have liked to be able to implement this in RISC-V specific
code (e.g., in riscvFinalizeRelax). This proved problematic, however, since
with --no-relax, R_RISCV_RELAX relocations will not be added to
InputSectionBase::relocations. Since we still want to emit those relocations,
copyRelocations cannot use the relocations array when --no-relax is used but
instead has to copy them directly from the input section. Therefore, I ended up
changing R_RISCV_ALIGN to R_RISCV_NONE inside copyRelocations.

MaskRay added a comment.EditedSep 5 2023, 2:42 PM

I think I like R_RISCV_ALIGN better as it is simpler and conveys more information.
Changing to R_RISCV_NONE may not be necessarily their intention. It's just a default behavior in GNU ld when a relocation is nullified.

Ultimately, I think a user processing these relocations need to know these relocations cannot be applied again.

--emit-relocs copied relocations do not have the SHF_ALLOC flag, so they are easily recognizable.

I think I like R_RISCV_ALIGN better as it is simpler and conveys more information.
Changing to R_RISCV_NONE may not be necessarily their intention. It's just a default behavior in GNU ld when a relocation is nullified.

Ultimately, I think a user processing these relocations need to know these relocations cannot be applied again.

--emit-relocs copied relocations do not have the SHF_ALLOC flag, so they are easily recognizable.

I think I agree: since the output of --emit-relocs is not supposed to be processed by linkers, there doesn't seem to be a need to follow the psABI to the letter. In fact, we might not be doing that anyway: relocation pairs like R_RISCV_JAL+R_RISCV_RELAX are not defined as valid relaxations but can be produced by --emit-relocs --relax.

I will revert the patch to the previous version. Do you agree @jrtc27?

I think it's still a bit dodgy for R_RISCV_ALIGN specifically, and don't like the divergence from GNU ld -- there should really be consensus between the toolchains there -- but it may be ok.

Regarding R_RISCV_RELAX pairs that don't normally exist: nothing in the psABI says you can't do that, they just don't have a defined meaning, and in fact that's a good thing, because it means new relaxations can be defined in future and toolchains can unconditionally emit them without having to worry about whether or not the linker knows about them; old linkers will just treat them as if they didn't have R_RISCV_RELAX.

I think it's still a bit dodgy for R_RISCV_ALIGN specifically, and don't like the divergence from GNU ld -- there should really be consensus between the toolchains there -- but it may be ok.

I definitely see the point here as well, especially about toolchain consensus.

While trying to figure out whether GNU ld's treatment of R_RISCV_ALIGN is intentional, I noticed that the behavior of --emit-relocs --relax has changed quite a bit since 2.40 (more specifically, since this commit). Since then, most R_RISCV_RELAX relocations are also changed to R_RISCV_NONE. This is the output of GNU ld 2.41 on the test case in this patch:

$ riscv64-linux-gnu-gcc -c llvm-project/lld/test/ELF/riscv-relax-emit-relocs.s
$ binutils-gdb/build/ld/ld-new -q -Ttext=0 riscv-relax-emit-relocs.o
$ riscv64-linux-gnu-objdump -dr a.out

a.out:     file format elf64-littleriscv


Disassembly of section .text:

0000000000000000 <_start>:
   0:	008000ef          	jal	ra,8 <f>
			0: R_RISCV_JAL	f
   4:	004000ef          	jal	ra,8 <f>
			4: R_RISCV_NONE	*ABS*+0x4
			4: R_RISCV_JAL	f

0000000000000008 <f>:
   8:	8082                	ret
			8: R_RISCV_NONE	*ABS*+0x4
			8: R_RISCV_NONE	*ABS*+0x6

I have the feeling this behavior is not intentional at all but rather an artifact of the implementation of relaxation (metadata about necessary code deletion is encoded in relocations which are later overwritten with R_RISCV_NONE).

Given the behavior of the two toolchains has diverged, I think it would make sense to sync with the GNU devs to agree on what the behavior should be. I will start with either opening a bug report or proposing a patch for GNU binutils tomorrow.

I think it's still a bit dodgy for R_RISCV_ALIGN specifically, and don't like the divergence from GNU ld -- there should really be consensus between the toolchains there -- but it may be ok.

I definitely see the point here as well, especially about toolchain consensus.

While trying to figure out whether GNU ld's treatment of R_RISCV_ALIGN is intentional, I noticed that the behavior of --emit-relocs --relax has changed quite a bit since 2.40 (more specifically, since this commit). Since then, most R_RISCV_RELAX relocations are also changed to R_RISCV_NONE. This is the output of GNU ld 2.41 on the test case in this patch:

$ riscv64-linux-gnu-gcc -c llvm-project/lld/test/ELF/riscv-relax-emit-relocs.s
$ binutils-gdb/build/ld/ld-new -q -Ttext=0 riscv-relax-emit-relocs.o
$ riscv64-linux-gnu-objdump -dr a.out

a.out:     file format elf64-littleriscv


Disassembly of section .text:

0000000000000000 <_start>:
   0:	008000ef          	jal	ra,8 <f>
			0: R_RISCV_JAL	f
   4:	004000ef          	jal	ra,8 <f>
			4: R_RISCV_NONE	*ABS*+0x4
			4: R_RISCV_JAL	f

0000000000000008 <f>:
   8:	8082                	ret
			8: R_RISCV_NONE	*ABS*+0x4
			8: R_RISCV_NONE	*ABS*+0x6

I have the feeling this behavior is not intentional at all but rather an artifact of the implementation of relaxation (metadata about necessary code deletion is encoded in relocations which are later overwritten with R_RISCV_NONE).

Given the behavior of the two toolchains has diverged, I think it would make sense to sync with the GNU devs to agree on what the behavior should be. I will start with either opening a bug report or proposing a patch for GNU binutils tomorrow.

Opening an https://sourceware.org/ issue is useful but I do believe R_RISCV_RELAX=>R_RISCV_NONE on our side is unnecessary.

--emit-relocs is a debug or advanced feature. Giving more information by preserving the original type has some value. The R_RISCV_RELAX=>R_RISCV_NONE write is unneeded complexity, albeit small.

MaskRay added inline comments.Sep 7 2023, 8:29 PM
lld/ELF/InputSection.cpp
355

Perhaps Cause and Effect should not be used here. I.e. remove "so", and just state what we do.

jobnoorman updated this revision to Diff 556235.Sep 8 2023, 1:37 AM
  • Revert to emitting R_RISCV_ALIGN instead of R_RISCV_NONE;
  • Improve comment.

Opening an https://sourceware.org/ issue is useful but I do believe R_RISCV_RELAX=>R_RISCV_NONE on our side is unnecessary.

Agreed.

--emit-relocs is a debug or advanced feature. Giving more information by preserving the original type has some value. The R_RISCV_RELAX=>R_RISCV_NONE write is unneeded complexity, albeit small.

I assume you meant R_RISCV_ALIGN => R_RISCV_NONE here? I've reverted the patch to its former behavior (emitting R_RISCV_ALIGN) in any case.

lld/ELF/InputSection.cpp
355

Updated, is this what you had in mind?

MaskRay accepted this revision.Sep 8 2023, 3:44 PM
This revision is now accepted and ready to land.Sep 8 2023, 3:44 PM
This revision was automatically updated to reflect the committed changes.

Opening an https://sourceware.org/ issue is useful but I do believe R_RISCV_RELAX=>R_RISCV_NONE on our side is unnecessary.

Agreed.

--emit-relocs is a debug or advanced feature. Giving more information by preserving the original type has some value. The R_RISCV_RELAX=>R_RISCV_NONE write is unneeded complexity, albeit small.

I assume you meant R_RISCV_ALIGN => R_RISCV_NONE here? I've reverted the patch to its former behavior (emitting R_RISCV_ALIGN) in any case.

I filed https://sourceware.org/bugzilla/show_bug.cgi?id=30844 (ld riscv: --emit-relocs does not retain the original relocation type).
I am quite concerned if RISC-V --emit-relocs get more uses before we have figured out the issues.
Please consider that the current lld behavior is highly experimental and don't rely on the behavior in some components like bolt/ (or consider its code highly experimental and may break any time).

I filed https://sourceware.org/bugzilla/show_bug.cgi?id=30844 (ld riscv: --emit-relocs does not retain the original relocation type).

Thanks for that! Still waiting for my Sourceware account :/

I am quite concerned if RISC-V --emit-relocs get more uses before we have figured out the issues.
Please consider that the current lld behavior is highly experimental and don't rely on the behavior in some components like bolt/ (or consider its code highly experimental and may break any time).

Yes, will keep this in mind.