This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Relax R_RISCV_ALIGN
ClosedPublic

Authored by MaskRay on Jun 11 2022, 4:51 PM.

Details

Summary

Alternative to D125036. Implement R_RISCV_ALIGN relaxation so that we can handle
-mrelax object files (i.e. -mno-relax is no longer needed) and creates a
framework for future relaxation.

relaxAux is placed in a union with InputSectionBase::jumpInstrMod, storing
auxiliary information for relaxation. In the first pass, relaxAux is allocated.
The main data structure is relocDeltas: when referencing relocations[i], the
actual offset is r_offset - (i ? relocDeltas[i-1] : 0).

relaxOnce performs one relaxation pass. It computes relocDeltas for all text
section. Then, adjust st_value/st_size for symbols relative to this section
based on SymbolAnchor. bytesDropped is set so that assignAddresses knows
that the size has changed.

Run relaxOnce in the finalizeAddressDependentContent loop to wait for
convergence of text sections and other address dependent sections (e.g.
SHT_RELR). Note: extrating relaxOnce into a separate loop works for many cases
but has issues in some linker script edge cases.

After convergence, compute section contents: shrink the NOP sequence of each
R_RISCV_ALIGN as appropriate. Instead of deleting bytes, we run a sequence of
memcpy on the content delimitered by relocation locations. For R_RISCV_ALIGN let
the next memcpy skip the desired number of bytes. Section content computation is
parallelizable, but let's ensure the implementation is mature before
optimizations. Technically we can save a copy if we interleave some code with
OutputSection::writeTo, but let's not pollute the generic code (we don't have
templated relocation resolving, so using conditions can impose overhead to
non-RISCV.)

Tested:
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- LLVM=1 defconfig all
built kernel is bootable.

Diff Detail

Event Timeline

MaskRay created this revision.Jun 11 2022, 4:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2022, 4:51 PM
MaskRay requested review of this revision.Jun 11 2022, 4:51 PM
MaskRay updated this revision to Diff 436196.Jun 12 2022, 2:12 AM
MaskRay retitled this revision from WIP [ELF] Relax R_RISCV_ALIGN to [ELF] Relax R_RISCV_ALIGN.
MaskRay edited the summary of this revision. (Show Details)

x

MaskRay updated this revision to Diff 436277.Jun 12 2022, 11:04 PM
MaskRay edited the summary of this revision. (Show Details)

R_RISCV_CALL patch will follow

MaskRay edited the summary of this revision. (Show Details)Jun 14 2022, 12:22 AM
MaskRay updated this revision to Diff 436688.Jun 14 2022, 1:08 AM
MaskRay edited the summary of this revision. (Show Details)

adjust test. add more comments

MaskRay updated this revision to Diff 436867.Jun 14 2022, 11:29 AM

Handle --gc-sections

luismarques added inline comments.Jun 15 2022, 12:08 PM
lld/ELF/LinkerScript.cpp
949–950 ↗(On Diff #436867)

Document return

1326 ↗(On Diff #436867)

We don't have test coverage for this.

MaskRay updated this revision to Diff 437298.Jun 15 2022, 12:34 PM
MaskRay edited the summary of this revision. (Show Details)

Remove unneeded assignAddress change

MaskRay marked 2 inline comments as done.Jun 15 2022, 12:38 PM
MaskRay added inline comments.
lld/ELF/LinkerScript.cpp
949–950 ↗(On Diff #436867)

I think assignOffsets does not need a change. Removed

1326 ↗(On Diff #436867)

I removed this change.

If we exit the loop with

relaxOnce => true
assignAddresses   # address updated with correct section size information
relaxOnce => false   # instruction relaxation output does not change

It has converged.

MaskRay added a reviewer: peter.smith.EditedJun 19 2022, 12:13 PM
MaskRay marked 2 inline comments as done.
MaskRay added a subscriber: compnerd.

(I intended to only bother Peter after some RISC-V folks made some verification, but I guess the two steps can be parallel. This works for some configurations I know, including @compnerd's usage.)
I have asked a distrubotr @felixonmars for testing as well.

Do you really need a plain union? How about a PointerUnion or something llvmy std::variant?

MaskRay added a comment.EditedJun 19 2022, 9:39 PM

Do you really need a plain union? How about a PointerUnion or something llvmy std::variant?

The size is very important for memory usage. 8 or 16 bytes here may cost 1% maximum RSS.
Since PointerUnion abstraction seems unnecessary, I'd want to avoid it.

Do you really need a plain union? How about a PointerUnion or something llvmy std::variant?

The size is very important for memory usage. 8 or 16 bytes here may cost 1% maximum RSS.
Since PointerUnion abstraction seems unnecessary, I'd want to avoid it.

I would expect the size of the pointer union to be 8 bytes. My main motivation is that unions are odd.

MaskRay added a comment.EditedJun 20 2022, 12:52 AM

Do you really need a plain union? How about a PointerUnion or something llvmy std::variant?

The size is very important for memory usage. 8 or 16 bytes here may cost 1% maximum RSS.
Since PointerUnion abstraction seems unnecessary, I'd want to avoid it.

I would expect the size of the pointer union to be 8 bytes. My main motivation is that unions are odd.

PointerUnion will not increase the size, but the additional abstraction is unnecessary. Both features are used in very specific context and there is no need to worry about misuse (and if there is a misuse, it will fail immediately). I'd also want to avoid the cost clearing the least significant bit.

I haven't looked at the code yet, but I can confirm that I was able to build and boot FreeBSD for RISCV64 with this patch and the FreeBSD patch below to remove -mno-relax (I could have just defined the riscv-relaxations linker feature, but I wanted to make sure that the code is actually built with relaxations):

diff --git a/share/mk/bsd.lib.mk b/share/mk/bsd.lib.mk
index 36d91ea019f3..b37ebe39ecf4 100644
--- a/share/mk/bsd.lib.mk
+++ b/share/mk/bsd.lib.mk
@@ -123,8 +123,8 @@ CXXFLAGS+= ${DEBUG_FILES_CFLAGS}
 CTFFLAGS+= -g
 .endif
 
-.if ${MACHINE_CPUARCH} == "riscv" && ${LINKER_FEATURES:Mriscv-relaxations} == ""
-CFLAGS += -mno-relax
+.if ${MACHINE_CPUARCH} == "riscv"
+CFLAGS += -mrelax
 .endif
 
 .include <bsd.libnames.mk>
diff --git a/share/mk/bsd.prog.mk b/share/mk/bsd.prog.mk
index 6b8da09edaf0..3743250c2c87 100644
--- a/share/mk/bsd.prog.mk
+++ b/share/mk/bsd.prog.mk
@@ -84,8 +84,8 @@ CXXFLAGS+= -ftrivial-auto-var-init=pattern
 # bsd.sanitizer.mk is not installed, so don't require it (e.g. for ports).
 .sinclude "bsd.sanitizer.mk"
 
-.if ${MACHINE_CPUARCH} == "riscv" && ${LINKER_FEATURES:Mriscv-relaxations} == ""
-CFLAGS += -mno-relax
+.if ${MACHINE_CPUARCH} == "riscv"
+CFLAGS += -mrelax
 .endif
 
 .if defined(CRUNCH_CFLAGS)
diff --git a/share/mk/bsd.sys.mk b/share/mk/bsd.sys.mk
index 221e8b028479..f6266d7b991b 100644
--- a/share/mk/bsd.sys.mk
+++ b/share/mk/bsd.sys.mk
@@ -83,6 +83,13 @@ CWARNFLAGS.clang+=   -Wno-unused-const-variable
 .if ${COMPILER_TYPE} == "clang" && ${COMPILER_VERSION} >= 130000
 CWARNFLAGS.clang+=     -Wno-error=unused-but-set-variable
 .endif
+.if ${COMPILER_TYPE} == "clang" && ${COMPILER_VERSION} >= 150000
+CWARNFLAGS.clang+=     -Wno-deprecated-non-prototype
+CWARNFLAGS.clang+=     -Wno-unreachable-code-generic-assoc
+CWARNFLAGS.clang+=     -Wno-strict-prototypes
+CWARNFLAGS.clang+=     -Wno-error=unused-but-set-parameter
+CWARNFLAGS.clang+=     -Wno-error=implicit-function-declaration
+.endif
 .endif # WARNS <= 6
 .if ${WARNS} <= 3
 CWARNFLAGS.clang+=     -Wno-tautological-compare -Wno-unused-value\
diff --git a/stand/defs.mk b/stand/defs.mk
index e9c97f7720ab..753bf39ced31 100644
--- a/stand/defs.mk
+++ b/stand/defs.mk
@@ -175,9 +175,7 @@ CFLAGS+=    -fPIC
 
 # Some RISC-V linkers have support for relaxations, while some (lld) do not
 # yet. If this is the case we inhibit the compiler from emitting relaxations.
-.if ${LINKER_FEATURES:Mriscv-relaxations} == ""
-CFLAGS+=       -mno-relax
-.endif
+CFLAGS+=       -mrelax
 
 # The boot loader build uses dd status=none, where possible, for reproducible
 # build output (since performance varies from run to run). Trouble is that
diff --git a/sys/conf/kern.mk b/sys/conf/kern.mk
index b86149ab4618..2c82ff97b88e 100644
--- a/sys/conf/kern.mk
+++ b/sys/conf/kern.mk
@@ -34,6 +34,9 @@ NO_WUNUSED_BUT_SET_VARIABLE=  -Wno-unused-but-set-variable
 .if ${COMPILER_VERSION} >= 140000
 NO_WBITWISE_INSTEAD_OF_LOGICAL=        -Wno-bitwise-instead-of-logical
 .endif
+.if ${COMPILER_VERSION} >= 150000
+CWARNFLAGS+=   -Wno-strict-prototypes -Wno-error=unused-but-set-variable
+.endif
 # Several other warnings which might be useful in some cases, but not severe
 # enough to error out the whole kernel build.  Display them anyway, so there is
 # some incentive to fix them eventually.
@@ -151,9 +154,7 @@ CFLAGS.clang+=      -mcmodel=medium
 CFLAGS.gcc+=   -mcmodel=medany
 INLINE_LIMIT?= 8000
 
-.if ${LINKER_FEATURES:Mriscv-relaxations} == ""
-CFLAGS+=       -mno-relax
-.endif
+CFLAGS+=       -mrelax
 .endif
 
 #

I'll leave the RISCV details to the experts. General approach looks good to me.

lld/ELF/InputSection.h
102

It looks like the definitions are only used in RISCV.cpp as only a pointer is used in the union below. Could these be forward declared here?

I could be missing some use though.

208

update comment for relaxAux? I assume that the union is because we don't have relaxation and basic block sections simultaneously?

lld/ELF/Target.h
92

Will be worth a comment like needsThunk to describe what this does, just in case another architecture chooses to do RiscV like relaxations.

lld/ELF/Writer.cpp
1637

Although more source changes. Would it be cleaner to have the passes variable here, and pass it into createThunks as a parameter?

gkm added inline comments.Jun 20 2022, 1:27 PM
lld/ELF/Arch/RISCV.cpp
593

InputSectionBase::bytesDropped is merely uint8_t, and feels vulnerable to overflow. The comment on the decl says it is intended for basic-block sections, for which 8 bits is reasonable, but this new use, it might be inadequate. Perhaps uint16_t ?

MaskRay updated this revision to Diff 438522.Jun 20 2022, 6:33 PM
MaskRay marked 5 inline comments as done.

comments

kito-cheng added inline comments.Jun 21 2022, 4:40 AM
lld/ELF/Arch/RISCV.cpp
610

I hit overflow here as @gkm concern, and the fixed by changing bytesDropped to uint16_t (yeah, I tested the uint8_t version), maybe we can put an assert (delta <= numeric_limits<uint16_t>::max()); here to make sure this could catch earlier?

I saw there are assertions for byteDropped in other place, so I think that should be reasonable?

[kitoc@xxxx llvm-project]$ grpe bytesDropped  * -R
...
lld/ELF/InputSection.h:  uint8_t bytesDropped = 0;
lld/ELF/InputSection.h:    assert(bytesDropped + num < 256);
lld/ELF/InputSection.h:    bytesDropped += num;
lld/ELF/InputSection.h:    assert(bytesDropped >= num);
lld/ELF/InputSection.h:    bytesDropped -= num;
...

Gonna run second round of testing.

luismarques added inline comments.Jun 21 2022, 6:18 AM
lld/ELF/Arch/RISCV.cpp
554–562

Can't we skip this for the first pass?

563–565

No test coverage for this?

luismarques added inline comments.Jun 21 2022, 8:18 AM
lld/ELF/Arch/RISCV.cpp
563–565

Nevermind. D127611.

gkm added inline comments.Jun 21 2022, 9:00 AM
lld/ELF/InputSection.h
146
MaskRay updated this revision to Diff 438897.Jun 21 2022, 8:16 PM
MaskRay marked 5 inline comments as done.

add // namespace

lld/ELF/Arch/RISCV.cpp
554–562

This code takes nearly no time. I don't think we should special case the first pass.

593

Thanks!

610

push_back/drop_back is a code problem of the basic block sections feature. I don't intend to touch the functions for this patch.
Changed RISCV.cpp:611 instead.

lld/ELF/InputSection.h
146

push_back/drop_back is a code problem of the basic block sections feature. I don't intend to touch the functions for this patch.
Changed RISCV.cpp:611 instead.

lld/ELF/Writer.cpp
1637

ThunkCreator::pass needs to be retained, otherwise uint32_t pass needs to be threading though most of its member functions.

Thanks for the updates, I don't have any more comments. Happy for someone comfortable with RISCV to approve when they are happy with the details.

I have used this patch to compile and link to the software below. Generally, there are no obvious problems with this patch.

  1. curl-7.82.0:
    • command: ./configure CC=/path/to/clang LDFLAGS="-fuse-ld=lld --ld-path=/path/to/ld.lld -mrelax" --without-ssl
    • result:The compiler and linker works fine and all the tests that are provided have passed. Simply running is fine.
  1. bash-5.1:
    • command: ./configure CC_FOR_BUILD=/path/to/clang LDFLAGS_FOR_BUILD="-fuse-ld=lld --ld-path=/path/to/ld.lld -mrelax" CC=/path/to/clang LDFLAGS="-fuse-ld=lld --ld-path=/path/to/ld.lld -mrelax"
    • result:The compiler and linker works fine and part of the tests that are provided has passed. Failed test not related to this patch. Simply running is fine.
  1. vim 8.2.5:
    • command: ./configure CC=/path/to/clang LDFLAGS="-fuse-ld=lld --ld-path=/path/to/ld.lld -mrelax
    • result:The compiler and linker works fine and part of the tests that are provided has passed. Failed test not related to this patch. Simply running is fine.
  1. libevent:
    • command: ./configure CC=/path/to/clang LDFLAGS="-fuse-ld=lld --ld-path=/path/to/ld.lld -mrelax"
    • result: The compiler and linker works fine and all the tests that are provided have passed.
  1. tmux:
    • command: ./configure CC=/path/to/clang LDFLAGS="-fuse-ld=lld --ld-path=/path/to/ld.lld -mrelax"
    • result: The compiler and linker works fine. run make check have no error. Simply running is fine.

Tested with this patch with LLVM testsuite and internal testsuite, and no failure :)

Thanks for all the suggestions and testing! I think the approach implemented in this patch series is what we should use. I'll wait a week and push the two...

I have asked a distrubotr @felixonmars for testing as well.

Thanks for the patch. We (@Arch) have done some extended testing with for example the Firefox browser, and everything looks good so far.

MaskRay accepted this revision.Jun 23 2022, 2:45 PM
This revision is now accepted and ready to land.Jun 23 2022, 2:45 PM
jrtc27 added inline comments.Jul 3 2022, 9:39 AM
lld/ELF/Arch/RISCV.cpp
277

ALIGN is not a hint, it's a requirement (as opposed to R_RISCV_RELAX, which is a true hint). I would suggest separating these two notions at the RelExpr level, then the config->relax check to skip R_RISCV_RELAX in the CALL(_PLT) patch can instead be done at the generic level rather than in the target.

491

Should this not be Elf_Sword or similar? ELFCLASS64 _can_ overflow this, even if you really really really shouldn't.

494

A getter that returns nothing is odd, saveSymbolAnchors/recordSymbolAnchors/initSymbolAnchors/similar?

500

Do you not just want an int32_t * (or smart pointer) given it's either 0 or relocations.size() elements? SmallVector adds overhead as it tracks both size and capacity, but we don't need any dynamic behaviour (beyond "does not exist" (null) and "exists with relocations.size() elements").

523

Does this not need to be stable_sort to guarantee the zero-sized symbols have their anchors in the right order? Also, does the order of A's end vs B's start matter for this implementation? That should be documented (and, ideally, why).

540

This seems like it belongs in generic code?

554

Is this actually the original st_value? If you interleave relaxation with other adjustment of st_value, won't delta stay the same but then the "unrelaxed" st_value will be different?

570

It might be nice to hoist this out to a separate function, it's quite nested here and this is the bit people care about editing to add new relaxations, so separating the "do the relaxations" part from all the tracking infrastructure would help there.

653

I don't know if it's a requirement that, say, 6 padding bytes be emitted as nop; c.nop rather than c.nop; nop. Does binutils make this assumption?

MaskRay updated this revision to Diff 441971.Jul 3 2022, 12:02 PM
MaskRay marked 8 inline comments as done.

comments

lld/ELF/Arch/RISCV.cpp
277

The R_RISCV_ALIGN case label is specific to RISC-V. The pass is done after scanRelocations, so the code does not fit into the generic code. RelExpr can be split for ALIGN and RELAX but its necessity isn't that high. If just that HINT is a bit of misnomer I can rename it separately to R_RELAX or R_RELAX_OR_ALIGN.

491

There are precedents using int32_t in many places. Elf_Sword is not used. Since relocations aren't that many, just switched to int64_t.

500

changed to std::unique_ptr<int64_t[]>

523

llvm::sort is fine. The previous comment explains it. Improved the comment a bit.

540

This is RISC-V specific. Unless another supported architecture adds relaxation, this can stay here.

554

For most symbols, this is the original st_value.
Linker script symbol assignments may rewrite st_value (and does not care about the original value).
The code should be fine.

653

I vaguely remember that GNU ld seems to use nop; c.nop (prefer long to short). This code should match its behavior.

MaskRay added inline comments.Jul 3 2022, 9:35 PM
lld/ELF/Arch/RISCV.cpp
491

Hmm. I guess the original int32_t or uint32_t may be better.

MaskRay updated this revision to Diff 442448.Jul 5 2022, 11:45 PM

Thank for the thorough comments.
I'll push this in few days if there is no further request.

Rebase after a getInputSections optimization

This revision was landed with ongoing or failed builds.Jul 7 2022, 10:16 AM
This revision was automatically updated to reflect the committed changes.
luismarques added inline comments.Jan 19 2023, 6:13 AM
lld/ELF/Arch/RISCV.cpp
599

@MaskRay I ran into this error when building LLVM with LLD in a RISC-V host. I guess we actually need an int32?

MaskRay added inline comments.Jan 19 2023, 7:47 PM
lld/ELF/Arch/RISCV.cpp
599

This will increase the size of InputSection which we should try to avoid (memory usage increase). I am on a trip so cannot investigate it closely.

It will help if you can ask the author of --optimize-bb-jumps whether it is still used. Removing nopFiller will make room for delta.

luismarques added inline comments.
lld/ELF/Arch/RISCV.cpp
599

It will help if you can ask the author of --optimize-bb-jumps whether it is still used. Removing nopFiller will make room for delta.

@tmsriram any comments?

lld/test/ELF/riscv-relax-align.s