This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Alignment relaxation
AbandonedPublic

Authored by gkm on May 5 2022, 12:53 PM.

Details

Summary

This is the first in a stack of diffs that repackages and revises D100835 for RISC-V linker relaxation.

This diff implements machinery to elide unwanted instruction bytes, and to rewrite NOP sequences that are optimal for code-alignment padding.

Caveat: this has not been tested with option --emit-relocs.

Diff Detail

Event Timeline

gkm created this revision.May 5 2022, 12:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 12:53 PM
gkm requested review of this revision.May 5 2022, 12:53 PM
gkm edited the summary of this revision. (Show Details)May 5 2022, 1:36 PM
gkm edited the summary of this revision. (Show Details)
arichardson added inline comments.May 6 2022, 1:08 AM
lld/ELF/InputSection.cpp
196

Missing test coverage, you should add some symbols to your test cases.

lld/ELF/InputSection.h
273

Moving this after the other bool above should avoid the size change.

gkm marked 2 inline comments as done.May 6 2022, 11:56 AM
gkm updated this revision to Diff 427703.May 6 2022, 11:57 AM

Revise according to review feedback:

  • place new bool member adjacent to another bool to avoid size change of lld/ELF/InputSection.cpp
  • add test case for symbol-size
reames added a comment.May 6 2022, 3:29 PM

Minor style comments. Not really my area, so I can't really comment on how this fits within the existing structure.

lld/ELF/Arch/RISCV.cpp
498

the name nopBytes is slightly ambiguous here. Maybe requiredNopBytes? To distinguish it on first glance from the number available.

519

Extend this comment to indicate why this is required - i.e. split a 4 byte into a 2-byte remainder case + not wanting to assume the compiler emitted nops at all.

526

The rvc part of this check is redundant with the checks above. Maybe drop it from the else if clause, and turn it into an assert within the block?

lld/ELF/InputSection.cpp
177

This code seems to assume that ranges appear in order of increasing start address. I believe that is true in the code, but could easily be asserted more heavily to prevent future subtle mistakes.

181

You should probably assert that no range contains the symbol address.

lld/ELF/InputSection.h
168

Having variables defined between functions looks odd to my eye, but I see this already exists in the class. Maybe do a pre-nfc to group all the data together, and then add your new ones there?

gkm retitled this revision from [LLD][RISCV] Alignment relaxation to [RISCV] Alignment relaxation.May 9 2022, 9:30 PM
gkm edited the summary of this revision. (Show Details)
gkm updated this revision to Diff 428291.EditedMay 9 2022, 10:28 PM
gkm marked 6 inline comments as done.
  • Revise according to review feedback
  • Rewrite InputSectionBase::deleteRanges() to use a compact single pass that unifies handling of symbol addresses and sizes
  • expand lld/test/ELF/riscv-relax-syms.s
gkm added inline comments.May 10 2022, 7:41 AM
lld/ELF/InputSection.cpp
177

The relaxation algorithm processes relocations by ascending address, and thus delete ranges are pushed to the vector by ascending address. I added an assertion.

177
gkm updated this revision to Diff 429051.May 12 2022, 12:46 PM

Refactor ...

  • s/DeleteRanges/AdjustRanges/: these can now handle deletion and insertion, supporting multiple-pass relaxation where alignment padding can expand, and/or relaxations can be undone
  • Build framework for adding more relaxation types in later diffs
  • Add comments
gkm updated this revision to Diff 429100.May 12 2022, 4:31 PM
  • Fix indentation caught by clang-format

Clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git , apply this patch:

--- i/arch/riscv/Makefile
+++ w/arch/riscv/Makefile
@@ -38,9 +38,9 @@ endif
 
 ifeq ($(CONFIG_LD_IS_LLD),y)
-       KBUILD_CFLAGS += -mno-relax
-       KBUILD_AFLAGS += -mno-relax
+       #KBUILD_CFLAGS += -mno-relax
+       #KBUILD_AFLAGS += -mno-relax
 ifndef CONFIG_AS_IS_LLVM
-       KBUILD_CFLAGS += -Wa,-mno-relax
-       KBUILD_AFLAGS += -Wa,-mno-relax
+       #KBUILD_CFLAGS += -Wa,-mno-relax
+       #KBUILD_AFLAGS += -Wa,-mno-relax
 endif
 endif

(My llvm build directory is /tmp/RelA)

ninja -C /tmp/RelA clang lld llvm-{nm,size,strings,objcopy,objdump,readelf,ar,strip}

cd ~/Dev/linux
PATH=/tmp/RelA/bin:$PATH make O=/tmp/linux/riscv64 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- LLVM=1 -j60 defconfig all
...

I see a segfault

  LD      .tmp_vmlinux.kallsyms1
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ld.lld -melf64lriscv --build-id=sha1 --script=./arch/riscv/kernel/vmlinux.lds --strip-debug -o .tmp_vmlinux.kallsyms1 --whole-archive arch/riscv/kernel/head.o init/built-in.a usr/built-in.a arch/riscv/built-in.a arch/riscv/errata/built-in.a kernel/built-in.a certs/built-in.a mm/built-in.a fs/built-in.a ipc/built-in.a security/built-in.a crypto/built-in.a block/built-in.a lib/built-in.a arch/riscv/lib/built-in.a lib/lib.a arch/riscv/lib/lib.a drivers/built-in.a sound/built-in.a net/built-in.a virt/built-in.a --no-whole-archive --start-group ./drivers/firmware/efi/libstub/lib.a --end-group
 #0 0x000056142291b4d3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/tmp/RelA/bin/lld+0x387c4d3)
 #1 0x000056142291940e llvm::sys::RunSignalHandlers() (/tmp/RelA/bin/lld+0x387a40e)
 #2 0x000056142291bada SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f8197d53200 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12200)
 #4 0x0000561422a7ad01 void std::__introsort_loop<SymbolAddr*, long, __gnu_cxx::__ops::_Iter_comp_iter<lld::elf::InputSectionBase::adjustRanges(llvm::ArrayRef<lld::elf::InputSectionBase::AdjustRange>)::$_0>>(SymbolAddr*, SymbolAddr*, long, __gnu_cxx::__ops::_Iter_comp_iter<lld::elf::InputSectionBase::adjustRanges(llvm::ArrayRef<lld::elf::InputSectionBase::AdjustRange>)::$_0>) InputSection.cpp:0:0
 #5 0x0000561422a6d458 lld::elf::InputSectionBase::adjustRanges(llvm::ArrayRef<lld::elf::InputSectionBase::AdjustRange>) (/tmp/RelA/bin/lld+0x39ce458)
 #6 0x0000561422b61a7c (anonymous namespace)::RISCV::finalizeSections() const RISCV.cpp:0:0
 #7 0x0000561422ba4223 (anonymous namespace)::Writer<llvm::object::ELFType<(llvm::support::endianness)1, true>>::finalizeSections() Writer.cpp:0:0
 #8 0x0000561422b827dc void lld::elf::writeResult<llvm::object::ELFType<(llvm::support::endianness)1, true>>() (/tmp/RelA/bin/lld+0x3ae37dc)
 #9 0x00005614229f170f lld::elf::LinkerDriver::link(llvm::opt::InputArgList&) (/tmp/RelA/bin/lld+0x395270f)
#10 0x00005614229e3ab4 lld::elf::LinkerDriver::linkerMain(llvm::ArrayRef<char const*>) (/tmp/RelA/bin/lld+0x3944ab4)
#11 0x00005614229e242d lld::elf::link(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, bool, bool) (/tmp/RelA/bin/lld+0x394342d)
#12 0x000056142286f488 lldMain(int, char const**, llvm::raw_ostream&, llvm::raw_ostream&, bool) lld.cpp:0:0
#13 0x000056142286ec62 main (/tmp/RelA/bin/lld+0x37cfc62)
#14 0x00007f81975fa7fd __libc_start_main ./csu/../csu/libc-start.c:332:16
#15 0x000056142286e7aa _start (/tmp/RelA/bin/lld+0x37cf7aa)
lld/ELF/Arch/RISCV.cpp
476

Consider SmallVector<InputSectionBase::AdjustRange, 0>

488

constexpr

491

static

496

static

Actually, avoid defining such a short function.

501

avoid defining such a short function.

514

Newer code uses osec instead of os. os is usually for raw_ostream.

519

+2 needs a comment.

525

Add some const

548
550

Use

if (!(is->flags & SHF_EXECINSTR))
  continue;

to decrease indentation.

lld/ELF/InputSection.cpp
187

Use sym for Symbol *.

Avoid Symbol *b from some old code.

1080

Prefer moving R_RELAX_HINT to the switch below to not penalize other architectures. They don't want to take a comparison overhead here.

lld/test/ELF/riscv-relax-align-rvc.s
12

Avoid a temporary file for FIleCheck input.

lld/test/ELF/riscv-relax-align.s
40

Add a few more text sections to demonstrate that multiple sections can be handled.

lld/test/ELF/riscv-relax-syms.s
3

Avoid -p if the directory is known to not exist. Use cd %t if you need to refer to %t/ many times.

5

##

18

CHECK-DAG:

30

Add a comment somewhere to explain what the notation [0..4) means.

MaskRay added inline comments.May 14 2022, 5:19 PM
lld/ELF/Arch/RISCV.cpp
508

A NOTE does not need a username.

lld/ELF/InputSection.h
168

The mutable can be avoided.

gkm marked 17 inline comments as done.May 16 2022, 10:44 AM

. . .

I see a segfault

Yes, I see it also. Will debug.
Meanwhile, I responded to other feedback and asked you some follow-up questions.
Thanks for reviewing!

lld/ELF/Arch/RISCV.cpp
496

What do you propose as an alternative? Open-code the shift?

501

What do you propose as an alternative? Open-code the bitwise ops? The idea is to abstract access to subfields of addend. Ideally, I could access alignNopBytes and alignBoundary as distinct struct members, but I am hijacking the existing Relocation interface that I can't change.

lld/ELF/InputSection.h
168

Say more. How?

lld/test/ELF/riscv-relax-align-rvc.s
12

I find that the temporary files are very useful for debugging tests. I can see the output of llvm-objdump without the effort of extracting the command from the log and running it manually.

Although I wish it were standard practice to retain temp files, I can revert back to a pipeline prior to pushing upstream.

gkm updated this revision to Diff 429778.May 16 2022, 10:45 AM
gkm marked 2 inline comments as done.
  • Revise according to review feedback
gkm updated this revision to Diff 429790.May 16 2022, 11:10 AM
  • Handle case R_RELAX_HINT in InputSectionBase::getRelocTargetVA()
gkm added inline comments.May 16 2022, 11:14 AM
lld/ELF/InputSection.cpp
1080

I am unconvinced:

  1. The extra comparison overhead will be lost in the noise, though I will measure to be certain
  2. Delaying R_RELAX_HINT means we run the block of code between here and the switch, which requires handling R_RELAX_HINT in InputSectionBase::getRelocTargetVA() to return a fake value 0.
MaskRay added inline comments.May 16 2022, 11:28 AM
lld/ELF/Arch/RISCV.cpp
496

Yes, open-code it.

501

Open code it since it's only called once. A comment is sufficient to express the intent.

lld/ELF/InputSection.cpp
1080

R_NONE is also not nice here. It would be better be placed in the switch cases but I haven't got time to refine the basic-block-section usage.

R_RELAX_HINT is risc-v specific. I don't want it to penalize other code. The cost is certainly small but every a little makes a mickle.

gkm updated this revision to Diff 429851.May 16 2022, 2:19 PM
  • Rework class SymbolAddr. The broken version's comparison function confused llvm::sort by returning true when given the same object for args a and b. This violates the API because comparisions should return a < b, which is false when a and b are the same object.
gkm marked 3 inline comments as done.May 16 2022, 2:27 PM
gkm updated this revision to Diff 429859.May 16 2022, 2:37 PM
  • open code the trivial getter/setter functions atop R_RISCV_ALIGN's Relocation::addend for align boundary & nop byte count.
gkm added a comment.May 16 2022, 2:40 PM

LLD no longer crashes when linking Linux.

On lld/ELF/Relocations.cpp:1439, R_RELAX_HINT should not trigger sym.hasDirectReloc = true;, otherwise an ifunc may unnecessarily be converted to a canonical PLT. You may read *-ifunc-nonpreemptible*.s tests for what they do. I have some notes on https://maskray.me/blog/2021-01-18-gnu-indirect-function#address-significance

lld/ELF/Arch/RISCV.cpp
40

The name is unnecessarily the same as Writer::finalizeSections. relaxSections may be a better name.

488
520
522

emplace_back

573

The variable and the member function have the same name. Rename the variable to ranges?

Move it outside the outer loop and use a ranges.clear() here. Try avoiding constructing/destructing a vector in a loop.

576

r.addend is guaranteed to be non-zero

591

Don't perform relaxation for a relocatable link.

lld/ELF/InputSection.cpp
154

struct

186

If a file compiled with -ffunction-sections has N text sections/function symbols, this takes O(N^2). I think we should build a map to decrease the time complexity. You can add a map to InputFile. It's unfortunate to increase the size of InputFile which it is acceptable.

189

emplace_back

192

stable_sort

Two symbols may be defined at the same offset.

221
236

The condition isn't correct. It may trigger the undefined behavior src0 < dest+length < srcN.


We should avoid std::copy when

The behavior is undefined if d_first is within the range [first, last)

Otherwise we should use std::copy.


I am not sure the code works with adding bytes. In that case we need to get the NOP pattern.

lld/ELF/InputSection.h
184

Then remove mutable from copiedData

186
lld/test/ELF/riscv-relax-align-rvc.s
39

The test needs many .balign {8,16} directives to test InputSectionBase::adjustRanges.

The current InputSectionBase::adjustRanges is flawed but the tests aren't able to detect that.

MaskRay added inline comments.May 19 2022, 8:07 PM
lld/ELF/Arch/RISCV.cpp
582

This is inefficient. relaxOnce should call script->assignAddresses() at most once. The call should be moved just before return.

MaskRay requested changes to this revision.May 19 2022, 8:08 PM
This revision now requires changes to proceed.May 19 2022, 8:08 PM

While I think there are still significant problems needing to address, it may work with quite a few application. It'd be nice if folks can check how this patch works without -mno-relax.

gkm marked 17 inline comments as done.May 22 2022, 4:54 PM
gkm added inline comments.
lld/ELF/Arch/RISCV.cpp
40

This is an override of target-independent virtual void TargetInfo::finalizeSections(), which is meant to be a generic hook for targets to do target-specific things. For RISC-V, that target-specific thing happens to be relaxation.

522
/scratch/llvm-project/lld/ELF/Arch/RISCV.cpp:523:18: error: no matching member function for call to 'emplace_back'
    adjustRanges.emplace_back({r.offset, incr});
    ~~~~~~~~~~~~~^~~~~~~~~~~~
/scratch/llvm-project/llvm/include/llvm/ADT/SmallVector.h:927:45: note: candidate template ignored: substitution failure: deduced incomplete pack <(no value)> for template parameter 'ArgTypes'
  template <typename... ArgTypes> reference emplace_back(ArgTypes &&... Args) {
                        ~~~~~~~~            ^

The only way I have found to code this without complaint from clang is this:

AdjustRange range {r.offset, incr};
adjustRanges.push_back(range);

Godbolt reveals that the generated code is substantially the same at -O2 for push_back() and emplace_back().

591

The comment is redundant and provides no insight, so I removed it.

lld/ELF/InputSection.cpp
186

Adding a map to InputFile is unnecessary, especially for something specific to one target.

192

For this algorithm, retaining the original sequence for symbols at the same offset is irrelevant.

236

You are correct that the condition was wrong. This is an arch-independent loop that is responsible only for moving code that falls outside the adjust ranges. Arch-dependent code is responsible for filling contents of expanded adjust ranges. See fillAlignNops() in Arch/RISCV.cpp.

It is not yet possible to test expanding adjust ranges. The first pass of relaxOnce() will only shrink ranges. Only on a subsequent pass will we ever grow an alignment pad, or undo a call/jump/load/store/arithmetic relaxation. Alignment alone without other relaxations will always complete in one pass.

lld/test/ELF/riscv-relax-align-rvc.s
39

How many .balign directives do you think are necessary? I think the important case is where moved bytes overlap. Since ranges can only shrink in this diff, we can only test for src0 < dest+length < srcN, but not for src0 < dest < srcN.

Note that in libc++, the implementations of std::copy() and std::copy_backward() both call __builtin_memmove(), which is bidirectionally safe and works no matter how the regions overlap.

gkm updated this revision to Diff 431274.May 22 2022, 4:56 PM
gkm marked 7 inline comments as done.
  • revise according to review feedback

While I think there are still significant problems needing to address, it may work with quite a few application. It'd be nice if folks can check how this patch works without -mno-relax.

My testing hasn't revealed any problems so far. I have looked at the code but I haven't yet evaluated it as carefully as I would like.

lld/ELF/Arch/RISCV.cpp
533–535

Please describe in the comment what you mean by a corrupted NOP sequence.

542

add static?

571–573

What expectations should we have about the convergence? Let's write some of that down. Either in this comment or in some place that provides a more global overview.

585

Nit: remove the user from the TODO.

lld/test/ELF/riscv-relax-align-rvc.s
11

Typo: "the after".

gkm updated this revision to Diff 432412.May 26 2022, 4:33 PM
gkm marked 5 inline comments as done.
  • revise according to review feedback
gkm updated this revision to Diff 432434.May 26 2022, 6:25 PM
  • restructure NOP-rewriter loop in fillAdjustGaps()
gkm updated this revision to Diff 432472.May 26 2022, 11:42 PM

Expand comments before relaxOnce() and fillAdjustGaps()

gkm updated this revision to Diff 432619.May 27 2022, 12:49 PM
  • Cleanup & improve commentary in lld/test/ELF/riscv-relax-syms.s
gkm updated this revision to Diff 432624.May 27 2022, 1:29 PM
  • fix indentation, as caught by clang-format
luismarques added inline comments.May 31 2022, 3:10 AM
lld/ELF/Arch/RISCV.cpp
539–540

Sorry, where is this done?

lld/ELF/InputSection.cpp
176

This was changed but the tests didn't change. Sounds like an opportunity to improve test coverage?

gkm marked an inline comment as done.Jun 1 2022, 12:23 PM
gkm added inline comments.
lld/ELF/Arch/RISCV.cpp
539–540

I should modify the comment because it pertains to non-alignment relaxation types that are implemented in subsequent diffs.

571–573

I will elaborate on that in later diffs. Alignment alone finishes in one pass.

lld/ELF/InputSection.cpp
176

The change is an optimization, and does not alter behavior, so no new test is necessary. Before, I was doing unnecessary work to store symbol boundaries for non-executable sections which would never be relaxed.

gkm updated this revision to Diff 434622.Jun 6 2022, 2:57 PM
gkm marked an inline comment as done.
  • revise comments according to review feedback
gkm updated this revision to Diff 434677.Jun 6 2022, 7:48 PM
  • rework AlignAddend as a union
MaskRay added inline comments.Jun 11 2022, 9:52 AM
lld/ELF/Arch/RISCV.cpp
40

My point is that I don't think other ports will do similar things. Rename this to relaxSections to make debugging easier. We can rename to finalizeSections if there is ever a need.

lld/test/ELF/riscv-relax-align.s
40

Not done? As mentioned, this test only has 12-byte nops in one place. This is not strong enough to catch issues.
We need multiple .blign in this section and .blign in other text sections.

MaskRay added inline comments.Jun 11 2022, 9:59 AM
lld/ELF/Arch/RISCV.cpp
511

This accesses an inactive member of the union and is UB. https://en.cppreference.com/w/cpp/language/union
You can drop union, replace u64 with a member function which returns a uint64_t instead.

Using just one buffer with the copy_backward usage is unsafe.

Consider a case when all AdjustRange::delta values are positive (inflating), I feel that one iteration may overwrite some bytes which will be used later.

Currently the content move is done in InputSectionBase::adjustRanges MutableArrayRef<uint8_t> buf = this->mutableData();. So every relax iteration shuffles the content.
I think a better way is to do the shuffle only once, after the relaxation finishes.

lld/ELF/InputSection.cpp
164
183

stable_sort

197

Delete since the caller has checked emptiness.

209

The invariant on ranges is checked by symbolAddrs. Ranges after the end of symbolAddrs is not tested.
I'd remove all assert in the for loop.

if (i > 0)
  // An AdjustRange should not span a symbol start/end offset.
  assert(!ranges[i - 1].contains(sa.offset));

below is sufficient.

If there is a need to check

are all disjoint, i.e., do not overlap; and ...

Use a separate loop which calls fatal and do this after the ranges object is constructed.

234
245

Using just one buffer with the copy_backward usage is unsafe.

Consider a case when all AdjustRange::delta values are positive (inflating), I feel that one iteration may overwrite some bytes which will be used later.

MaskRay added a comment.EditedJun 11 2022, 11:19 AM

I will try to rewrite R_RISCV_ALIGN, but I will probably not spend too much time on other relocation types (you can add them :) )

I feel uneasy with this patch series mainly for two things (a) intrusive changes to generic code (this could be fixed) (b) deleting bytes and undo seem hacky.
There is a minor issue that st_size computation is incorrect for the Linux kernel which can be fixed.
Another issue is with

while (relaxOnce(sectionSymbolAddrs))
   ;

being separate from the finalizeAddressDependentContent loop. Some linker scripts can cause the relax to be re-done. I think the only waterproof approach is to place relaxation into finalizeAddressDependentContent.

I created D127611/D127581 as an alternative implementation for alignment and call relaxation. It only performs "deleting bytes" after convergence.

gkm abandoned this revision.Jun 26 2022, 10:03 PM
lld/test/ELF/riscv-relax-syms.s