Page MenuHomePhabricator

Propeller: LLD Support for Basic Block Sections
Needs ReviewPublic

Authored by tmsriram on Sep 25 2019, 5:58 PM.

Details

Summary

LLD Support for Basic Block Sections

This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf

This is one in the series of patches for Propeller.

This patch adds lld support for basic block sections and performs relaxations after the basic blocks have been reordered.

After the linker has reordered the basic block sections according to the desired sequence, it runs a relaxation pass to optimize jump instructions. Currently, the compiler emits the long form of all jump instructions . AMD64 ISA supports variants of jump instructions with one byte offset or a four byte offset. The compiler generates jump instructions with R_X86_64 32-bit PC relative relocations. We would like to use a new relocation type for these jump instructions as it makes it easy and accurate while relaxing these instructions.

A new relocation type for jmp instructions which need to be relaxed makes it easy and accurate when the linker tries to find these instructions. Our current method peeks back to look at the opcode of the relocation type that could correspond to jmp instructions.

The relaxation pass does three things:

First, it deletes all explicit fall-through direct jump instructions between adjacent basic blocks. This is done by discarding the tail of the basic block section.

Second, it shrinks jump instructions whose offsets can fit in a smaller equivalent jump instruction. The AMD64 ISA supports variants of jump instructions with either a one byte offset or a four byte offset. Shorter jump instructions are preferred where possible to reduce code size. The jump instructions are shrunk by using jump relocations. Jump relocations are used to modify the opcode of the jump instruction. Jump relocations contain three values, instruction offset, jump type and size. There is a one to one correspondence between jump relocations and jump instructions. While writing this jump instruction out to the final binary, the linker uses the jump relocation to determine the opcode and the size of the modified jump instruction. These new relocations are required because the input object files are memory-mapped without write permissions and directly modifying the object files requires copying these sections. Copying a large number of basic block sections significantly bloats memory and we have invented jump relocations as a simple solution to avoid this bloat.

Third, the relaxation pass replaces a two jump instruction sequence, JX and JY , with just one jump instruction if JX is a conditional branch whose target is the adjacent basic block and JY is a direct branch. This can be done by replacing the two jump instructions with a single jump instruction which has as the inverse op-code of JX as its op-code, and the target of JY as its branch target. This change is also made using a jump relocation at the appropriate code offset in that section.

Diff Detail

Event Timeline

tmsriram created this revision.Sep 25 2019, 5:58 PM

Thank you and others who have contributed to the efforts! I hope this can fill some gap of the missing optimization opportunities by PGO.

I've seen two patches posted for lld now, this patch and D68062. To make it easier to review, please make them organized by "Edit Related Revisions..."
Some of comments in D68062 are applicable to this patch as well. Use isec for IS variables, osec for OS variables, etc.

There is also a question whether we should make --optimize-bb-jumps the default. I'd suggest defaulting to --no-optimize-bb-jumps. The various tests updated with --no-optimize-bb-jumps is a concern to me. Non-propeller users probably don't want their instructions unexpectedly rewritten by the linker.

ruiu added a comment.Sep 25 2019, 10:18 PM

Some random comments...

lld/ELF/Arch/X86_64.cpp
154

nit: Return I here and add an llvm_unreachable after the loop, so that we can catch an impossible condition at runtime.

191–192

If you define new set of relocation types for bb sections, you can make it guaranteed that the relocations always point to the beginning of a section (because if there's a jump instruction jumping to a middle of a bb section, it is not really a BB).

lld/ELF/OutputSections.cpp
346

This SpecialFiller is always NOP instructions, so how about adding just a boolean flag (e.g. isec->fallthrough) to an input section and remove this SpecialFiller vector?

lld/ELF/Writer.cpp
1740

Please add a comment as to in what condition this shrink sections too much. If it is monotonically decreasing and alignments are all the same, I think this condition should never occur.

1747

Is growing different from undoing? I'm not quite sure why you can't simply revert all changes we made to a section if we have to grow it.

ruiu added a comment.Sep 25 2019, 10:45 PM

Please run clang-format-diff on this patch.

MaskRay added a subscriber: PkmX.Sep 26 2019, 12:22 AM

The shrinking section nature may be similar to R_RISCV_RELAX as used by RISC-V. The RISC-V port of binutils ld.bfd has a similar routine to change section sizes and update symbol values. @PkmX and I haven't implemented the relaxation in the lld RISC-V work. This patch inspires me to explore more on the possibility to implement section shrinking and make it general. I don't like the binutils implementation because it is quadratic about several variables.

lld/ELF/Arch/X86_64.cpp
511

Delete ()

513

Delete superfluous {} and blank lines.

564

Delete if (BytesGrown)

852

loc[-1]

882

llvm_unreachable if this is unreachable

lld/ELF/OutputSections.cpp
257

at -> operator[]

at has a bound checking overhead.

258

failed ... and no full stop

lld/ELF/SyntheticSections.h
589

unordered_map if the ordered property is not needed.

lld/ELF/Target.h
96

If growJmpInsn is X86 specific. Can it be moved to Arch/X86_64.cpp?

lld/ELF/Writer.cpp
49

Delete these.

559

Delete unneeded comments.

644

Delete the blank line.

646

Delete surrounding {}

652

return condition

rather than

if condition

return true

else

return false
705

Use vectors and move them outside the loop.

732

Delete {}
auto -> Symbol

1653

auto->SectionBase

1663

This is a dangerous operation. Are you arbitrarily changing st_value/st_size? In what conditions can this be triggered?

1685

optimizeBasicBlockJumps should be placed in finalizeAddressDependentContent to accommodate thunks and linker script symbol changes.

1686

Why !ELFT::Is64Bits

1693

vector<bool> is thread hostile because it is represented as a bit vector.

See SyntheticSections.cpp:createSymbols for an example how to parallel correctly.

1708

MaxIt is not needed.

MaxAlign = 0;
if !config->shrinkJumpsAggressively && !sections.empty()
  MaxAlign = max_element(...)->alignment;
1715

More comments on how sections are shrunk.

1717

vector<bool> is thread hostile.

1724

Supefluous ()

1732

Magic 4 here seems very x86 specific.

1747

Another related question: why does shrinkJumpsAggressively call growJmpInsn?

1769

Try not adding another field InputSectionBase::Trimmed. Instead, moving it here.

2118

Delete unrelated comments.

PkmX added a comment.Sep 26 2019, 2:41 AM

The shrinking section nature may be similar to R_RISCV_RELAX as used by RISC-V. The RISC-V port of binutils ld.bfd has a similar routine to change section sizes and update symbol values. @PkmX and I haven't implemented the relaxation in the lld RISC-V work. This patch inspires me to explore more on the possibility to implement section shrinking and make it general. I don't like the binutils implementation because it is quadratic about several variables.

RISC-V relaxation would require a more complicated framework since relaxation can happen in the middle of sections, but it is still possible to do it in a mostly linear fashion.


About the patch itself: There are no tests?

lld/ELF/SyntheticSections.cpp
2061

Duplicate assert.

lld/ELF/Target.h
87

Document the meaning of the bool returned.

lld/ELF/Writer.cpp
1647

I don't think you need to loop over all input files, just files with sections that have been shrunk.

1663

I think having symbols in the part of section that is shrunk should just be an error, or at least it should clamp them to the end of a section.

1738

Let the user specify how many rounds to run? I suppose the last few rounds may have marginal benefits.

ruiu added a comment.Sep 26 2019, 2:55 AM

The shrinking section nature may be similar to R_RISCV_RELAX as used by RISC-V. The RISC-V port of binutils ld.bfd has a similar routine to change section sizes and update symbol values. @PkmX and I haven't implemented the relaxation in the lld RISC-V work. This patch inspires me to explore more on the possibility to implement section shrinking and make it general. I don't like the binutils implementation because it is quadratic about several variables.

RISC-V relaxation would require a more complicated framework since relaxation can happen in the middle of sections, but it is still possible to do it in a mostly linear fashion.

Does that happen even in a middle of a basic block?

tmsriram updated this revision to Diff 222510.Sep 30 2019, 3:49 PM

Add new tests for the relaxation code.

We forgot to add these tests earlier, doing it now.