This is an archive of the discontinued LLVM Phabricator instance.

[BOLT] RFC: Add golang support
Needs ReviewPublic

Authored by yota9 on Apr 24 2022, 9:52 AM.

Details

Summary

This patch adds golang binaries support built with original golang
compiler (gc). The golang support requers 3 separate optimization
passes:

  • golang-preprocess: This pass must be executed before any changes were

perfomed in BinaryFunctions. Its main purpose is to parse the initial
metadata for golang functions.

  • golang-postprocess: This pass must be executed after all optimizations

were performed (except for the LongJmp) pass. Its main purpose is too
adjust the code to sync it with functions metadata.

  • golang: This pass must be executed after all code changes were

performed, no code changes are allowed by this pass or after it. This
pass only changes golang-specific data sections that contains functions
metadata.

Golang compiler patches:
Aarch64 support needs mapping symbols patch support, the PR was created
here: https://go-review.googlesource.com/c/go/+/343150 . The patches for
the gc supported versions might be found here:
https://github.com/yota9/golang_aarch64_mapping_symbols . Apply the
patch for your gc version and rebuilt the compiler.

Golang binaries compilation:
The golang default linker does not support emit-relocs option, the
external linker must be used for that purpose. The compilation command
examples:

  • x86_64 (pie): go build -a -buildmode=pie -ldflags='-linkmode=external -extld=gcc -extldflags "-fuse-ld=gold -fPIE -pie -Wl,--threads -Wl,--emit-relocs -Wl,--compress-debug-sections=none"
  • x86_64 (no-pie): go build -a -buildmode=exe -ldflags='-linkmode=external -extld=gcc -extldflags "-fuse-ld=gold -no-pie -Wl,--threads -Wl,--emit-relocs -Wl,--compress-debug-sections=none"
  • aarch64 (non-pie): go build -a -ldflags='-linkmode=external -extld=gcc -extldflags "-no-pie -Wl,--emit-relocs -Wl,--compress-debug-sections=none"
NOTE: The aarch64 ld and gold linkers currently has some bugs related to PIE binaries. The ld emits wrong addresses in data sections, which does not affect runtime due to dynamic relocations, but affects our binary data extractions. The gold linker emits static relocations on the wrong offsets.

Restrictions:

  • The golang support is tightly connected to the gc version, so we only

can optimize supported by BOLT gc versions compiled binaries.

  • The inlining, frame optimization, split functions, lite mode

optimizations are unsupported.

Results:
Up to 19% of performance improvements were received on internal services.

NOTE: Golang binaries are compatible with instrumentation!

Testing:
TBD, probably a few binaries would be add in bolt-tests repo for both
x86 and aarch64 platforms.

Golang fixes related to support:
https://go-review.googlesource.com/c/go/+/396797
https://go-review.googlesource.com/c/go/+/380894
https://go-review.googlesource.com/c/go/+/334789

AArch64 gold linker veneer fix:
https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=cde010e1a866e67b7e895cbcb95dedd3de0a1e56;hp=a686428a8bb4cc6a32a976e36f1ec59aa5a9f58e

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Diff Detail

Event Timeline

yota9 created this revision.Apr 24 2022, 9:52 AM
yota9 requested review of this revision.Apr 24 2022, 9:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2022, 9:52 AM

Some places are explained and marked by me that needs extra attention or some suggestions for re-implemetation

bolt/include/bolt/Core/BinarySection.h
447

The mechanism of in-place data section change should be re-implemented, your suggestions are welcomed :)

bolt/lib/Core/BinaryBasicBlock.cpp
638

We need this, since some of the instructions annotations might be "locked". E.g. when dealing with gc inlined functions, we might inline only one instruction (e.g NOP) which will be removed by bolt. But this instruction has inline metadata that we need to have in order to restore inlining golang tables. So before removing such instruction we need to create NOP instruction on it's place that will store this metadata.

bolt/lib/Core/BinaryEmitter.cpp
1169

The extra sections creation and emission mechanism should be re-implemented, your suggestions are welcomed :)

bolt/lib/Passes/Golang.cpp
1041

The mmap is used since at this point we are unable to tell the final section size, we are allocating 8 times more of original space, which is more than enough. Since mmap allocates pages lazily we don't have extra memory consumption. However we probalbe should use ordinary new here

bolt/lib/Passes/Instrumentation.cpp
188

The golang has a sp bias table for each function. Since instrumentation uses the stack we need to mark inserted instructions so later in golang-postprocess and golang pass we will handle these instructions specially

yota9 updated this revision to Diff 424791.Apr 24 2022, 10:27 AM

clang format warn

This is a large patch.

Do you have plans to add Go support to llvm-objdump for testing?

ayermolo added a comment.EditedApr 24 2022, 11:42 AM

Tests?
More specifically. Would it be possible to add unit tests, and not just binaries to bolt-tests?

yota9 added a comment.EditedApr 24 2022, 11:53 AM

@tschuett What do you mean by add go to llvm-objdump?
@ayermolo 'm not sure if I will be able to cover it compeletely or even all corner cases, probably only basic moments, but I will try to. As a first step I suggest to discuss technical point of the patch, there are some places that surely should be redone, I need some suggestions on this :) I don't think it will be fast process due to the patch size, my personal workload & etc,, but during that time I will try to add some test cases :)
UPD Could somebody please point me out what is the problem with git-clang-format stage? Locally It does not show any problem and I don't see any comments in the diff too, no idea what it is complaining about :(

Using llvm-objdump to read properties of Go binaries.

@tschuett please checkout the answer above, I'm not sure if i will use llvm-objdump or will use other methods right now.

bolt/lib/Passes/BinaryPasses.cpp
1219

The problem here that if we set hot section based on the hasProfile insetad of hasValidIndex we will break LongJmp (tentativeLayoutRelocMode checks index), so we need to remove hasProfile case at all here (I will remove it with the next reupload). AFAIR the problem here is that if the function has profile but don't have exec count (which is not set if the main entry point does not have profile information for some reason) we would and up with split function with invalid index both parts of which will be emitted one by one in cold section. So we need to fix either function reordering alg to set hot index if the function has a profile (not exec count) or add check during profile read so that if the function hasPprofile() set exec count to be at least 1. Looking for your suggestions here too, probably I will separate this part from this commit to another one

Amir added a comment.May 4 2022, 1:45 PM

Vladislav, thank you for adding this support! It's a massive change, so let's make it easier to review. Can you please split this diff up into small isolated chunks?
I would suggest having separate diffs for:

  1. Core interface changes/additions
  2. Instrumentation changes
  3. MCPlusBuilder changes
  4. Passes changes
  5. Base Golang passes/functions/data structures
  6. Interaction between Golang stuff and existing passes
  7. Documentation changes

An example of such a stack would be D124900.

It would be much easier to integrate the work into the codebase with bite-sized diffs.

yota9 updated this revision to Diff 487218.Jan 8 2023, 12:36 PM

Rebase & some updates

yota9 edited the summary of this revision. (Show Details)Jan 8 2023, 12:43 PM
yota9 added a comment.Jan 8 2023, 1:49 PM

Hello @Amir ! As you suggested I'm starting to splitting this patch to the parts. I suggest not to use stack of changes, it would be more convenient for me to post group of changes one-by-one, you can refer to the rest of the changes on this patch. As I understand your idea although smaller changes by it self are not very useful we still can commit them one by one due to the size of this patch. So the first change was prepared for core: https://reviews.llvm.org/D141234 . Thank you!

yota9 updated this revision to Diff 507371.Mar 22 2023, 8:40 AM

rebase, use new instead of mmap

yota9 updated this revision to Diff 507388.Mar 22 2023, 9:16 AM

Removs BS isChanged

yota9 updated this revision to Diff 507402.Mar 22 2023, 9:37 AM

fix tests

yota9 updated this revision to Diff 507492.Mar 22 2023, 1:54 PM

remove IsChanged