This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Allow multiple executable load segments.
ClosedPublic

Authored by hoy on May 26 2021, 9:09 AM.

Details

Summary

The linker or post-link optimizer can create an ELF image with multiple executable segments each of which will be loaded separately at run time. This breaks the assumption of llvm-profgen that currently only supports one base load address. What it ends up with is that the subsequent mmap events will be treated as an overwrite of the first mmap event which will in turn screw up address mapping. While it is non-trivial to support multiple separate load addresses and given that on x64 those segments will always be loaded at consecutive addresses (though via separate mmap
sys calls), I'm adding an error checking logic to bail out if that's violated and keep using a single load address which is the address of the first executable segment.

Also changing the disassembly output from printing section offset to printing the virtual address instead, which matches the behavior of objdump.

Diff Detail

Event Timeline

hoy created this revision.May 26 2021, 9:09 AM
hoy requested review of this revision.May 26 2021, 9:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 26 2021, 9:09 AM
hoy edited the summary of this revision. (Show Details)May 26 2021, 9:11 AM
hoy added reviewers: wenlei, wlei, wmi.
wlei added inline comments.Jun 1 2021, 12:36 PM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
149

Could you teach me on the difference of Phdr.p_vaddr and Phdr.p_offset? why the MMP event's offset is supposed to be equal to the first Phdr.p_offset?

hoy added inline comments.Jun 1 2021, 12:58 PM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
149

p_vaddr stands for the preferred base virtual address of the segment. p_offset is the offset of the segment from the start of the image. Mostly p_vaddr - vaddr of first segment == p_offset. However, they are sections that do not take physical space in an image, such as the .BSS section which contains zero values for some globals, but they will have an allocated space at runtime. Sections after .BSS will have a mismatched vaddr and offset.

hoy updated this revision to Diff 358003.Jul 12 2021, 11:20 AM

Supporting single load segement multiple mmap events.

wenlei added inline comments.Jul 13 2021, 5:53 PM
llvm/tools/llvm-profgen/PerfReader.cpp
332–334

This could be simplified if we store ExeSegmentOffsets as a std::set<uint64_t>, and let getExeSegmentOffsets return std::set<uint64_t>&?

349

another nit: we named MMapEvent::BaseAddress based on the assumption of single text segment and single load, in which case the address is always the base load address of the binary. now it turns out that's not always the case, perhaps Event::MappedAddress is more appropriate?

llvm/tools/llvm-profgen/ProfiledBinary.h
105

nit: I think PreferredBaseAddress is generally about binary load address. Here perhaps we should call it PreferredTextSegmentAddresses? Same for helpers.

107

ExeSegment -> TextSegment?

wenlei added inline comments.Jul 13 2021, 5:55 PM
llvm/tools/llvm-profgen/PerfReader.cpp
332–334

nvm, I see that you need the range search to find the containing segment later, in which case set won't work.

hoy added inline comments.Jul 13 2021, 6:05 PM
llvm/tools/llvm-profgen/PerfReader.cpp
349

Event::MappedAddress sounds good to me, or how about name it just Address?

struct MMapEvent {
  uint64_t PID = 0;
  uint64_t BaseAddress = 0;
  uint64_t Size = 0;
  uint64_t Offset = 0;
  StringRef BinaryPath;
};
llvm/tools/llvm-profgen/ProfiledBinary.h
105

PreferredTextSegmentAddresses sounds better.

107

Sounds good, thanks.

wenlei added inline comments.Jul 13 2021, 6:14 PM
llvm/tools/llvm-profgen/PerfReader.cpp
349

Address is good too.

hoy updated this revision to Diff 358474.Jul 13 2021, 6:19 PM

Addressing Wenlei's comments.

This revision was not accepted when it landed; it landed in state Needs Review.Jul 13 2021, 6:23 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.

one more place needs a name change, otherwise lgtm, thanks.

llvm/tools/llvm-profgen/PerfReader.cpp
668

Change BASE_ADDRESS to ADDRESS or MMAPPED_ADDRESS too?

hoy added a comment.Jul 13 2021, 6:25 PM

Sorry

llvm/tools/llvm-profgen/PerfReader.cpp
668

Sorry, landed this too fast. Meant to land the other patch. Good catch, will make a follow-up commit.

llvm/test/tools/llvm-profgen/Inputs/symbolize.ll