This is an archive of the discontinued LLVM Phabricator instance.

[profile] Add binary ids into indexed profiles
ClosedPublic

Authored by gulfem on Oct 13 2022, 6:41 PM.

Details

Summary

This patch adds support for including binary ids in an indexed profile.
It adds a new field into the header that points to the offset of the
binary id section. The binary id section consists of a size of the
section, and a list of binary ids (if they are present) that consist
of two parts: length and data.

This patch guarantees that indexed profile is backwards compatible
after adding binary ids.

Diff Detail

Event Timeline

gulfem created this revision.Oct 13 2022, 6:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 6:41 PM
gulfem requested review of this revision.Oct 13 2022, 6:41 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 13 2022, 6:41 PM
Herald added subscribers: llvm-commits, Restricted Project. · View Herald Transcript
mysterymath added inline comments.Oct 14 2022, 1:42 PM
llvm/include/llvm/ProfileData/InstrProfReader.h
102

Can we use ArrayRef<uint8_t> in lieu of std::pair<uint64_t, const uint8_t *>? Lengths larger than size_t won't work on 32-bit platforms any, since the data is in memory.
This would allow us to use the convenience ArrayRef internally, then convert to 64-bit sizes at the (de)serialization edge.

llvm/lib/ProfileData/InstrProfReader.cpp
78

nit: This is alignToPowerOf2 in MathExtras.h.

llvm/lib/ProfileData/InstrProfWriter.cpp
379–383

nit: 'HashOffset', 'MemProfOffset', and 'BinaryIdOffset'.

500

It's a bit odd that binary ID parsing logic is inlined here into the writer.
It seems like this should be a responsibility of the reader, such that it would present and interface for iterating over each binary ID semantically.

gulfem updated this revision to Diff 469303.EditedOct 20 2022, 11:21 AM

Provide readBinaryIds() function and address feedback.

gulfem updated this revision to Diff 469424.Oct 20 2022, 5:23 PM

Fixed the issue about error reporting in llvm-profdata that caused invalid text format tests failed.

mysterymath added inline comments.Oct 21 2022, 11:29 AM
llvm/include/llvm/ProfileData/InstrProf.h
1053

nit: IDs

llvm/lib/ProfileData/InstrProfReader.cpp
114

nit: Just 8. The comment says "aligned to 8 bytes", so that's the origin of 8, rather than it being the size of a uint64_t.

llvm/lib/ProfileData/InstrProfWriter.cpp
284

ArrayRef<llvm:object::BuildIDRef>; that way, any vector-like collection can be used here. Also saves a copy.

Also, just a heads up, but there looks to be a reliable prof-data-related crash in the Windows presubmit tests.

gulfem updated this revision to Diff 470857.Oct 26 2022, 10:23 AM
  1. Assign 0 for BinaryIdsSize by default to fix the llvm-profdata

test failures happened in Windows, where the indexed profiles are
generated with an earlier version of indexed profile.

  1. Addressed further feedback
gulfem marked 6 inline comments as done.Oct 26 2022, 10:30 AM
gulfem added inline comments.
llvm/include/llvm/ProfileData/InstrProf.h
1053

I consistently use Id instead of an ID throughout the implementation.

llvm/lib/ProfileData/InstrProfWriter.cpp
284

I overlooked this, and thanks for pointing that out. It should at least be a reference. I changed that to ArrayRef to make it more flexible.

LGTM from my POV (general build IDisms, usability for debuginfod).

llvm/lib/ProfileData/InstrProfWriter.cpp
285–286

llvm::append_range(BinaryIds, BIs);

gulfem updated this revision to Diff 470879.Oct 26 2022, 11:15 AM
gulfem marked an inline comment as done.

Use llvm::append_range

gulfem added a subscriber: ellis.Oct 26 2022, 11:20 AM

@davidxl, @vsk and @ellis
We are planning to implement support for the debuginfod protocol in llvm-cov, which will be used by the source-based code coverage in Fuchsia.
Supporting build ids in indexed profiles is the first step towards that goal. Could you please take a look at that?

What is the use case for the feature?

ellis added inline comments.Oct 26 2022, 11:50 AM
llvm/tools/llvm-profdata/llvm-profdata.cpp
344

SmallVector<T, 0> seems to be preferred over std::vector<T> even if the number of elements is large.

https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h

gulfem added a comment.EditedOct 26 2022, 4:12 PM

What is the use case for the feature?

Our current coverage pipeline roughly has the following steps:

  1. Read binary-id in each raw profile (.profraw) via llvm-profdata
  2. Fetch all the binaries with the corresponding binary ids
  3. Merge all the raw profiles, and generate an indexed profile (.profdata)
  4. Generate a coverage report via llvm-cov by providing the indexed profile and all the fetched binaries.

We would like to simplify our coverage pipeline by using debuginfod in llvm-cov. When we include binary ids into indexed profiles and start using debuginfod in llvm-cov, our coverage pipeline can look like this:

  1. Merge all the raw profiles, and generate an indexed profile (.profdata)
  2. Generate a coverage report via llvm-cov by providing only an indexed profile. llvm-cov is going to read all the binary ids from an indexed profile, and fetch the associated binaries from the debuginfod server.

We would like to simplify our coverage pipeline by using debuginfod in llvm-cov. When we include binary ids into indexed profiles and start using debuginfod in llvm-cov

Thanks for sharing! Our downstream coverage flow is very similar to yours, and so I suspect we could also benefit from this improvement.

What is the use case for the feature?

Our current coverage pipeline has the following steps:

  1. Read binary-id in each raw profile (.profraw) via using llvm-profdata
  2. Fetch all the binaries with the corresponding binary id
  3. Merge all the raw profiles, and generate an indexed profile (.profdata)
  4. Generate a coverage report via using llvm-cov by providing all the fetched binaries and indexed profile.

We would like to simplify our coverage pipeline by using debuginfod in llvm-cov. When we include binary ids into indexed profiles and start using debuginfod in llvm-cov, our coverage pipeline can look like this:

  1. Merge all the raw profiles, and generate an indexed profile (.profdata)
  2. Generate a coverage report via using llvm-cov by providing only an indexed profile. llvm-cov is going to read all the binary ids from an indexed profile, and fetch the associated binaries from the debuginfod server.

thanks.

We would like to simplify our coverage pipeline by using debuginfod in llvm-cov. When we include binary ids into indexed profiles and start using debuginfod in llvm-cov

Thanks for sharing! Our downstream coverage flow is very similar to yours, and so I suspect we could also benefit from this improvement.

It looks like coverage users share a similar flow, so we expect that this work can be leveraged by other coverage users. We have custom tools/scripts that handle the workflow that I described above in Fuchsia.
If we can move some of the workflow to llvm tools like llvm-cov, the customization is going to be smaller. Please feel free to review the code.

gulfem added inline comments.Oct 31 2022, 4:09 PM
llvm/tools/llvm-profdata/llvm-profdata.cpp
344

The way I interpret Programmers Manual on SmallVector is that it good for the containers that have small number of elements:
"This is good for vectors that are “usually small” (e.g. the number of predecessors/successors of a block is usually less than 8)."
This container might include thousands of elements when all the profiles are merged together while collecting coverage in a big project.

Does anybody have any more feedback, questions or concerns? If not, I'm planning to land this patch.

https://reviews.llvm.org/D136702 is the WIP patch that adds support using debuginfod into llvm-cov, which relies on embedding binary ids into indexed profiles.

ellis accepted this revision.Nov 14 2022, 2:24 PM

Seems good to me! I just have some minor nits.

llvm/lib/ProfileData/InstrProfReader.cpp
98

Can we use endian::readNext() here and remove BI += sizeof(BILen) later?

llvm/lib/ProfileData/InstrProfWriter.cpp
485–486

NIT: Can we combine these two lines into one?

This revision is now accepted and ready to land.Nov 14 2022, 2:24 PM
gulfem updated this revision to Diff 481821.EditedDec 9 2022, 8:23 PM

Fix a few things after testing on Fuchsia and rebase

  1. Merge binary ids when writer contexts from different threads are merged
  2. Remove duplicated binary ids
  3. Use object::BuildID instead of object::BuildIDRef
gulfem marked 2 inline comments as done.Dec 9 2022, 8:24 PM
ellis added inline comments.Dec 12 2022, 10:11 AM
llvm/lib/ProfileData/InstrProfWriter.cpp
298–301

I'm not sure if this early return makes sense here. If we fail to merge the "frame mappings" then does that mean we should also skip merging the binary ids? I think we should either move this to its own function or move merging the binary ids before this.

mysterymath added inline comments.Dec 12 2022, 11:54 AM
llvm/lib/ProfileData/InstrProfWriter.cpp
309

Shouldn't this be reserve(BinaryIds.size() + IPW.BinaryIDs.size())?

gulfem updated this revision to Diff 482351.Dec 12 2022, 8:44 PM

Addressed feedback about merging binary ids.

gulfem marked 2 inline comments as done.Dec 12 2022, 8:47 PM
gulfem added inline comments.
llvm/lib/ProfileData/InstrProfWriter.cpp
298–301

Thanks for pointing that out. I moved it earlier.

This revision was landed with ongoing or failed builds.Dec 14 2022, 12:27 PM
This revision was automatically updated to reflect the committed changes.
gulfem marked an inline comment as done.
gulfem reopened this revision.Dec 28 2022, 5:33 PM
This revision is now accepted and ready to land.Dec 28 2022, 5:33 PM
gulfem updated this revision to Diff 485558.Dec 28 2022, 5:33 PM

There were two issues when I landed the change:

  1. Failed on big endian powerpc64 bot

https://lab.llvm.org/buildbot#builders/231/builds/6229
The issue is that we use the host-endian format in raw profiles,
but we always use little-endian format in indexed profiles.
readBinaryIdsInternal() is shared between raw and indexed profiles,
so I extended the function to take endian argument.

  1. Failed on some profile tests

Some tests use different flags while instrumenting the code, and
compare the generated profiles via using diff. After adding
binary ids into profiles, the generated profiles have different
binary ids when they are compiled with different flags.
So, I updated the tests to compare the llvm-profdata show
result instead of directly comparing the profiles.

gulfem updated this revision to Diff 485572.Dec 28 2022, 8:32 PM

Use Endian instead of Endiannness to be consistent.

This revision was landed with ongoing or failed builds.Dec 29 2022, 10:48 AM
This revision was automatically updated to reflect the committed changes.
mingmingl added a subscriber: mingmingl.EditedOct 27 2023, 4:42 PM

Hi, I noticed that the binary id section is padded so its section size is a power of 2. So the actual bytes written could be approx. 2x of binary id content size (e.g, 17 is padded to 32). May I know what's the motivation? If the section is not padded to a power of 2 (e.g., pad up so the section size is a multiple of 8), instr prof reader could be changed accordingly to read binary id correctly.