This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
compiler-rt/
-
include/profile/
-
profile/
3/6
InstrProfData.inc
-
lib/profile/
-
profile/
1
InstrProfiling.h
7/10
InstrProfilingPlatformLinux.c
3/7
InstrProfilingWriter.c
-
llvm/
-
include/llvm/ProfileData/
-
llvm/
-
ProfileData/
-
InstrProfData.inc
3/4
InstrProfReader.h
-
lib/ProfileData/
-
ProfileData/
5/9
InstrProfReader.cpp
-
tools/llvm-profdata/
-
llvm-profdata/
1
llvm-profdata.cpp

Differential D102039

[profile] Add binary id into profiles
ClosedPublic

Authored by gulfem on May 6 2021, 6:01 PM.

Download Raw Diff

Details

Reviewers

phosek
mcgrathr
vsk
davidxl
bogner
aeubanks

Commits

rGe50a38840dc3: [profile] Add binary id into profiles
rGf984ac2715f7: [profile] Add binary id into profiles

Summary

This patch adds binary id into profiles to easily associate binaries
with the corresponding profiles. There is also an RFC that
discusses the motivation, design and implementation in more detail:
https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	20 ms	x64 debian > LLVM.tools/llvm-profdata::c-general.test
	30 ms	x64 debian > LLVM.tools/llvm-profdata::malformed-ptr-to-counter-array.test
	30 ms	x64 debian > LLVM.tools/llvm-profdata::raw-32-bits-be.test
	40 ms	x64 debian > LLVM.tools/llvm-profdata::raw-32-bits-le.test
	30 ms	x64 debian > LLVM.tools/llvm-profdata::raw-64-bits-be.test
		View Full Test Results (15 Failed)

Event Timeline

gulfem created this revision.May 6 2021, 6:01 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptMay 6 2021, 6:01 PM

gulfem requested review of this revision.May 6 2021, 6:01 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 6 2021, 6:01 PM

Herald added subscribers: llvm-commits, Restricted Project. · View Herald Transcript

gulfem added reviewers: phosek, mcgrathr.May 6 2021, 6:02 PM

Harbormaster completed remote builds in B103101: Diff 343548.May 6 2021, 6:15 PM

phosek added inline comments.May 10 2021, 11:06 AM

compiler-rt/include/profile/InstrProfData.inc
132	Instead of `HasBuildId` as a boolean, I think it might be better to use `BuildIdSize` to allow multiple ids. While raw profiles should have either 0 or 1 in practice, indexed profiles created by merging multiple raw profiles could have multiple ids, so using `BuildIdSize` would allow using the same header format for both raw and indexed format.
compiler-rt/lib/profile/InstrProfilingWriter.c
36	I'm not sure if we should be using the term `BuildId` since it can be interpreted as ELF-specific. We want this to generalize to other binary formats like Mach-O and COFF. In Mach-O, it's called `LC_UUID`. In COFF, it's usually referred to as GUID. When searching online, I noticed that lld also build ID for COFF, see D36758, but I'm not sure if that's official name or if they just copied the name from the ELF implementation. So one option would be to just use `BuildId` but make it clear that it's a generic term. Another option would be a new name, for example `BinaryId`.
263	This function implements ELF-specific logic so it cannot be in `InstrProfilingWriter.c` which is platform agnostic. It should be `InstrProfilingPlatformLinux.c` and we'll need COFF and Mach-O equivalents in `InstrProfilingPlatformWindows.c` and `InstrProfilingPlatformDarwin.c` respectively (which could be empty in the initial implementation).
291	I think that we should avoid allocating the `BuildId` struct on the heap which could cause an issue when allocator itself is instrumented. Two solutions I can think of is to either allocate the struct as static (it's 8-16 bytes which is not so bad) or alternatively this function could use `ProfDataWriter` to directly write out the struct in which case it could be allocated on stack.

Move elf specific details into InstrProfilingPlatformLinux.c
Remove build id struct
Use binary id instead of build id

gulfem retitled this revision from [profile] WIP Add build id into profiles to [profile] WIP Add binary id into profiles.May 11 2021, 10:37 AM

gulfem edited the summary of this revision. (Show Details)

gulfem marked an inline comment as done.May 11 2021, 10:46 AM

gulfem added inline comments.

compiler-rt/include/profile/InstrProfData.inc
132	Can a raw profile with build id and another raw profile without a build id be merged and create indexed profiled?
compiler-rt/lib/profile/InstrProfilingWriter.c
36	I renamed it to `binary id`, and I will briefly explain other ids used in other platforms in the RFC.
291	It seems like ValueProfiling already uses heap allocation `compiler-rt/lib/profile/InstrProfilingValue.c`, but we don't really need heap allocation for build id. So, I removed build id struct. Please let me know what you think about the new implementation.

Harbormaster completed remote builds in B103789: Diff 344474.May 11 2021, 11:59 AM

gulfem added inline comments.May 12 2021, 6:18 PM

compiler-rt/include/profile/InstrProfData.inc
132	@phosek, it seems to me like `raw` and `indexed` profile do not share the same profile format. For ex, `raw` profile header is defined in: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ProfileData/InstrProfData.inc#L130 Whereas, `indexed` profile header is defined in: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ProfileData/InstrProf.h#L999 I think, we might need to extend both formats with binary id then. Please let me know if I'm missing anything.

phosek added inline comments.May 12 2021, 11:33 PM

compiler-rt/include/profile/InstrProfData.inc
132	Yes, looks like we'll need to extend both formats but I still think it'd be preferable to support arbitrary number of binary id's in both formats since at least in ELF, you can have more than one build ID. The format of should also be the same in both the raw and indexed format, that is a sequence of tuples where each tuple is a length and a sequence of bytes of that length.
compiler-rt/lib/profile/InstrProfiling.h
101	I'd slightly prefer returning both the data and the size as output parameters for consistency. We might also consider having a boolean/int return argument to signal an error.
compiler-rt/lib/profile/InstrProfilingWriter.c
287–288	You shouldn't need an extra variable for the pointer.
291	That's true but it's also possible to allocate counter for value profiling statically if you use `-mllvm -vp-static-alloc` in which case you can avoid `malloc` which may be desirable on some platforms (this is what we would likely use on Fuchsia for example).
llvm/tools/llvm-profdata/llvm-profdata.cpp
2227	I think we'll want this functionality even in the final version but we should introduce additional flag, for example `-show-binary-id`, and only print the binary id if that flag is set.

Support multiple binary ids
Extend indexed profile format to include binary ids
Add tests

Harbormaster completed remote builds in B106577: Diff 348352.May 27 2021, 12:55 PM

Use clang-format to correctly format the code

Harbormaster completed remote builds in B106628: Diff 348420.May 27 2021, 6:20 PM

mcgrathr added inline comments.Jun 1 2021, 6:26 PM

compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
100	This is insufficient. The note "name" determines the space of n_type values. You must verify that the n_namesz=4 and the name bytes are "GNU\0" as well, or else this is some unrelated note that happens to use n_type=3 where 3 does not mean NT_GNU_BUILD_ID.
103	This arithmetic is not right since Note has a type with sizeof>1. An example of correct arithmetic is: Note = (const ElfW(Nhdr))((const char )(Note + 1) + RoundUp(Note->n_namesz, 4) + RoundUp(Note->n_descsz, 4));
118	Can you abstract this logic into a subroutine so the code is not repeated in the two functions? A simple approach would be a subroutine that takes a maybe-null ProfDataWriter pointer and when called with a null pointer just returns the size instead of calling the writer.
compiler-rt/test/profile/binary-id.c
1 ↗	(On Diff #348420)	You may need to pass explicit -Wl,--build-id=none through the compiler here since it's sometimes on by default.
llvm/include/llvm/ProfileData/InstrProfReader.h
85	Is this the number of distinct IDs, or the byte size of the total block? (AIUI the total block is a sequence of {uint64_t n, uint8_t[n]} blocks.) If it's the number of distinct IDs, then I don't understand how a single uint8_t* from getBinaryIds() is meant to be used. For a number like that, I think Count is a clearer name to use than Size.
91	Should these really be two separate methods? Or should it just be one method that returns both the size and the data in a span-style data structure?
491	I don't understand the commented-out declaration.
llvm/lib/ProfileData/InstrProfReader.cpp
542	This is wildly susceptible to bad data.

Addressed Roland's comments

gulfem marked 4 inline comments as done.Jun 4 2021, 4:14 PM

gulfem added inline comments.

compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
103	My understanding is that each note in notes section starts with a `struct` which includes `n_namesz`, `n_descsz`, and `n_type` members. It is followed by the name (whose length is defined in `n_namesz`) and then by the descriptor (whose length is defined in `n_descsz`). So, type is already part of the struct. The arithmetic I have increases Note by struct size, and name data, and descriptor data. Am I missing anything?
llvm/include/llvm/ProfileData/InstrProfReader.h
85	Is this the number of distinct IDs, or the byte size of the total block? It is the number of binary IDs. That file consistently uses `Size` to refer to number of elements like `DataSize`, `CounterSize`, `NamesSize`, etc. I don't like that either, but I just used `Size` to be consistent with the rest of the implementation. I think `Count` or `NumOf` is more self-explanatory.
llvm/lib/ProfileData/InstrProfReader.cpp
542	This is wildly susceptible to bad data.
542	How can I improve that? I need to somehow increment the pointer by binary id length.

gulfem marked an inline comment as done.Jun 4 2021, 4:22 PM

Harbormaster completed remote builds in B107758: Diff 349985.Jun 4 2021, 5:07 PM

phosek added inline comments.Jun 14 2021, 1:25 AM

compiler-rt/include/profile/InstrProfData.inc
132	I'd prefer to put this field at the end of the header to match the position of binary ID in the profile.
compiler-rt/lib/profile/InstrProfilingInternal.h
206 ↗	(On Diff #349985)	This name is a bit of a mouthful. I'd consider calling it just `__llvm_write_binary_ids` and say in the comment that the function always returns the number of binary ids even if writer is `NULL`.
compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
103	I think we should be returning the number of binary ids in every case, even if `writer != NULL`.
103	When you increment a pointer, the address is moved by the size of the underlying type: char* c = ...; c++; // c increments by 1 == sizeof(char) uint64_t* l = ...; l++; // l increments by 8 == sizeof(uint64_t) In this case `sizeof(Note) > 1` so you first need to cast it to `const char *`.
llvm/lib/ProfileData/InstrProfReader.cpp
518	Where is this deallocated? It seems like this memory gets leaked. `BinaryIdTy` is just a plain struct with just two 8 byte fields (on a 64-bit machine), can we avoid allocating it on the heap and return/pass it by value to simplify lifetime management?
542	I think at minimum, you need to check that you aren't reading past the end of the file in the case the length is incorrect.

Addressed Petr's comments

gulfem marked 4 inline comments as done.Jun 16 2021, 4:06 PM

gulfem added inline comments.

llvm/lib/ProfileData/InstrProfReader.cpp
518	I used `unique_ptr`, so memory will be released after BinaryIdTy goes out of scope.

gulfem marked 2 inline comments as done.Jun 16 2021, 4:07 PM

gulfem edited the summary of this revision. (Show Details)Jun 16 2021, 4:11 PM

gulfem added reviewers: vsk, davidxl.

phosek added inline comments.Jun 16 2021, 4:12 PM

llvm/lib/ProfileData/InstrProfReader.cpp
518	I'd still prefer passing these by value, heap allocation in this case seems unnecessary and is going to be less efficient.

Here's the link to the RFC:
https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html

Harbormaster completed remote builds in B109613: Diff 352576.Jun 17 2021, 2:08 AM

gulfem retitled this revision from [profile] WIP Add binary id into profiles to [profile] Add binary id into profiles.Jun 24 2021, 2:28 PM

gulfem edited the summary of this revision. (Show Details)

gulfem edited the summary of this revision. (Show Details)Jun 24 2021, 2:32 PM

@vsk @davidxl Do you have any thoughts on this? We've already done a few rounds of reviews, but we'd like to get your opinion as well.

Repeating the same question asked in RFC: the profile data has builtin fine grained matching mechanism. For the top level stamping, is it enough to embed the build id in the file name of the profile data?

In D102039#2841108, @davidxl wrote:

Repeating the same question asked in RFC: the profile data has builtin fine grained matching mechanism. For the top level stamping, is it enough to embed the build id in the file name of the profile data?

Just to clarify: we are suggesting to embed build in the profile file itself, not in the profile file name.
This will also allow us to embed multiple build ids if profiles are merged together.
Can you please elaborate on the builtin fine grained matching mechanism and why do you think it will be a problem?

In D102039#2841108, @davidxl wrote:

Repeating the same question asked in RFC: the profile data has builtin fine grained matching mechanism. For the top level stamping, is it enough to embed the build id in the file name of the profile data?

We need to be able to match the profile back to the corresponding binary when generating coverage. Embedding build ID in filename doesn't work for us because on Fuchsia, most processes don't have access to filesystem (it's a capability that most processes shouldn't need). Rather the profile is exported via IPC as a memory object that doesn't have any name.

There might be a Fuchsia-specific solution, like wrapping the profile in an envelope that contains he build ID, but that solution would be Fuchsia-specific and we thought that making profiles more self-descriptive could be generally useful.

One idea we would like to pursue in the future is implementing support for the debuginfod protocol in LLVM, including llvm-cov, you could then use llvm-cov --instr-profile default.profraw and llvm-cov would fetch both the binary and the source from debuginfod server using the build ID inside the profile. In our case, this would not only simplify our infrastructure but also make it easy for developers to generate coverage reports from profiles fetched directly from our CI to reproduce results.

I am less concerned about raw profile format change (the version still needs to be bumped), but for the indexed format, the version needs to be bumped and it needs to guarantee that format is backward compatible. The patch does not seem to handle it.

Is it necessary to have binary id support in the indexed format?

In D102039#2841389, @davidxl wrote:

I am less concerned about raw profile format change (the version still needs to be bumped), but for the indexed format, the version needs to be bumped and it needs to guarantee that format is backward compatible. The patch does not seem to handle it.

Is it necessary to have binary id support in the indexed format?

Yes, we need to bump the version. There is still some work that needs to be done in this patch like bumping the version, adjusting some of the tests, and adding more tests, etc.
We would like to get early feedback to see whether there is any fundamental issue about the approach.
@phosek can correct me, but I think we need to have binary id support in the indexed format.
We merge raw profiles into indexed profiles before generating source code coverage reports, and in order to associate binaries while generating coverage, we need to have binary id in the indexed profiles.

In D102039#2841703, @gulfem wrote:

In D102039#2841389, @davidxl wrote:

I am less concerned about raw profile format change (the version still needs to be bumped), but for the indexed format, the version needs to be bumped and it needs to guarantee that format is backward compatible. The patch does not seem to handle it.

Is it necessary to have binary id support in the indexed format?

Yes, we need to bump the version. There is still some work that needs to be done in this patch like bumping the version, adjusting some of the tests, and adding more tests, etc.
We would like to get early feedback to see whether there is any fundamental issue about the approach.
@phosek can correct me, but I think we need to have binary id support in the indexed format.
We merge raw profiles into indexed profiles before generating source code coverage reports, and in order to associate binaries while generating coverage, we need to have binary id in the indexed profiles.

Is the merging done in process or offline? Assuming it is offline, it seems possible to use name base approach.

In D102039#2841709, @davidxl wrote:

Is the merging done in process or offline? Assuming it is offline, it seems possible to use name base approach.

It's done offline but since the profiles published via IPC don't have any names, we cannot rely or filenames, we would need to introduce a custom Fuchsia wrapper format (see my previous comment about the "envelope" idea).

We don't necessarily need binary ID in the indexed format, what we could do is:

llvm-profdata show --binary-id a.profraw >a.buildid
# fetch a binary with a.buildid as a.out
llvm-profdata show --binary-id b.profraw >b.buildid
# fetch a binary with b.buildid as b.out
...
llvm-profdata merge -o merged.profdata a.profraw b.profraw ...
llvm-cov show --instr-profile=merged.profdata a.out b.out ...

If we also stored binary IDs inside the indexed format, we could simplify this and do:

llvm-profdata show --binary-ids merged.profdata >merged.buildids
# fetch all binaries

This is more efficient but either would be fine with us for now.

Storing binary IDs inside the indexed profile would become really valuable if/once we have support debuginfod support at which point we should be able to just do:

llvm-profdata merge -o merged.profdata a.profraw b.profraw ...
llvm-cov show --instr-profile=merged.profdata

and llvm-cov would fetch binaries directly from the debuginfod server using binary IDs stored inside the indexed profile. It'll take a while before we have debuginfod support available in LLVM though so this is not critical for now, we're just trying to plan ahead.

Is extending the indexed format a problem? Is there some way to make it less of an issue?

In D102039#2845002, @phosek wrote:
In D102039#2841709, @davidxl wrote:

Is the merging done in process or offline? Assuming it is offline, it seems possible to use name base approach.

It's done offline but since the profiles published via IPC don't have any names, we cannot rely or filenames, we would need to introduce a custom Fuchsia wrapper format (see my previous comment about the "envelope" idea).

We don't necessarily need binary ID in the indexed format, what we could do is:
llvm-profdata show --binary-id a.profraw >a.buildid
# fetch a binary with a.buildid as a.out
llvm-profdata show --binary-id b.profraw >b.buildid
# fetch a binary with b.buildid as b.out
...
llvm-profdata merge -o merged.profdata a.profraw b.profraw ...
llvm-cov show --instr-profile=merged.profdata a.out b.out ...
If we also stored binary IDs inside the indexed format, we could simplify this and do:
llvm-profdata show --binary-ids merged.profdata >merged.buildids
# fetch all binaries
This is more efficient but either would be fine with us for now.

Storing binary IDs inside the indexed profile would become really valuable if/once we have support debuginfod support at which point we should be able to just do:
llvm-profdata merge -o merged.profdata a.profraw b.profraw ...
llvm-cov show --instr-profile=merged.profdata
and llvm-cov would fetch binaries directly from the debuginfod server using binary IDs stored inside the indexed profile. It'll take a while before we have debuginfod support available in LLVM though so this is not critical for now, we're just trying to plan ahead.

I suppose this workflow can also be done with

llvm-cov show --instr-profile=merged.profdata --buildids=<file of list of ids> without the need to fetch the binaries.

Is extending the indexed format a problem? Is there some way to make it less of an issue?

It is not an issue, but I think it is preferable minimize version bumps for indexed format to reduce longer term churns. Given this restriction, it is better to make version change tied to functionalities that can do without a format change. Scripts, wrappers etc. are better for non-essential changes -- especially for workflows that can be mostly automated (which minimizes human inconveniences).

Add binary id between profile header and data
Raw profile version bump
Only add binary id into raw profiles

gulfem added a reviewer: aeubanks.Jul 15 2021, 11:10 AM

gulfem added a reviewer: bogner.Jul 15 2021, 11:13 AM

I'm not familiar enough with this code

In D102039#2880917, @aeubanks wrote:

I'm not familiar enough with this code

@aeubanks I just added you because I modified a test file (corrupted-profile.c), and you are the author of that test file.

that test change seems fine
I remember I copied the structure from another existing test though, somewhere in llvm/test/tools/llvm-profdata IIRC

In D102039#2881096, @aeubanks wrote:

that test change seems fine
I remember I copied the structure from another existing test though, somewhere in llvm/test/tools/llvm-profdata IIRC

Thanks!

Harbormaster completed remote builds in B114300: Diff 359056.Jul 15 2021, 1:06 PM

@davidxl we split the indexed profile change out of this change since it's not necessary for us right now, would it be possible to take a look again?

compiler-rt/include/profile/InstrProfData.inc
141	I'd prefer to move this field just after `Version` before `DataSize` to match the order of data inside the profile.

davidxl added inline comments.Jul 19 2021, 1:08 PM

compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
90	Is it possible to extract this into a common helper function like void ForEachNote(int note_type, .. note_start, .. note_end, ... call_back ) { ... }

Break WriteBinaryIds into multiple functions to increase readibility
Rebase

gulfem marked an inline comment as done.Jul 19 2021, 3:43 PM

gulfem added inline comments.

compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
90	I think your concern was about the readability of that function, so I broke that function into multiple functions to increase readability. Please let me know if you other concerns.

LGTM

llvm/lib/ProfileData/InstrProfReader.cpp
538	When there is no binary ids to be print, perhaps output a line with 'None' instead of empty (which looks like the output is truncated.

This revision is now accepted and ready to land.Jul 19 2021, 3:51 PM

gulfem marked an inline comment as done.Jul 19 2021, 4:12 PM

gulfem added inline comments.

llvm/lib/ProfileData/InstrProfReader.cpp
538	When there is no binary id, we don't print anything at all. At line 517, we check the binary id size, and if there is no binary id, just return without printing anything. if (BinaryIdsSize == 0) return success();

Harbormaster completed remote builds in B114967: Diff 359943.Jul 19 2021, 4:25 PM

Modified some raw profile test files to include the new profile version (version 6) and new header element (BinaryIdsSize).

Harbormaster completed remote builds in B115222: Diff 360315.Jul 20 2021, 6:31 PM

This revision was landed with ongoing or failed builds.Jul 21 2021, 10:56 AM

Closed by commit rGf984ac2715f7: [profile] Add binary id into profiles (authored by gulfem). · Explain Why

This revision was automatically updated to reflect the committed changes.

gulfem added a commit: rGf984ac2715f7: [profile] Add binary id into profiles.

gulfem added a reverting change: rGfd895bc81ba7: Revert "[profile] Add binary id into profiles".Jul 21 2021, 12:15 PM

gulfem added a commit: rGe50a38840dc3: [profile] Add binary id into profiles.Jul 22 2021, 5:40 PM

hans mentioned this in D107143: [profile] Fix profile merging with binary IDs.Jul 31 2021, 12:44 AM

gulfem mentioned this in D109122: [profile] Extend binary id profile test.Sep 1 2021, 6:59 PM

vitalybuka added a subscriber: vitalybuka.Dec 1 2021, 10:46 PM

vitalybuka added inline comments.

compiler-rt/test/profile/binary-id.c
12 ↗	(On Diff #360523)	Hi @gulfem, Do you have some ideas why this fails on this bot? https://lab.llvm.org/staging/#/builders/97/builds/840

gulfem added inline comments.Dec 3 2021, 6:15 PM

compiler-rt/test/profile/binary-id.c
12 ↗	(On Diff #360523)	Hi @vitalybuka, I tried looking into that, but I am not able to access that bot at the moment. I'm getting `502 Bad Gateway` error, but I'll try to reproduce it later. Hopefully, I can give your more info then!

Yes, looks like both LLVM servers are down https://lab.llvm.org/
FYI The special about that staging bot is that it is Ubuntu with recent
glibc 2.34. Our primary bot runs exactly the same build on Debian with
glibc 2.28 and the test passes there. It can be glibc or other dependencies.

In D102039#3171276, @vitalybuka wrote:

Yes, looks like both LLVM servers are down https://lab.llvm.org/
FYI The special about that staging bot is that it is Ubuntu with recent
glibc 2.34. Our primary bot runs exactly the same build on Debian with
glibc 2.28 and the test passes there. It can be glibc or other dependencies.

I tried to reproduce it on my machine, but I it uses Debian (not Ubuntu), so the issue did not reproduce.
With that patch, we started embedding build id (a unique identifier) into llvm profiles.
That test basically enables build id in the binary by using -Wl,--build-id -O2 option.
It then generates a llvm profile, and tries to read embedded build id from the profile.
What is the output of the following command?

/b/sanitizer-x86_64-linux/build/llvm_build64/bin/clang   -m64  -ldl  -fprofile-instr-generate -Wl,--build-id -O2 -o /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/profile/Linux/binary-id.c
readelf -n /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp

If we have build id in the binary, readelf (you can also use llvm-readelf instead) should have some output like that:

Displaying notes found in: .note.gnu.build-id
  Owner                Data size        Description
  GNU                  0x00000014       NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 8699f6e0c4e12b872aae7e1f37fa6ba2564e9702

If we have build id in the binary, we should then check whether build id is embedded in the profile:

env LLVM_PROFILE_FILE=/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw  /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp
llvm-profdata show --binary-ids  /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw

What is the output of the above command?

buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
/b/sanitizer-x86_64-linux/build/llvm_build64/bin/clang   -m64  -ldl
 -fprofile-instr-generate -Wl,--build-id=none -O2 -o
/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp
/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/profile/Linux/binary-id.c
buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
env
LLVM_PROFILE_FILE=/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw
 /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp
buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
../llvm_build64/bin/llvm-profdata show --binary-ids
 /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw
>
/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.out
buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
../llvm_build64/bin/llvm-profdata show --binary-ids
 /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw

Instrumentation level: Front-end
Total functions: 3
Maximum function count: 1
Maximum internal block count: 0
buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
../llvm_build64/bin/llvm-profdata merge -o
/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profdata
/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw
buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
/b/sanitizer-x86_64-linux/build/llvm_build64/bin/clang   -m64  -ldl
 -fprofile-instr-generate -Wl,--build-id -O2 -o
/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp
/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/profile/Linux/binary-id.c
buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
env
LLVM_PROFILE_FILE=/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw
 /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp
buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
../llvm_build64/bin/llvm-profdata show --binary-ids
 /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw

Instrumentation level: Front-end
Total functions: 3
Maximum function count: 1
Maximum internal block count: 0

And no Binary IDs

../llvm_build64/bin/llvm-readelf -n test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp
Displaying notes found in: .note.ABI-tag
  Owner                Data size 	Description
  GNU                  0x00000010	NT_GNU_ABI_TAG (ABI version tag)
    OS: Linux, ABI: 3.2.0

Displaying notes found in: .note.gnu.property
  Owner                Data size 	Description
  GNU                  0x00000010	NT_GNU_PROPERTY_TYPE_0 (property note)
    Properties:    x86 ISA needed: x86-64-baseline


Displaying notes found in: .note.gnu.build-id
  Owner                Data size 	Description
  GNU                  0x00000014	NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 3dff2793618e5a04cbfef83483bc82d7d63b807d

Displaying notes found in: .note.gnu.gold-version
  Owner                Data size 	Description
  GNU                  0x00000009	NT_GNU_GOLD_VERSION (gold version)
    Version: gold 1.16

../llvm_build64/bin/llvm-profdata show --binary-ids test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw
Instrumentation level: Front-end
Total functions: 3
Maximum function count: 1
Maximum internal block count: 0

vitalybuka added inline comments.Dec 14 2021, 10:36 PM

compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
176	Here is the problem This loop stops on the first PT_NOTE, even if it's not build-id e.g. .note.ABI-tag, check my log in the message above

gulfem added inline comments.Dec 15 2021, 10:31 AM

compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
176	`PT_NOTE` is the note segment in the program header, and that segment can have multiple notes (https://man7.org/linux/man-pages/man5/elf.5.html). Like, the first note is `.note.ABI-tag`, and the second note is `.note.gnu.build-id` in your log. When I run that locally, there are still two notes, but just the order is swapped. gulfem@gulfem:~/llvm-release-build$ bin/llvm-readelf -n /usr/local/google/home/gulfem/llvm-release-build/projects/compiler-rt/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp Displaying notes found in: .note.gnu.build-id Owner Data size Description GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: 56559dcdb2c5b111e7e875c183ca2e0a1f433c96 Displaying notes found in: .note.ABI-tag Owner Data size Description GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag) OS: Linux, ABI: 3.2.0 Here, we iterate through all the notes. https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/profile/InstrProfilingPlatformLinux.c#L155 This should be iterating through all the notes in your case, too. I'm looking at that code to see whether we are missing something else.

@vitalybuka, it looks like we can have have multiple note segments in the program header, and each note segment can contain multiple notes.
Thank you @mcgrathr for pointing that out.
I think your case is like that, and I'll push a fix for that.

I uploaded https://reviews.llvm.org/D115830 to fix the issue.

phosek mentioned this in D76482: [lld][ELF] Provide optional hidden symbols for build ID.Nov 14 2023, 12:15 PM

Revision Contents

Path

Size

compiler-rt/

include/

profile/

InstrProfData.inc

1 line

lib/

profile/

InstrProfiling.h

8 lines

InstrProfilingPlatformLinux.c

46 lines

InstrProfilingWriter.c

20 lines

llvm/

include/

llvm/

ProfileData/

InstrProfData.inc

1 line

InstrProfReader.h

14 lines

lib/

ProfileData/

InstrProfReader.cpp

41 lines

tools/

llvm-profdata/

llvm-profdata.cpp

3 lines

Diff 344474

compiler-rt/include/profile/InstrProfData.inc

	Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	/* Definition of member fields of the raw profile header data structure. */			/* Definition of member fields of the raw profile header data structure. */
	#ifndef INSTR_PROF_RAW_HEADER			#ifndef INSTR_PROF_RAW_HEADER
	#define INSTR_PROF_RAW_HEADER(Type, Name, Initializer)			#define INSTR_PROF_RAW_HEADER(Type, Name, Initializer)
	#else			#else
	#define INSTR_PROF_DATA_DEFINED			#define INSTR_PROF_DATA_DEFINED
	#endif			#endif
	INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())			INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
	INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())			INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
				INSTR_PROF_RAW_HEADER(uint64_t, HasBinaryId, 0) // TODO: Can use a bool?
				phosekUnsubmitted Not Done Reply Inline Actions Instead of `HasBuildId` as a boolean, I think it might be better to use `BuildIdSize` to allow multiple ids. While raw profiles should have either 0 or 1 in practice, indexed profiles created by merging multiple raw profiles could have multiple ids, so using `BuildIdSize` would allow using the same header format for both raw and indexed format. phosek: Instead of `HasBuildId` as a boolean, I think it might be better to use `BuildIdSize` to allow…
				phosekUnsubmitted Done Reply Inline Actions I'd prefer to put this field at the end of the header to match the position of binary ID in the profile. phosek: I'd prefer to put this field at the end of the header to match the position of binary ID in the…
				gulfemAuthorUnsubmitted Done Reply Inline Actions Can a raw profile with build id and another raw profile without a build id be merged and create indexed profiled? gulfem: Can a raw profile with build id and another raw profile without a build id be merged…
				gulfemAuthorUnsubmitted Done Reply Inline Actions @phosek, it seems to me like `raw` and `indexed` profile do not share the same profile format. For ex, `raw` profile header is defined in: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ProfileData/InstrProfData.inc#L130 Whereas, `indexed` profile header is defined in: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ProfileData/InstrProf.h#L999 I think, we might need to extend both formats with binary id then. Please let me know if I'm missing anything. gulfem: @phosek, it seems to me like `raw` and `indexed` profile do not share the same profile format.
				phosekUnsubmitted Not Done Reply Inline Actions Yes, looks like we'll need to extend both formats but I still think it'd be preferable to support arbitrary number of binary id's in both formats since at least in ELF, you can have more than one build ID. The format of should also be the same in both the raw and indexed format, that is a sequence of tuples where each tuple is a length and a sequence of bytes of that length. phosek: Yes, looks like we'll need to extend both formats but I still think it'd be preferable to…
	INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize)			INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize)
	INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesBeforeCounters, PaddingBytesBeforeCounters)			INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesBeforeCounters, PaddingBytesBeforeCounters)
	INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize)			INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize)
	INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesAfterCounters, PaddingBytesAfterCounters)			INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesAfterCounters, PaddingBytesAfterCounters)
	INSTR_PROF_RAW_HEADER(uint64_t, NamesSize, NamesSize)			INSTR_PROF_RAW_HEADER(uint64_t, NamesSize, NamesSize)
	INSTR_PROF_RAW_HEADER(uint64_t, CountersDelta, (uintptr_t)CountersBegin)			INSTR_PROF_RAW_HEADER(uint64_t, CountersDelta, (uintptr_t)CountersBegin)
	INSTR_PROF_RAW_HEADER(uint64_t, NamesDelta, (uintptr_t)NamesBegin)			INSTR_PROF_RAW_HEADER(uint64_t, NamesDelta, (uintptr_t)NamesBegin)
	INSTR_PROF_RAW_HEADER(uint64_t, ValueKindLast, IPVK_Last)			INSTR_PROF_RAW_HEADER(uint64_t, ValueKindLast, IPVK_Last)
	#undef INSTR_PROF_RAW_HEADER			#undef INSTR_PROF_RAW_HEADER
				phosekUnsubmitted Not Done Reply Inline Actions I'd prefer to move this field just after `Version` before `DataSize` to match the order of data inside the profile. phosek: I'd prefer to move this field just after `Version` before `DataSize` to match the order of data…
	/* INSTR_PROF_RAW_HEADER end */			/* INSTR_PROF_RAW_HEADER end */

	/* VALUE_PROF_FUNC_PARAM start */			/* VALUE_PROF_FUNC_PARAM start */
	/* Definition of parameter types of the runtime API used to do value profiling			/* Definition of parameter types of the runtime API used to do value profiling
	* for a given value site.			* for a given value site.
	*/			*/
	#ifndef VALUE_PROF_FUNC_PARAM			#ifndef VALUE_PROF_FUNC_PARAM
	#define VALUE_PROF_FUNC_PARAM(ArgType, ArgName, ArgLLVMType)			#define VALUE_PROF_FUNC_PARAM(ArgType, ArgName, ArgLLVMType)
	▲ Show 20 Lines • Show All 602 Lines • ▼ Show 20 Lines
	/* InstrProfile per-function control data alignment. */			/* InstrProfile per-function control data alignment. */
	#define INSTR_PROF_DATA_ALIGNMENT 8			#define INSTR_PROF_DATA_ALIGNMENT 8

	/* The data structure that represents a tracked value by the			/* The data structure that represents a tracked value by the
	* value profiler.			* value profiler.
	*/			*/
	typedef struct InstrProfValueData {			typedef struct InstrProfValueData {
	/* Profiled value. */			/* Profiled value. */
	uint64_t Value;			uint64_t Value;
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'uint64_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'uint64_t' [clang-diagnostic-error] [[https://github.
	/* Number of times the value appears in the training run. */			/* Number of times the value appears in the training run. */
	uint64_t Count;			uint64_t Count;
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'uint64_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'uint64_t' [clang-diagnostic-error] [[https://github.
	} InstrProfValueData;			} InstrProfValueData;

	#endif /* INSTR_PROF_DATA_INC */			#endif /* INSTR_PROF_DATA_INC */

	#ifndef INSTR_ORDER_FILE_INC			#ifndef INSTR_ORDER_FILE_INC
	/* The maximal # of functions: 1281024 (the buffer size will be 1284 KB). */			/* The maximal # of functions: 1281024 (the buffer size will be 1284 KB). */
	#define INSTR_ORDER_FILE_BUFFER_SIZE 131072			#define INSTR_ORDER_FILE_BUFFER_SIZE 131072
	#define INSTR_ORDER_FILE_BUFFER_BITS 17			#define INSTR_ORDER_FILE_BUFFER_BITS 17
	▲ Show 20 Lines • Show All 126 Lines • Show Last 20 Lines

compiler-rt/lib/profile/InstrProfiling.h

	Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	const char *__llvm_profile_end_names(void);			const char *__llvm_profile_end_names(void);
	uint64_t *__llvm_profile_begin_counters(void);			uint64_t *__llvm_profile_begin_counters(void);
	uint64_t *__llvm_profile_end_counters(void);			uint64_t *__llvm_profile_end_counters(void);
	ValueProfNode *__llvm_profile_begin_vnodes();			ValueProfNode *__llvm_profile_begin_vnodes();
	ValueProfNode *__llvm_profile_end_vnodes();			ValueProfNode *__llvm_profile_end_vnodes();
	uint32_t *__llvm_profile_begin_orderfile();			uint32_t *__llvm_profile_begin_orderfile();

	/*!			/*!
				* \brief Reads binary id.
				*
				* If binary id exists, returns it and sets its size.
				* Othwerwise, returns null.
				*/
				const uint8_t __llvm_read_binary_id(uint32_t Size);
				phosekUnsubmitted Not Done Reply Inline Actions I'd slightly prefer returning both the data and the size as output parameters for consistency. We might also consider having a boolean/int return argument to signal an error. phosek: I'd slightly prefer returning both the data and the size as output parameters for consistency.

				/*!
	* \brief Clear profile counters to zero.			* \brief Clear profile counters to zero.
	*			*
	*/			*/
	void __llvm_profile_reset_counters(void);			void __llvm_profile_reset_counters(void);

	/*!			/*!
	* \brief Merge profile data from buffer.			* \brief Merge profile data from buffer.
	*			*
	▲ Show 20 Lines • Show All 226 Lines • Show Last 20 Lines

compiler-rt/lib/profile/InstrProfilingPlatformLinux.c

/===- InstrProfilingPlatformLinux.c - Profile data Linux platform ------===\		/===- InstrProfilingPlatformLinux.c - Profile data Linux platform ------===\
\|*		\|*
\|* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		\|* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
\|* See https://llvm.org/LICENSE.txt for license information.		\|* See https://llvm.org/LICENSE.txt for license information.
\|* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		\|* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
\|*		\|*
\===----------------------------------------------------------------------===/		\===----------------------------------------------------------------------===/

#if defined(__linux__) \|\| defined(__FreeBSD__) \|\| defined(__Fuchsia__) \|\| \		#if defined(__linux__) \|\| defined(__FreeBSD__) \|\| defined(__Fuchsia__) \|\| \
(defined(__sun__) && defined(__svr4__)) \|\| defined(__NetBSD__)		(defined(__sun__) && defined(__svr4__)) \|\| defined(__NetBSD__)

		#include <elf.h>
		#include <link.h>
#include <stdlib.h>		#include <stdlib.h>

#include "InstrProfiling.h"		#include "InstrProfiling.h"

#define PROF_DATA_START INSTR_PROF_SECT_START(INSTR_PROF_DATA_COMMON)		#define PROF_DATA_START INSTR_PROF_SECT_START(INSTR_PROF_DATA_COMMON)
#define PROF_DATA_STOP INSTR_PROF_SECT_STOP(INSTR_PROF_DATA_COMMON)		#define PROF_DATA_STOP INSTR_PROF_SECT_STOP(INSTR_PROF_DATA_COMMON)
#define PROF_NAME_START INSTR_PROF_SECT_START(INSTR_PROF_NAME_COMMON)		#define PROF_NAME_START INSTR_PROF_SECT_START(INSTR_PROF_NAME_COMMON)
#define PROF_NAME_STOP INSTR_PROF_SECT_STOP(INSTR_PROF_NAME_COMMON)		#define PROF_NAME_STOP INSTR_PROF_SECT_STOP(INSTR_PROF_NAME_COMMON)
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	__llvm_profile_begin_vnodes(void) {
return &PROF_VNODES_START;		return &PROF_VNODES_START;
}		}
COMPILER_RT_VISIBILITY ValueProfNode *__llvm_profile_end_vnodes(void) {		COMPILER_RT_VISIBILITY ValueProfNode *__llvm_profile_end_vnodes(void) {
return &PROF_VNODES_STOP;		return &PROF_VNODES_STOP;
}		}
COMPILER_RT_VISIBILITY ValueProfNode *CurrentVNode = &PROF_VNODES_START;		COMPILER_RT_VISIBILITY ValueProfNode *CurrentVNode = &PROF_VNODES_START;
COMPILER_RT_VISIBILITY ValueProfNode *EndVNode = &PROF_VNODES_STOP;		COMPILER_RT_VISIBILITY ValueProfNode *EndVNode = &PROF_VNODES_STOP;

		static size_t RoundUp(size_t size, size_t align) {
		return (size + align - 1) & ~(align - 1);
		}

		/* Returns build id and sets it size.
		* ELF file format has optional unique build id that can be used as binary id
		*/
		COMPILER_RT_VISIBILITY const uint8_t __llvm_read_binary_id(uint32_t Size) {
		extern const ElfW(Ehdr) __ehdr_start __attribute__((visibility("hidden")));
		const ElfW(Ehdr) *ElfHeader = &__ehdr_start;
		const ElfW(Phdr) *ProgramHeader =
		(const ElfW(Phdr) *)((uintptr_t)ElfHeader + ElfHeader->e_phoff);

		uint32_t i;
		davidxlUnsubmitted Done Reply Inline Actions Is it possible to extract this into a common helper function like void ForEachNote(int note_type, .. note_start, .. note_end, ... call_back ) { ... } davidxl: Is it possible to extract this into a common helper function like void ForEachNote(int…
		gulfemAuthorUnsubmitted Done Reply Inline Actions I think your concern was about the readability of that function, so I broke that function into multiple functions to increase readability. Please let me know if you other concerns. gulfem: I think your concern was about the readability of that function, so I broke that function into…
		/* Iterate through entries in the program header. */
		for (i = 0; i < ElfHeader->e_phnum; i++) {
		/* Look for the note section in program header entries. */
		if (ProgramHeader[i].p_type != PT_NOTE)
		continue;

		const ElfW(Nhdr) *Note =
		(const ElfW(Nhdr) *)((uintptr_t)ElfHeader + ProgramHeader[i].p_offset);
		const ElfW(Nhdr) *NotesEnd = Note + ProgramHeader[i].p_filesz;

		mcgrathrUnsubmitted Done Reply Inline Actions This is insufficient. The note "name" determines the space of n_type values. You must verify that the n_namesz=4 and the name bytes are "GNU\0" as well, or else this is some unrelated note that happens to use n_type=3 where 3 does not mean NT_GNU_BUILD_ID. mcgrathr: This is insufficient. The note "name" determines the space of n_type values. You must verify…
		while (Note < NotesEnd) {
		/* Look for the NT_GNU_BUILD_ID type in note section. */
		if (Note->n_type != NT_GNU_BUILD_ID) {
		mcgrathrUnsubmitted Not Done Reply Inline Actions This arithmetic is not right since Note has a type with sizeof>1. An example of correct arithmetic is: Note = (const ElfW(Nhdr))((const char )(Note + 1) + RoundUp(Note->n_namesz, 4) + RoundUp(Note->n_descsz, 4)); mcgrathr: This arithmetic is not right since Note has a type with sizeof>1. An example of correct…
		phosekUnsubmitted Done Reply Inline Actions I think we should be returning the number of binary ids in every case, even if `writer != NULL`. phosek: I think we should be returning the number of binary ids in every case, even if `writer != NULL`.
		gulfemAuthorUnsubmitted Done Reply Inline Actions My understanding is that each note in notes section starts with a `struct` which includes `n_namesz`, `n_descsz`, and `n_type` members. It is followed by the name (whose length is defined in `n_namesz`) and then by the descriptor (whose length is defined in `n_descsz`). So, type is already part of the struct. The arithmetic I have increases Note by struct size, and name data, and descriptor data. Am I missing anything? gulfem: My understanding is that each note in notes section starts with a `struct` which includes…
		phosekUnsubmitted Not Done Reply Inline Actions When you increment a pointer, the address is moved by the size of the underlying type: char* c = ...; c++; // c increments by 1 == sizeof(char) uint64_t* l = ...; l++; // l increments by 8 == sizeof(uint64_t) In this case `sizeof(Note) > 1` so you first need to cast it to `const char `. phosek:* When you increment a pointer, the address is moved by the size of the underlying type: ```…
		Note = Note + sizeof(ElfW(Nhdr)) + RoundUp(Note->n_namesz, 4) +
		RoundUp(Note->n_descsz, 4);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - RoundUp(Note->n_descsz, 4); + RoundUp(Note->n_descsz, 4); Lint: Pre-merge checks: clang-format: please reformat the code ``` - RoundUp(Note->n_descsz, 4); +…
		continue;
		}
		*Size = Note->n_descsz;
		uint8_t *Data =
		(uint8_t *)((uintptr_t)Note + sizeof(ElfW(Nhdr)) + Note->n_namesz);
		return Data;
		}

		/* If build id does not exist, stop reading entries in program header. */
		break;
		}

		return NULL;
		mcgrathrUnsubmitted Done Reply Inline Actions Can you abstract this logic into a subroutine so the code is not repeated in the two functions? A simple approach would be a subroutine that takes a maybe-null ProfDataWriter pointer and when called with a null pointer just returns the size instead of calling the writer. mcgrathr: Can you abstract this logic into a subroutine so the code is not repeated in the two functions?
		}

#endif		#endif
		vitalybukaUnsubmitted Not Done Reply Inline Actions Here is the problem This loop stops on the first PT_NOTE, even if it's not build-id e.g. .note.ABI-tag, check my log in the message above vitalybuka: Here is the problem This loop stops on the first PT_NOTE, even if it's not build-id e.g. .note.
		gulfemAuthorUnsubmitted Done Reply Inline Actions `PT_NOTE` is the note segment in the program header, and that segment can have multiple notes (https://man7.org/linux/man-pages/man5/elf.5.html). Like, the first note is `.note.ABI-tag`, and the second note is `.note.gnu.build-id` in your log. When I run that locally, there are still two notes, but just the order is swapped. gulfem@gulfem:~/llvm-release-build$ bin/llvm-readelf -n /usr/local/google/home/gulfem/llvm-release-build/projects/compiler-rt/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp Displaying notes found in: .note.gnu.build-id Owner Data size Description GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: 56559dcdb2c5b111e7e875c183ca2e0a1f433c96 Displaying notes found in: .note.ABI-tag Owner Data size Description GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag) OS: Linux, ABI: 3.2.0 Here, we iterate through all the notes. https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/profile/InstrProfilingPlatformLinux.c#L155 This should be iterating through all the notes in your case, too. I'm looking at that code to see whether we are missing something else. gulfem: `PT_NOTE` is the note segment in the program header, and that segment can have multiple notes…

compiler-rt/lib/profile/InstrProfilingWriter.c

Show All 27 Lines

static uint8_t BufferIOBuffer[VP_BUFFER_SIZE]; static uint8_t BufferIOBuffer[VP_BUFFER_SIZE];

static InstrProfValueData VPDataArray[16]; static InstrProfValueData VPDataArray[16];

static uint32_t VPDataArraySize = sizeof(VPDataArray) / sizeof(*VPDataArray); static uint32_t VPDataArraySize = sizeof(VPDataArray) / sizeof(*VPDataArray);

COMPILER_RT_VISIBILITY uint8_t *DynamicBufferIOBuffer = 0; COMPILER_RT_VISIBILITY uint8_t *DynamicBufferIOBuffer = 0;

COMPILER_RT_VISIBILITY uint32_t VPBufferSize = 0; COMPILER_RT_VISIBILITY uint32_t VPBufferSize = 0;

/* The buffer writer is reponsponsible in keeping writer state /* The buffer writer is reponsponsible in keeping writer state

* across the call. * across the call.

phosekUnsubmitted

Done

I'm not sure if we should be using the term BuildId since it can be interpreted as ELF-specific. We want this to generalize to other binary formats like Mach-O and COFF. In Mach-O, it's called LC_UUID. In COFF, it's usually referred to as GUID.

When searching online, I noticed that lld also build ID for COFF, see D36758, but I'm not sure if that's official name or if they just copied the name from the ELF implementation. So one option would be to just use BuildId but make it clear that it's a generic term. Another option would be a new name, for example BinaryId.

phosek: I'm not sure if we should be using the term `BuildId` since it can be interpreted as ELF…

gulfemAuthorUnsubmitted

Done

I renamed it to binary id, and I will briefly explain other ids used in other platforms in the RFC.

gulfem: I renamed it to `binary id`, and I will briefly explain other ids used in other platforms in…

*/ */

COMPILER_RT_VISIBILITY uint32_t lprofBufferWriter(ProfDataWriter *This, COMPILER_RT_VISIBILITY uint32_t lprofBufferWriter(ProfDataWriter *This,

ProfDataIOVec *IOVecs, ProfDataIOVec *IOVecs,

uint32_t NumIOVecs) { uint32_t NumIOVecs) {

uint32_t I; uint32_t I;

char **Buffer = (char **)&This->WriterCtx; char **Buffer = (char **)&This->WriterCtx;

for (I = 0; I < NumIOVecs; I++) { for (I = 0; I < NumIOVecs; I++) {

size_t Length = IOVecs[I].ElmSize * IOVecs[I].NumElm; size_t Length = IOVecs[I].ElmSize * IOVecs[I].NumElm;

▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines

COMPILER_RT_VISIBILITY int COMPILER_RT_VISIBILITY int

lprofWriteDataImpl(ProfDataWriter *Writer, const __llvm_profile_data *DataBegin, lprofWriteDataImpl(ProfDataWriter *Writer, const __llvm_profile_data *DataBegin,

const __llvm_profile_data *DataEnd, const __llvm_profile_data *DataEnd,

const uint64_t *CountersBegin, const uint64_t *CountersEnd, const uint64_t *CountersBegin, const uint64_t *CountersEnd,

VPDataReaderType *VPDataReader, const char *NamesBegin, VPDataReaderType *VPDataReader, const char *NamesBegin,

const char *NamesEnd, int SkipNameDataWrite) { const char *NamesEnd, int SkipNameDataWrite) {

/* Calculate size of sections. */ /* Calculate size of sections. */

phosekUnsubmitted

Done

This function implements ELF-specific logic so it cannot be in InstrProfilingWriter.c which is platform agnostic. It should be InstrProfilingPlatformLinux.c and we'll need COFF and Mach-O equivalents in InstrProfilingPlatformWindows.c and InstrProfilingPlatformDarwin.c respectively (which could be empty in the initial implementation).

phosek: This function implements ELF-specific logic so it cannot be in `InstrProfilingWriter.c` which…

const uint64_t DataSize = __llvm_profile_get_data_size(DataBegin, DataEnd); const uint64_t DataSize = __llvm_profile_get_data_size(DataBegin, DataEnd);

const uint64_t CountersSize = CountersEnd - CountersBegin; const uint64_t CountersSize = CountersEnd - CountersBegin;

const uint64_t NamesSize = NamesEnd - NamesBegin; const uint64_t NamesSize = NamesEnd - NamesBegin;

/* Create the header. */ /* Create the header. */

__llvm_profile_header Header; __llvm_profile_header Header;

if (!DataSize) if (!DataSize)

return 0; return 0;

/* Determine how much padding is needed before/after the counters and after /* Determine how much padding is needed before/after the counters and after

* the names. */ * the names. */

uint64_t PaddingBytesBeforeCounters, PaddingBytesAfterCounters, uint64_t PaddingBytesBeforeCounters, PaddingBytesAfterCounters,

PaddingBytesAfterNames; PaddingBytesAfterNames;

__llvm_profile_get_padding_sizes_for_counters( __llvm_profile_get_padding_sizes_for_counters(

DataSize, CountersSize, NamesSize, &PaddingBytesBeforeCounters, DataSize, CountersSize, NamesSize, &PaddingBytesBeforeCounters,

&PaddingBytesAfterCounters, &PaddingBytesAfterNames); &PaddingBytesAfterCounters, &PaddingBytesAfterNames);

/* Initialize header structure. */ /* Initialize header structure. */

#define INSTR_PROF_RAW_HEADER(Type, Name, Init) Header.Name = Init; #define INSTR_PROF_RAW_HEADER(Type, Name, Init) Header.Name = Init;

#include "profile/InstrProfData.inc" #include "profile/InstrProfData.inc"

uint32_t BinaryIdSize = 0;

uint32_t *BinaryIdSizePtr = &BinaryIdSize;

const uint8_t *BinaryId = __llvm_read_binary_id(BinaryIdSizePtr);

phosekUnsubmitted

Not Done

uint32_t BinaryIdSize = 0;

- uint32_t *BinaryIdSizePtr = &BinaryIdSize;

- const uint8_t *BinaryId = __llvm_read_binary_id(BinaryIdSizePtr);

+ const uint8_t *BinaryId = __llvm_read_binary_id(&BinaryIdSize);

if (BinaryId)

You shouldn't need an extra variable for the pointer.

phosek: You shouldn't need an extra variable for the pointer.

if (BinaryId)

Header.HasBinaryId = 1;

phosekUnsubmitted

Not Done

I think that we should avoid allocating the BuildId struct on the heap which could cause an issue when allocator itself is instrumented. Two solutions I can think of is to either allocate the struct as static (it's 8-16 bytes which is not so bad) or alternatively this function could use ProfDataWriter to directly write out the struct in which case it could be allocated on stack.

phosek: I think that we should avoid allocating the `BuildId` struct on the heap which could cause an…

gulfemAuthorUnsubmitted

Not Done

It seems like ValueProfiling already uses heap allocation compiler-rt/lib/profile/InstrProfilingValue.c, but we don't really need heap allocation for build id.
So, I removed build id struct.
Please let me know what you think about the new implementation.

gulfem: It seems like ValueProfiling already uses heap allocation `compiler…

phosekUnsubmitted

Not Done

That's true but it's also possible to allocate counter for value profiling statically if you use -mllvm -vp-static-alloc in which case you can avoid malloc which may be desirable on some platforms (this is what we would likely use on Fuchsia for example).

phosek: That's true but it's also possible to allocate counter for value profiling statically if you…

/* Write the data. */ /* Write the data. */

ProfDataIOVec IOVec[] = { ProfDataIOVec IOVec[] = {

{&Header, sizeof(__llvm_profile_header), 1, 0}, {&Header, sizeof(__llvm_profile_header), 1, 0},

{DataBegin, sizeof(__llvm_profile_data), DataSize, 0}, {DataBegin, sizeof(__llvm_profile_data), DataSize, 0},

{NULL, sizeof(uint8_t), PaddingBytesBeforeCounters, 1}, {NULL, sizeof(uint8_t), PaddingBytesBeforeCounters, 1},

{CountersBegin, sizeof(uint64_t), CountersSize, 0}, {CountersBegin, sizeof(uint64_t), CountersSize, 0},

{NULL, sizeof(uint8_t), PaddingBytesAfterCounters, 1}, {NULL, sizeof(uint8_t), PaddingBytesAfterCounters, 1},

{SkipNameDataWrite ? NULL : NamesBegin, sizeof(uint8_t), NamesSize, 0}, {SkipNameDataWrite ? NULL : NamesBegin, sizeof(uint8_t), NamesSize, 0},

{NULL, sizeof(uint8_t), PaddingBytesAfterNames, 1}}; {NULL, sizeof(uint8_t), PaddingBytesAfterNames, 1}};

if (Writer->Write(Writer, IOVec, sizeof(IOVec) / sizeof(*IOVec))) if (Writer->Write(Writer, IOVec, sizeof(IOVec) / sizeof(*IOVec)))

return -1; return -1;

/* Value profiling is not yet supported in continuous mode. */ /* Value profiling is not yet supported in continuous mode. */

if (__llvm_profile_is_continuous_mode_enabled()) if (__llvm_profile_is_continuous_mode_enabled())

return 0; return 0;

return writeValueProfData(Writer, VPDataReader, DataBegin, DataEnd); if (writeValueProfData(Writer, VPDataReader, DataBegin, DataEnd))

return -1;

if (!BinaryId)

return 0;

/* Write binary id size and data. */

ProfDataIOVec BinaryIdIOVec[] = {

{BinaryIdSizePtr, sizeof(uint32_t), 1, 0},

{BinaryId, sizeof(uint8_t), *BinaryIdSizePtr, 0}};

return Writer->Write(Writer, BinaryIdIOVec,

sizeof(BinaryIdIOVec) / sizeof(*BinaryIdIOVec));

Lint: Pre-merge checks

clang-format: please reformat the code

-    sizeof(BinaryIdIOVec) / sizeof(*BinaryIdIOVec));
+                       sizeof(BinaryIdIOVec) / sizeof(*BinaryIdIOVec));

Lint: Pre-merge checks: clang-format: please reformat the code ``` - sizeof(BinaryIdIOVec) / sizeof(*BinaryIdIOVec))…

} }

llvm/include/llvm/ProfileData/InstrProfData.inc

	Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	/* Definition of member fields of the raw profile header data structure. */			/* Definition of member fields of the raw profile header data structure. */
	#ifndef INSTR_PROF_RAW_HEADER			#ifndef INSTR_PROF_RAW_HEADER
	#define INSTR_PROF_RAW_HEADER(Type, Name, Initializer)			#define INSTR_PROF_RAW_HEADER(Type, Name, Initializer)
	#else			#else
	#define INSTR_PROF_DATA_DEFINED			#define INSTR_PROF_DATA_DEFINED
	#endif			#endif
	INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())			INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
	INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())			INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
				INSTR_PROF_RAW_HEADER(uint64_t, HasBinaryId, 0) // TODO: Can use a bool?
	INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize)			INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize)
	INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesBeforeCounters, PaddingBytesBeforeCounters)			INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesBeforeCounters, PaddingBytesBeforeCounters)
	INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize)			INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize)
	INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesAfterCounters, PaddingBytesAfterCounters)			INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesAfterCounters, PaddingBytesAfterCounters)
	INSTR_PROF_RAW_HEADER(uint64_t, NamesSize, NamesSize)			INSTR_PROF_RAW_HEADER(uint64_t, NamesSize, NamesSize)
	INSTR_PROF_RAW_HEADER(uint64_t, CountersDelta, (uintptr_t)CountersBegin)			INSTR_PROF_RAW_HEADER(uint64_t, CountersDelta, (uintptr_t)CountersBegin)
	INSTR_PROF_RAW_HEADER(uint64_t, NamesDelta, (uintptr_t)NamesBegin)			INSTR_PROF_RAW_HEADER(uint64_t, NamesDelta, (uintptr_t)NamesBegin)
	INSTR_PROF_RAW_HEADER(uint64_t, ValueKindLast, IPVK_Last)			INSTR_PROF_RAW_HEADER(uint64_t, ValueKindLast, IPVK_Last)
	▲ Show 20 Lines • Show All 611 Lines • ▼ Show 20 Lines
	/* InstrProfile per-function control data alignment. */			/* InstrProfile per-function control data alignment. */
	#define INSTR_PROF_DATA_ALIGNMENT 8			#define INSTR_PROF_DATA_ALIGNMENT 8

	/* The data structure that represents a tracked value by the			/* The data structure that represents a tracked value by the
	* value profiler.			* value profiler.
	*/			*/
	typedef struct InstrProfValueData {			typedef struct InstrProfValueData {
	/* Profiled value. */			/* Profiled value. */
	uint64_t Value;			uint64_t Value;
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'uint64_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'uint64_t' [clang-diagnostic-error] [[https://github.
	/* Number of times the value appears in the training run. */			/* Number of times the value appears in the training run. */
	uint64_t Count;			uint64_t Count;
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'uint64_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'uint64_t' [clang-diagnostic-error] [[https://github.
	} InstrProfValueData;			} InstrProfValueData;

	#endif /* INSTR_PROF_DATA_INC */			#endif /* INSTR_PROF_DATA_INC */

	#ifndef INSTR_ORDER_FILE_INC			#ifndef INSTR_ORDER_FILE_INC
	/* The maximal # of functions: 1281024 (the buffer size will be 1284 KB). */			/* The maximal # of functions: 1281024 (the buffer size will be 1284 KB). */
	#define INSTR_ORDER_FILE_BUFFER_SIZE 131072			#define INSTR_ORDER_FILE_BUFFER_SIZE 131072
	#define INSTR_ORDER_FILE_BUFFER_BITS 17			#define INSTR_ORDER_FILE_BUFFER_BITS 17
	▲ Show 20 Lines • Show All 126 Lines • Show Last 20 Lines

llvm/include/llvm/ProfileData/InstrProfReader.h

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	public:
virtual ~InstrProfReader() = default;		virtual ~InstrProfReader() = default;

/// Read the header. Required before reading first record.		/// Read the header. Required before reading first record.
virtual Error readHeader() = 0;		virtual Error readHeader() = 0;

/// Read a single record.		/// Read a single record.
virtual Error readNextRecord(NamedInstrProfRecord &Record) = 0;		virtual Error readNextRecord(NamedInstrProfRecord &Record) = 0;

		/// Read binary id.
		mcgrathrUnsubmitted Not Done Reply Inline Actions Is this the number of distinct IDs, or the byte size of the total block? (AIUI the total block is a sequence of {uint64_t n, uint8_t[n]} blocks.) If it's the number of distinct IDs, then I don't understand how a single uint8_t* from getBinaryIds() is meant to be used. For a number like that, I think Count is a clearer name to use than Size. mcgrathr: Is this the number of distinct IDs, or the byte size of the total block? (AIUI the total block…
		gulfemAuthorUnsubmitted Done Reply Inline Actions Is this the number of distinct IDs, or the byte size of the total block? It is the number of binary IDs. That file consistently uses `Size` to refer to number of elements like `DataSize`, `CounterSize`, `NamesSize`, etc. I don't like that either, but I just used `Size` to be consistent with the rest of the implementation. I think `Count` or `NumOf` is more self-explanatory. gulfem: > Is this the number of distinct IDs, or the byte size of the total block? It is the number…
		/// TODO: Consider implementing it as a pure virtual function,
		/// and override it every subclass.
		virtual Error readBinaryId() { return success(); }

		/// Print binary id on stream OS.
		virtual void printBinaryId(raw_ostream &OS){};
		mcgrathrUnsubmitted Done Reply Inline Actions Should these really be two separate methods? Or should it just be one method that returns both the size and the data in a span-style data structure? mcgrathr: Should these really be two separate methods? Or should it just be one method that returns both…

/// Iterator over profile data.		/// Iterator over profile data.
InstrProfIterator begin() { return InstrProfIterator(this); }		InstrProfIterator begin() { return InstrProfIterator(this); }
InstrProfIterator end() { return InstrProfIterator(); }		InstrProfIterator end() { return InstrProfIterator(); }

virtual bool isIRLevelProfile() const = 0;		virtual bool isIRLevelProfile() const = 0;

virtual bool hasCSIRLevelProfile() const = 0;		virtual bool hasCSIRLevelProfile() const = 0;

▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	private:
const char *NamesStart;		const char *NamesStart;
uint64_t NamesSize;		uint64_t NamesSize;
// After value profile is all read, this pointer points to		// After value profile is all read, this pointer points to
// the header of next profile data (if exists)		// the header of next profile data (if exists)
const uint8_t *ValueDataStart;		const uint8_t *ValueDataStart;
uint32_t ValueKindLast;		uint32_t ValueKindLast;
uint32_t CurValueDataSize;		uint32_t CurValueDataSize;

		uint64_t HasBinaryId; // TODO: Can use a bool?
		uint32_t BinaryIdSize;
		const uint8_t *BinaryId;

public:		public:
RawInstrProfReader(std::unique_ptr<MemoryBuffer> DataBuffer)		RawInstrProfReader(std::unique_ptr<MemoryBuffer> DataBuffer)
: DataBuffer(std::move(DataBuffer)) {}		: DataBuffer(std::move(DataBuffer)) {}
RawInstrProfReader(const RawInstrProfReader &) = delete;		RawInstrProfReader(const RawInstrProfReader &) = delete;
RawInstrProfReader &operator=(const RawInstrProfReader &) = delete;		RawInstrProfReader &operator=(const RawInstrProfReader &) = delete;

static bool hasFormat(const MemoryBuffer &DataBuffer);		static bool hasFormat(const MemoryBuffer &DataBuffer);
Error readHeader() override;		Error readHeader() override;
Error readNextRecord(NamedInstrProfRecord &Record) override;		Error readNextRecord(NamedInstrProfRecord &Record) override;
		Error readBinaryId() override;
		void printBinaryId(raw_ostream &OS) override;

bool isIRLevelProfile() const override {		bool isIRLevelProfile() const override {
return (Version & VARIANT_MASK_IR_PROF) != 0;		return (Version & VARIANT_MASK_IR_PROF) != 0;
}		}

bool hasCSIRLevelProfile() const override {		bool hasCSIRLevelProfile() const override {
return (Version & VARIANT_MASK_CSIR_PROF) != 0;		return (Version & VARIANT_MASK_CSIR_PROF) != 0;
}		}
▲ Show 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	private:
// end of the summary data if it exists or the input \c Cur.		// end of the summary data if it exists or the input \c Cur.
// \c UseCS indicates whether to use the context-sensitive profile summary.		// \c UseCS indicates whether to use the context-sensitive profile summary.
const unsigned char *readSummary(IndexedInstrProf::ProfVersion Version,		const unsigned char *readSummary(IndexedInstrProf::ProfVersion Version,
const unsigned char *Cur, bool UseCS);		const unsigned char *Cur, bool UseCS);

public:		public:
IndexedInstrProfReader(		IndexedInstrProfReader(
std::unique_ptr<MemoryBuffer> DataBuffer,		std::unique_ptr<MemoryBuffer> DataBuffer,
std::unique_ptr<MemoryBuffer> RemappingBuffer = nullptr)		std::unique_ptr<MemoryBuffer> RemappingBuffer = nullptr)
		mcgrathrUnsubmitted Done Reply Inline Actions I don't understand the commented-out declaration. mcgrathr: I don't understand the commented-out declaration.
: DataBuffer(std::move(DataBuffer)),		: DataBuffer(std::move(DataBuffer)),
RemappingBuffer(std::move(RemappingBuffer)), RecordIndex(0) {}		RemappingBuffer(std::move(RemappingBuffer)), RecordIndex(0) {}
IndexedInstrProfReader(const IndexedInstrProfReader &) = delete;		IndexedInstrProfReader(const IndexedInstrProfReader &) = delete;
IndexedInstrProfReader &operator=(const IndexedInstrProfReader &) = delete;		IndexedInstrProfReader &operator=(const IndexedInstrProfReader &) = delete;

/// Return the profile version.		/// Return the profile version.
uint64_t getVersion() const { return Index->getVersion(); }		uint64_t getVersion() const { return Index->getVersion(); }
bool isIRLevelProfile() const override { return Index->isIRLevelProfile(); }		bool isIRLevelProfile() const override { return Index->isIRLevelProfile(); }
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/lib/ProfileData/InstrProfReader.cpp

Show All 18 Lines
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/IR/ProfileSummary.h"		#include "llvm/IR/ProfileSummary.h"
#include "llvm/ProfileData/InstrProf.h"		#include "llvm/ProfileData/InstrProf.h"
#include "llvm/ProfileData/ProfileCommon.h"		#include "llvm/ProfileData/ProfileCommon.h"
#include "llvm/Support/Endian.h"		#include "llvm/Support/Endian.h"
#include "llvm/Support/Error.h"		#include "llvm/Support/Error.h"
#include "llvm/Support/ErrorOr.h"		#include "llvm/Support/ErrorOr.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/SymbolRemappingReader.h"
#include "llvm/Support/SwapByteOrder.h"		#include "llvm/Support/SwapByteOrder.h"
		#include "llvm/Support/SymbolRemappingReader.h"
#include <algorithm>		#include <algorithm>
#include <cctype>		#include <cctype>
#include <cstddef>		#include <cstddef>
#include <cstdint>		#include <cstdint>
#include <limits>		#include <limits>
#include <memory>		#include <memory>
#include <system_error>		#include <system_error>
#include <utility>		#include <utility>
▲ Show 20 Lines • Show All 324 Lines • ▼ Show 20 Lines

template <class IntPtrT>		template <class IntPtrT>
Error RawInstrProfReader<IntPtrT>::readHeader(		Error RawInstrProfReader<IntPtrT>::readHeader(
const RawInstrProf::Header &Header) {		const RawInstrProf::Header &Header) {
Version = swap(Header.Version);		Version = swap(Header.Version);
if (GET_VERSION(Version) != RawInstrProf::Version)		if (GET_VERSION(Version) != RawInstrProf::Version)
return error(instrprof_error::unsupported_version);		return error(instrprof_error::unsupported_version);

		HasBinaryId = swap(Header.HasBinaryId);
CountersDelta = swap(Header.CountersDelta);		CountersDelta = swap(Header.CountersDelta);
NamesDelta = swap(Header.NamesDelta);		NamesDelta = swap(Header.NamesDelta);
auto DataSize = swap(Header.DataSize);		auto DataSize = swap(Header.DataSize);
auto PaddingBytesBeforeCounters = swap(Header.PaddingBytesBeforeCounters);		auto PaddingBytesBeforeCounters = swap(Header.PaddingBytesBeforeCounters);
auto CountersSize = swap(Header.CountersSize);		auto CountersSize = swap(Header.CountersSize);
auto PaddingBytesAfterCounters = swap(Header.PaddingBytesAfterCounters);		auto PaddingBytesAfterCounters = swap(Header.PaddingBytesAfterCounters);
NamesSize = swap(Header.NamesSize);		NamesSize = swap(Header.NamesSize);
ValueKindLast = swap(Header.ValueKindLast);		ValueKindLast = swap(Header.ValueKindLast);
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	Error RawInstrProfReader<IntPtrT>::readValueProfilingData(
// remapped into function name hashes.		// remapped into function name hashes.
VDataPtrOrErr.get()->deserializeTo(Record, Symtab.get());		VDataPtrOrErr.get()->deserializeTo(Record, Symtab.get());
CurValueDataSize = VDataPtrOrErr.get()->getSize();		CurValueDataSize = VDataPtrOrErr.get()->getSize();
return success();		return success();
}		}

template <class IntPtrT>		template <class IntPtrT>
Error RawInstrProfReader<IntPtrT>::readNextRecord(NamedInstrProfRecord &Record) {		Error RawInstrProfReader<IntPtrT>::readNextRecord(NamedInstrProfRecord &Record) {
if (atEnd())		if (atEnd()) {
		// Read binary id that starts after record.
		// TODO: In which cases there are multiple headers?
		if (Error E = readBinaryId())
		return error(std::move(E));

// At this point, ValueDataStart field points to the next header.		// At this point, ValueDataStart field points to the next header.
if (Error E = readNextHeader(getNextHeaderPos()))		if (Error E = readNextHeader(getNextHeaderPos()))
return error(std::move(E));		return error(std::move(E));
		}

// Read name ad set it in Record.		// Read name ad set it in Record.
if (Error E = readName(Record))		if (Error E = readName(Record))
return error(std::move(E));		return error(std::move(E));

// Read FuncHash and set it in Record.		// Read FuncHash and set it in Record.
if (Error E = readFuncHash(Record))		if (Error E = readFuncHash(Record))
return error(std::move(E));		return error(std::move(E));

// Read raw counts and set Record.		// Read raw counts and set Record.
if (Error E = readRawCounts(Record))		if (Error E = readRawCounts(Record))
return error(std::move(E));		return error(std::move(E));

// Read value data and set Record.		// Read value data and set Record.
if (Error E = readValueProfilingData(Record))		if (Error E = readValueProfilingData(Record))
return error(std::move(E));		return error(std::move(E));

// Iterate.		// Iterate.
advanceData();		advanceData();
return success();		return success();
}		}

		template <class IntPtrT> Error RawInstrProfReader<IntPtrT>::readBinaryId() {
		if (HasBinaryId == 0)
		return success();
		phosekUnsubmitted Not Done Reply Inline Actions Where is this deallocated? It seems like this memory gets leaked. `BinaryIdTy` is just a plain struct with just two 8 byte fields (on a 64-bit machine), can we avoid allocating it on the heap and return/pass it by value to simplify lifetime management? phosek: Where is this deallocated? It seems like this memory gets leaked. `BinaryIdTy` is just a plain…
		gulfemAuthorUnsubmitted Done Reply Inline Actions I used `unique_ptr`, so memory will be released after BinaryIdTy goes out of scope. gulfem: I used `unique_ptr`, so memory will be released after BinaryIdTy goes out of scope.
		phosekUnsubmitted Not Done Reply Inline Actions I'd still prefer passing these by value, heap allocation in this case seems unnecessary and is going to be less efficient. phosek: I'd still prefer passing these by value, heap allocation in this case seems unnecessary and is…

		// Read binary id size.
		const uint32_t *BinaryIdBuffer =
		reinterpret_cast<const uint32_t *>(ValueDataStart);
		BinaryIdSize = *BinaryIdBuffer;
		BinaryIdBuffer++;

		// Read binary id data.
		BinaryId = reinterpret_cast<const uint8_t *>(BinaryIdBuffer);
		// Increment next profile start by binary id size and data.
		ValueDataStart += sizeof(BinaryIdSize);
		ValueDataStart += BinaryIdSize;

		return success();
		}

		template <class IntPtrT>
		void RawInstrProfReader<IntPtrT>::printBinaryId(raw_ostream &OS) {
		if (HasBinaryId == 0)
		return;
		davidxlUnsubmitted Not Done Reply Inline Actions When there is no binary ids to be print, perhaps output a line with 'None' instead of empty (which looks like the output is truncated. davidxl: When there is no binary ids to be print, perhaps output a line with 'None' instead of empty…
		gulfemAuthorUnsubmitted Done Reply Inline Actions When there is no binary id, we don't print anything at all. At line 517, we check the binary id size, and if there is no binary id, just return without printing anything. if (BinaryIdsSize == 0) return success(); gulfem: When there is no binary id, we don't print anything at all. At line 517, we check the binary id…

		OS << "Binary ID: ";
		for (uint32_t I = 0; I < BinaryIdSize; I++)
		OS.write_hex(BinaryId[I]);
		mcgrathrUnsubmitted Not Done Reply Inline Actions This is wildly susceptible to bad data. mcgrathr: This is wildly susceptible to bad data.
		gulfemAuthorUnsubmitted Done Reply Inline Actions This is wildly susceptible to bad data. gulfem: > This is wildly susceptible to bad data. >
		gulfemAuthorUnsubmitted Done Reply Inline Actions How can I improve that? I need to somehow increment the pointer by binary id length. gulfem: How can I improve that? I need to somehow increment the pointer by binary id length.
		phosekUnsubmitted Done Reply Inline Actions I think at minimum, you need to check that you aren't reading past the end of the file in the case the length is incorrect. phosek: I think at minimum, you need to check that you aren't reading past the end of the file in the…
		OS << "\n";
		}

namespace llvm {		namespace llvm {

template class RawInstrProfReader<uint32_t>;		template class RawInstrProfReader<uint32_t>;
template class RawInstrProfReader<uint64_t>;		template class RawInstrProfReader<uint64_t>;

} // end namespace llvm		} // end namespace llvm

InstrProfLookupTrait::hash_value_type		InstrProfLookupTrait::hash_value_type
▲ Show 20 Lines • Show All 417 Lines • Show Last 20 Lines

llvm/tools/llvm-profdata/llvm-profdata.cpp

Show First 20 Lines • Show All 2,216 Lines • ▼ Show 20 Lines	if (ShownFunctions && ShowMemOPSizes) {
showValueSitesStats(OS, IPVK_MemOPSize, VPStats[IPVK_MemOPSize]);		showValueSitesStats(OS, IPVK_MemOPSize, VPStats[IPVK_MemOPSize]);
}		}

if (ShowDetailedSummary) {		if (ShowDetailedSummary) {
OS << "Total number of blocks: " << PS->getNumCounts() << "\n";		OS << "Total number of blocks: " << PS->getNumCounts() << "\n";
OS << "Total count: " << PS->getTotalCount() << "\n";		OS << "Total count: " << PS->getTotalCount() << "\n";
PS->printDetailedSummary(OS);		PS->printDetailedSummary(OS);
}		}

		// This is only for testing binary id prototype.
		Reader->printBinaryId(OS);
		phosekUnsubmitted Not Done Reply Inline Actions I think we'll want this functionality even in the final version but we should introduce additional flag, for example `-show-binary-id`, and only print the binary id if that flag is set. phosek: I think we'll want this functionality even in the final version but we should introduce…
return 0;		return 0;
}		}

static void showSectionInfo(sampleprof::SampleProfileReader *Reader,		static void showSectionInfo(sampleprof::SampleProfileReader *Reader,
raw_fd_ostream &OS) {		raw_fd_ostream &OS) {
if (!Reader->dumpSectionInfo(OS)) {		if (!Reader->dumpSectionInfo(OS)) {
WithColor::warning() << "-show-sec-info-only is only supported for "		WithColor::warning() << "-show-sec-info-only is only supported for "
<< "sample profile in extbinary format and is "		<< "sample profile in extbinary format and is "
▲ Show 20 Lines • Show All 315 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[profile] Add binary id into profilesClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 344474

compiler-rt/include/profile/InstrProfData.inc

compiler-rt/lib/profile/InstrProfiling.h

compiler-rt/lib/profile/InstrProfilingPlatformLinux.c

compiler-rt/lib/profile/InstrProfilingWriter.c

llvm/include/llvm/ProfileData/InstrProfData.inc

llvm/include/llvm/ProfileData/InstrProfReader.h

llvm/lib/ProfileData/InstrProfReader.cpp

llvm/tools/llvm-profdata/llvm-profdata.cpp

[profile] Add binary id into profiles
ClosedPublic