This is an archive of the discontinued LLVM Phabricator instance.

[profile] Add binary id into profiles
ClosedPublic

Authored by gulfem on May 6 2021, 6:01 PM.

Details

Summary

This patch adds binary id into profiles to easily associate binaries
with the corresponding profiles. There is also an RFC that
discusses the motivation, design and implementation in more detail:
https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html

Diff Detail

Event Timeline

gulfem created this revision.May 6 2021, 6:01 PM
gulfem requested review of this revision.May 6 2021, 6:01 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 6 2021, 6:01 PM
Herald added subscribers: llvm-commits, Restricted Project. · View Herald Transcript
phosek added inline comments.May 10 2021, 11:06 AM
compiler-rt/include/profile/InstrProfData.inc
132

Instead of HasBuildId as a boolean, I think it might be better to use BuildIdSize to allow multiple ids. While raw profiles should have either 0 or 1 in practice, indexed profiles created by merging multiple raw profiles could have multiple ids, so using BuildIdSize would allow using the same header format for both raw and indexed format.

compiler-rt/lib/profile/InstrProfilingWriter.c
36

I'm not sure if we should be using the term BuildId since it can be interpreted as ELF-specific. We want this to generalize to other binary formats like Mach-O and COFF. In Mach-O, it's called LC_UUID. In COFF, it's usually referred to as GUID.

When searching online, I noticed that lld also build ID for COFF, see D36758, but I'm not sure if that's official name or if they just copied the name from the ELF implementation. So one option would be to just use BuildId but make it clear that it's a generic term. Another option would be a new name, for example BinaryId.

263

This function implements ELF-specific logic so it cannot be in InstrProfilingWriter.c which is platform agnostic. It should be InstrProfilingPlatformLinux.c and we'll need COFF and Mach-O equivalents in InstrProfilingPlatformWindows.c and InstrProfilingPlatformDarwin.c respectively (which could be empty in the initial implementation).

291

I think that we should avoid allocating the BuildId struct on the heap which could cause an issue when allocator itself is instrumented. Two solutions I can think of is to either allocate the struct as static (it's 8-16 bytes which is not so bad) or alternatively this function could use ProfDataWriter to directly write out the struct in which case it could be allocated on stack.

gulfem updated this revision to Diff 344474.May 11 2021, 10:36 AM
  1. Move elf specific details into InstrProfilingPlatformLinux.c
  2. Remove build id struct
  3. Use binary id instead of build id
gulfem retitled this revision from [profile] WIP Add build id into profiles to [profile] WIP Add binary id into profiles.May 11 2021, 10:37 AM
gulfem edited the summary of this revision. (Show Details)
gulfem edited the summary of this revision. (Show Details)
gulfem marked an inline comment as done.May 11 2021, 10:46 AM
gulfem added inline comments.
compiler-rt/include/profile/InstrProfData.inc
132

Can a raw profile with build id and another raw profile without a build id be merged and create indexed profiled?

compiler-rt/lib/profile/InstrProfilingWriter.c
36

I renamed it to binary id, and I will briefly explain other ids used in other platforms in the RFC.

291

It seems like ValueProfiling already uses heap allocation compiler-rt/lib/profile/InstrProfilingValue.c, but we don't really need heap allocation for build id.
So, I removed build id struct.
Please let me know what you think about the new implementation.

gulfem added inline comments.May 12 2021, 6:18 PM
compiler-rt/include/profile/InstrProfData.inc
132

@phosek, it seems to me like raw and indexed profile do not share the same profile format.
For ex, raw profile header is defined in:
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ProfileData/InstrProfData.inc#L130

Whereas, indexed profile header is defined in:
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ProfileData/InstrProf.h#L999

I think, we might need to extend both formats with binary id then.
Please let me know if I'm missing anything.

phosek added inline comments.May 12 2021, 11:33 PM
compiler-rt/include/profile/InstrProfData.inc
132

Yes, looks like we'll need to extend both formats but I still think it'd be preferable to support arbitrary number of binary id's in both formats since at least in ELF, you can have more than one build ID. The format of should also be the same in both the raw and indexed format, that is a sequence of tuples where each tuple is a length and a sequence of bytes of that length.

compiler-rt/lib/profile/InstrProfiling.h
101 ↗(On Diff #344474)

I'd slightly prefer returning both the data and the size as output parameters for consistency. We might also consider having a boolean/int return argument to signal an error.

compiler-rt/lib/profile/InstrProfilingWriter.c
287–288

You shouldn't need an extra variable for the pointer.

291

That's true but it's also possible to allocate counter for value profiling statically if you use -mllvm -vp-static-alloc in which case you can avoid malloc which may be desirable on some platforms (this is what we would likely use on Fuchsia for example).

llvm/tools/llvm-profdata/llvm-profdata.cpp
2256

I think we'll want this functionality even in the final version but we should introduce additional flag, for example -show-binary-id, and only print the binary id if that flag is set.

gulfem updated this revision to Diff 348352.May 27 2021, 12:14 PM
  1. Support multiple binary ids
  2. Extend indexed profile format to include binary ids
  3. Add tests
gulfem updated this revision to Diff 348420.May 27 2021, 5:09 PM

Use clang-format to correctly format the code

mcgrathr added inline comments.Jun 1 2021, 6:26 PM
compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
102

This is insufficient. The note "name" determines the space of n_type values. You must verify that the n_namesz=4 and the name bytes are "GNU\0" as well, or else this is some unrelated note that happens to use n_type=3 where 3 does not mean NT_GNU_BUILD_ID.

105

This arithmetic is not right since Note has a type with sizeof>1. An example of correct arithmetic is:

Note = (const ElfW(Nhdr)*)((const char *)(Note + 1) + RoundUp(Note->n_namesz, 4) + RoundUp(Note->n_descsz, 4));
120

Can you abstract this logic into a subroutine so the code is not repeated in the two functions?
A simple approach would be a subroutine that takes a maybe-null ProfDataWriter pointer and when called with a null pointer just returns the size instead of calling the writer.

compiler-rt/test/profile/binary-id.c
2

You may need to pass explicit -Wl,--build-id=none through the compiler here since it's sometimes on by default.

llvm/include/llvm/ProfileData/InstrProfReader.h
85

Is this the number of distinct IDs, or the byte size of the total block? (AIUI the total block is a sequence of {uint64_t n, uint8_t[n]} blocks.) If it's the number of distinct IDs, then I don't understand how a single uint8_t* from getBinaryIds() is meant to be used. For a number like that, I think Count is a clearer name to use than Size.

91

Should these really be two separate methods? Or should it just be one method that returns both the size and the data in a span-style data structure?

484

I don't understand the commented-out declaration.

llvm/lib/ProfileData/InstrProfReader.cpp
541

This is wildly susceptible to bad data.

gulfem updated this revision to Diff 349985.Jun 4 2021, 3:59 PM

Addressed Roland's comments

gulfem marked 4 inline comments as done.Jun 4 2021, 4:14 PM
gulfem added inline comments.
compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
105

My understanding is that each note in notes section starts with a struct which includes n_namesz, n_descsz, and n_type members.
It is followed by the name (whose length is defined in n_namesz) and then by the descriptor (whose length is defined in n_descsz).
So, type is already part of the struct. The arithmetic I have increases Note by struct size, and name data, and descriptor data.
Am I missing anything?

llvm/include/llvm/ProfileData/InstrProfReader.h
85

Is this the number of distinct IDs, or the byte size of the total block?

It is the number of binary IDs. That file consistently uses Size to refer to number of elements like DataSize, CounterSize, NamesSize, etc.
I don't like that either, but I just used Size to be consistent with the rest of the implementation.
I think Count or NumOf is more self-explanatory.

llvm/lib/ProfileData/InstrProfReader.cpp
541

This is wildly susceptible to bad data.

541

How can I improve that? I need to somehow increment the pointer by binary id length.

gulfem marked an inline comment as done.Jun 4 2021, 4:22 PM
phosek added inline comments.Jun 14 2021, 1:25 AM
compiler-rt/include/profile/InstrProfData.inc
132

I'd prefer to put this field at the end of the header to match the position of binary ID in the profile.

compiler-rt/lib/profile/InstrProfilingInternal.h
202

This name is a bit of a mouthful. I'd consider calling it just __llvm_write_binary_ids and say in the comment that the function always returns the number of binary ids even if writer is NULL.

compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
105

I think we should be returning the number of binary ids in every case, even if writer != NULL.

105

When you increment a pointer, the address is moved by the size of the underlying type:

char* c = ...;
c++; // c increments by 1 == sizeof(char)
uint64_t* l = ...;
l++; // l increments by 8 == sizeof(uint64_t)

In this case sizeof(Note) > 1 so you first need to cast it to const char *.

llvm/lib/ProfileData/InstrProfReader.cpp
517

Where is this deallocated? It seems like this memory gets leaked. BinaryIdTy is just a plain struct with just two 8 byte fields (on a 64-bit machine), can we avoid allocating it on the heap and return/pass it by value to simplify lifetime management?

541

I think at minimum, you need to check that you aren't reading past the end of the file in the case the length is incorrect.

gulfem updated this revision to Diff 352576.Jun 16 2021, 4:01 PM

Addressed Petr's comments

gulfem marked 4 inline comments as done.Jun 16 2021, 4:06 PM
gulfem added inline comments.
llvm/lib/ProfileData/InstrProfReader.cpp
517

I used unique_ptr, so memory will be released after BinaryIdTy goes out of scope.

gulfem marked 2 inline comments as done.Jun 16 2021, 4:07 PM
gulfem edited the summary of this revision. (Show Details)Jun 16 2021, 4:11 PM
gulfem added reviewers: vsk, davidxl.
phosek added inline comments.Jun 16 2021, 4:12 PM
llvm/lib/ProfileData/InstrProfReader.cpp
517

I'd still prefer passing these by value, heap allocation in this case seems unnecessary and is going to be less efficient.

gulfem retitled this revision from [profile] WIP Add binary id into profiles to [profile] Add binary id into profiles.Jun 24 2021, 2:28 PM
gulfem edited the summary of this revision. (Show Details)
gulfem edited the summary of this revision. (Show Details)Jun 24 2021, 2:32 PM

@vsk @davidxl Do you have any thoughts on this? We've already done a few rounds of reviews, but we'd like to get your opinion as well.

Repeating the same question asked in RFC: the profile data has builtin fine grained matching mechanism. For the top level stamping, is it enough to embed the build id in the file name of the profile data?

Repeating the same question asked in RFC: the profile data has builtin fine grained matching mechanism. For the top level stamping, is it enough to embed the build id in the file name of the profile data?

Just to clarify: we are suggesting to embed build in the profile file itself, not in the profile file name.
This will also allow us to embed multiple build ids if profiles are merged together.
Can you please elaborate on the builtin fine grained matching mechanism and why do you think it will be a problem?

Repeating the same question asked in RFC: the profile data has builtin fine grained matching mechanism. For the top level stamping, is it enough to embed the build id in the file name of the profile data?

We need to be able to match the profile back to the corresponding binary when generating coverage. Embedding build ID in filename doesn't work for us because on Fuchsia, most processes don't have access to filesystem (it's a capability that most processes shouldn't need). Rather the profile is exported via IPC as a memory object that doesn't have any name.

There might be a Fuchsia-specific solution, like wrapping the profile in an envelope that contains he build ID, but that solution would be Fuchsia-specific and we thought that making profiles more self-descriptive could be generally useful.

One idea we would like to pursue in the future is implementing support for the debuginfod protocol in LLVM, including llvm-cov, you could then use llvm-cov --instr-profile default.profraw and llvm-cov would fetch both the binary and the source from debuginfod server using the build ID inside the profile. In our case, this would not only simplify our infrastructure but also make it easy for developers to generate coverage reports from profiles fetched directly from our CI to reproduce results.

I am less concerned about raw profile format change (the version still needs to be bumped), but for the indexed format, the version needs to be bumped and it needs to guarantee that format is backward compatible. The patch does not seem to handle it.

Is it necessary to have binary id support in the indexed format?

I am less concerned about raw profile format change (the version still needs to be bumped), but for the indexed format, the version needs to be bumped and it needs to guarantee that format is backward compatible. The patch does not seem to handle it.

Is it necessary to have binary id support in the indexed format?

Yes, we need to bump the version. There is still some work that needs to be done in this patch like bumping the version, adjusting some of the tests, and adding more tests, etc.
We would like to get early feedback to see whether there is any fundamental issue about the approach.
@phosek can correct me, but I think we need to have binary id support in the indexed format.
We merge raw profiles into indexed profiles before generating source code coverage reports, and in order to associate binaries while generating coverage, we need to have binary id in the indexed profiles.

I am less concerned about raw profile format change (the version still needs to be bumped), but for the indexed format, the version needs to be bumped and it needs to guarantee that format is backward compatible. The patch does not seem to handle it.

Is it necessary to have binary id support in the indexed format?

Yes, we need to bump the version. There is still some work that needs to be done in this patch like bumping the version, adjusting some of the tests, and adding more tests, etc.
We would like to get early feedback to see whether there is any fundamental issue about the approach.
@phosek can correct me, but I think we need to have binary id support in the indexed format.
We merge raw profiles into indexed profiles before generating source code coverage reports, and in order to associate binaries while generating coverage, we need to have binary id in the indexed profiles.

Is the merging done in process or offline? Assuming it is offline, it seems possible to use name base approach.

Is the merging done in process or offline? Assuming it is offline, it seems possible to use name base approach.

It's done offline but since the profiles published via IPC don't have any names, we cannot rely or filenames, we would need to introduce a custom Fuchsia wrapper format (see my previous comment about the "envelope" idea).

We don't necessarily need binary ID in the indexed format, what we could do is:

llvm-profdata show --binary-id a.profraw >a.buildid
# fetch a binary with a.buildid as a.out
llvm-profdata show --binary-id b.profraw >b.buildid
# fetch a binary with b.buildid as b.out
...
llvm-profdata merge -o merged.profdata a.profraw b.profraw ...
llvm-cov show --instr-profile=merged.profdata a.out b.out ...

If we also stored binary IDs inside the indexed format, we could simplify this and do:

llvm-profdata show --binary-ids merged.profdata >merged.buildids
# fetch all binaries

This is more efficient but either would be fine with us for now.

Storing binary IDs inside the indexed profile would become really valuable if/once we have support debuginfod support at which point we should be able to just do:

llvm-profdata merge -o merged.profdata a.profraw b.profraw ...
llvm-cov show --instr-profile=merged.profdata

and llvm-cov would fetch binaries directly from the debuginfod server using binary IDs stored inside the indexed profile. It'll take a while before we have debuginfod support available in LLVM though so this is not critical for now, we're just trying to plan ahead.

Is extending the indexed format a problem? Is there some way to make it less of an issue?

Is the merging done in process or offline? Assuming it is offline, it seems possible to use name base approach.

It's done offline but since the profiles published via IPC don't have any names, we cannot rely or filenames, we would need to introduce a custom Fuchsia wrapper format (see my previous comment about the "envelope" idea).

We don't necessarily need binary ID in the indexed format, what we could do is:

llvm-profdata show --binary-id a.profraw >a.buildid
# fetch a binary with a.buildid as a.out
llvm-profdata show --binary-id b.profraw >b.buildid
# fetch a binary with b.buildid as b.out
...
llvm-profdata merge -o merged.profdata a.profraw b.profraw ...
llvm-cov show --instr-profile=merged.profdata a.out b.out ...

If we also stored binary IDs inside the indexed format, we could simplify this and do:

llvm-profdata show --binary-ids merged.profdata >merged.buildids
# fetch all binaries

This is more efficient but either would be fine with us for now.

Storing binary IDs inside the indexed profile would become really valuable if/once we have support debuginfod support at which point we should be able to just do:

llvm-profdata merge -o merged.profdata a.profraw b.profraw ...
llvm-cov show --instr-profile=merged.profdata

and llvm-cov would fetch binaries directly from the debuginfod server using binary IDs stored inside the indexed profile. It'll take a while before we have debuginfod support available in LLVM though so this is not critical for now, we're just trying to plan ahead.

I suppose this workflow can also be done with

llvm-cov show --instr-profile=merged.profdata --buildids=<file of list of ids> without the need to fetch the binaries.

Is extending the indexed format a problem? Is there some way to make it less of an issue?

It is not an issue, but I think it is preferable minimize version bumps for indexed format to reduce longer term churns. Given this restriction, it is better to make version change tied to functionalities that can do without a format change. Scripts, wrappers etc. are better for non-essential changes -- especially for workflows that can be mostly automated (which minimizes human inconveniences).

gulfem updated this revision to Diff 359056.Jul 15 2021, 11:04 AM
  1. Add binary id between profile header and data
  2. Raw profile version bump
  3. Only add binary id into raw profiles
aeubanks resigned from this revision.Jul 15 2021, 11:15 AM

I'm not familiar enough with this code

I'm not familiar enough with this code

@aeubanks I just added you because I modified a test file (corrupted-profile.c), and you are the author of that test file.

that test change seems fine
I remember I copied the structure from another existing test though, somewhere in llvm/test/tools/llvm-profdata IIRC

that test change seems fine
I remember I copied the structure from another existing test though, somewhere in llvm/test/tools/llvm-profdata IIRC

Thanks!

@davidxl we split the indexed profile change out of this change since it's not necessary for us right now, would it be possible to take a look again?

compiler-rt/include/profile/InstrProfData.inc
140

I'd prefer to move this field just after Version before DataSize to match the order of data inside the profile.

davidxl added inline comments.Jul 19 2021, 1:08 PM
compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
92

Is it possible to extract this into a common helper function like

void ForEachNote(int note_type, .. note_start, .. note_end, ... call_back ) {

...

}

gulfem updated this revision to Diff 359943.Jul 19 2021, 3:40 PM
  1. Break WriteBinaryIds into multiple functions to increase readibility
  2. Rebase
gulfem marked an inline comment as done.Jul 19 2021, 3:43 PM
gulfem added inline comments.
compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
92

I think your concern was about the readability of that function, so I broke that function into multiple functions to increase readability.
Please let me know if you other concerns.

davidxl accepted this revision.Jul 19 2021, 3:51 PM

LGTM

llvm/lib/ProfileData/InstrProfReader.cpp
537

When there is no binary ids to be print, perhaps output a line with 'None' instead of empty (which looks like the output is truncated.

This revision is now accepted and ready to land.Jul 19 2021, 3:51 PM
gulfem marked an inline comment as done.Jul 19 2021, 4:12 PM
gulfem added inline comments.
llvm/lib/ProfileData/InstrProfReader.cpp
537

When there is no binary id, we don't print anything at all.
At line 517, we check the binary id size, and if there is no binary id, just return without printing anything.

if (BinaryIdsSize == 0)
  return success();
gulfem updated this revision to Diff 360315.Jul 20 2021, 4:29 PM

Modified some raw profile test files to include the new profile version (version 6) and new header element (BinaryIdsSize).

This revision was landed with ongoing or failed builds.Jul 21 2021, 10:56 AM
This revision was automatically updated to reflect the committed changes.
vitalybuka added inline comments.
compiler-rt/test/profile/binary-id.c
12

Hi @gulfem,
Do you have some ideas why this fails on this bot?
https://lab.llvm.org/staging/#/builders/97/builds/840

gulfem added inline comments.Dec 3 2021, 6:15 PM
compiler-rt/test/profile/binary-id.c
12

Hi @vitalybuka,
I tried looking into that, but I am not able to access that bot at the moment.
I'm getting 502 Bad Gateway error, but I'll try to reproduce it later.
Hopefully, I can give your more info then!

Yes, looks like both LLVM servers are down https://lab.llvm.org/
FYI The special about that staging bot is that it is Ubuntu with recent
glibc 2.34. Our primary bot runs exactly the same build on Debian with
glibc 2.28 and the test passes there. It can be glibc or other dependencies.

Yes, looks like both LLVM servers are down https://lab.llvm.org/
FYI The special about that staging bot is that it is Ubuntu with recent
glibc 2.34. Our primary bot runs exactly the same build on Debian with
glibc 2.28 and the test passes there. It can be glibc or other dependencies.

I tried to reproduce it on my machine, but I it uses Debian (not Ubuntu), so the issue did not reproduce.
With that patch, we started embedding build id (a unique identifier) into llvm profiles.
That test basically enables build id in the binary by using -Wl,--build-id -O2 option.
It then generates a llvm profile, and tries to read embedded build id from the profile.
What is the output of the following command?

/b/sanitizer-x86_64-linux/build/llvm_build64/bin/clang   -m64  -ldl  -fprofile-instr-generate -Wl,--build-id -O2 -o /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/profile/Linux/binary-id.c
readelf -n /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp

If we have build id in the binary, readelf (you can also use llvm-readelf instead) should have some output like that:

Displaying notes found in: .note.gnu.build-id
  Owner                Data size        Description
  GNU                  0x00000014       NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 8699f6e0c4e12b872aae7e1f37fa6ba2564e9702

If we have build id in the binary, we should then check whether build id is embedded in the profile:

env LLVM_PROFILE_FILE=/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw  /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp
llvm-profdata show --binary-ids  /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw

What is the output of the above command?

vitalybuka added a comment.EditedDec 14 2021, 7:33 PM
buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
/b/sanitizer-x86_64-linux/build/llvm_build64/bin/clang   -m64  -ldl
 -fprofile-instr-generate -Wl,--build-id=none -O2 -o
/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp
/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/profile/Linux/binary-id.c
buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
env
LLVM_PROFILE_FILE=/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw
 /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp
buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
../llvm_build64/bin/llvm-profdata show --binary-ids
 /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw
>
/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.out
buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
../llvm_build64/bin/llvm-profdata show --binary-ids
 /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw

Instrumentation level: Front-end
Total functions: 3
Maximum function count: 1
Maximum internal block count: 0
buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
../llvm_build64/bin/llvm-profdata merge -o
/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profdata
/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw
buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
/b/sanitizer-x86_64-linux/build/llvm_build64/bin/clang   -m64  -ldl
 -fprofile-instr-generate -Wl,--build-id -O2 -o
/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp
/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/profile/Linux/binary-id.c
buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
env
LLVM_PROFILE_FILE=/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw
 /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp
buildbot@sanitizer-buildbot-vb-wlz1:/b/sanitizer-x86_64-linux/build/compiler_rt_build$
../llvm_build64/bin/llvm-profdata show --binary-ids
 /b/sanitizer-x86_64-linux/build/compiler_rt_build/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw

Instrumentation level: Front-end
Total functions: 3
Maximum function count: 1
Maximum internal block count: 0

And no Binary IDs

vitalybuka added a comment.EditedDec 14 2021, 9:37 PM
../llvm_build64/bin/llvm-readelf -n test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp
Displaying notes found in: .note.ABI-tag
  Owner                Data size 	Description
  GNU                  0x00000010	NT_GNU_ABI_TAG (ABI version tag)
    OS: Linux, ABI: 3.2.0

Displaying notes found in: .note.gnu.property
  Owner                Data size 	Description
  GNU                  0x00000010	NT_GNU_PROPERTY_TYPE_0 (property note)
    Properties:    x86 ISA needed: x86-64-baseline


Displaying notes found in: .note.gnu.build-id
  Owner                Data size 	Description
  GNU                  0x00000014	NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 3dff2793618e5a04cbfef83483bc82d7d63b807d

Displaying notes found in: .note.gnu.gold-version
  Owner                Data size 	Description
  GNU                  0x00000009	NT_GNU_GOLD_VERSION (gold version)
    Version: gold 1.16
../llvm_build64/bin/llvm-profdata show --binary-ids test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp.profraw
Instrumentation level: Front-end
Total functions: 3
Maximum function count: 1
Maximum internal block count: 0
vitalybuka added inline comments.Dec 14 2021, 10:36 PM
compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
178

Here is the problem
This loop stops on the first PT_NOTE, even if it's not build-id
e.g. .note.ABI-tag, check my log in the message above

gulfem added inline comments.Dec 15 2021, 10:31 AM
compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
178

PT_NOTE is the note segment in the program header, and that segment can have multiple notes (https://man7.org/linux/man-pages/man5/elf.5.html).
Like, the first note is .note.ABI-tag, and the second note is .note.gnu.build-id in your log.
When I run that locally, there are still two notes, but just the order is swapped.

gulfem@gulfem:~/llvm-release-build$ bin/llvm-readelf -n /usr/local/google/home/gulfem/llvm-release-build/projects/compiler-rt/test/profile/Profile-x86_64/Linux/Output/binary-id.c.tmp
Displaying notes found in: .note.gnu.build-id
  Owner                Data size        Description
  GNU                  0x00000014       NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 56559dcdb2c5b111e7e875c183ca2e0a1f433c96

Displaying notes found in: .note.ABI-tag
  Owner                Data size        Description
  GNU                  0x00000010       NT_GNU_ABI_TAG (ABI version tag)
    OS: Linux, ABI: 3.2.0

Here, we iterate through all the notes.
https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/profile/InstrProfilingPlatformLinux.c#L155
This should be iterating through all the notes in your case, too.
I'm looking at that code to see whether we are missing something else.

@vitalybuka, it looks like we can have have multiple note segments in the program header, and each note segment can contain multiple notes.
Thank you @mcgrathr for pointing that out.
I think your case is like that, and I'll push a fix for that.