This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profgen] Filter out invalid debug line
ClosedPublic

Authored by wlei on Sep 20 2021, 10:03 AM.

Details

Summary

There appears some big inline numbers in the symbolizer's output which is invalid for compiler to consume. See the example below:

400618: movl  $1, %edx                          main:4294967285
40061d: subl  %eax, %edx                        main:4294967285
40061f: addl  %ecx, %edx                        bar:1 @ main:3.2
40062b: addl  %ecx, %eax                        bar:1 @ main:3.2
40062d: movl  %eax, %esi                        bar:4294967292 @ main:3.2
40062f: addl  $1, %ecx                          main:2.2
...

Here changed to filter them out in a centralized way and also adjusted some total count samples due to it.

Diff Detail

Event Timeline

wlei created this revision.Sep 20 2021, 10:03 AM
wlei requested review of this revision.Sep 20 2021, 10:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2021, 10:03 AM
wlei edited the summary of this revision. (Show Details)Sep 20 2021, 10:09 AM
wlei added reviewers: hoy, wenlei, wmi.
hoy added inline comments.Sep 20 2021, 1:15 PM
llvm/test/tools/llvm-profgen/cs-preinline.test
13

Wondering what is causing the total sample to change but the body samples stay the same? The lines filtered out should not have mapped to the same context previously?

I think I've seen negative line offsets due to use of macro. If macro is used the line info may point to location where the macro is defined, then it can lead to negative line offset if we use function start line as the base.

In such cases, negative offset should still work for correlating profile back to IR.. What is the corresponding line offset you saw on relevant IR? Do they match the offset in the profile?

wlei added a comment.Sep 28 2021, 12:43 PM

I think I've seen negative line offsets due to use of macro. If macro is used the line info may point to location where the macro is defined, then it can lead to negative line offset if we use function start line as the base.

In such cases, negative offset should still work for correlating profile back to IR.. What is the corresponding line offset you saw on relevant IR? Do they match the offset in the profile?

As we discussed offline, we might have the invalid number in non-macro case and our internal tools also haven't covered this which can be a separate improvements later.

llvm/test/tools/llvm-profgen/cs-preinline.test
13

Sorry for the late reply.

This is because previously we add total sample even it's an invalid line number. With current change, the invalid line is filtered out at the early stage, so we then won't have them sum to the total number. This is causing the difference.

I think it should be minor impact by the difference since the invalid line is rare and total number itself is kind of a estimation(like it counts against all instructions but not the source code)

hoy accepted this revision.Sep 28 2021, 3:49 PM
hoy added inline comments.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
543

Can we potentially lose contexts when an invalid line offset is one of the frames? Like A:-1 @ B:2 @ C:3, without this change, we could still have samples for B:2 @ C:3. But I think that's rare.

LGTM.

This revision is now accepted and ready to land.Sep 28 2021, 3:49 PM
wlei added inline comments.Sep 30 2021, 3:13 PM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
543

Can we potentially lose contexts when an invalid line offset is one of the frames? Like A:-1 @ B:2 @ C:3, without this change, we could still have samples for B:2 @ C:3. But I think that's rare.

I haven't seen those cases, it seems it only happened for leaf frame. Even it can be leaf call, there is no samples hit the callee. I can add a warning on the non-leaf frame invalid line of stack address.

wenlei accepted this revision.Oct 1 2021, 12:14 AM

I think I've seen negative line offsets due to use of macro. If macro is used the line info may point to location where the macro is defined, then it can lead to negative line offset if we use function start line as the base.

In such cases, negative offset should still work for correlating profile back to IR.. What is the corresponding line offset you saw on relevant IR? Do they match the offset in the profile?

As we discussed offline, we might have the invalid number in non-macro case and our internal tools also haven't covered this which can be a separate improvements later.

Okay, as long as this matches the behavior of original tool, this looks good. I'm also curious how google's tool deal with such negative offset though..

wlei added a comment.Oct 1 2021, 9:36 AM

I think I've seen negative line offsets due to use of macro. If macro is used the line info may point to location where the macro is defined, then it can lead to negative line offset if we use function start line as the base.

In such cases, negative offset should still work for correlating profile back to IR.. What is the corresponding line offset you saw on relevant IR? Do they match the offset in the profile?

As we discussed offline, we might have the invalid number in non-macro case and our internal tools also haven't covered this which can be a separate improvements later.

Okay, as long as this matches the behavior of original tool, this looks good. I'm also curious how google's tool deal with such negative offset though..

Yeah, it seems like google also ignore those negative offset : https://github.com/google/autofdo/blob/963a8c1f55ed86db6b909ee603a46742b3980139/source_info.h

// Return the offset using the line number and the discriminator.
static constexpr uint64_t GenerateOffset(uint32_t Line,
                                         uint32_t Discriminator) {
  return (static_cast<uint64_t>(Line) << 32) |
         static_cast<uint64_t>(Discriminator);
}

// Return the line number from the offset.
static constexpr uint32_t GetLineNumberFromOffset(uint64_t Offset) {
  return (Offset >> 32) & 0xffff;
}
This revision was landed with ongoing or failed builds.Oct 4 2021, 7:14 PM
This revision was automatically updated to reflect the committed changes.