This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profgen] Support perf script without parsing MMap events
ClosedPublic

Authored by wlei on Jul 29 2021, 12:41 PM.

Details

Summary

This change supports to run without parsing MMap binary loading events instead it always assumes binary is loaded at the preferred address. This is used when we have assured no binary load address changes or we have pre-processed the addresses resolution. Warn if there's interior mmap event but without leading mmap events.

Diff Detail

Event Timeline

wlei created this revision.Jul 29 2021, 12:41 PM
wlei requested review of this revision.Jul 29 2021, 12:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2021, 12:41 PM
wlei edited the summary of this revision. (Show Details)Jul 29 2021, 12:51 PM
wlei added reviewers: hoy, wenlei, wmi.
wlei retitled this revision from [llvm-profgen] Ignore parsing MMap events to [llvm-profgen] An option to ignore parsing MMap events.Jul 29 2021, 12:54 PM
wlei edited the summary of this revision. (Show Details)

Having both "--show-mmap-events --ignore-mmap-events" seems a bit counter-intuitive.
If we already preprocessed the mmap events, could we also assume that we have the right pids and hence don't need --show-mmap-events?

hoy added a comment.Jul 29 2021, 1:05 PM

Can the mmap events be treated as optional, so when they present we parse them otherwise just move forward to other events?

wlei added a comment.Jul 29 2021, 1:13 PM

Having both "--show-mmap-events --ignore-mmap-events" seems a bit counter-intuitive.
If we already preprocessed the mmap events, could we also assume that we have the right pids and hence don't need --show-mmap-events?

Thanks for the feedback, the --show-mmap-events here is different from the linux perf script, here is to show the mmap events paring results, only for debugging use I think.

And in this test, see the directive: ; CHECK-IGNORE-NOT: Mmap: Binary /home/a.out loaded at 0x400000 which means --show-mmap-events won't work with --ignore-mmap-events.

I can add a warning here if it's confusing.

wlei added a comment.Jul 29 2021, 1:24 PM

Can the mmap events be treated as optional, so when they present we parse them otherwise just move forward to other events?

That might have issues with the leading samples before we meet a MMAP, they won't know which binary it belongs to. Currently If not specify ignore-mmap-events, we will always check the Addr2Binary table which is empty without mmap events parsed.

hoy added a comment.Jul 29 2021, 1:34 PM

Can the mmap events be treated as optional, so when they present we parse them otherwise just move forward to other events?

That might have issues with the leading samples before we meet a MMAP, they won't know which binary it belongs to. Currently If not specify ignore-mmap-events, we will always check the Addr2Binary table which is empty without mmap events parsed.

Can this be fixed? Without mmap events we can just use the preferred address as the load address.

It sounds a bit unclear to me by giving ignore-mmap-events when the map events are there. That means we basically do not trust those events.

wlei added a comment.Jul 29 2021, 1:47 PM

Can the mmap events be treated as optional, so when they present we parse them otherwise just move forward to other events?

That might have issues with the leading samples before we meet a MMAP, they won't know which binary it belongs to. Currently If not specify ignore-mmap-events, we will always check the Addr2Binary table which is empty without mmap events parsed.

Can this be fixed? Without mmap events we can just use the preferred address as the load address.

It sounds a bit unclear to me by giving ignore-mmap-events when the map events are there. That means we basically do not trust those events.

Understood! My concern is if the scripts has mmap events and only the leading sample missed the load address, they still can be loaded at a different base address which is not the preferred address. it's wrong base address in that case,

How about change it to always-trust-mmap-events, with this, the leading no-matched samples will be dropped. otherwise, it will use preferred address as load address.

hoy added a comment.Jul 29 2021, 1:52 PM

Can the mmap events be treated as optional, so when they present we parse them otherwise just move forward to other events?

That might have issues with the leading samples before we meet a MMAP, they won't know which binary it belongs to. Currently If not specify ignore-mmap-events, we will always check the Addr2Binary table which is empty without mmap events parsed.

Can this be fixed? Without mmap events we can just use the preferred address as the load address.

It sounds a bit unclear to me by giving ignore-mmap-events when the map events are there. That means we basically do not trust those events.

Understood! My concern is if the scripts has mmap events and only the leading sample missed the load address, they still can be loaded at a different base address which is not the preferred address. it's wrong base address in that case,

If that's a real case, we'll need to fix that. For now I think we want to cover to cases: 1. with leading mmaps 2. without leading mmaps (assuming the preferred address is the real load address). Have you seen a third case? I'm wondering if the input is invalid.

wlei added a comment.Jul 29 2021, 1:59 PM

Can the mmap events be treated as optional, so when they present we parse them otherwise just move forward to other events?

That might have issues with the leading samples before we meet a MMAP, they won't know which binary it belongs to. Currently If not specify ignore-mmap-events, we will always check the Addr2Binary table which is empty without mmap events parsed.

Can this be fixed? Without mmap events we can just use the preferred address as the load address.

It sounds a bit unclear to me by giving ignore-mmap-events when the map events are there. That means we basically do not trust those events.

Understood! My concern is if the scripts has mmap events and only the leading sample missed the load address, they still can be loaded at a different base address which is not the preferred address. it's wrong base address in that case,

If that's a real case, we'll need to fix that. For now I think we want to cover to cases: 1. with leading mmaps 2. without leading mmaps (assuming the preferred address is the real load address). Have you seen a third case? I'm wondering if the input is invalid.

Oh, I see, you mean if turn on --show-mmap-events, it will always have the leading mmap event, otherwise it's the invalid input? I indeed haven't seen the third case before. If so, that makes sense to use preferred address when without leading mmaps.

hoy added a comment.Jul 29 2021, 2:08 PM

Can the mmap events be treated as optional, so when they present we parse them otherwise just move forward to other events?

That might have issues with the leading samples before we meet a MMAP, they won't know which binary it belongs to. Currently If not specify ignore-mmap-events, we will always check the Addr2Binary table which is empty without mmap events parsed.

Can this be fixed? Without mmap events we can just use the preferred address as the load address.

It sounds a bit unclear to me by giving ignore-mmap-events when the map events are there. That means we basically do not trust those events.

Understood! My concern is if the scripts has mmap events and only the leading sample missed the load address, they still can be loaded at a different base address which is not the preferred address. it's wrong base address in that case,

If that's a real case, we'll need to fix that. For now I think we want to cover to cases: 1. with leading mmaps 2. without leading mmaps (assuming the preferred address is the real load address). Have you seen a third case? I'm wondering if the input is invalid.

Oh, I see, you mean if turn on --show-mmap-events, it will always have the leading mmap event, otherwise it's the invalid input? I indeed haven't seen the third case before. If so, that makes sense to use preferred address when without leading mmaps.

I think --show-mmap-events should always give leading mmap events or no mmap events at all. We should warn or error out when only interior mmap events are seen.

wlei added a comment.Jul 29 2021, 2:29 PM

Can the mmap events be treated as optional, so when they present we parse them otherwise just move forward to other events?

That might have issues with the leading samples before we meet a MMAP, they won't know which binary it belongs to. Currently If not specify ignore-mmap-events, we will always check the Addr2Binary table which is empty without mmap events parsed.

Can this be fixed? Without mmap events we can just use the preferred address as the load address.

It sounds a bit unclear to me by giving ignore-mmap-events when the map events are there. That means we basically do not trust those events.

Understood! My concern is if the scripts has mmap events and only the leading sample missed the load address, they still can be loaded at a different base address which is not the preferred address. it's wrong base address in that case,

If that's a real case, we'll need to fix that. For now I think we want to cover to cases: 1. with leading mmaps 2. without leading mmaps (assuming the preferred address is the real load address). Have you seen a third case? I'm wondering if the input is invalid.

Oh, I see, you mean if turn on --show-mmap-events, it will always have the leading mmap event, otherwise it's the invalid input? I indeed haven't seen the third case before. If so, that makes sense to use preferred address when without leading mmaps.

I think --show-mmap-events should always give leading mmap events or no mmap events at all. We should warn or error out when only interior mmap events are seen.

That makes sense, let me change it, thanks for the feedback!

Can the mmap events be treated as optional, so when they present we parse them otherwise just move forward to other events?

+1. I was thinking about the same..

We can start with assuming all binaries are loaded at preferred based, then on mmap event we update the base for binary, on other event, we process them using the current base (would be preferred base if no relevant mmap is seen before), and warn if there's no mmap before that event?

The above would be the default behavior without any switch. Is that what we're trying to do? If we want we could add a switch to always require mmap and error out if we don't find mmap for a binary.

hoy added a comment.Jul 29 2021, 3:17 PM

Can the mmap events be treated as optional, so when they present we parse them otherwise just move forward to other events?

+1. I was thinking about the same..

We can start with assuming all binaries are loaded at preferred based,

This is my two cents.

then on mmap event we update the base for binary,

I was thinking about warning or erroring for this. Erroring is probably two strong. An interior mmap event without a corresponding leading event likely means the profiling system doesn't work consistently, i.e., it sometimes grabs mmap event but sometimes not.

wlei updated this revision to Diff 363174.Jul 30 2021, 11:55 AM

addressing reviewers' feedback, default to use preferred adddress as loading address without mmap events, give warning if no leading mmap events

hoy added inline comments.Jul 30 2021, 2:34 PM
llvm/tools/llvm-profgen/PerfReader.cpp
320–327

nit: prefered -> preferred

751

Instead of a prescan for leading mmap events based on limited number of lines, can we compute a flag, say EventSeenFirst, the value of which can be sample events or mmap events, during parseAndAggregateTrace and check it in parseMMap2Event?

wlei updated this revision to Diff 363222.Jul 30 2021, 3:15 PM

addressing Hongtao's feedback

wlei added inline comments.Jul 30 2021, 3:16 PM
llvm/tools/llvm-profgen/PerfReader.cpp
751

Good idea! Fixed.

wlei retitled this revision from [llvm-profgen] An option to ignore parsing MMap events to [llvm-profgen] Support perf script without parsing MMap events.Jul 30 2021, 3:24 PM
wlei edited the summary of this revision. (Show Details)
wenlei added inline comments.Jul 30 2021, 3:33 PM
llvm/tools/llvm-profgen/PerfReader.cpp
673

The term "Interior mmap event" could be confusing..

The warning I was thinking is for when LBR/stack sample is seen before a relevant mmap is seen, in which case we warn about assuming preferred based address due to not seeing corresponding mmap. It doesn't matter whether we found other mmap event later or not.

hoy added inline comments.Jul 30 2021, 3:45 PM
llvm/tools/llvm-profgen/PerfReader.cpp
673

That sounds reasonable. In that case, the warning should be issued in parseSample for the first sample that does not have a leading mmap event.

wlei updated this revision to Diff 363232.Jul 30 2021, 4:08 PM

addressing Wenlei and Hongtao's feedback

hoy accepted this revision.Jul 30 2021, 4:38 PM

LGTM, thanks.

llvm/tools/llvm-profgen/PerfReader.h
657 ↗(On Diff #363232)

Nit: add a comment to indicate what the flag is used for.

This revision is now accepted and ready to land.Jul 30 2021, 4:38 PM
wlei updated this revision to Diff 363238.Jul 30 2021, 5:05 PM

add comments

wenlei added inline comments.Jul 30 2021, 6:29 PM
llvm/tools/llvm-profgen/PerfReader.cpp
673

Sorry if I wasn't clear. But note the "relevant mmap" in my previous comment - the way EventSeen is set here doesn't guarantee that the mmap seen is relevant for the binary we're interested in. Say if we want to generate profile for foo.so, we'd still want to warn if mmap for foo.so isn't seen (we may have seen mmap for other binaries) before we have a sample that is concluded to belong to foo.so's address space based on the preferred base.

hoy added inline comments.Aug 2 2021, 9:13 AM
llvm/tools/llvm-profgen/PerfReader.cpp
673

That's more precise. EventSeen should be set after the mmap event for the current binary is analyzed.

wenlei added inline comments.Aug 2 2021, 9:15 AM
llvm/tools/llvm-profgen/PerfReader.cpp
673

As for the implementation, it's more like a MMapSeen member of ProfiledBinary.

wlei updated this revision to Diff 363506.Aug 2 2021, 9:52 AM

change to check the field in ProfiledBinary

Thanks for making the changes. It's almost ready, except two more tweaks.

llvm/tools/llvm-profgen/PerfReader.cpp
664–671

Would it be cleaner if we move this to be after line 646 - at that point, we've located the binary using top addresses from stack sample, and we're ready to warn even if we don't get any lbr extracted due to leading outgoing branches.

llvm/tools/llvm-profgen/ProfiledBinary.h
145

Do we need a three-way Optional value here? Would it work if we just always initialize it to false, and set it to true when mmap is seen for the binary?

wlei added inline comments.Aug 2 2021, 11:15 AM
llvm/tools/llvm-profgen/PerfReader.cpp
664–671

Good point!

llvm/tools/llvm-profgen/ProfiledBinary.h
145

That's to avoid redundant warnings if there're many sample without matching any leading mmap. see in line 663 isLoadedByMMap.reset(); will mark it as "reported".

wlei updated this revision to Diff 363535.Aug 2 2021, 11:32 AM

addressing Wenlei's feedback

wenlei accepted this revision.Aug 2 2021, 11:54 AM

lgtm with nit on flags. thanks!

llvm/tools/llvm-profgen/ProfiledBinary.h
145

Ok, I see, thanks for clarification. In that case, after resetting, the meaning of the flag no longer abides to its definition described in the comment above? Mmap is still not seen, but we reset it to unknown just to avoid duplicated warning.. May consider a separate flag (like MissingMMapWarned) is cleaner.

Another nit: we use upper case initial for member field - IsLoadedByMMap

wlei updated this revision to Diff 363565.Aug 2 2021, 1:59 PM

addressing Wenlei's feedback

wenlei added inline comments.Aug 2 2021, 4:16 PM
llvm/tools/llvm-profgen/PerfReader.cpp
325

With IsLoadedByMMap initialized to false, this is no longer needed? (We don't setMissingMMapWarned here either)

wlei updated this revision to Diff 363606.Aug 2 2021, 5:12 PM

remove a redundant assignment of IsLoadedByMMap

llvm/tools/llvm-profgen/PerfReader.cpp
325

Good catch, fixed!