This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Fix external address issues of perf reader (leading external LBR part)
ClosedPublic

Authored by wlei on Dec 10 2021, 10:19 AM.

Details

Summary

We can have the sampling just hit into the external addresses, in that case, both the top stack frame and the latest LBR target are external addresses. For example:

	        ffffffff
 0x4006c8/0xffffffff/P/-/-/0  0x40069b/0x400670/M/-/-/0

 	          ffffffff
	          40067e
0xffffffff/0xffffffff/P/-/-/0  0x4006c8/0xffffffff/P/-/-/0  0x40069b/0x400670/M/-/-/0

Before we will ignore the entire samples. However, we found there exists some internal LBRs in the remaining part of sample, the range between them is still a valid range, we will lose some valid LBRs. Those LBRs will be unwinded based on a empty(context-less) call stack.

This change tries to fix it, instead of ignoring the entire sample, we only ignore the leading external addresses.

Note that the first outgoing LBR is useful since there is a valid range between it's source and next LBR's target.

Diff Detail

Event Timeline

wlei created this revision.Dec 10 2021, 10:19 AM
wlei requested review of this revision.Dec 10 2021, 10:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 10 2021, 10:19 AM
wlei edited the summary of this revision. (Show Details)Dec 10 2021, 10:38 AM
wlei added reviewers: hoy, wenlei.
wlei edited the summary of this revision. (Show Details)Dec 10 2021, 10:41 AM
wenlei added inline comments.Dec 11 2021, 10:16 AM
llvm/test/tools/llvm-profgen/inline-noprobe2.test
92

Ok, so this happens with existing test cases too. I didn't realize the entire function profile for small function could be missing due to external addresses even in contrived tests. Did you verify what is the external address (what function from what library) leads to samples involving quick_sort being ignored?

llvm/tools/llvm-profgen/PerfReader.cpp
256–257

A bit confused by the comment on "unwinded using a context-less frame stack" - I thought after we strip out the leading external frame and external LBR, we would still use the remaining stack as context, hence it's not context-less.

259

I think we can do some sanity check here:

  1. assert that we have a paired LBR here? ie. LBR target is external.
  2. also assert that the next frame on stack aligns with LBR source?
260

Should this be pushFrame/popFrame instead of switchToFrame depending on the instruction at the source of LBR?
Basically we are unwinding through the internal->external LBR here, which should be either a call or return, right?

612

If only leaf frames are external, can we keep the remaining frames for unwinding?

973

nit: move this into parseSample

1157

nit: top->leaf to make it explicit, for message and variable name.

wlei updated this revision to Diff 393765.Dec 12 2021, 10:56 AM
wlei edited the summary of this revision. (Show Details)

addressing feedbacks.

wlei added inline comments.Dec 12 2021, 10:57 AM
llvm/test/tools/llvm-profgen/inline-noprobe2.test
80–85

Here it's indeed fixed by this patch.

92

Sorry for the confusing. This is not related with this patch, I deleted this part by mistake in https://reviews.llvm.org/D115013. When I fixed this test case, I realized it and tried to sneak in this diff :)

llvm/tools/llvm-profgen/PerfReader.cpp
256–257

Good point. Based on this, we don't need the code here. We can have CALL/Return/JMP to external address and current unwinder all covered this, we can just reuse them. One exception is return to external code, that should be handed by the next patch. Removed this code.

259

assert that we have a paired LBR here? ie. LBR target is external.

Sounds good, added assertion in validateInitialState.

also assert that the next frame on stack aligns with LBR source?

If I understand correctly, that assumes the LBR source should be a CALL? The tricky part is we don't only have CALL into external address, we can have RETURN, JMP. One common cases is the PLT, internal call --> PLT--> external address. PLT code is a JMP.

How about to give a warning if the following CALL unwinding hit a mismatched frame.

260

Yeah, same to above comment. the current unwinder should already covered pushFrame/popFrame. One difference is the Return. For returning to external address, we don't have a target callsite, hence we can't push a valid frame. But this case should be handled by https://reviews.llvm.org/D115550 which used an artificial frame to truncate the call stack.

which should be either a call or return.

You see, it appeared that the PLT code is a JMP.

612

Sounds good, tried to keep the remaining internal call stack for unwinding.

973

fixed, thanks!

1157

Fixed

wenlei added inline comments.Dec 12 2021, 11:17 PM
llvm/tools/llvm-profgen/PerfReader.cpp
62

Actually do we need the check on isLeadingFrameExternal? Should we always expect parent frame address and LBR source to align for any calls?

Additionally, should there a one instruction shift between frame address and source/call address (see getCallAddrFromFrameAddr)?

And when mismatch happens, should we abort unwinding? It seems that at this point it's guaranteed to generate incorrect context?

How you seen such mismatch happening?

240

nit: this is more like total branches. usually LBR record is considered the full 16/32 entires of branch pairs in LBR. same applies to the warning message.

259

On 2nd thought, you're right that if the transfer is not a call, there will be mismatch, PLT is one, tail call jump can be another.

Thought for PLT, perhaps we can treat PLT as external too, so anything involved PLT will be stripped too.

I was hoping that we can have a sense of how intrusive this kind of mismatch is. Hopefully it's rare. Warning is good but perhaps PLT case can be handled?

271–272

Update the comment as it's no longer accurate - it could regular intra function branch, or artificial branch cross function boundaries when there're external functions involved (example would be good).

llvm/tools/llvm-profgen/PerfReader.h
302

nit: LeadingFrame -> LeafFrame. LeafLBR -> LastLBR. Leaf or root is used for frame, but there's no leaf or root for LBR, which isn't hierarchical. If there're other inconsistent usage of the terms, we can fix them too.

Additionally, why not check on CurrentLeafFrame instead given the function name says "Frame"?

hoy added inline comments.Dec 13 2021, 12:23 PM
llvm/tools/llvm-profgen/PerfReader.cpp
524

nit: unwinded -> unwound

532

nit: NumLeadingExternalLBR -> NumLeadingOutgoingLBR?

612

Maybe we can keep the whole call stack by replacing consecutive external frames with an artificial frame introduced in the other patch. This should be synchronized with the LBR parser where we replace consecutive external LBRs with an artificial branch.

llvm/tools/llvm-profgen/PerfReader.h
217

nit: address -> addresses

Also enum class is preferred.

305

nit: use a single return statement return !LBRStack.empty() && LBRStack[0].Target == ExternalAddr .

wlei updated this revision to Diff 394125.Dec 13 2021, 8:36 PM

addressing reviewer's feedback

wlei added inline comments.Dec 13 2021, 8:45 PM
llvm/tools/llvm-profgen/PerfReader.cpp
62

Should we always expect parent frame address and LBR source to align for any calls?

you see there is a comment to explain "// TODO: Currently we just assume all the addr that can't match the
2nd frame is in prolog/epilog.", so before we saw frame in prolog/epilog mismatch the call stack.

Additionally, should there a one instruction shift between frame address and source/call address (see getCallAddrFromFrameAddr)?

We already fix this mismatch in extractCallstack see

// We need to translate return address to call address for non-leaf frames.
if (!CallStack.empty()) {
  auto CallAddr = Binary->getCallAddrFromFrameAddr(FrameAddr);
  if (!CallAddr) {
  ....

And when mismatch happens, should we abort unwinding? It seems that at this point it's guaranteed to generate incorrect context?

Yeah, before for the frame in prolog/epilog, we didn't abort. If mismatch happen, we still can process using a context-less stack, but that will make things even complicated.

How you seen such mismatch happening?

I haven't tested it. I will add a warning for both leading frame issue and frame prolog/epilog. Maybe we can see how many cases of them to decide if we need to abort.

240

thanks for the suggestion.

259

PLT case can be handled, we can have a diff to make PLT code as external code. then we can have some statistic base on this. I will have a separated diff for PLT stuffs.

524

fixed, thanks!

532

fixed, thanks!

612

Good point. Changed to both use external branch, update in the next patch.

llvm/tools/llvm-profgen/PerfReader.h
217

Fixed, thanks

Also enum class is preferred.

It appeared that an complication issue to initialize the static const variable in ProfiledFrame

302

Thanks for the suggestion. The warning is moved to the next patch.

305

The warning is moved to the next patch, thanks!

wenlei accepted this revision.Dec 13 2021, 9:00 PM

lgtm, thanks.

llvm/tools/llvm-profgen/PerfReader.cpp
259

Dealing with PLT in separate patch sounds good. And I guess we only need to deal with it if it's important enough.

This revision is now accepted and ready to land.Dec 13 2021, 9:00 PM
hoy added inline comments.Dec 13 2021, 10:15 PM
llvm/tools/llvm-profgen/PerfReader.cpp
612

Thanks. Just to be clear, we will no longer need to break, instead, we'll continue whenever the top of stack is also an ExternalAddr. This way we could handle the following case

Ext
Ext
foo
Ext
Ext
bar
main
Ext/Ext /Ext/Ext foo/Ext foo/foo Ext/foo Ext/Ext bar/Ext bar/bar bar/bar ...

Though this should be rare, for completeness does it sounds reasonable?

wlei added inline comments.Dec 13 2021, 10:28 PM
llvm/tools/llvm-profgen/PerfReader.cpp
612

That's a good point. Only one thing: we don't know Ext/Ext is a Call or Return or Jmp. So I think we should preprocess to deduplicate the Ext in the call stack. like we also deduplicate the Ext in LBRsample.
in your case, we will get.

Ext
Ext
foo
Ext
Ext
bar
main

to

Ext
foo
Ext
bar
main
hoy added inline comments.Dec 13 2021, 10:37 PM
llvm/tools/llvm-profgen/PerfReader.cpp
612

Exactly, that is actually what I meant, i.e, to skip the current Ext frame only if the top of stack is already an Ext.

Do you think we need the other change to correctly truncate such as mixed callsite when reporting samples? Sounds good to implement there.

wlei added inline comments.Dec 13 2021, 10:44 PM
llvm/tools/llvm-profgen/PerfReader.cpp
612

Sounds good, I can move to code of truncating external addr from the 2nd patch to here. but here it's just truncate the context we need next patch to handle Call/Return of artificial branch.

hoy added inline comments.Dec 13 2021, 10:48 PM
llvm/tools/llvm-profgen/PerfReader.cpp
612

I see. Sounds good to move the truncating code here to make the current patch more complete and self-contained,

wlei updated this revision to Diff 394156.Dec 13 2021, 11:39 PM

addressing Hontao's feedback: Also keep the external address in the middle, deduplicate them. They can still be unwound(in next patch). And truncate the context for them.

hoy accepted this revision.Dec 14 2021, 8:38 AM

lgtm, thanks.

This revision was landed with ongoing or failed builds.Dec 14 2021, 4:42 PM
This revision was automatically updated to reflect the committed changes.