This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Handle return to external transition.
ClosedPublic

Authored by hoy on Jun 18 2021, 9:31 AM.

Details

Summary

In a callback case, a return from internal code, say A, to external runtime can happen. The external runtime can then call back to another internal routine, say B. Making an artificial branch that looks like a return from A to B can confuse the unwinder to treat the instruction before B as the call instruction.

Diff Detail

Event Timeline

hoy created this revision.Jun 18 2021, 9:31 AM
hoy requested review of this revision.Jun 18 2021, 9:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2021, 9:31 AM
wenlei added inline comments.Jun 21 2021, 5:29 PM
llvm/tools/llvm-profgen/PerfReader.cpp
464

If we have a transition from A to B, which is in the form of A calling <external> and <external> calling B, we would form artificial branch from A to B as well. But the same issue with unwinder still exists, right?

So IIUC, the bail out condition should be A and B are not the same frame/function? In order for artificial branch to work with unwinder, the original execution flow can't involve extra external frames.

hoy added inline comments.Jun 21 2021, 5:50 PM
llvm/tools/llvm-profgen/PerfReader.cpp
464

The processing order here is in reversed timing order. E..g, we are processing a trace from latest to oldest like:

[C, D] [E, F] [ext, B] [ext, ext] .... , [A, ext], and now we are at [A, ext], without this change we would group [ext, B] (cached by PrevTrDst) and [A, ext] to form an artificial branch. With this change, we only keep processed traces up to [E, F].

Checking A and B are not in the same function can narrow down the cases. But even if they are in the same function, returning from A to runtime and then form to A ends up with an artificial branch meaning A returning to A can still be confusing, since there may not be a callsite corresponding to that.

hoy added inline comments.Jun 21 2021, 5:59 PM
llvm/tools/llvm-profgen/PerfReader.cpp
464

If we have a transition from A to B, which is in the form of A calling <external> and <external> calling B, we would form artificial branch from A to B as well. But the same issue with unwinder still exists, right?

This is fine so far. The problem with returning from A to B is that it causes the unwinder to treat the instruction in front of B is a call instruction, which is not always the case.

wenlei added inline comments.Jun 21 2021, 6:14 PM
llvm/tools/llvm-profgen/PerfReader.cpp
464

Checking A and B are not in the same function can narrow down the cases. But even if they are in the same function, returning from A to runtime and then form to A ends up with an artificial branch meaning A returning to A can still be confusing, since there may not be a callsite corresponding to that.

Fair point.

If we have a transition from A to B, which is in the form of A calling <external> and <external> calling B, we would form artificial branch from A to B as well. But the same issue with unwinder still exists, right?

This is fine so far. The problem with returning from A to B is that it causes the unwinder to treat the instruction in front of B is a call instruction, which is not always the case.

For this case, we don't have the problem of expecting a non-existing call site, but I was wondering if unwinder can yield correct results. When unwinding through artificial branch A->B, we see it's a call and will pop the frame of B, however in reality, we should pop frames for both B and <external>?

hoy added inline comments.Jun 21 2021, 6:29 PM
llvm/tools/llvm-profgen/PerfReader.cpp
464

If there is a preceding return from B to somewhere which pushes a frame, then the artificial call from A to B should naturally pop it. This likely means calling into the runtime from A, and then calling back to user code B, and then returning to user code, which is likely to happen due to tail calls.

If there is no preceding return in LBR, the call stack should be empty since the call stack trace truncation happens at external calling B point. The leftover traces will be treated without a calling context.

wenlei accepted this revision.Jun 21 2021, 11:07 PM

lgtm, thanks.

llvm/tools/llvm-profgen/PerfReader.cpp
464

ok, forgot about the truncation, A->external->B will never exist for calling context.

This revision is now accepted and ready to land.Jun 21 2021, 11:07 PM
wmi added inline comments.Jun 22 2021, 8:49 AM
llvm/tools/llvm-profgen/PerfReader.cpp
464

Didn't follow the related change closely so I have some basic questions.

When we need artificial branch?

In which case the truncation will happen?

464
hoy added inline comments.Jun 22 2021, 9:24 AM
llvm/tools/llvm-profgen/PerfReader.cpp
464

An artificial branch is used to model a round-trip transition from user code to runtime and back. For example, for a LBR trace like [C, D] [E, F] [ext, B] [ext, ext] .... , [A, ext], [M,N], [P,Q], an artificial branch [A, B] is created to connect the trace before and after the user/external transition, so that the original LBR trace is made use of maximally.

Here by truncation I mean the callstack trace truncation. A hybrid trace may look like

	          40057d
	          4005b9
	    7f67469af555
                  400579

0x40059f/0x400553/P/-/-/0 0x40059f/0x400553/P/-/-/0 0x40059f/0x400553/P/-/-/0 0x7f67469af555/0x4005b98/P/-/-/0 0x400578/0x7f67469af530/P/-/-/0 0x40099f/0x400588/P/-/-/0

Note that the return address (0x7f67469af555) sitting in the middle of the call stack indicates an external call back to user code. The callstack will be truncated to just the upper two entries since there isn't a good way to use a calling context with external code in the compiler. However, we don't truncate the LBR traces at the point of external calls. We'd like to fully use the trace before and after the transition, thought they may only be under a partial calling context.

hoy retitled this revision from [CSSPGO][llvm-profgen] Handle reurn to external transition. to [CSSPGO][llvm-profgen] Handle return to external transition..Jun 22 2021, 9:24 AM
wmi added inline comments.Jun 22 2021, 12:08 PM
llvm/tools/llvm-profgen/PerfReader.cpp
464

Thanks for the detailed explanation.

For artificial branch, sounds like you drop the parts belonging to external. I can understand for the branch records which are not calls/rets, it is ok to drop them since it is unrelated to the current binary. But if the branch records are calls/rets, they may correlate with the frames in the unwinding stack. I guess that is why you also have callstack trace truncation.

What if we keep all the calls/rets related branch records no matter they are ingoing/outgoing/external and keep all the frames in the callstack, can we match them and get the correct context easier?

wenlei added inline comments.Jun 22 2021, 12:17 PM
llvm/tools/llvm-profgen/PerfReader.cpp
464

External code may have frame pointer omitted, and stack sample from external code can be inaccurate. Additionally, we can't really leverage external frames in the context in the compiler for inlining.

wmi accepted this revision.Jun 22 2021, 2:33 PM
wmi added inline comments.
llvm/tools/llvm-profgen/PerfReader.cpp
464

I see. Thanks for explaining it.

This revision was landed with ongoing or failed builds.Jun 22 2021, 4:25 PM
This revision was automatically updated to reflect the committed changes.