This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/tools/llvm-profgen/
-
tools/
-
llvm-profgen/
4/12
PerfReader.cpp

Differential D104546

[CSSPGO][llvm-profgen] Handle return to external transition.
ClosedPublic

Authored by hoy on Jun 18 2021, 9:31 AM.

Download Raw Diff

Details

Reviewers

wenlei
wlei
wmi

Commits

rG5c8659801a49: [CSSPGO][llvm-profgen] Handle return to external transition.

Summary

In a callback case, a return from internal code, say A, to external runtime can happen. The external runtime can then call back to another internal routine, say B. Making an artificial branch that looks like a return from A to B can confuse the unwinder to treat the instruction before B as the call instruction.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hoy created this revision.Jun 18 2021, 9:31 AM

Herald added a subscriber: wenlei. · View Herald TranscriptJun 18 2021, 9:31 AM

hoy requested review of this revision.Jun 18 2021, 9:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2021, 9:31 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B109949: Diff 353030.Jun 18 2021, 9:32 AM

hoy added reviewers: wenlei, wlei, wmi.Jun 18 2021, 9:37 AM

hoy added a parent revision: D104276: [CSSPGO][llvm-profgen] Ignore LBR records after interrupt transition.

wenlei added inline comments.Jun 21 2021, 5:29 PM

llvm/tools/llvm-profgen/PerfReader.cpp
464	If we have a transition from A to B, which is in the form of A calling <external> and <external> calling B, we would form artificial branch from A to B as well. But the same issue with unwinder still exists, right? So IIUC, the bail out condition should be A and B are not the same frame/function? In order for artificial branch to work with unwinder, the original execution flow can't involve extra external frames.

hoy added inline comments.Jun 21 2021, 5:50 PM

llvm/tools/llvm-profgen/PerfReader.cpp
464	The processing order here is in reversed timing order. E..g, we are processing a trace from latest to oldest like: [C, D] [E, F] [ext, B] [ext, ext] .... , [A, ext], and now we are at [A, ext], without this change we would group [ext, B] (cached by `PrevTrDst`) and [A, ext] to form an artificial branch. With this change, we only keep processed traces up to [E, F]. Checking A and B are not in the same function can narrow down the cases. But even if they are in the same function, returning from A to runtime and then form to A ends up with an artificial branch meaning A returning to A can still be confusing, since there may not be a callsite corresponding to that.

hoy added inline comments.Jun 21 2021, 5:59 PM

llvm/tools/llvm-profgen/PerfReader.cpp
464	If we have a transition from A to B, which is in the form of A calling <external> and <external> calling B, we would form artificial branch from A to B as well. But the same issue with unwinder still exists, right? This is fine so far. The problem with returning from A to B is that it causes the unwinder to treat the instruction in front of B is a call instruction, which is not always the case.

wenlei added inline comments.Jun 21 2021, 6:14 PM

llvm/tools/llvm-profgen/PerfReader.cpp
464	Checking A and B are not in the same function can narrow down the cases. But even if they are in the same function, returning from A to runtime and then form to A ends up with an artificial branch meaning A returning to A can still be confusing, since there may not be a callsite corresponding to that. Fair point. If we have a transition from A to B, which is in the form of A calling <external> and <external> calling B, we would form artificial branch from A to B as well. But the same issue with unwinder still exists, right? This is fine so far. The problem with returning from A to B is that it causes the unwinder to treat the instruction in front of B is a call instruction, which is not always the case. For this case, we don't have the problem of expecting a non-existing call site, but I was wondering if unwinder can yield correct results. When unwinding through artificial branch A->B, we see it's a call and will pop the frame of B, however in reality, we should pop frames for both B and <external>?

hoy added inline comments.Jun 21 2021, 6:29 PM

llvm/tools/llvm-profgen/PerfReader.cpp
464	If there is a preceding return from B to somewhere which pushes a frame, then the artificial call from A to B should naturally pop it. This likely means calling into the runtime from A, and then calling back to user code B, and then returning to user code, which is likely to happen due to tail calls. If there is no preceding return in LBR, the call stack should be empty since the call stack trace truncation happens at external calling B point. The leftover traces will be treated without a calling context.

lgtm, thanks.

llvm/tools/llvm-profgen/PerfReader.cpp
464	ok, forgot about the truncation, A->external->B will never exist for calling context.

This revision is now accepted and ready to land.Jun 21 2021, 11:07 PM

wmi added inline comments.Jun 22 2021, 8:49 AM

llvm/tools/llvm-profgen/PerfReader.cpp
464	Didn't follow the related change closely so I have some basic questions. When we need artificial branch? In which case the truncation will happen?
464

hoy added inline comments.Jun 22 2021, 9:24 AM

llvm/tools/llvm-profgen/PerfReader.cpp
464	An artificial branch is used to model a round-trip transition from user code to runtime and back. For example, for a LBR trace like `[C, D] [E, F] [ext, B] [ext, ext] .... , [A, ext], [M,N], [P,Q]`, an artificial branch `[A, B]` is created to connect the trace before and after the user/external transition, so that the original LBR trace is made use of maximally. Here by truncation I mean the callstack trace truncation. A hybrid trace may look like 40057d 4005b9 7f67469af555 400579 0x40059f/0x400553/P/-/-/0 0x40059f/0x400553/P/-/-/0 0x40059f/0x400553/P/-/-/0 0x7f67469af555/0x4005b98/P/-/-/0 0x400578/0x7f67469af530/P/-/-/0 0x40099f/0x400588/P/-/-/0 Note that the return address (0x7f67469af555) sitting in the middle of the call stack indicates an external call back to user code. The callstack will be truncated to just the upper two entries since there isn't a good way to use a calling context with external code in the compiler. However, we don't truncate the LBR traces at the point of external calls. We'd like to fully use the trace before and after the transition, thought they may only be under a partial calling context.

hoy retitled this revision from [CSSPGO][llvm-profgen] Handle reurn to external transition. to [CSSPGO][llvm-profgen] Handle return to external transition..Jun 22 2021, 9:24 AM

wmi added inline comments.Jun 22 2021, 12:08 PM

llvm/tools/llvm-profgen/PerfReader.cpp
464	Thanks for the detailed explanation. For artificial branch, sounds like you drop the parts belonging to external. I can understand for the branch records which are not calls/rets, it is ok to drop them since it is unrelated to the current binary. But if the branch records are calls/rets, they may correlate with the frames in the unwinding stack. I guess that is why you also have callstack trace truncation. What if we keep all the calls/rets related branch records no matter they are ingoing/outgoing/external and keep all the frames in the callstack, can we match them and get the correct context easier?

wenlei added inline comments.Jun 22 2021, 12:17 PM

llvm/tools/llvm-profgen/PerfReader.cpp
464	External code may have frame pointer omitted, and stack sample from external code can be inaccurate. Additionally, we can't really leverage external frames in the context in the compiler for inlining.

wmi accepted this revision.Jun 22 2021, 2:33 PM

wmi added inline comments.

llvm/tools/llvm-profgen/PerfReader.cpp
464	I see. Thanks for explaining it.

This revision was landed with ongoing or failed builds.Jun 22 2021, 4:25 PM

Closed by commit rG5c8659801a49: [CSSPGO][llvm-profgen] Handle return to external transition. (authored by hoy). · Explain Why

This revision was automatically updated to reflect the committed changes.

hoy added a commit: rG5c8659801a49: [CSSPGO][llvm-profgen] Handle return to external transition..

Revision Contents

Path

Size

llvm/

tools/

llvm-profgen/

PerfReader.cpp

10 lines

Diff 353817

llvm/tools/llvm-profgen/PerfReader.cpp

Show First 20 Lines • Show All 454 Lines • ▼ Show 20 Lines	while (Index < Records.size()) {

if (IsOutgoing) {		if (IsOutgoing) {
if (!PrevTrDst) {		if (!PrevTrDst) {
// This is unpaired outgoing jump which is likely due to interrupt or		// This is unpaired outgoing jump which is likely due to interrupt or
// incomplete LBR trace. Ignore current and subsequent entries since		// incomplete LBR trace. Ignore current and subsequent entries since
// they are likely in different contexts.		// they are likely in different contexts.
break;		break;
}		}

		if (Binary->addressIsReturn(Src)) {
		wenleiUnsubmitted Not Done Reply Inline Actions If we have a transition from A to B, which is in the form of A calling <external> and <external> calling B, we would form artificial branch from A to B as well. But the same issue with unwinder still exists, right? So IIUC, the bail out condition should be A and B are not the same frame/function? In order for artificial branch to work with unwinder, the original execution flow can't involve extra external frames. wenlei: If we have a transition from A to B, which is in the form of A calling <external> and…
		hoyAuthorUnsubmitted Done Reply Inline Actions The processing order here is in reversed timing order. E..g, we are processing a trace from latest to oldest like: [C, D] [E, F] [ext, B] [ext, ext] .... , [A, ext], and now we are at [A, ext], without this change we would group [ext, B] (cached by `PrevTrDst`) and [A, ext] to form an artificial branch. With this change, we only keep processed traces up to [E, F]. Checking A and B are not in the same function can narrow down the cases. But even if they are in the same function, returning from A to runtime and then form to A ends up with an artificial branch meaning A returning to A can still be confusing, since there may not be a callsite corresponding to that. hoy: The processing order here is in reversed timing order. E..g, we are processing a trace from…
		wmiUnsubmitted Not Done Reply Inline Actions wmi:
		hoyAuthorUnsubmitted Done Reply Inline Actions If we have a transition from A to B, which is in the form of A calling <external> and <external> calling B, we would form artificial branch from A to B as well. But the same issue with unwinder still exists, right? This is fine so far. The problem with returning from A to B is that it causes the unwinder to treat the instruction in front of B is a call instruction, which is not always the case. hoy: > If we have a transition from A to B, which is in the form of A calling <external> and…
		wenleiUnsubmitted Not Done Reply Inline Actions Checking A and B are not in the same function can narrow down the cases. But even if they are in the same function, returning from A to runtime and then form to A ends up with an artificial branch meaning A returning to A can still be confusing, since there may not be a callsite corresponding to that. Fair point. If we have a transition from A to B, which is in the form of A calling <external> and <external> calling B, we would form artificial branch from A to B as well. But the same issue with unwinder still exists, right? This is fine so far. The problem with returning from A to B is that it causes the unwinder to treat the instruction in front of B is a call instruction, which is not always the case. For this case, we don't have the problem of expecting a non-existing call site, but I was wondering if unwinder can yield correct results. When unwinding through artificial branch A->B, we see it's a call and will pop the frame of B, however in reality, we should pop frames for both B and <external>? wenlei: > Checking A and B are not in the same function can narrow down the cases. But even if they are…
		hoyAuthorUnsubmitted Done Reply Inline Actions If there is a preceding return from B to somewhere which pushes a frame, then the artificial call from A to B should naturally pop it. This likely means calling into the runtime from A, and then calling back to user code B, and then returning to user code, which is likely to happen due to tail calls. If there is no preceding return in LBR, the call stack should be empty since the call stack trace truncation happens at external calling B point. The leftover traces will be treated without a calling context. hoy: If there is a preceding return from B to somewhere which pushes a frame, then the artificial…
		wenleiUnsubmitted Not Done Reply Inline Actions ok, forgot about the truncation, A->external->B will never exist for calling context. wenlei: ok, forgot about the truncation, A->external->B will never exist for calling context.
		wmiUnsubmitted Not Done Reply Inline Actions Didn't follow the related change closely so I have some basic questions. When we need artificial branch? In which case the truncation will happen? wmi: Didn't follow the related change closely so I have some basic questions. When we need…
		hoyAuthorUnsubmitted Done Reply Inline Actions An artificial branch is used to model a round-trip transition from user code to runtime and back. For example, for a LBR trace like `[C, D] [E, F] [ext, B] [ext, ext] .... , [A, ext], [M,N], [P,Q]`, an artificial branch `[A, B]` is created to connect the trace before and after the user/external transition, so that the original LBR trace is made use of maximally. Here by truncation I mean the callstack trace truncation. A hybrid trace may look like 40057d 4005b9 7f67469af555 400579 0x40059f/0x400553/P/-/-/0 0x40059f/0x400553/P/-/-/0 0x40059f/0x400553/P/-/-/0 0x7f67469af555/0x4005b98/P/-/-/0 0x400578/0x7f67469af530/P/-/-/0 0x40099f/0x400588/P/-/-/0 Note that the return address (0x7f67469af555) sitting in the middle of the call stack indicates an external call back to user code. The callstack will be truncated to just the upper two entries since there isn't a good way to use a calling context with external code in the compiler. However, we don't truncate the LBR traces at the point of external calls. We'd like to fully use the trace before and after the transition, thought they may only be under a partial calling context. hoy: An artificial branch is used to model a round-trip transition from user code to runtime and…
		wmiUnsubmitted Not Done Reply Inline Actions Thanks for the detailed explanation. For artificial branch, sounds like you drop the parts belonging to external. I can understand for the branch records which are not calls/rets, it is ok to drop them since it is unrelated to the current binary. But if the branch records are calls/rets, they may correlate with the frames in the unwinding stack. I guess that is why you also have callstack trace truncation. What if we keep all the calls/rets related branch records no matter they are ingoing/outgoing/external and keep all the frames in the callstack, can we match them and get the correct context easier? wmi: Thanks for the detailed explanation. For artificial branch, sounds like you drop the parts…
		wenleiUnsubmitted Not Done Reply Inline Actions External code may have frame pointer omitted, and stack sample from external code can be inaccurate. Additionally, we can't really leverage external frames in the context in the compiler for inlining. wenlei: External code may have frame pointer omitted, and stack sample from external code can be…
		wmiUnsubmitted Not Done Reply Inline Actions I see. Thanks for explaining it. wmi: I see. Thanks for explaining it.
		// In a callback case, a return from internal code, say A, to external
		// runtime can happen. The external runtime can then call back to
		// another internal routine, say B. Making an artificial branch that
		// looks like a return from A to B can confuse the unwinder to treat
		// the instruction before B as the call instruction.
		break;
		}

// For transition to external code, group the Source with the next		// For transition to external code, group the Source with the next
// availabe transition target.		// availabe transition target.
Dst = PrevTrDst;		Dst = PrevTrDst;
PrevTrDst = 0;		PrevTrDst = 0;
IsArtificial = true;		IsArtificial = true;
} else {		} else {
if (PrevTrDst) {		if (PrevTrDst) {
// If we have seen an incoming transition from external code to internal		// If we have seen an incoming transition from external code to internal
▲ Show 20 Lines • Show All 222 Lines • Show Last 20 Lines