In a callback case, a return from internal code, say A, to external runtime can happen. The external runtime can then call back to another internal routine, say B. Making an artificial branch that looks like a return from A to B can confuse the unwinder to treat the instruction before B as the call instruction.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/tools/llvm-profgen/PerfReader.cpp | ||
---|---|---|
464 | If we have a transition from A to B, which is in the form of A calling <external> and <external> calling B, we would form artificial branch from A to B as well. But the same issue with unwinder still exists, right? So IIUC, the bail out condition should be A and B are not the same frame/function? In order for artificial branch to work with unwinder, the original execution flow can't involve extra external frames. |
llvm/tools/llvm-profgen/PerfReader.cpp | ||
---|---|---|
464 | The processing order here is in reversed timing order. E..g, we are processing a trace from latest to oldest like: [C, D] [E, F] [ext, B] [ext, ext] .... , [A, ext], and now we are at [A, ext], without this change we would group [ext, B] (cached by PrevTrDst) and [A, ext] to form an artificial branch. With this change, we only keep processed traces up to [E, F]. Checking A and B are not in the same function can narrow down the cases. But even if they are in the same function, returning from A to runtime and then form to A ends up with an artificial branch meaning A returning to A can still be confusing, since there may not be a callsite corresponding to that. |
llvm/tools/llvm-profgen/PerfReader.cpp | ||
---|---|---|
464 |
This is fine so far. The problem with returning from A to B is that it causes the unwinder to treat the instruction in front of B is a call instruction, which is not always the case. |
llvm/tools/llvm-profgen/PerfReader.cpp | ||
---|---|---|
464 |
Fair point.
For this case, we don't have the problem of expecting a non-existing call site, but I was wondering if unwinder can yield correct results. When unwinding through artificial branch A->B, we see it's a call and will pop the frame of B, however in reality, we should pop frames for both B and <external>? |
llvm/tools/llvm-profgen/PerfReader.cpp | ||
---|---|---|
464 | If there is a preceding return from B to somewhere which pushes a frame, then the artificial call from A to B should naturally pop it. This likely means calling into the runtime from A, and then calling back to user code B, and then returning to user code, which is likely to happen due to tail calls. If there is no preceding return in LBR, the call stack should be empty since the call stack trace truncation happens at external calling B point. The leftover traces will be treated without a calling context. |
lgtm, thanks.
llvm/tools/llvm-profgen/PerfReader.cpp | ||
---|---|---|
464 | ok, forgot about the truncation, A->external->B will never exist for calling context. |
llvm/tools/llvm-profgen/PerfReader.cpp | ||
---|---|---|
464 | An artificial branch is used to model a round-trip transition from user code to runtime and back. For example, for a LBR trace like [C, D] [E, F] [ext, B] [ext, ext] .... , [A, ext], [M,N], [P,Q], an artificial branch [A, B] is created to connect the trace before and after the user/external transition, so that the original LBR trace is made use of maximally. Here by truncation I mean the callstack trace truncation. A hybrid trace may look like 40057d 4005b9 7f67469af555 400579 0x40059f/0x400553/P/-/-/0 0x40059f/0x400553/P/-/-/0 0x40059f/0x400553/P/-/-/0 0x7f67469af555/0x4005b98/P/-/-/0 0x400578/0x7f67469af530/P/-/-/0 0x40099f/0x400588/P/-/-/0 Note that the return address (0x7f67469af555) sitting in the middle of the call stack indicates an external call back to user code. The callstack will be truncated to just the upper two entries since there isn't a good way to use a calling context with external code in the compiler. However, we don't truncate the LBR traces at the point of external calls. We'd like to fully use the trace before and after the transition, thought they may only be under a partial calling context. |
llvm/tools/llvm-profgen/PerfReader.cpp | ||
---|---|---|
464 | Thanks for the detailed explanation. For artificial branch, sounds like you drop the parts belonging to external. I can understand for the branch records which are not calls/rets, it is ok to drop them since it is unrelated to the current binary. But if the branch records are calls/rets, they may correlate with the frames in the unwinding stack. I guess that is why you also have callstack trace truncation. What if we keep all the calls/rets related branch records no matter they are ingoing/outgoing/external and keep all the frames in the callstack, can we match them and get the correct context easier? |
llvm/tools/llvm-profgen/PerfReader.cpp | ||
---|---|---|
464 | External code may have frame pointer omitted, and stack sample from external code can be inaccurate. Additionally, we can't really leverage external frames in the context in the compiler for inlining. |
llvm/tools/llvm-profgen/PerfReader.cpp | ||
---|---|---|
464 | I see. Thanks for explaining it. |
If we have a transition from A to B, which is in the form of A calling <external> and <external> calling B, we would form artificial branch from A to B as well. But the same issue with unwinder still exists, right?
So IIUC, the bail out condition should be A and B are not the same frame/function? In order for artificial branch to work with unwinder, the original execution flow can't involve extra external frames.