This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues
ClosedPublic

Authored by wlei on Jan 25 2022, 12:37 PM.

Details

Summary

This patch is fixing two issues for both CS and non-CS.

  1. For external-call-internal, the head samples of the the internal function should be recorded.
  2. avoid ignoring LBR after meeting the interrupt branch for CS profile

LBR parser is shared between CS and non-CS, we found it's error-prone while dealing with artificial branch inside LBR parser. Since artificial branch is mainly used for CS profile unwinding, this patch tries to simplify LBR parser by decoupling artificial branch code from it, the concept of artificial branch is removed and split into two transitional branches(internal-to-external, external-to-internal). Then we leave all the processing of external branch to unwinder.

Specifically for unwinder, remembering that we introduce external frame in https://reviews.llvm.org/D115550. We can just take external address as a regular address and reuse current unwind function(unwindCall, unwindReturn). For a normal case, the external frame will match an external LBR, and it will be filtered out by unwindLinear without losing any context.

The data also shows that the interrupt or standalone LBR pattern(unpaired case) does exist, we choose to handle it by clearing the call stack and keeping unwinding. Here we leverage checking in unwindLinear, because a standalone LBR, no matter its type, since it doesn’t have other part to pair, it will eventually cause a wrong linear range, like [external, internal], [internal, external]. Then set the state to invalid there.

Diff Detail

Event Timeline

wlei created this revision.Jan 25 2022, 12:37 PM
wlei requested review of this revision.Jan 25 2022, 12:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2022, 12:37 PM
wlei retitled this revision from [llvm-profgen] Decouple artificial branch from perfreader and fix external addr related issues to [llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues.Jan 25 2022, 1:14 PM
wlei edited the summary of this revision. (Show Details)
wlei added reviewers: hoy, wenlei.
hoy added a comment.Jan 25 2022, 6:55 PM

Thanks for working on this! The refactoring does make the code base a lot cleaner.

llvm/tools/llvm-profgen/PerfReader.cpp
256

This can be an interrupt branch which does not correspond to any external frame on the call stack. Once such a branch is see, should the state have already been reset to dummy? I'm not seeing the resetting code anywhere. Maybe the stack resetting code being removed from `VirtualUnwinder::unwindCall can be reused to handle such case?

302

Would be nice to make the two cases fit into the call and return case below, respectively.

llvm/tools/llvm-profgen/ProfiledBinary.h
81 ↗(On Diff #403007)

This could be a potential legal address. Use UINT64_MAX instead?

wlei updated this revision to Diff 424978.Apr 25 2022, 11:05 AM
wlei edited the summary of this revision. (Show Details)

Updating D118177: [llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues

Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 11:05 AM
wlei added inline comments.Apr 25 2022, 11:14 AM
llvm/tools/llvm-profgen/PerfReader.cpp
256

Good point. There can be two patterns for the interrupt branch, I fixed them in other parts.

  1. LBR target is an external address caused by interrupt. This will be a regular branch, Resetting the call stack there if a regular branch's target is external address. see PerfReader.cpp:293
  1. LBR source is an external address caused by interrupt. This will go into unwindCall, we will find a mismatch call source and the call stack leaf. Resetting the call stack there. see PerfReader.cpp:67
302

Sounds good. Merging all the code into isCallState or isReturnState, now the four pattern: external-call-internal/internal-call-external/external-return-internal/internal-return-external are all handled by them.

llvm/tools/llvm-profgen/ProfiledBinary.h
81 ↗(On Diff #403007)

Okay, Before I didn't use UINT64_MAX because there can be add operations to address value and adding value to UINT64_MAX caused overflow, we might always keep in mind of this. I changed to UINT64_MAX and fixed one overflow issue, also wondering if we have a better number for this so that overflow can't be issues in the future.

hoy added inline comments.Apr 25 2022, 2:24 PM
llvm/tools/llvm-profgen/PerfReader.cpp
253

nit: external-to-internal

870–873

Please leave a comment about why such branch is still useful, i.e, when sourceOffset is external.

llvm/tools/llvm-profgen/PerfReader.h
490

Is this detecting a general return or just return from external?

522

Can we have an internal-to-external return here? I'm wondering if it should be filtered out.

llvm/tools/llvm-profgen/ProfileGenerator.cpp
771

Can this be fused into populateBoundarySamplesForFunction?

Looking at how this is handled on the non-CS side, especially in populateBoundarySamplesForAllFunctions, there is a check if (!FrameVec.empty()) , can we do something similar in populateBoundarySamplesForFunction? Of course we need the check here to bypass populateBodySamplesForFunction call, or perhaps we can move the getFunctionProfileForContext call into the populating functions.

wlei updated this revision to Diff 425059.Apr 25 2022, 4:41 PM

Updating D118177: [llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues

llvm/tools/llvm-profgen/PerfReader.cpp
253

fixed!

870–873

comments added.

llvm/tools/llvm-profgen/PerfReader.h
490

This is a heuristic to only detect returning from external because the source is external(unknown), so we leverage target address checking if it's the next inst of call inst.

For the general return, the source is known, so we use addressIsReturn to determine if it's a return.

522

The internal-to-external return pattern is handled by the right above branch(if (Binary->addressIsReturn(State.getCurrentLBRSource()))) which checking the source if it's a return.

llvm/tools/llvm-profgen/ProfileGenerator.cpp
771

Yes, fused into populateBoundarySamplesForFunction.

A little difference is I didn't move getFunctionProfileForContext into the populating functions because this is an optimization we did before to reuse the FunctionSamples of one context, otherwise we need to fetch the FunctionSamples twice(one in populateBodySamplesForFunction, one in populateBoundarySamplesForFunction).

Here I just directly bypassed the populateBodySamplesForFunction if the context is empty.

hoy added inline comments.Apr 25 2022, 5:00 PM
llvm/tools/llvm-profgen/PerfReader.h
490

I meant to say the name of this function doesn't exactly match its code. Maybe including the isSourceExternal check in this function?

522

Ok, a return to external is considered as a return. But in unwindReturn, the pushing external frame code is being removed. Should we keep it?

wlei updated this revision to Diff 425098.Apr 25 2022, 6:50 PM

Updating D118177: [llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues

wlei added inline comments.Apr 25 2022, 6:51 PM
llvm/tools/llvm-profgen/PerfReader.h
490

Ah, I see, the function name is not clear. Renamed and merged isSourceExternal

522

Before the ArtificialBranch is actually from two branches [internal-to-external, external-to-internal], so at that time, we take two operations in one call to unwindReturn, first push an external frame, then push an internal address. the stack is [external, internal]

Now after decoupling ArtificialBranch into two branches, it calls unwindReturn twice: In the first call to unwindReturn , say the internal-to-external will push the first address which is the external address(the stack is [external]) and for the second call to unwindReturn , the external-to-internal will push the internal address.(the stack is [external, internal]).

The current one and the previous are expected to be equivalent, just right now it's split into two steps.

Same to`unwindCall`, you can also find it removes the code to pop the extra external frame.

wlei updated this revision to Diff 425100.Apr 25 2022, 6:52 PM

Updating D118177: [llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues

hoy added inline comments.Apr 26 2022, 12:19 AM
llvm/tools/llvm-profgen/PerfReader.cpp
307

What kind of branch is not a call or return but with an external target address?

llvm/tools/llvm-profgen/PerfReader.h
522

Thanks for the explanation. I was trying to enumerate new cases we may see here. Previously with artificial branches, unpaired internal/external transitions may be thrown away, as well as samples following them. Now with this change, we never throw away any samples except for external/external transitions. This means the unwinder will see new patterns like unpaired I/E or E/I transitions, and I'm trying to make sense of them.

Perhaps we should bring this offline for a complete understanding and verification of the problem (you may already did that).

wenlei added inline comments.Apr 26 2022, 3:40 PM
llvm/tools/llvm-profgen/PerfReader.h
495

Check isValidState first.

512

Good that we don't need to check for artificial return, but perhaps we could assert that the target is next to a call for internal targets?

llvm/tools/llvm-profgen/ProfiledBinary.h
81 ↗(On Diff #403007)

Technically, both and 1 and UINT64_MAX can be legal address, but practically that probably will never happen. While overflow is an issue with UINT64_MAX, I think 1 is fine?

wlei updated this revision to Diff 425611.Apr 27 2022, 2:36 PM

Updating D118177: [llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues

llvm/tools/llvm-profgen/PerfReader.h
495

It's already checked in the only caller, see isCallState.

512

Not sure I fully get your point, we have the code in unwindReturn which call getCallAddrFromFrameAddr to get the target's next call inst. If the return variable(CallAddr) is not a call inst, it just return 0. Do you mean to assert on this zero value, I guess currently it won't pass for many cases. I think before we intent to silently give a zero, then the call stack is truncated by this wrong match.

void VirtualUnwinder::unwindReturn(UnwindState &State) {
  // Add extra frame as we unwind through the return
  const LBREntry &LBR = State.getCurrentLBR();
  uint64_t CallAddr = Binary->getCallAddrFromFrameAddr(LBR.Target);
  State.switchToFrame(CallAddr);
  ...
llvm/tools/llvm-profgen/ProfiledBinary.h
81 ↗(On Diff #403007)

This is the address not offset, it will be minus the base address, this is more common than "add" overflow. Both 1 and UINT64_MAX can overflow, that's why I choose the value in the middle 0x1000000000000000ULL. But I'm fine with either value.

wlei added a comment.Apr 27 2022, 2:51 PM

As we discussed offline, noted the summary here:

The data shows we the interrupt or standalone LBR pattern does exist, we should handle it(clear the call stack and keep unwinding).

Now we don’t choose to rely on detecting this LBR based on its type(call, return, jmp), we will leverage checking in linearUnwind, because a standalone LBR, no matter its type, since it doesn’t have other part to pair, it will eventually cause a wrong linear range, like [external, internal], [internal, external]. Then set the state to invalid there, this should make the code cleaner.

hoy added a comment.Apr 27 2022, 6:06 PM

The current implementation looks much cleaner now, thanks for working on this!

llvm/tools/llvm-profgen/ProfileGenerator.cpp
852

nit: CalleeCtx.append(ContextId.begin(), ContextId.end());

wlei updated this revision to Diff 425677.Apr 27 2022, 6:54 PM

Updating D118177: [llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues

llvm/tools/llvm-profgen/ProfileGenerator.cpp
852

Done, thanks!

hoy added inline comments.Apr 27 2022, 10:56 PM
llvm/tools/llvm-profgen/ProfiledBinary.h
81 ↗(On Diff #403007)

How can UINT64_MAX overflow? Do we add anything to it?

I guess with value 1 we would need special handling when converting it to offset right?

Actually IIRC, the upper half of the 64-bit mem space, i.e, FFFF8000'00000000 through FFFFFFFF'FFFFFFFF, is used as kernel space on Linux, so using UINT64_MAX to represent external address should be fine.

wlei added inline comments.Apr 27 2022, 11:10 PM
llvm/tools/llvm-profgen/ProfiledBinary.h
81 ↗(On Diff #403007)

Uh.. when it detect bogus trace LeafAddr >= LBRLeaf + 0x100, I already fixed this.

yeah, UINT64_MAX overflow are very unlikely, it should be fine.

1 is also okay I guess, minus a number, then become a negative value, but still doesn't conflict to binary offset.

hoy accepted this revision.Apr 27 2022, 11:37 PM

LGTM.

llvm/tools/llvm-profgen/ProfiledBinary.h
81 ↗(On Diff #403007)

I see. Then either value works.

This revision is now accepted and ready to land.Apr 27 2022, 11:37 PM
wenlei added inline comments.Apr 28 2022, 9:55 AM
llvm/tools/llvm-profgen/PerfReader.cpp
86

This doesn't damage the call stack, so the state should remain valid, just skip recording this range.

There's an implicit assumption that is we also filter out external stack frames which matches external LBRs. Perhaps worth pointing out very briefly.

509–515

I think we need to be a bit more disciplined about adding these warnings now. As you can see, for any llvm-profgen run, we're always printing out a bunch of warnings, which can be over whelming to users (non-compiler devs).

It's ok to add these two in this change, but we probably need a follow up change to restructure those warnings: 1) try to see if we can trim some, 2) perhaps hide some less important ones under something like a verbose mode, or different warning level.

866

The notion of ExternalOffset is a bit weird -- it translates a special constant into something special but not a const... Wondering if it could be avoid, e.g. replace X == ExternalOffset with Binary->offsetIsCode(X)?

llvm/tools/llvm-profgen/ProfiledBinary.h
81 ↗(On Diff #403007)

I don't have strong opinion either, but I'd suggest keep 1 if UINT64_MAX isn't better than 1. Make changes only if needed.

wlei updated this revision to Diff 425912.Apr 28 2022, 3:06 PM

Updating D118177: [llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues

llvm/tools/llvm-profgen/PerfReader.cpp
86

Comments refined.

509–515

Agreed the less important warnings should be trimmed . Will follow up with a patch.

866

Changed to use Binary->offsetIsCode

llvm/tools/llvm-profgen/ProfiledBinary.h
81 ↗(On Diff #403007)

Sounds good to use 1

wenlei accepted this revision.Apr 28 2022, 3:49 PM

lgtm, thanks.

wlei edited the summary of this revision. (Show Details)Apr 28 2022, 4:06 PM
This revision was landed with ongoing or failed builds.Apr 28 2022, 4:08 PM
This revision was automatically updated to reflect the committed changes.