This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Ignore LBR records after interrupt transition
ClosedPublic

Authored by hoy on Jun 14 2021, 6:34 PM.

Details

Summary

If we have seen an inwards transition from external code to internal code, but not a following outwards transition, the inwards transition is likely due to interrupt which is usually unpaired. Ignore current and subsequent entries since they are likely from an unrelated pre-interrupt context.

LBR records from different interrupt context are unrelated and they should not be mixed together. Currenlty the OS does this for task-scheduling interrupt but not for all interrupts.

Diff Detail

Event Timeline

hoy created this revision.Jun 14 2021, 6:34 PM
hoy requested review of this revision.Jun 14 2021, 6:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2021, 6:34 PM
wenlei added inline comments.Jun 15 2021, 12:18 PM
llvm/tools/llvm-profgen/PerfReader.cpp
457

What do you mean by unrelated pre-interrupt context? After interrupt is handled, execution is supposed to continue from the same spot. If we only missed one of the interrupt transition branch, the pre-interrupt IP should be right before the current IP. If that's not the case, then what's missing is more than a transition branch from the pair - it then sounds like the problem is LBR buffer is not being cleared on context switch.

hoy added inline comments.Jun 15 2021, 12:32 PM
llvm/tools/llvm-profgen/PerfReader.cpp
457

Yes, LBR buffer is not cleared for all interrupts but task-scheduling interrupt. Interrupt does not reflect a normal execution flow. Pre-interrupt code and post-resume code should have the same context. For task-scheduling interrupt, the OS knows how to regroup them so that we have consistent execution flow in LBR for a single thread. Looks like it doesn't do that for other type of interrupts. We are working around here to break the LBR trace at the interrupt point.

On the second thought, we should also break at the resume point.

wenlei added inline comments.Jun 15 2021, 12:43 PM
llvm/tools/llvm-profgen/PerfReader.cpp
457

Yes, LBR buffer is not cleared for all interrupts but task-scheduling interrupt.

I'm not sure if this is expected. My understanding of the expectation is that, since save/restore LBR states could be expensive, we don't need to do that for all interrupts and context switches. But the fall back is to clear LBR buffer, with that we shouldn't see inconsistent LBR entries. We never saw such inconsistent LBR on windows, and we also never saw that on linux until today. We need to follow up with kernel folks.

On the second thought, we should also break at the resume point.

As a workaround, yes we need to bail out for both case.

486–487

Would be clearer if we do this:

if (IsOutwards) {
 ...
}
else {
  // Not finding outward transition after seeing an inward transition ...
  if (PrevTrDst)
     break;
}
hoy added inline comments.Jun 17 2021, 6:54 PM
llvm/tools/llvm-profgen/PerfReader.cpp
486–487

Looks better, thanks.

hoy updated this revision to Diff 352895.Jun 17 2021, 6:55 PM

Addressing back-from-interrupt case.

wenlei accepted this revision.Jun 18 2021, 10:31 AM

lgtm, thanks.

This revision is now accepted and ready to land.Jun 18 2021, 10:31 AM
wlei accepted this revision.Jun 18 2021, 11:02 AM

LGTM, thanks!

hoy retitled this revision from [CSSPGO][llvm-profgen] Ignore LBR records after interrup transition to [CSSPGO][llvm-profgen] Ignore LBR records after interrupt transition.Jun 18 2021, 12:04 PM
This revision was landed with ongoing or failed builds.Jun 18 2021, 12:14 PM
This revision was automatically updated to reflect the committed changes.