This is an archive of the discontinued LLVM Phabricator instance.

[JumpThreading] merge debug info when merging select+br
ClosedPublic

Authored by nickdesaulniers on Apr 8 2021, 1:41 PM.

Details

Summary

Jump threading can replace select then unconditional branch with
conditional branch, but when doing so loses debug info.

This destructive transform is eventually leading to a failed Verifier
run during full LTO builds of the Linux kernel with CFI and KCOV
enabled, as reported in PR39531.

ModuleSanitizerCoveragePass will insert calls to
sanitizer_cov_trace_pc, and sometimes split critical edges,
using whatever debug info may or may not exist for the branch for
the added libcall. Since we can inline calls to
sanitizer_cov_trace_pc due to LTO, this can lead to the error
observed in PR39531 when the debug info isn't propagated to
the libcall, because of prior destructive transforms that failed to
retain debug info.

Diff Detail

Event Timeline

nickdesaulniers created this revision.Apr 8 2021, 1:41 PM
nickdesaulniers requested review of this revision.Apr 8 2021, 1:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2021, 1:41 PM
nickdesaulniers added inline comments.Apr 8 2021, 1:44 PM
llvm/test/Transforms/JumpThreading/branch-debug-info2.ll
14

I probably can remove all these meaningless calls. Reducing this case from LTO was a dog. -passes=debugify helped a lot though.

vsk added a subscriber: vsk.Apr 8 2021, 2:03 PM

If you'd like, you can pass -debugify-level=locations to elide those dbg.value calls.

vedant

Link: b/184085493
Link: https://bugs.llvm.org/show_bug.cgi?id=39531
Link: https://www.llvm.org/docs/HowToUpdateDebugInfo.html#when-to-merge-instruction-locations
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

We don't have any tag styling like this in LLVM - maybe include some sentences about these links? (& if you like you can write the bug number with "PR39531") - oh, and at least Google tries not to include links to internal bugs in upstream contributions (not an entirely consistent policy, Apple folks regularly link to their internal bug tracker & I'm sure there are some google bugs referenced upstream, though we try not to). And "Signed off by" seems inaccurate (& Phabricator will add "Reviewed By" I think anyway, which is usually the thing we might want to record)

Could you explain further why this could result in a verifier failure? I don't know of any reason a branch would /need/ a debug location. Call instructions need locations due to some inlining concerns/constraints, but I don't think we require all/other instructions to have locations.

llvm/test/Transforms/JumpThreading/branch-debug-info2.ll
27

This looks incorrect - if this is the result of merging the select and br on lines 27 and 29, then the location can't be the existing !29 (because that would pick one line over the other, which can, in general, cause problems for profiles). Perhaps !29 on the output isn't !29 on the input - in which case (actually in any case) it'd be good to not hardcode !29, instead using a pattern match on the dbg attachment, and verifying the specific contents of that dbg attachment in another CHECK line. (I'm guessing it should show that this location will have line 0 to resolve the ambiguity)

Link: b/184085493
Link: https://bugs.llvm.org/show_bug.cgi?id=39531
Link: https://www.llvm.org/docs/HowToUpdateDebugInfo.html#when-to-merge-instruction-locations
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

We don't have any tag styling like this in LLVM - maybe include some sentences about these links? (& if you like you can write the bug number with "PR39531")

I agree that Signed-off-by: should be dropped, and Link: should be replaced with inline references. And concise message is a good property.
(I am not quite used to the verbose Linux kernel style.)

oh, and at least Google tries not to include links to internal bugs in upstream contributions (not an entirely consistent policy, Apple folks regularly link to their internal bug tracker & I'm sure there are some google bugs referenced upstream, though we try not to). And "Signed off by" seems inaccurate (& Phabricator will add "Reviewed By" I think anyway, which is usually the thing we might want to record)

Yeah, mostly a courtesy to other contributors, because links to internal things are completely not useful to others.
You can find such links more in other projects, e.g. chromium (https://chromium.googlesource.com/chromium/src/+/4fa3314af759654ed05c1d085210e92655cc7713)
but for llvm-project such links are rare. I don't object to including a link, though.

nickdesaulniers edited the summary of this revision. (Show Details)Apr 8 2021, 4:39 PM
nickdesaulniers added a reviewer: MaskRay.
nickdesaulniers removed a subscriber: MaskRay.
  • regen test with -debugify-level=locations, pattern match in test
nickdesaulniers marked an inline comment as done.Apr 8 2021, 4:50 PM
nickdesaulniers edited the summary of this revision. (Show Details)

The syntax the LLVM project uses to reference bugs is "PR<number>" (eg: PR39531) not "pr/<number>", if you could update the patch description to match that format. (this is also supported on llvm.org, where you can go to llvm.org/PR39531 to reach that bug, for instance)

So reading the bug, it sounds like the branch's location is used for the location of a sanitizer coverage function call ( __sanitizer_cov_trace_pc ) and that's where the verifier trips up? I guess it'd probably be worth fixing that code to do something to add a valid location to those function calls too - unless we want to broaden this constraint to apply to require that all branch instructions have debug locations too (which I'd be on the fence about - I was the one who implemented the "calls must have locations if the function they're in have a location" and its found a bunch of bugs, but it's been a non-trivial cost and I'm not sure it'd reasonably extend to all branches).

If we are going to require that all branches have locations (in functions that have debug info) then we probably want to add that to the verifier, so these aren't found late/only when using sanitizers, for instance.

If we aren't, then the sanitizer shouldn't rely on this guarantee/should be able to cope correctly when it a branch does not have a location.

llvm/test/Transforms/JumpThreading/branch-debug-info2.ll
23–24

If you like, you can include the ! in the match, eg: [[DBG:![0-9]+]] ... [[DBG]] = !DILocation... (no hard rule either way - lots of examples in both ways in LLVM's tests I'm sure, though I find it a bit neater to include the !, avoids needing to include it in both places or accidentally omit it in the second match & allow other prefixes (eg: !345 = ... could match [[THING]] = ... where THING = 45))

nickdesaulniers edited the summary of this revision. (Show Details)Apr 8 2021, 5:23 PM
  • move ! check into match expression
nickdesaulniers marked an inline comment as done.Apr 8 2021, 5:25 PM

I guess it'd probably be worth fixing that code to do something to add a valid location to those function calls too

Yeah, that sounds like the best long term solution to me; otherwise we'll be playing whack-a-mole with destructive transforms losing debug and results in confusing-for-developers error messages. When splitting a critical edge, what location do we use if neither contain debug info? If only one is missing debug info, I'd guess that Instruction::applyMergedLocation will pick the one existing one?

then the sanitizer shouldn't rely on this guarantee

It's not so much the sanitizer relying on this, but the inliner. See http://reviews.llvm.org/rG93035c8f47589590b65041603524f48a7c007e1f which added it. LTO just exposes it since normally we can't/don't inline callees from compiler-rt, IIUC.

then we probably want to add that to the verifier

And then have to fix all of the bugs it finds...? A worthy cause, but perhaps a severe undertaking (as you alluded to with "calls must have locations if the function they're in have a location").

I guess it'd probably be worth fixing that code to do something to add a valid location to those function calls too

Yeah, that sounds like the best long term solution to me; otherwise we'll be playing whack-a-mole with destructive transforms losing debug and results in confusing-for-developers error messages. When splitting a critical edge, what location do we use if neither contain debug info? If only one is missing debug info, I'd guess that Instruction::applyMergedLocation will pick the one existing one?

Created https://reviews.llvm.org/D100158 to followup and pursue that idea (though this issue addressed by this patch should still be fixed IMO). It looks like llvm::SplitCriticalEdge utilizes the debug location from the predecessors branch instruction; if it has none, oh well. So my previous question about multiple locations doesn't make much sense, as we only have 1 edge.

I guess it'd probably be worth fixing that code to do something to add a valid location to those function calls too

Yeah, that sounds like the best long term solution to me; otherwise we'll be playing whack-a-mole with destructive transforms losing debug and results in confusing-for-developers error messages. When splitting a critical edge, what location do we use if neither contain debug info? If only one is missing debug info, I'd guess that Instruction::applyMergedLocation will pick the one existing one?

Not necessarily - if the code is going into another basic block, we shouldn't preserve the existing location

then the sanitizer shouldn't rely on this guarantee

It's not so much the sanitizer relying on this, but the inliner. See http://reviews.llvm.org/rG93035c8f47589590b65041603524f48a7c007e1f which added it. LTO just exposes it since normally we can't/don't inline callees from compiler-rt, IIUC.

Sorry, to phrase this more clearly - yes, for the sake of the inliner we added an invariant/constraint that calls from functions that have a !dbg attachment where the call may be inlined must have a !dbg attachment that has a scope chain that leads to the same DISubprogram as the caller's !dbg attachment.

The sanitizer is relying on a further guarantee that any instruction it builds a call from must have this requirement too - that guarantee isn't... guaranteed (ie: it's not enforced by the verifier/hasn't been determined to be a requirement/invariant of the IR).

then we probably want to add that to the verifier

And then have to fix all of the bugs it finds...? A worthy cause, but perhaps a severe undertaking (as you alluded to with "calls must have locations if the function they're in have a location").

If they're all real bugs, I'm OK saying we should go that way and fix them as we go - that's what we did with the calls for the inliner & fixed a bunch of location quality bugs along the way. If there's a for-sure example where this new guarantee cannot be met (ie: a case that's definitely not a bug (because an instruction is synthesized from nothing or moved over between basic blocks or the like - whatever the things we do that let instructions not have locations) that isn't a bug, then it's clear we should allow/support it and stop the sanitizers from depending on this unguaranteed constraint.

But yeah, not really going to insist on pursuing it until the counterexample is provided - but it should be one or the other at least: Fix compiler-rt to not require the guarantee that all instructions it builds calls from have locations, or enforce that guarantee in the verifier.

Since https://reviews.llvm.org/D100158 is approved, are we still interested in this patch fixing what looks to me like a loss of debug info?

Since https://reviews.llvm.org/D100158 is approved, are we still interested in this patch fixing what looks to me like a loss of debug info?

I'd be up for getting this fixed still - perhaps a more motivating test case would be if the two instructions had the same location, then the merged instruction should have that same location- a clear goodness, compared to the fuzzier "why does it matter that we managed to provide a line zero location on this instruction".

  • test matching dbg merge, rather than mismatch dbg merge

heh, sure, done. PTAL

dblaikie accepted this revision.Apr 12 2021, 5:37 PM

Perfect - thanks a lot!

This revision is now accepted and ready to land.Apr 12 2021, 5:37 PM