This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo][LICM] Drop DebugLoc from IntrinsicInst when hoisting
ClosedPublic

Authored by jmmartinez on Sep 22 2022, 5:20 AM.

Details

Summary

The DebugLoc is conserved when hoisting function calls, to ensure the
DIScope is preserved if inlining occurs.

This commit drops the DebugLoc in the case the call is an intrinsic
call that won't be lowered into a function call.

Diff Detail

Event Timeline

jmmartinez created this revision.Sep 22 2022, 5:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 22 2022, 5:20 AM
jmmartinez requested review of this revision.Sep 22 2022, 5:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 22 2022, 5:20 AM

What's the motivation for the change - reduced debug info size by having fewer zero locations?

What's the motivation for the change - reduced debug info size by having fewer zero locations?

The idea is to reduce the amount zero locations since they can be confusing for the users.

I thought that dropping the debug location for these kind of intrinsics would consistent with what happens for the rest of the instructions. However, I'm fairly new to the debug-info so I'm not confident about the impact that these changes may have.

Offhand, handling non-call intrinsic functions the same way as hoisting any other instruction seems like the way to go.

The test case looks like it could be more focused, it doesn't look complicated enough to need to be generated from C source.

What's the motivation for the change - reduced debug info size by having fewer zero locations?

The idea is to reduce the amount zero locations since they can be confusing for the users.

At least GDB mostly ignores line zero, I think - so what sort of user confusion are you encountering/trying to address? Good to know what the use cases are, etc.

I thought that dropping the debug location for these kind of intrinsics would consistent with what happens for the rest of the instructions. However, I'm fairly new to the debug-info so I'm not confident about the impact that these changes may have.

Yeah, fair enough - works OK for me.

Offhand, handling non-call intrinsic functions the same way as hoisting any other instruction seems like the way to go.

Yeah, sounds OK

The test case looks like it could be more focused, it doesn't look complicated enough to need to be generated from C source.

Yep, if there's something simpler that'd be great.

rnk added a subscriber: rnk.Sep 26 2022, 2:33 PM

Offhand, handling non-call intrinsic functions the same way as hoisting any other instruction seems like the way to go.

+1,

The test case looks like it could be more focused, it doesn't look complicated enough to need to be generated from C source.

llvm/lib/IR/DebugInfo.cpp
834

We now have three instances of setDebugLoc(DebugLoc()) in this function. I think the "early return" guidance is leading us to code duplication.

Can you please refactor this so that we calculate the conditions under which we want to apply a line zero location, and have that be the early return case? It is, if this is an intrinsic which may lower to a call, or a non-intrinsic function call.

  • Updated dropLocation() to avoid redudant setDebugLoc(DebugLoc()); return;
  • Simplify the licm-hoist-intrinsic-debug-loc.ll test

What's the motivation for the change - reduced debug info size by having fewer zero locations?

The idea is to reduce the amount zero locations since they can be confusing for the users.

At least GDB mostly ignores line zero, I think - so what sort of user confusion are you encountering/trying to address? Good to know what the use cases are, etc.

The issue was raised, in two separate occasions, from people developing static-analysis tools that maps the assembly back to the source code (for example, to map register spills back to source-code).

llvm/lib/IR/DebugInfo.cpp
834

Updated to avoid code duplication. Thanks!

dblaikie accepted this revision.Sep 27 2022, 10:23 AM

What's the motivation for the change - reduced debug info size by having fewer zero locations?

The idea is to reduce the amount zero locations since they can be confusing for the users.

At least GDB mostly ignores line zero, I think - so what sort of user confusion are you encountering/trying to address? Good to know what the use cases are, etc.

The issue was raised, in two separate occasions, from people developing static-analysis tools that maps the assembly back to the source code (for example, to map register spills back to source-code).

Setting no location doesn't really make the instructions reliable though - they'll be arbitrary, based on whatever instructions happen to come before them. While it's useful to ignore location 0 to provide that flow-on location behavior for an interactive debugger (rather than stepping back and forth to "I don't know where I am" to "I'm on line 5", etc) so maybe this is suggesting that those users would like data that isn't available & still won't be reliably available with this change & might cause such tools to go from "we don't have an answer" to "now we sometimes have the wrong answer and we don't know which cases are reliable and which aren't" - those tools could implement "pretend line 0 has a flow-on location" and be able to show that to the user as "this is a best guess/might be wrong" and other places that don't have line zero might be more reliable. (I mean, not a lot more reliable, we're probably using no-location in a bunch of other places and the line table's probably too expensive to encode every no-location as line 0)

But, yeah, this is consistent with the existing code here and elsewhere that uses no location, I guess. Just some concerns about what it means that someone's finding this to be a problem ^.

This revision is now accepted and ready to land.Sep 27 2022, 10:23 AM

Well, yes, there are different scenarios for consumers of the line table. Profilers really should care most about "why are we in this block" rather than "where exactly did this instruction come from" while someone using addr2line to try to track down a trapping instruction would really want the most accurate provenance possible. Line info can't realistically meet all needs equally. Until we get to a line table that can comfortably address all the needs, we have to make choices.
In this particular case, adding non-call-intrinsics to the set of instructions that are already handled a particular way seems like the most consistent behavior we can have, and consistency seems like a positive thing.

jmmartinez marked an inline comment as done.Sep 29 2022, 1:30 AM
This revision was landed with ongoing or failed builds.Sep 30 2022, 2:25 AM
This revision was automatically updated to reflect the committed changes.