This is an archive of the discontinued LLVM Phabricator instance.

[RFC] Rewrite the way we generate debug locations for variables.
AbandonedPublic

Authored by samsonov on Jan 30 2014, 8:20 AM.

Details

Reviewers
dblaikie
Summary

Completely change the way we turn DBG_VALUE machine instructions
into location information for local variables. This is inspired by discussion
in r128327 review thread.

We used to terminate location range for each variable at the end of the basic
block with DBG_VALUE for it. This is incorrect, e.g. the register used to hold
the variable location may be never clobbered in the entire function, so one
DBG_VALUE is enough to define the location of a variable in the whole function.

Now we actually analyze the control flow: calculate the presumed location of
a variable at the end and the beginning of each basic block. For example,
if the location of a variable is the same at the end of all predecessors of
basic block B, we may assume that we know the location of a variable in B,
and avoid terminating the location range too early.

I think it's pretty much all we can do w/o actually changing the CodeGen for
DBG_VALUE instructions.

If you agree with the direction of this patch, I should probably test it
extensively. There are many concerns:
0) Correctness. LLVM/Clang regression tests don't really check anything -

we should run it on gdb test-suite or smth. comprehensive.
  1. Debug info size. We should verify that .debug_loc section wouldn't explode.
  2. Compilation time - in the worst case we smth. like O(NumVars * NumInstructions), which can be a lot for large functions.

Diff Detail

Event Timeline

samsonov updated this revision to Unknown Object (????).Jan 30 2014, 8:21 AM

clang-formatize the new code

Quick gut-reaction: that class is big enough that it deserves its own file.
(I realize you're just looking for a sanity check of the approach - and
I'll try to express some opinion on that too, though I don't think I'm
fully qualified to be the arbiter for this increased (though probably
justified) complexity - I don't have nearly enough knowledge about the
backend, location information, and other ways to approach this problem -
I'll try to ask around)

  • David

I also /really/ wish we could get this info from register allocation or
something rather than recomputing it. CC'ing lhames here in case he's got
ideas. Will pester him via other channels as well.

I don't think we're doing the work of register allocator here, though this may look similar :) Instead of tracking the "liveness" of registers, we're calculating the "liveness" of DBG_VALUE instructions, which are tied to user variables, and only occasionally - to registers.

Hi Alexey,

thanks for tackling this problem!

as for quick gut reactions:
Having some doxygen-style comments for each of the member functions of DbgValueHistoryCalculator would make reviewing this patch a lot easier.

As for testing, in addition to the gdb test suite, there is also the lldb test suite, and if you’re looking for a place to put in more end-to-end debug-info tests, there is the llvm debuginfo-tests repository which runs on both lldb and gdb buildbots. (There are only a handful of tests in there right now, but it doesn’t have to stay that way.) On the llvm side this will need at least a test exercising llc | llvm-dwarfdump.

This is a really naive question, but it will help me understand your decisions better: would it be possible to change codegen to insert a DBG_VALUE into every MC block that post-dominates the basic block that contained the original dbg.declare/value intrinsic?

cheers,
adrian

Hi Adrian,

Sorry for the lack of doxygen-comments for functions, I'll add them with the next version of the patch.

This is a really naive question, but it will help me understand your decisions better: would it be possible to change codegen to insert a DBG_VALUE into every MC block that post-dominates the basic block that contained the original dbg.declare/value intrinsic?

This might make things a bit easier, but:
a) this would bloat the MachineFunction, and IIRC David mentioned that some people complain about the number of DBG_VALUE calls already.
b) This might be confusing. E.g.

BB1:
  DBG_VALUE: var "x" is in %rax
  clobber %rax
  jmp BB2
BB2:
  // we probably shouldn't insert DBG_VALUE for "x" here, it's inaccessible.

or

BB1:
  DBG_VALUE: var "x" is in %rax
  jmp BB3
BB2:
  DBG_VALUE: var "x" is in %rbx
  jmp BB3
BB3:
  // what DBG_VALUE for "x" should we insert here?

We probably can add a pass to CodeGen that would perform similar (to this patch) kind of analysis and add additional DBG_VALUE calls, making debug info generation more straightforward. Here, instead, I try to not make any assumptions about the CodeGen output and do the best I can, w/o modifying the code.

I really really really wish I had time to review this now.
For the moment some kibitzing based on prior experience.

-----Original Message-----
From: llvm-commits-bounces@cs.uiuc.edu [mailto:llvm-commits-
bounces@cs.uiuc.edu] On Behalf Of Alexey Samsonov
Sent: Thursday, January 30, 2014 8:20 AM
To: dblaikie@gmail.com; samsonov@google.com
Cc: llvm-commits@cs.uiuc.edu
Subject: [PATCH] [RFC] Rewrite the way we generate debug locations for
variables.

Hi dblaikie,

Completely change the way we turn DBG_VALUE machine instructions
into location information for local variables. This is inspired by
discussion
in r128327 review thread.

We used to terminate location range for each variable at the end of the
basic
block with DBG_VALUE for it. This is incorrect, e.g. the register used
to hold
the variable location may be never clobbered in the entire function, so
one
DBG_VALUE is enough to define the location of a variable in the whole
function.

Terminating at that point is the safe thing to do. Working out
when it's safe not to terminate is the tricky part, yeah.

Now we actually analyze the control flow: calculate the presumed
location of
a variable at the end and the beginning of each basic block. For
example,
if the location of a variable is the same at the end of all predecessors
of
basic block B, we may assume that we know the location of a variable in
B,

Sure.

and avoid terminating the location range too early.

Whoa. Looking at predecessors can tell you where something is on
block entry, but that's not the same as being able to not terminate
a range. Predecessors are a control-flow thing, ranges are a
physical-layout thing. Maybe you account for this in your patch,
like I said I haven't looked at it yet, but it's a trap for the
unwary.

--paulr

I think it's pretty much all we can do w/o actually changing the CodeGen
for
DBG_VALUE instructions.

If you agree with the direction of this patch, I should probably test it
extensively. There are many concerns:
0) Correctness. LLVM/Clang regression tests don't really check anything

  • we should run it on gdb test-suite or smth. comprehensive.
  • Debug info size. We should verify that .debug_loc section wouldn't

explode.

  1. Compilation time - in the worst case we smth. like O(NumVars * NumInstructions), which can be a lot for large functions.

http://llvm-reviews.chandlerc.com/D2658

Files:

lib/CodeGen/AsmPrinter/DwarfDebug.cpp

-----Original Message-----
From: llvm-commits-bounces@cs.uiuc.edu [mailto:llvm-commits-
bounces@cs.uiuc.edu] On Behalf Of Adrian Prantl
Sent: Thursday, January 30, 2014 12:41 PM
To: reviews+D2658+public+165c5c32e55e8f4a@llvm-reviews.chandlerc.com
Cc: llvm-commits@cs.uiuc.edu
Subject: Re: [PATCH] [RFC] Rewrite the way we generate debug locations
for variables.

samsonov abandoned this revision.Jun 9 2015, 10:44 AM