This is an archive of the discontinued LLVM Phabricator instance.

Rewrite calculateDbgValueHistory to make it (hopefully) more transparent.
ClosedPublic

Authored by samsonov on May 2 2014, 3:29 PM.

Details

Reviewers
dblaikie
Summary

This change preserves the original algorithm of generating history
for user variables, but makes it more clear.

High-level description of algorithm:
Scan all the machine basic blocks and machine instructions in the order
they are emitted to the object file. Do the following:

  1. If we see a DBG_VALUE instruction, add it to the history of the

corresponding user variable. Keep track of all user variables, whose
locations are described by a register.

  1. If we see a regular instruction, look at all the registers it clobbers,

and terminate the location range for all variables described by these registers.

  1. At the end of the basic block, terminate location ranges for all

user variables described by some register.

(3) is too restrictive, and may result in a poor debug info.
For example, the variable location can be described by %rsp, or any other
register which is never clobbered in a function. I think that handling
this special case may improve situation for many real-life programs and
plan to address this in future commits.

Diff Detail

Event Timeline

samsonov updated this revision to Diff 9046.May 2 2014, 3:29 PM
samsonov retitled this revision from to Rewrite calculateDbgValueHistory to make it (hopefully) more transparent..
samsonov updated this object.
samsonov edited the test plan for this revision. (Show Details)
samsonov added a reviewer: dblaikie.
samsonov added subscribers: echristo, Unknown Object (MLST).
dblaikie edited edge metadata.May 2 2014, 3:32 PM

Given that this is a substantial rewrite/change to the algorithm, would it be worthwhile to make it even more readable to begin with. What I have in mind (but perhaps it doesn't make sense - this is just an off-the-cuff idea) is trying to break down the new algorithm into some smaller functions with meaningful names that would make it easier to read through the code.

(since I'm going to have to read through the new code and understand it from scratch, since it doesn't map simply to the old algorithm, even if they produce the same result)

samsonov updated this revision to Diff 9052.May 2 2014, 4:51 PM
samsonov edited edge metadata.

Split the algorithm into several functions to make it more readable.

Uploaded a new version. Hope it would be more understandable.

-----Original Message-----
From: llvm-commits-bounces@cs.uiuc.edu [mailto:llvm-commits-
bounces@cs.uiuc.edu] On Behalf Of Alexey Samsonov

Hi dblaikie,

This change preserves the original algorithm of generating history
for user variables, but makes it more clear.

High-level description of algorithm:
Scan all the machine basic blocks and machine instructions in the order
they are emitted to the object file. Do the following:

  1. If we see a DBG_VALUE instruction, add it to the history of the

corresponding user variable. Keep track of all user variables, whose
locations are described by a register.

  1. If we see a regular instruction, look at all the registers it

clobbers,
and terminate the location range for all variables described by these
registers.

  1. At the end of the basic block, terminate location ranges for all

user variables described by some register.

(3) is too restrictive, and may result in a poor debug info.
For example, the variable location can be described by %rsp, or any
other
register which is never clobbered in a function. I think that handling
this special case may improve situation for many real-life programs and
plan to address this in future commits.

Taking no stand on the patch, just on that last statement...

(3) is correctly conservative, because variable-in-register liveness
depends on control flow but debug-value ranges are expressed in terms
of physical order. Getting this truly right requires a dataflow
analysis to track liveness across block boundaries.

There may be special cases of small multi-block functions where some
variable lives in a register that is never clobbered, but I question
how often that will come up in real programs.

If a variable has a "home" on the stack, that's a different story.
Naively you can fall back to the stack location after a register
copy is clobbered and probably 90%-95% of the time you're fine.
--paulr

http://reviews.llvm.org/D3597

Files:

lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
dblaikie added inline comments.May 2 2014, 5:40 PM
lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
136

Not sure about making a local class here - I'd probably just pass the map around to a few functions for now - but I haven't read this all yet, perhaps it makes more sense the way you've got it here.

138

unnecessary default ctor - you can just omit this

142

Yeah, sort of feel like the RegDescribedVars could be passed through and this function (calculate) could be inlined into calculateDbgValueHistory. Then it wouldn't be weird about whether this DbgValueHistoryCalculator can be reused or not (it's not reused in its one use case, but it clears RegDescribedVars so it could be reused..)

144

This long condition could probably be inverted to a series of early returns with comments describing what each case represents.

As it stands it's a pretty opaque query of various operands and their types.

147

A const reference to an iterator is a bit weird. I'd probably use it by value (or is an iterator also an instruction, in which case I'd probably write it as "auto *" or "Instructior *")

148

This function could return Optional<unsigned> rather than bool + unsigned out parameter.

153

This 'if' is pretty long & the various things it's doing aren't exactly obvious - any way to split them out into some more named functions that would make it easier to read the high level algorithm? Maybe it's just some higher level comments, I'm not sure.

I'm not sure it would be better, but this could be inverted for less nesting:

if (!MI.isDebugValue()) {
  handleClobberedRegs(MI, TRI, Result);
  continue;
}

...

162

Is this a new feature of your algorithm, or did the old algorithm do this too?

172

The current LLVM convention is to not repeat the function name in the documentation comment (also doxy comments should be / not ). either omit it entirely, or if use \brief if the comment is brief (I forget when/why \brief is used/not used, though)

175

clobberVarsUsingRegister? clobberRegisterUses?

"RegDescribed" is a bit sub-optimal in general, I think, perhaps.

Maybe "clobberHistoryForInRegVars"?

196

I take it "RegDescribed" means "variables in registers" or just "variables not on the stack"? (what about variables that are indirected through a register?)

210

Sidenote: I wonder if MCRegAliasIterator could be a real iterator one day & this could use range-based for.

samsonov updated this revision to Diff 9054.May 2 2014, 7:04 PM

Address David's comments.

lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
136

Got rid of the class as suggested.

138

Done.

142

Done.

144

Done.

147

Dropped the reference.

148

Done.

153

Used last suggestion to reduce the nesting. Moved recalculating contents of RegDescribedVars to a separate function.

162

This data structure is different in my implementation. Current code just uses std::vector of size TRI->getNumRegs() instead of map. Yes, it assumes that there's at most one variable for each register.

172

Done (added \brief to comments).

175

Renamed to clobberRegisterUses/clobberAllRegisterUses.

196

It means variables in a register (DBG_VALUE %reg %reg), or indirected through register (DBG_VALUE %reg %imm). I can't think of a better name for it.

210

Ack.

dblaikie added inline comments.May 5 2014, 9:58 AM
lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
38

Should this just be an assertion?

40

When does this happen? (when is the first parameter not a register? or zero? (and what does zero mean - your comment above doesn't seem to describe this case, does it?))

Another assertion instead?

43

This condition's still a bit hard to read (not sure if there's a better way, though) - but I also don't understand it. In both plain register and indirect uses, we'd still want to clobber the variable if the register in operand 0 was used, right? So why do we need to check whether this is indirect or direct?

77

variables

101

'handle' is fairly vague - maybe "terminateLiveRanges"? (maybe it's best to avoid "live ranges" terminology so as not to confuse it with real register allocation live ranges? Is there an alternative/better way we could refer to the ranges of a debug location? DWARF just refers to them as "location list entries" but the things we're building here aren't quite a DWARF location list entry yet... though they will be)

(I'd ever consider naming clobberRegisterUses/handlerClobberedRegs the same, just overloaded - just that one takes a specific register and the other takes a whole instruction & extracts the registers (ie: the functionality the have now, but they're really doing the same thing, kind of))

124

You can use *PrevReg to get the value out of the Optional, if you like.

127

and here

142

This variable and its use seem confusing to me.

So the following loop loops over all MI in the MBB, does work if it's a debug_value, unless it's a trailing debug_value (a debug_value after all non-debug_value instructions in the block).

Do we have a lot of those? Is that worth a special case? I mean we should certainly prune zero-length location ranges, but they could arise from non-trailing debug_value anyway, right?

161

leaving a loop way down here seems a bit subtle - the idea here is that if we're in the last block of the function and there are no trailing debug instructions in the MBB?

I don't understand that second part of the condition, I suppose. I think I get the first part - if we're at the last block, let's not terminate anything, it'll be terminated by the end of the function anyway. (& I guess that produces correct debug info later on, even though we didn't properly terminate the range?)

samsonov updated this revision to Diff 9087.May 5 2014, 2:16 PM

Address more comments.

lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
38

Done.

40

Yes, there can be instructions like:

DBG_VALUE %noreg, 0, !"var"

which mark that debug location of "var" is now unknown. I've added a comment.

43

Yeah, let's just get rid of this block :)

77

Done.

101

Sounds good, renamed both to clobberRegisterUses.

124

Done.

127

Done.

142

Yes, this happens. Suppose we have the following machine basic block:

<begin MBB>
%R15 = ...
DBG_VALUE %R15, %noreg, !"var"
<end MBB>

Here we don't need to add zero-length location range for "var", and it's easier to just drop the range at this point.

In the history produced by this algorithm, location range is started by DBG_VALUE instruction and is terminated by some following instruction. If we start the new range by adding

DBG_VALUE %R15, %noreg, !"var"

to the history, how do we terminate it? Nothing follows it in the same machine basic block.

161

Yes. We're not required to terminate DBG_VALUE entries in the last MBB - we just assume the location is valid until the end of the function. There is a code which deals with this in DwarfDebug::collectVariableInfo().

The second condition in the if-statement just checks if there is an instruction we can use to terminate location ranges. If there is no such instruction - no problem we shouldn't have started any new location ranges anyway, see "if (SeenLastMBBInstr && isDescribedByReg(MI))" above.

dblaikie added inline comments.May 7 2014, 10:57 AM
lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
31

Is this ever called with a non_DBG_VALUE? It doesn't look like it. Should this just be an assertion?

36

Is the first operand ever not a register? I thought that was an invariant of these intrinsics.

So it looks like isDescribedByReg boils down to "return MI.getOperand(0).getReg()", and some assertions?

(assuming the return type changes to "unsigned" rather than "Optional<unsigned>" - correctly returning 0/false when there is no register use (ie: this is a range terminator))

142

"easier" in terms of computational work, but not necessarily easier in terms of being able to understand this algorithm, right?

I'd like to make this as legible as possible, because it is rather confusing. The comment you opened this patch with described a fairly clear algorithm, but that's not visible to me in the code, which is really unfortunate.

That being said, I know this is just a precursor to other refactoring, so I shouldn't dwell on making this "perfect".

The issue you describe here where there's no way to terminate an instruction would come up with intermediate values too:

DBG_VALUE %R15, %noreg, !"var"
DBG_VALUE %noreg, 0, !"var"
...

While I realize that's probably not a thing we ever create, it seems like having the algorithm tolerate that general case would probably make it more legible. (I don't know for sure, I haven't written it out that way)

Plucking ideas out of the air, maybe having a "pending start of range" list that is built whenever a DBG_VALUE is seen, but isn't added to the real list until a non-DBG_VALUE instruction is seen (that instruction would then form the start of the range).

That way the above example falls out by replacing (Or removing, in the case of the end-of-range %noreg) elements in the pending list, then throwing out the whole pending list when we get to the end, since anything in there never saw an instruction to start the range at. (and if we do see an instruction we build the start-of-real-range for each element in the pending list and clear out the pending list)

Alternatively if ranges can cover DBG_VALUE intrinsicts (since they start at the intrinsic, rather than the next instruction) - what actually breaks if they /only/ cover DBG_VALUE intrinsics? (can we just build these ranges, even if they're silly - to simplify the algorithm - or is there something that breaks if we do?)

samsonov updated this revision to Diff 9195.May 7 2014, 4:36 PM

Address more comments.

lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
31

It's used once for non-DBG_VALUE instruction, but I agree that it's easier to make an assertion here.

36

No. See getDebugLocValue() in DwarfDebug.cpp which actually distinguishes between different kinds of MachineOperands used as a first argument of DBG_VALUE. I've updated the comment in this function.

142

OK, I'm not opposed to generating lots of empty location ranges to simplify the algorithm. The example you mention:

DBG_VALUE %R15, %noreg, !"var"
DBG_VALUE %noreg, 0, !"var"

should already be handled fine - both of these instruction appear in History for "var", and we will construct the range which starts before first instruction, and ends before the second one (it will of course be empty, as instructions are transient).

That said, for simplicity we may simply *always* terminate location ranges starting with register-using DBG_VALUE instruction by appending the last instruction in the MBB. Why not? This would require a minor change in contract, though: now History vector for Variable may contain arbitrary instructions - if the instruction is DBG_VALUE *for the same variable*, we assume it's a start of a new location range. Otherwise it's a clobbering instruction.

See the updated code.

(side note: probably it makes sense to factor out the code which turns History vector into actual DebugLocEntry list, as it is somewhat connected to the way we construct/represent the History vector).

(gentle ping)

David, could you take a look in case you have more comments?

dblaikie added inline comments.May 16 2014, 2:44 PM
lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
36

OK. I see, it's either a register or a constant.

I'd still probably switch to just using an unsigned return value, dropping the Optional. Since zero is "not in a register" anyway.

if (!MI.getOperand(0).isReg())
  return 0;
return MI.getOperand(0).getReg();

I know it's a bit more subtle, using 0 as the "not in a register" state, but having two null states (Optional-with-value-zero and Optional-without-value) isn't ideal either...

160

This is the normal for-loop exit condition, right?

Rather than breaking early (I found this hard to parse because I wasn't sure if it was exiting the loop early under some interesting condition - having to think about the addresses of elements, etc, was a bit taxing) could we just roll the condition into the clobberAllRegistersUses instead?

I suppose the condition will still be hard to pass, but I won't be trying to think about non-linear control flow at the same time.

Maybe the comment could be clarified here. "<existing comment> unless it's the last basic block, in which case let their liveness run off to the end of the function" or something?

if (!MBB.empty() && &MBB != &MF->back())
lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1229

What's the reason for these extra conditions? (the getDebugVariable() == DV checks here and below)

samsonov updated this revision to Diff 9500.May 16 2014, 3:37 PM

Address more comments.

lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
36

Done.

160

Agree. Done.

lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1229

After this change, the location range for the variable DV:

  1. starts at a DBG_VALUE instruction for DV
  2. ends at another instruction. In particular, it can be DBG_VALUE instruction for some other variable.
dblaikie accepted this revision.May 16 2014, 4:02 PM
dblaikie edited edge metadata.

Looks good - thanks for your patience!

This revision is now accepted and ready to land.May 16 2014, 4:02 PM
dblaikie added inline comments.May 16 2014, 4:09 PM
lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1229

Not to hold up the submission of this patch, but I can't quite grasp your explanation here.

Are you suggesting that this change (this particular part of your patch, or the patch in general) is changing behavior? It'd be helpful to keep the behavior change separate, if possible.

Thanks for the review! Submitted as r209225.

lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1229

This shouldn't be a behavior change, we're slightly changing the assumptions about which instructions can be used for tracking the location of a variable (but this would result in the same labels and .debug_loc contents). Anyway, watching the bots.

samsonov closed this revision.May 20 2014, 11:45 AM