Page MenuHomePhabricator

[DebugInfo] Add new instruction and expression operator for variadic debug values
Needs ReviewPublic

Authored by StephenTozer on Jun 23 2020, 3:22 AM.

Details

Summary

Continuing the work discussed in the RFC[0], this is the next patch for implementing variadic debug values. This patch adds a new instruction that can represent variadic debug values, DBG_VALUE_VAR. This patch alone covers the addition of the instruction and a set of basic code changes in MachineInstr and a few adjacent areas, but does not correctly handle variadic debug values outside of these areas, nor does it generate them at any point; these will be introduced in later patches.

The new instruction is similar to the existing DBG_VALUE instruction, with the following differences:

  • The operands are in a different order; DBG_VALUE’s operands are Value, Offset, Variable, Expression. DBG_VALUE_VAR’s operands are Variable, Expression[, Values]*
  • Any number of values may be used in the instruction following the Variable and Expression operands. These are referred to in code as “debug operands”, and are indexed from 0 so that getDebugOperand(X) == getOperand(X+2).
  • The Expression in a DBG_VALUE_VAR must use the DW_OP_LLVM_arg operator, also added in this patch, to pass arguments into the expression: none of the values in the instruction are pushed onto the expression stack by default.

The new DW_OP_LLVM_arg operator is only valid in expressions appearing in a DBG_VALUE_VAR; it takes a single argument and pushes the debug operand at the index given by the argument onto the Expression stack. For example, in the instruction below:

DBG_VALUE_VAR !”a”, ! DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_minus), %0, %1

DW_OP_LLVM_arg, 0 has the meaning “Push the 0th debug operand (%0) onto the stack.” The whole expression thus evaluates to %0 - %1.
There are two main motivations for introducing this as a separate instruction rather than simply changing the old instruction:

  • Smaller change to MIR: by creating a new instruction we avoid invalidating old MIR, and will not cause produced MIR to be different in any cases apart from where the new instruction is needed. This simplifies the process of getting this change integrated, without preventing us from eventually removing the old instruction if desired.
  • Conciseness: DBG_VALUE_VAR instructions are more verbose than the existing instruction, which neatly covers simple direct and indirect values. By using DBG_VALUE_VAR for only the complex cases where it is needed, we avoid making debug information any more incomprehensible. This also simplifies (in my opinion) the question of whether or not to automatically push debug operands onto the expression stack (as was discussed in the original RFC).

[0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139376.html

Diff Detail

Event Timeline

StephenTozer created this revision.Jun 23 2020, 3:22 AM

@StephenTozer Thanks for this!

Is there an overlap of the DW_OP_LLVM_arg with the D70642?

DBG_VAL_VAR stands for variadic, any alternative for the name? VAR may indicate to 'variable' I think...

Another question/concern (may be too early): the Salvage Debug Info feature when supporting multiple SSA values may produce an exploded/very huge number of dbg_values I think, so we may want to introduce a magic value to cut off the input if it is too big. Have you encountered such situation?

llvm/test/CodeGen/MIR/Generic/dbg-value-var-spill.mir
55 ↗(On Diff #272659)

unused

llvm/test/CodeGen/MIR/Generic/dbg-value-var.mir
23 ↗(On Diff #272659)

unused

53 ↗(On Diff #272659)

Most of these MF attributes are unused, so we can get rid of them.

djtodoro added a subscriber: alok.Jun 23 2020, 5:33 AM

Is there an overlap of the DW_OP_LLVM_arg with the D70642?

Yes, the names here are conflicting; mainly because I didn't see a good alternative name for these ones, and none of the implicit pointer patches appear to have received any comments or updates since Jan 10th. They have similar purposes, but with the exception that this version is used to index values, while the other can also be used to index variables; given that, a reasonable conflict resolution might be to have one specifically for variables and another for debug operands.

DBG_VAL_VAR stands for variadic, any alternative for the name? VAR may indicate to 'variable' I think...

DBG_VALUE_VARIADIC seems reasonable though also quite long, but it's better than being misleading.

Another question/concern (may be too early): the Salvage Debug Info feature when supporting multiple SSA values may produce an exploded/very huge number of dbg_values I think, so we may want to introduce a magic value to cut off the input if it is too big. Have you encountered such situation?

I don't have a sample to prove that it will or will not, the full implementation is still in progress, but basic measurements of the number of debug values lost to salvaging suggests that it's not going to be a truly staggering number; even moreso when we consider that debug values relying on multiple SSA values or machine locations are more likely to have one of those values become unavailable at some point due to optimizations, losing the debug value.

DBG_VAL_VAR stands for variadic, any alternative for the name? VAR may indicate to 'variable' I think...

DBG_VALUE_VARIADIC seems reasonable though also quite long, but it's better than being misleading.

DBG_VALUE_LIST? And the predicates could be isDebugValueList and isSimpleDebugValue or something along those lines.

llvm/lib/CodeGen/MachineInstr.cpp
61

Is this a note to yourself that should be removed?

llvm/lib/CodeGen/PrologEpilogInserter.cpp
12

Is this a note to yourself that should be removed?

StephenTozer marked an inline comment as done.Jun 23 2020, 9:19 AM
StephenTozer added inline comments.
llvm/lib/CodeGen/MachineInstr.cpp
61

Yes, along with the one below.

DBG_VALUE_LIST? And the predicates could be isDebugValueList and isSimpleDebugValue or something along those lines.

"List" sounds good to me, although for the predicates something other than Simple would be good, since it's prone to overloading; perhaps just "NonList", or "Singular" or something along those lines.

Is there an overlap of the DW_OP_LLVM_arg with the D70642?

Yes, the names here are conflicting; mainly because I didn't see a good alternative name for these ones, and none of the implicit pointer patches appear to have received any comments or updates since Jan 10th. They have similar purposes, but with the exception that this version is used to index values, while the other can also be used to index variables; given that, a reasonable conflict resolution might be to have one specifically for variables and another for debug operands.

I see...
Hi @alok, do you have any input on this?

DBG_VAL_VAR stands for variadic, any alternative for the name? VAR may indicate to 'variable' I think...

DBG_VALUE_VARIADIC seems reasonable though also quite long, but it's better than being misleading.

Another question/concern (may be too early): the Salvage Debug Info feature when supporting multiple SSA values may produce an exploded/very huge number of dbg_values I think, so we may want to introduce a magic value to cut off the input if it is too big. Have you encountered such situation?

As @probinson suggested, DBG_VALUE_LIST sounds good to me.

I don't have a sample to prove that it will or will not, the full implementation is still in progress, but basic measurements of the number of debug values lost to salvaging suggests that it's not going to be a truly staggering number; even moreso when we consider that debug values relying on multiple SSA values or machine locations are more likely to have one of those values become unavailable at some point due to optimizations, losing the debug value.

I think it will trigger on any "larger" project build, e.g. llvm-project itself, even if we add support for instructions with two SSA values, but we can cut off the user-values if it reach certain number. But, as you said, it may be too early for this.

Is there an overlap of the DW_OP_LLVM_arg with the D70642?

I think the preferred way of resolving this would be to post to llvm-dev and make sure the authors of D70642 are CC'ed. As far as I can see D70642 is less space-efficient, but allows for referring to the same argument multiple times. Given the DW_OP_dup/over operators, I'm not sure if that is necessary.

Other than that, (apologies for the bike-sheddingI find the name confusing, because in the context of debug info, I'd associate "var" with "variable" and not with "variadic". DBG_VALUES, DBG_VALUE_MULTI? Or we could rename DBG_VALUE to DBG_VALUE_LEGACY and use DBG_VALUE for the new format, if we plan for it to completely replace it.

I think the preferred way of resolving this would be to post to llvm-dev and make sure the authors of D70642 are CC'ed. As far as I can see D70642 is less space-efficient, but allows for referring to the same argument multiple times. Given the DW_OP_dup/over operators, I'm not sure if that is necessary.

Drafting something up now. Also I'm not sure the DW_OP_pick operator and its various specializations are useful here; they can only be used to copy an element on the expression stack, but there is no guarantee that the required argument will actually be on the expression stack when DW_OP_pick is evaluated; if an argument is pushed on the stack and then immediately modified, the original argument can't be picked out later.

Addressing the review comments: remove the note-to-self comments, and rename DBG_VALUE_VAR -> DBG_VALUE_LIST. This could change again if people think something else would be more appropriate, but this looks to have general approval.

Also fixed the test instances of DBG_VALUE_VAR to have DW_OP_stack_value, as all actual produced instructions should be stack values.

Also /s/var/list/ in the test filenames.

Apologies for the update spam, this is the last one: remove unused lines from test.

Update: Duplicate registers in the debug operands could be an issue with the previous patch version, and duplicates aren't easy to prevent, so always treat DBG_VALUE_LISTs as potentially having them.

djtodoro added inline comments.Jul 9 2020, 6:13 AM
llvm/include/llvm/CodeGen/MachineInstr.h
483–484

\p Reg ?

493

Can we use templates to avoid duplicated code here for getDebugOperandsForReg()?

1166

Should the name of the method follow the new name of the meta instr ?

llvm/lib/CodeGen/MachineInstr.cpp
2235–2236

No need for the extra curly brackets

llvm/test/CodeGen/MIR/Generic/dbg-value-list.mir
2

Please add a top level comment describing the purpose of the test.

StephenTozer marked an inline comment as done.Jul 9 2020, 8:25 AM
StephenTozer added inline comments.
llvm/include/llvm/CodeGen/MachineInstr.h
493

We can, as long as we use a static function to hold the common code (if there's a way to do so without a static function then I'd be happy to go with that instead); the solution looks something like this:

  template <typename Operand, typename Instruction>
  static ArrayRef<Operand *> getDebugOperandsForReg(Instruction *MI, Register Reg) {
    assert(MI->isDebugValue() && "Tried to get debug operands for non-debug_value");
    SmallVector<Operand *, 2> Ops;
    for (Operand &Op : MI->debug_operands()) {
      if (Op.isReg() && Op.getReg() == Reg)
        Ops.push_back(&Op);
    }
    return Ops;
  }
  ArrayRef<const MachineOperand *> getDebugOperandsForReg(Register Reg) const {
	  return MachineInstr::getDebugOperandsForReg<const MachineOperand, const MachineInstr>(this, Reg);
  }
  ArrayRef<MachineOperand *> getDebugOperandsForReg(Register Reg) {
	  return MachineInstr::getDebugOperandsForReg<MachineOperand, MachineInstr>(this, Reg);
  }

Does this look good? It removes the duplication, it's just a bit more verbose and leaves an otherwise useless static function hanging around, unless it's moved to a private block (which is also fine but reduces readability by moving it far away from the public functions).

Address latest review comments.

djtodoro added inline comments.Jul 10 2020, 5:08 AM
llvm/include/llvm/CodeGen/MachineInstr.h
486

const does not have any impact here, since it is a return type, so it should be removed.

493

This looks good to me, thanks.

llvm/lib/Target/NVPTX/NVPTXPrologEpilogPass.cpp
79

Should be isNonListDebugValue() ?

Remove useless const, add rename that somehow escaped last diff.

Rename appendOpsToLoc->appendOpsToArg to fit the rest of the names, update isUndefDebugValue check to check all operands (pulled ahead from a patch further up the stack).

Conciseness: DBG_VALUE_VAR instructions are more verbose than the existing instruction, which neatly covers simple direct and indirect values. By using DBG_VALUE_VAR for only the complex cases where it is needed, we avoid making debug information any more incomprehensible. This also simplifies (in my opinion) the question of whether or not to automatically push debug operands onto the expression stack (as was discussed in the original RFC).

My take on this is: Given how limited the audience that reads MIR is, I think it may be better to always canonicalize to the variadic version, since it keeps the code simpler, as it only has to deal with one kind of debug intrinsic.