This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Add new instruction and expression operator for variadic debug values
ClosedPublic

Authored by StephenTozer on Jun 23 2020, 3:22 AM.

Details

Summary

Continuing the work discussed in the RFC[0], this is the next patch for implementing variadic debug values. This patch adds a new instruction that can represent variadic debug values, DBG_VALUE_VAR. This patch alone covers the addition of the instruction and a set of basic code changes in MachineInstr and a few adjacent areas, but does not correctly handle variadic debug values outside of these areas, nor does it generate them at any point; these will be introduced in later patches.

The new instruction is similar to the existing DBG_VALUE instruction, with the following differences:

  • The operands are in a different order; DBG_VALUE’s operands are Value, Offset, Variable, Expression. DBG_VALUE_VAR’s operands are Variable, Expression[, Values]*
  • Any number of values may be used in the instruction following the Variable and Expression operands. These are referred to in code as “debug operands”, and are indexed from 0 so that getDebugOperand(X) == getOperand(X+2).
  • The Expression in a DBG_VALUE_VAR must use the DW_OP_LLVM_arg operator, also added in this patch, to pass arguments into the expression: none of the values in the instruction are pushed onto the expression stack by default.

The new DW_OP_LLVM_arg operator is only valid in expressions appearing in a DBG_VALUE_VAR; it takes a single argument and pushes the debug operand at the index given by the argument onto the Expression stack. For example, in the instruction below:

DBG_VALUE_VAR !”a”, ! DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_minus), %0, %1

DW_OP_LLVM_arg, 0 has the meaning “Push the 0th debug operand (%0) onto the stack.” The whole expression thus evaluates to %0 - %1.
There are two main motivations for introducing this as a separate instruction rather than simply changing the old instruction:

  • Smaller change to MIR: by creating a new instruction we avoid invalidating old MIR, and will not cause produced MIR to be different in any cases apart from where the new instruction is needed. This simplifies the process of getting this change integrated, without preventing us from eventually removing the old instruction if desired.
  • Conciseness: DBG_VALUE_VAR instructions are more verbose than the existing instruction, which neatly covers simple direct and indirect values. By using DBG_VALUE_VAR for only the complex cases where it is needed, we avoid making debug information any more incomprehensible. This also simplifies (in my opinion) the question of whether or not to automatically push debug operands onto the expression stack (as was discussed in the original RFC).

[0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139376.html

Diff Detail

Event Timeline

StephenTozer created this revision.Jun 23 2020, 3:22 AM

@StephenTozer Thanks for this!

Is there an overlap of the DW_OP_LLVM_arg with the D70642?

DBG_VAL_VAR stands for variadic, any alternative for the name? VAR may indicate to 'variable' I think...

Another question/concern (may be too early): the Salvage Debug Info feature when supporting multiple SSA values may produce an exploded/very huge number of dbg_values I think, so we may want to introduce a magic value to cut off the input if it is too big. Have you encountered such situation?

llvm/test/CodeGen/MIR/Generic/dbg-value-var-spill.mir
55 ↗(On Diff #272659)

unused

llvm/test/CodeGen/MIR/Generic/dbg-value-var.mir
23 ↗(On Diff #272659)

unused

53 ↗(On Diff #272659)

Most of these MF attributes are unused, so we can get rid of them.

djtodoro added a subscriber: alok.Jun 23 2020, 5:33 AM

Is there an overlap of the DW_OP_LLVM_arg with the D70642?

Yes, the names here are conflicting; mainly because I didn't see a good alternative name for these ones, and none of the implicit pointer patches appear to have received any comments or updates since Jan 10th. They have similar purposes, but with the exception that this version is used to index values, while the other can also be used to index variables; given that, a reasonable conflict resolution might be to have one specifically for variables and another for debug operands.

DBG_VAL_VAR stands for variadic, any alternative for the name? VAR may indicate to 'variable' I think...

DBG_VALUE_VARIADIC seems reasonable though also quite long, but it's better than being misleading.

Another question/concern (may be too early): the Salvage Debug Info feature when supporting multiple SSA values may produce an exploded/very huge number of dbg_values I think, so we may want to introduce a magic value to cut off the input if it is too big. Have you encountered such situation?

I don't have a sample to prove that it will or will not, the full implementation is still in progress, but basic measurements of the number of debug values lost to salvaging suggests that it's not going to be a truly staggering number; even moreso when we consider that debug values relying on multiple SSA values or machine locations are more likely to have one of those values become unavailable at some point due to optimizations, losing the debug value.

DBG_VAL_VAR stands for variadic, any alternative for the name? VAR may indicate to 'variable' I think...

DBG_VALUE_VARIADIC seems reasonable though also quite long, but it's better than being misleading.

DBG_VALUE_LIST? And the predicates could be isDebugValueList and isSimpleDebugValue or something along those lines.

llvm/lib/CodeGen/MachineInstr.cpp
61

Is this a note to yourself that should be removed?

llvm/lib/CodeGen/PrologEpilogInserter.cpp
12

Is this a note to yourself that should be removed?

StephenTozer marked an inline comment as done.Jun 23 2020, 9:19 AM
StephenTozer added inline comments.
llvm/lib/CodeGen/MachineInstr.cpp
61

Yes, along with the one below.

DBG_VALUE_LIST? And the predicates could be isDebugValueList and isSimpleDebugValue or something along those lines.

"List" sounds good to me, although for the predicates something other than Simple would be good, since it's prone to overloading; perhaps just "NonList", or "Singular" or something along those lines.

Is there an overlap of the DW_OP_LLVM_arg with the D70642?

Yes, the names here are conflicting; mainly because I didn't see a good alternative name for these ones, and none of the implicit pointer patches appear to have received any comments or updates since Jan 10th. They have similar purposes, but with the exception that this version is used to index values, while the other can also be used to index variables; given that, a reasonable conflict resolution might be to have one specifically for variables and another for debug operands.

I see...
Hi @alok, do you have any input on this?

DBG_VAL_VAR stands for variadic, any alternative for the name? VAR may indicate to 'variable' I think...

DBG_VALUE_VARIADIC seems reasonable though also quite long, but it's better than being misleading.

Another question/concern (may be too early): the Salvage Debug Info feature when supporting multiple SSA values may produce an exploded/very huge number of dbg_values I think, so we may want to introduce a magic value to cut off the input if it is too big. Have you encountered such situation?

As @probinson suggested, DBG_VALUE_LIST sounds good to me.

I don't have a sample to prove that it will or will not, the full implementation is still in progress, but basic measurements of the number of debug values lost to salvaging suggests that it's not going to be a truly staggering number; even moreso when we consider that debug values relying on multiple SSA values or machine locations are more likely to have one of those values become unavailable at some point due to optimizations, losing the debug value.

I think it will trigger on any "larger" project build, e.g. llvm-project itself, even if we add support for instructions with two SSA values, but we can cut off the user-values if it reach certain number. But, as you said, it may be too early for this.

Is there an overlap of the DW_OP_LLVM_arg with the D70642?

I think the preferred way of resolving this would be to post to llvm-dev and make sure the authors of D70642 are CC'ed. As far as I can see D70642 is less space-efficient, but allows for referring to the same argument multiple times. Given the DW_OP_dup/over operators, I'm not sure if that is necessary.

Other than that, (apologies for the bike-sheddingI find the name confusing, because in the context of debug info, I'd associate "var" with "variable" and not with "variadic". DBG_VALUES, DBG_VALUE_MULTI? Or we could rename DBG_VALUE to DBG_VALUE_LEGACY and use DBG_VALUE for the new format, if we plan for it to completely replace it.

I think the preferred way of resolving this would be to post to llvm-dev and make sure the authors of D70642 are CC'ed. As far as I can see D70642 is less space-efficient, but allows for referring to the same argument multiple times. Given the DW_OP_dup/over operators, I'm not sure if that is necessary.

Drafting something up now. Also I'm not sure the DW_OP_pick operator and its various specializations are useful here; they can only be used to copy an element on the expression stack, but there is no guarantee that the required argument will actually be on the expression stack when DW_OP_pick is evaluated; if an argument is pushed on the stack and then immediately modified, the original argument can't be picked out later.

Addressing the review comments: remove the note-to-self comments, and rename DBG_VALUE_VAR -> DBG_VALUE_LIST. This could change again if people think something else would be more appropriate, but this looks to have general approval.

Also fixed the test instances of DBG_VALUE_VAR to have DW_OP_stack_value, as all actual produced instructions should be stack values.

Also /s/var/list/ in the test filenames.

Apologies for the update spam, this is the last one: remove unused lines from test.

Update: Duplicate registers in the debug operands could be an issue with the previous patch version, and duplicates aren't easy to prevent, so always treat DBG_VALUE_LISTs as potentially having them.

djtodoro added inline comments.Jul 9 2020, 6:13 AM
llvm/include/llvm/CodeGen/MachineInstr.h
483–484

\p Reg ?

493

Can we use templates to avoid duplicated code here for getDebugOperandsForReg()?

1166

Should the name of the method follow the new name of the meta instr ?

llvm/lib/CodeGen/MachineInstr.cpp
2235–2236

No need for the extra curly brackets

llvm/test/CodeGen/MIR/Generic/dbg-value-list.mir
2

Please add a top level comment describing the purpose of the test.

StephenTozer marked an inline comment as done.Jul 9 2020, 8:25 AM
StephenTozer added inline comments.
llvm/include/llvm/CodeGen/MachineInstr.h
493

We can, as long as we use a static function to hold the common code (if there's a way to do so without a static function then I'd be happy to go with that instead); the solution looks something like this:

  template <typename Operand, typename Instruction>
  static ArrayRef<Operand *> getDebugOperandsForReg(Instruction *MI, Register Reg) {
    assert(MI->isDebugValue() && "Tried to get debug operands for non-debug_value");
    SmallVector<Operand *, 2> Ops;
    for (Operand &Op : MI->debug_operands()) {
      if (Op.isReg() && Op.getReg() == Reg)
        Ops.push_back(&Op);
    }
    return Ops;
  }
  ArrayRef<const MachineOperand *> getDebugOperandsForReg(Register Reg) const {
	  return MachineInstr::getDebugOperandsForReg<const MachineOperand, const MachineInstr>(this, Reg);
  }
  ArrayRef<MachineOperand *> getDebugOperandsForReg(Register Reg) {
	  return MachineInstr::getDebugOperandsForReg<MachineOperand, MachineInstr>(this, Reg);
  }

Does this look good? It removes the duplication, it's just a bit more verbose and leaves an otherwise useless static function hanging around, unless it's moved to a private block (which is also fine but reduces readability by moving it far away from the public functions).

Address latest review comments.

djtodoro added inline comments.Jul 10 2020, 5:08 AM
llvm/include/llvm/CodeGen/MachineInstr.h
486

const does not have any impact here, since it is a return type, so it should be removed.

493

This looks good to me, thanks.

llvm/lib/Target/NVPTX/NVPTXPrologEpilogPass.cpp
79

Should be isNonListDebugValue() ?

Remove useless const, add rename that somehow escaped last diff.

Rename appendOpsToLoc->appendOpsToArg to fit the rest of the names, update isUndefDebugValue check to check all operands (pulled ahead from a patch further up the stack).

Conciseness: DBG_VALUE_VAR instructions are more verbose than the existing instruction, which neatly covers simple direct and indirect values. By using DBG_VALUE_VAR for only the complex cases where it is needed, we avoid making debug information any more incomprehensible. This also simplifies (in my opinion) the question of whether or not to automatically push debug operands onto the expression stack (as was discussed in the original RFC).

My take on this is: Given how limited the audience that reads MIR is, I think it may be better to always canonicalize to the variadic version, since it keeps the code simpler, as it only has to deal with one kind of debug intrinsic.

I don't know that my opinion holds much weight, but I agree that reducing the number of nearly-equivalent-but-subtly-different intrinsics/pseudo-instructions for debug-information would make it easier for a new dev to understand their semantics, and make it much easier to extend. I would argue that using different representations based on complexity doesn't make the result any more comprehensible, it just adds more things to comprehend.

I'm still catching up on what has already been discussed/decided here and in the RFC, as well as the history of debug info in LLVM, because AMD is working on extensions to DWARF to support heterogeneous debugging in a standard way (i.e. to support AMD GPU, Intel GPU, NVidia GPU, etc.; the intro at https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html#introduction does a good job of providing an overview of what we are thinking). Some of the concepts from there seem like they can simplify parts of the debug info handling while also supporting more expressiveness.

One big example of where this is needed is to represent address-spaces when working with memory locations. With the "implicit dereference" model in the current IR intrinsics, and the offset field in the MIR instruction, this becomes fairly complex. By making things explicit in the expression most operations on it are simplified.

Would it be appropriate to propose having a discussion somewhere on the topic with anyone interested, preferably on Discord or somewhere we can talk? I think we are leaning towards a very similar end-state, and I worry about just working in a silo until I can post an RFC to hash this out in text.

llvm/include/llvm/CodeGen/MachineInstr.h
495

It doesn't seem like you can implement this interface (at least without using static storage somewhere); it is static, the return type is non-owning, and none of the inputs provide the storage.

Can this just be replaced with a new "filter" iterator type? The caller can always collect the results into a container if they prefer.

504

I.e. here ArrayRef doesn't copy or move anything out of Ops, and then Ops goes out of scope and the ArrayRef is dangling.

llvm/lib/CodeGen/MachineInstr.cpp
2097

Can this assert on the size of MOs?

Would it be appropriate to propose having a discussion somewhere on the topic with anyone interested, preferably on Discord or somewhere we can talk? I think we are leaning towards a very similar end-state, and I worry about just working in a silo until I can post an RFC to hash this out in text.

In a normal year I would have suggested a round table at the upcoming LLVM dev meeting, but this time around we'll have to improvise :-)
In the past, a post to LLVM-dev with (it does not have to be a fully fleshed out RFC at all) was always a good way to get a discussing with all stakeholders started.

In a normal year I would have suggested a round table at the upcoming LLVM dev meeting, but this time around we'll have to improvise :-)
In the past, a post to LLVM-dev with (it does not have to be a fully fleshed out RFC at all) was always a good way to get a discussing with all stakeholders started.

I've sent an email to LLVM-dev, hopefully this ensures that everyone gets a chance to weigh in. I'm not sure how long to give it before deciding that everyone has "had their chance", but with any luck we can move things forward soon. I also dropped a reference to the IR implementation in there although a public review isn't up yet; I'll soon be putting that up, so look forward to it!

Updated changes for latest master (there may be some lurking unseen errors, still trying to flush any out), addressed some review comments. Right now, I think the only thing holding this patch up is the discussion on the llvm-dev mailing list about using DBG_VALUE_LIST as the default for all debug values. If anyone on this review has any input for that, please go ahead and reply to the chain - I think that the latest proposal I put forward should satisfy everyone's requirements, but I don't want to push ahead without being sure of that.

Could you add this to SourceLevelDebugging.rst or the IR reference, whichever is more appropriate?

Add documentation for DBG_VALUE_LIST and DW_OP_LLVM_arg.

Documentation looks good!

llvm/docs/SourceLevelDebugging.rst
581 ↗(On Diff #294752)

Since we — contrary to LLVM bitcode — don't bother with compatibility for MIR — should we just always use DBG_VALUE_LIST instead or turn DBG_VALUE into DBG_VALUE_LIST to keep the code simpler?

djtodoro added inline comments.Sep 30 2020, 12:30 AM
llvm/docs/SourceLevelDebugging.rst
581 ↗(On Diff #294752)

+1 for this, but do we want to keep the old one (DBG_VALUE) for the legacy (for some time) or it is too expensive?

StephenTozer added inline comments.Sep 30 2020, 3:20 AM
llvm/docs/SourceLevelDebugging.rst
581 ↗(On Diff #294752)

This is being discussed on the mailing list right now, although the conversation somewhat died down without a firm resolution. I'm leaning towards removing the old form (it was suggested that replacing it with DBG_VALUE_OLD for a transition period before outright removal would be a good idea) and using DBG_VALUE_LIST everywhere.

The main point of discussion that needed resolution was how to represent the different location types with DBG_VALUE_LIST, since this current version only cares about stack values. My most recent proposal was the addition of a new expression operator, DW_OP_LLVM_direct, that could be used primarily to differentiate between the register location DW_OP_reg0 and the memory location DW_OP_breg0, since this distinction doesn't exist in LLVM (we use the indirect flag to do so, which DBG_VALUE_LIST doesn't have).

I'm prepared to implement whatever solution we agree upon; and although we haven't got a firm agreement on any particular outcome, there seems to be more or less unanimous agreement that removing the old form is the right move. Given that we're keeping the behaviour of the old instruction and that we may want to keep it around in some legacy form for a time (DBG_VALUE_OLD?) as was suggested on the mailing list, it will probably be quite a small patch to implement the replacement. I'd also like that patch to be one of the last in this stack, so that we don't end up with a broken intermediate patch state (replacing the existing DBG_VALUE's uses before the new DBG_VALUE is fully functional).

djtodoro added inline comments.Oct 1 2020, 2:27 AM
llvm/docs/SourceLevelDebugging.rst
581 ↗(On Diff #294752)

I am now strongly +1 for removing/replacing the DBG_VALUE, and I agree on that the DBG_VALUE_OLD would be useful to keep for some time.

I think the distinguishing between register location and memory location is different problem (out of this particular change), and I think the discussion about that should continue on the mailing list, since we really don't have a mechanism to differentiate that.

StephenTozer added inline comments.Oct 1 2020, 4:44 AM
llvm/docs/SourceLevelDebugging.rst
581 ↗(On Diff #294752)

The issue is that I don't think we can replace DBG_VALUE with DBG_VALUE_LIST if the latter has no way to express the difference between a register location and a memory location - without a means of doing so, we would have to drop either all register locations or all simple memory locations, which would be disastrous for debug info.

We can at least change the literal names, i.e. do a simple string replace "DBG_VALUE -> DBG_VALUE_OLD" and then "DBG_VALUE_LIST -> DBG_VALUE" across the entire project. That should probably be its own patch, and given that it is just a string replacement it doesn't really matter where the patch is applied (my thought is that it should be applied after every other patch, so that DBG_VALUE_LIST is fully functional at the point of replacement).

Thank you, that makes sense. Yes we should probably have a separate patch that move DBG_VALUE out of the way (or renames it to DBG_VALUE_OLD), then we let this one take over the DBG_VALUE name and then remove DBG_VALUE_OLD. I will see that I'll reply to the "direct" thread to bring this to a resolution.

jmorse added a subscriber: jmorse.Nov 19 2020, 9:22 AM

@StephenTozer what's the status of the latest discussions on this patch? As far as I understand it:

  • DBG_VALUE may go away, but that'll be in another patch,
  • There was some conclusion to whether all DBG_VALUE_LISTs should be stack_values too, can't remember what, on the mailing list,
  • This patch, adding the plumbing for the DBG_VALUE_LIST, would be ready to land.

For the record, it LGTMs. I've added some mega-nits inline that are largely style based.

llvm/docs/SourceLevelDebugging.rst
608 ↗(On Diff #294752)

Gratuitous nit -- it's a sequence rather than a set

llvm/lib/CodeGen/MachineInstr.cpp
2180

Braces to avoid ambiguity with the contained blocks?

llvm/lib/Target/X86/X86OptimizeLEAs.cpp
610–615

I'd suggest lambda-ising this for readability

@StephenTozer what's the status of the latest discussions on this patch? As far as I understand it:

  • DBG_VALUE may go away, but that'll be in another patch,
  • There was some conclusion to whether all DBG_VALUE_LISTs should be stack_values too, can't remember what, on the mailing list,
  • This patch, adding the plumbing for the DBG_VALUE_LIST, would be ready to land.

The decision made on the mailing list was to add a flag to the DBG_VALUE_LIST that acts similar to the directness flag on DBG_VALUE, albeit with a more strictly defined meaning. It still needs to be implemented here, and will be added as part of this patch: it will be easiest if there is a single patch which contains the entire DBG_VALUE_LIST, rather than adding a new instruction and then changing its format in a subsequent patch. Everything else is correct, and the patch that renames DBG_VALUE -> DBG_VALUE_OLD, DBG_VALUE_LIST -> DBG_VALUE can easily be added at the end of this patch stack.

Rebased onto recent master.

I'm still working through the whole patch series (thank you for the careful breakdown of the changes and their dependencies, it has been a big help!) but I'm going to be out for a bit around the holidays and wanted to post at least some of the feedback I do have.

These are mostly just nits, I haven't really considered the design any more fundamentally. I'll also post some of the feedback I have on other patches in the series.

llvm/docs/SourceLevelDebugging.rst
606 ↗(On Diff #310155)

Not that there is a strong convention, but could you make these consistent with the "The Nth operand..." phrasing above?

llvm/include/llvm/CodeGen/MachineInstr.h
487
495
495

What's the advantage of this being a static method? It seems like below you wrap it with member versions, but I don't see when this version would be necessary?

497

I'd drop this assert, the call to debug_operands asserts the same thing.

499–503
506
511
llvm/lib/CodeGen/MachineInstr.cpp
849–852
861–864
2096–2097

Why not just assert that MCID.Opcode == TargetOpcode::DBG_VALUE_LIST instead? This seems like surprising behavior to me.

Alternatively a more generic interface that doesn't require the MCID at all and decides which opcode to use would make sense, although I would vote to always choose the _LIST variant in that case anyway.

2176–2178
llvm/lib/Target/NVPTX/NVPTXPrologEpilogPass.cpp
71
llvm/test/CodeGen/MIR/Generic/dbg-value-list-spill.mir
18

I'm curious why a reference to c ends up in the DBG_VALUE* for e? Or maybe put another way, why does the last CHECK line refer to $noreg?

StephenTozer added inline comments.Jan 6 2021, 2:08 PM
llvm/include/llvm/CodeGen/MachineInstr.h
495

The static version is needed to share code between the const and non-const versions of this function; it doesn't need to be called directly outside of those two functions.. If this function wasn't static, then it would have to either a) be const and thus unable to return non-const MachineOperands, or b) be non-const and thus not be callable from a const MachineInstr. The static version on the other hand can be called with either a const MachineInstr to return const MachineOperand*, or called with a MachineInstr to return MachineOperand*.

llvm/lib/CodeGen/MachineInstr.cpp
2096–2097

Why not just assert that MCID.Opcode == TargetOpcode::DBG_VALUE_LIST instead? This seems like surprising behavior to me.

Right now this function is called for both DBG_VALUE and DBG_VALUE_LIST; in places like LiveDebugValues where we remove delete debug value instructions, store them in some other format, modify them, and then re-emit them, this function will be called with MCID = DebugValue.WasList() ? TargetOpcode::DBG_VALUE_LIST : TargetOpcode::DBG_VALUE;. The MCID argument could be removed, with this function split up into two copies for single and list debug values; for what it's worth though, we'd still be making the same asserts as we do here, just done immediately prior calling this function in several other places.

Alternatively a more generic interface that doesn't require the MCID at all and decides which opcode to use would make sense, although I would vote to always choose the _LIST variant in that case anyway.

Given that this only produced DBG_VALUEs in the first place I'm not sure why it was ever needed as an argument; the best time to remove it might be when we go back to only producing one type of instruction, i.e. when we remove DBG_VALUE further down the line.

Rebased onto recent master (993c488ed), addressed review comments (except for the question about dbg-value-list-spill.mir).

Rebased onto recent master (resolving clashes with the DW_OP_implicit_ptr patch); rewrote dbg-value-list-spill.mir test.

Ping - this one has had a lot of discussion, and I think all of the substantial concerns have been addressed; are there any further comments?

W.r.t the replacement of DBG_VALUE with DBG_VALUE_LIST, the current plan is to make that all happen in a patch following this stack. The patch itself should mostly be deletion of old code and simplifying where possible, renaming DBG_VALUE_LIST to DBG_VALUE, adding an extra operand to the new DBG_VALUE to represent directness, adding several tests, and updating existing tests as appropriate. It's not a no-op, but it should be relatively simple to review since there's little to no "new logic" being added; it will however involve updating a large amount of code and be more "disruptive" than this patch, since it will be likely to break private code changes and tests relating to DBG_VALUEs. It makes more sense to land the existing patches before pushing the update for this reason. I was originally also planning to add the new operand as part of this patch, but since it would have no effect, I believe it would be better to add in a followup patch instead of adding an operand that is never actually read at any point.

W.r.t the replacement of DBG_VALUE with DBG_VALUE_LIST, the current plan is to make that all happen in a patch following this stack. The patch itself should mostly be deletion of old code and simplifying where possible, renaming DBG_VALUE_LIST to DBG_VALUE, adding an extra operand to the new DBG_VALUE to represent directness, adding several tests, and updating existing tests as appropriate.

IMO if that part could be done in stages without excessive churn, that would be best. In particular the new operand for directness seems like its own thing? The rest of it would seem fairly mechanical/NFC, although admittedly I haven't been following this closely. (So, pay more attention to the real reviewers, if they have a different opinion.)

This tentatively looks LGTM to me, modulo some nits and the comment on dbg-value-list-spill.mir.

llvm/lib/CodeGen/RegAllocFast.cpp
339

Is this rewriting

DBG_VALUE $noreg

to

DBG_VALUE %stack.0

?

Obviously this patch just preserves old behaviour, which is fine, but I get the feeling that the old behaviour is wrong.

llvm/lib/IR/DebugInfoMetadata.cpp
1190–1193

Mega-nit: Could I suggest "ExprOp" as the iterating variable -- having Op and Ops in the same method makes me think that Op is a member of Ops, but that's not the case.

llvm/lib/Target/NVPTX/NVPTXPrologEpilogPass.cpp
82–85

Should this be using the new "getOffsetOpcodes" hook?

llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp
278–284

Should this be using the new "getOffsetOpcodes" hook?

llvm/lib/Target/X86/X86OptimizeLEAs.cpp
587

The if that this "else" applies to is ambiguous -- curly braces everywhere please

llvm/test/CodeGen/MIR/Generic/dbg-value-list-spill.mir
2

Seeing how the input file doesn't have any virtual registers, surely running regalloc is going to do nothing?

Plus: the code below is X86, it probably deserves to be in the MIR/x86 directory and have a triple on the command line (I've been bitten by missing triples several times)

llvm/test/CodeGen/MIR/Generic/dbg-value-list.mir
4

Needs to use metadata node captures rather than hard-coded numbers

StephenTozer added inline comments.Feb 19 2021, 2:51 AM
llvm/lib/CodeGen/RegAllocFast.cpp
339

I'll have to run through RegAllocFast again to be sure of the full behaviour here... as far as I understand, the intent here is that if the DBG_VALUE used the spilled virtual register but was unassigned for some reason in this pass, then we recover it using the spilled value here. I think that that could be valid, but I don't understand why the location would be set to $noreg if the virtual register hasn't been totally removed yet (which it shouldn't be if it's being spilled here).

llvm/lib/Target/NVPTX/NVPTXPrologEpilogPass.cpp
82–85

Probably - the existing code isn't, but I'm happy to correct it for the "new" case.

llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp
278–284

Ditto the above response, old code isn't calling it but we can fix it as part of this patch.

jmorse added inline comments.Feb 19 2021, 2:53 AM
llvm/lib/CodeGen/RegAllocFast.cpp
339

Cool; not a blocker to landing of course.

Rebased onto latest master, numerous small cleanups.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 4 2021, 3:46 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Mar 4 2021, 4:19 AM

This fails on bots that have a non-x86 default triple: http://45.33.8.238/macm1/4580/step_10.txt

Probably just needs an explicit triple.

Please take a look, and revert for now if it takes a while to fix.

This fails on bots that have a non-x86 default triple: http://45.33.8.238/macm1/4580/step_10.txt

Probably just needs an explicit triple.

Please take a look, and revert for now if it takes a while to fix.

Thanks for the heads up - I've already reverted, but will push up again with this fix once I've checked to make sure there are no other issues.