This is an archive of the discontinued LLVM Phabricator instance.

[codeview] support more DW_OPs for more complete debug info
ClosedPublic

Authored by inglorion on Aug 18 2017, 5:28 PM.

Details

Summary

Some variables show up in Visual Studio as "optimized out" even in -O0
-Od builds. This change fixes two issues that would cause this to
happen. The first issue is that not all DIExpressions we generate were
recognized by the CodeView writer. This has been addressed by adding
support for DW_OP_constu, DW_OP_minus, and DW_OP_plus. The second
issue is that we had no way to encode DW_OP_deref in CodeView. We get
around that by changinge the type we encode in the debug info to be
a reference to the type in the source code.

This fixes PR34261.

Event Timeline

inglorion created this revision.Aug 18 2017, 5:28 PM

(Still a work in progress.)

inglorion updated this revision to Diff 112124.Aug 22 2017, 2:00 AM

Added support for DW_OP_deref, and a test case. This is now ready for review.

inglorion updated this revision to Diff 112125.Aug 22 2017, 2:01 AM
inglorion edited the summary of this revision. (Show Details)
inglorion added reviewers: aprantl, rnk, zturner.
inglorion added a subscriber: llvm-commits.

arc diff --verbatim

aprantl edited edge metadata.Aug 22 2017, 8:52 AM

test

llvm/test/CodeGen/X86/codeview-nrvo.ll
41 ↗(On Diff #112125)

We usually strip out all quoted attributes from testcases to make them more future-proof / readable.

73 ↗(On Diff #112125)

Shouldn't the testcase include a complex expression?

rnk added inline comments.Aug 22 2017, 9:19 AM
llvm/test/CodeGen/X86/codeview-nrvo.ll
73 ↗(On Diff #112125)

The dbg.declare gets lowered to a DEBUG_VALUE MI instruction with a DW_OP_deref. The complex DIExpression would be more obvious if this were a .mir test case. We probably want all of an IR NRVO test case, an MIR NRVO test case, and hand-crafted IR test cases with complex DIExpressions.

rnk added inline comments.Aug 22 2017, 9:29 AM
llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
1001

Please factor out the DIExpression parsing into a helper that extracts the fragment info, Deref, Supported, and Offset.

In the long run, we might want to remodel DIExpression so that it encodes this information directly. Right now DIExpressions feel very "stringly-typed": they're just a sequence of integer opcodes and arguments that don't have any clear relationship to each other. Every bit of code that touches them rolls its own parsing.

In the long run, we might want to remodel DIExpression so that it encodes this information directly. Right now DIExpressions feel very "stringly-typed": they're just a sequence of integer opcodes and arguments that don't have any clear relationship to each other. Every bit of code that touches them rolls its own parsing.

I think it might make sense to separate out the DW_OP_LLVM_fragment, but otherwise the DIExpression operand iterators are there to abstract from the array of opcodes implementation.

Thanks for taking a look!

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
1001

In the long run, we might want to remodel DIExpression so that it encodes this information directly.

I was thinking that, too. It's great to be able to express arbitrarily complex expressions, but if we generate only simple ones and backends handle only a few specific forms anyway, we might as well simplify our lives and avoid the extra parsing.

For now, I'll factor this out. I was actually planning to do that today anyway.

inglorion added inline comments.Aug 22 2017, 11:48 AM
llvm/test/CodeGen/X86/codeview-nrvo.ll
73 ↗(On Diff #112125)

How can I make the DIExpressions more obvious in the MIR? I'm new to MIR, so I'm not sure what the best way to generate it is. I did llc -O0 -stop-after=prologepilog, which gives me machine IR that has the debug records that I want, but the MIR still has dbg.declare and empty DIExpressions, so I'm not sure that's what we want. I tried several other passes, with similar results. At some point, the translation from dbg.declare to dbg.value must happen, and there must be a point where we have DIExpressions with arithmetic and deref, but I can't seem to find an invocation that dumps this.

Which pass generates the DIExpressions you are interested in? If it is a MIR pass, you'll need to stop-after that pass, and the DIExpressions will be referenced by the DBG_VALUE instructions in the MIR, and should appear in the IR dump metadata (but not in the IR code portion).

rnk added inline comments.Aug 22 2017, 1:10 PM
llvm/test/CodeGen/X86/codeview-nrvo.ll
73 ↗(On Diff #112125)

So, for some reason -stop-after=prologepilog doesn't print any DBG_VALUE instructions. I'm not sure why, but we weren't able to use that to make a test case. I think we're find with just the original NRVO test case plus some handcrafted IR DIExpression test cases.

inglorion updated this revision to Diff 112248.Aug 22 2017, 4:05 PM

How is this for the testcase? It checks positive offsets, negative offsets, and deref.

Testcase seems fine as far as I am concerned. Is it possible to include a comment with instructions for how to regenerate the test from source if that becomes necessary in the future?

rnk added inline comments.Aug 22 2017, 4:46 PM
llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
949

Shouldn't need this with a member initializer.

1056

Are you sure this always works? I would expect it to assert sometimes. We haven't run the assembler yet. We can't in general know the distance between two labels, even if they are in the same section. If we can, we can make a large number of simplifications to this code.

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.h
97

Let's initialize this to false so we don't need to remember it everywhere.

@aprantl, I would like to do that, but this doesn't really correspond to any source anymore. I suppose you could take the LLVM IR part of it, take the offsets out of the DIExpressions, and then run it through llc -O0 -stop-after=prologepilog. Then you can edit the DIExpressions to be what you need them to be. Would it be useful to include that in a comment?

Sometimes I say something like "compiled from the following source with -O0 and manually edited", I'll leave it up to your judgement.

inglorion updated this revision to Diff 112472.Aug 23 2017, 4:10 PM
inglorion marked 6 inline comments as done.
  • rewrite non-deref ranges for variables we changed to references
  • factored out interpretation of debug info expressions
inglorion updated this revision to Diff 112473.Aug 23 2017, 4:12 PM

removed stray assert

inglorion updated this revision to Diff 112652.Aug 24 2017, 7:46 PM
  • updated COFF/pieces.ll test to check for records we now emit
  • refactored to avoid iterating twice

Thanks for the feedback! I think I incorporated all of it. Does this look good now?

rnk added inline comments.Aug 28 2017, 10:24 AM
llvm/lib/CodeGen/AsmPrinter/VariableLocation.cpp
34–51

I was thinking about how to improve DIExpression over the weekend, and one of the ideas I came up with is that we should "canonicalize" them on construction, and we should add a high-level opcode like DW_OP_LLVM_fragment that expresses DW_OP_deref plus a signed offset. That would eliminate the need for a ton of DW_OP_const/minus/plus parsing and centralize the logic in the DIExpression opcode canonicalization phase. CodeGen and mid-level passes would be able to reason about chains of loads with offsets, or they would give up if the expression is too complicated.

This is an idle thought for a future patch, though...

llvm/lib/CodeGen/AsmPrinter/VariableLocation.h
2

IMO we should call this DbgVariableLocation. I'd recommend moving it to DebugHandlerBase.h for now. That's the current home for common functionality between dwarf + CV emission.

14–15

These are not system headers, these should be quoted.

llvm/test/DebugInfo/COFF/pieces.ll
182 ↗(On Diff #112652)

Cool. :)

inglorion added inline comments.Aug 28 2017, 2:35 PM
llvm/lib/CodeGen/AsmPrinter/VariableLocation.cpp
34–51

Yeah, I was thinking along similar lines: Have a DW_OP that represents what we commonly want, and keep the existing format for DIExpression so we don't have to change the representation and we can still express complex things. Like you said, that's for a future patch, though.

inglorion updated this revision to Diff 112975.Aug 28 2017, 2:59 PM

changes requested by @rnk

rnk accepted this revision.Aug 28 2017, 4:20 PM

lgtm

llvm/lib/CodeGen/AsmPrinter/DebugHandlerBase.h
41 ↗(On Diff #112975)

Make this unsigned, otherwise it will only store the values 0 and -1, which can be confusing from time to time.

This revision is now accepted and ready to land.Aug 28 2017, 4:20 PM
inglorion updated this revision to Diff 112990.Aug 28 2017, 4:44 PM

changed 1-bit ints to unsigneds, ran clang-format

This revision was automatically updated to reflect the committed changes.