Page MenuHomePhabricator

[DebugInfo][Docs] Document how variable location metadata is transformed through target codegen
ClosedPublic

Authored by jmorse on Mar 25 2019, 12:07 PM.

Details

Summary

This patch adds documentation to SourceLevelDebugging.rst that explains how variable location information is compiled from the IR representation down to the end of the codegen pipeline, but avoiding discussion of file formats and encoding.

As far as I'm aware this is all currently undocumented (in html docs at least). There may be much better ways of structuring the added information, and I'm very happy to reorganize/rewrite the text.

Things that I believe this documents:

  • How llvm.dbg.value intrinsics map down onto DBG_VALUE instructions
  • Several examples of how variable locations should be changed when instruction scheduling moves instructions
  • What LiveDebugVariables does
  • What LiveDebugValues does
  • The rationale behind _why_ LiveDebugValues expands variable location information at a late stage, versus mem2reg encoding lots of information
  • The fact that DBG_VALUE changes meaning (I believe!) after LiveDebugValues
  • That each block must define all valid variable locations in it by the end of codegen (see discussion in D59431)

What it doesn't explain, and is still a mystery to me:

  • _Why_ does LiveDebugVariables do what it does?

I'd also be highly interested in things that are currently uncertain/unknown, but really should be documented. Ideally this page should contain all the rationale behind how debuginfo is designed/operates.

Diff Detail

Repository
rL LLVM

Event Timeline

jmorse created this revision.Mar 25 2019, 12:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2019, 12:07 PM

I have not read through everything yet, but thanks for documenting this!

docs/SourceLevelDebugging.rst
550 ↗(On Diff #192178)

nit: function is created

575 ↗(On Diff #192178)

Is it worth mentioning that there are some exceptions with parameter debug values (ArgDbgValues), or is that too detailed?

vsk added a comment.Apr 2 2019, 7:08 PM

Thanks so much for writing this down! Imho this is in pretty good shape (better than the drafts of similar docs I've thrown away). I've taken a look at the first third. I'll try to give a more detailed pass over the remaining bits later.

docs/SourceLevelDebugging.rst
519 ↗(On Diff #192178)

Wdyt of keeping with the title of this patch and calling this 'How variable location metadata is used' (or, transformed)?

522 ↗(On Diff #192178)

Start with: 'LLVM preserves ... throughout mid-level and backend passes'? No need to mention file formats, per the patch summary.

532 ↗(On Diff #192178)

Isn't it the reverse? I thought we mapped variable locations to instruction ranges. Also, not sure it's necessary to mention encoding/serializing here.

533 ↗(On Diff #192178)

Instead of mentioning serious changes, how about giving a very brief outline? Like: '''The major transformations which affect variable location fidelity include: <bulleted list>\n The following sections go into more detail about each transformation.'''

I really like that the sections are ordered by how they occur in the pipeline. Maybe mention that, so readers have an easier time forming a timeline of when things happen?

539 ↗(On Diff #192178)

Can you start by mentioning that this transformation converts IR into MIR?

540 ↗(On Diff #192178)

Can we drop the 'or', or is there a need to distinguish 'location' from 'memory address'?

543 ↗(On Diff #192178)

Just: 'In MIR, variable locations may be identified in a number of different ways'?

544 ↗(On Diff #192178)

Can you split the second example out into its own sentence?

554 ↗(On Diff #192178)

Can you clarify what is meant by undefined? Is this something that gets patched up later, or is the assignment lost?

556 ↗(On Diff #192178)

'After MIR locations are assigned to each variable'?

576 ↗(On Diff #192178)

Mention that llvm makes a best-effort attempt to ensure the positions of DBG_VALUEs in the instruction stream correspond to the order in which assignments/updates occur in source? Maybe you already discuss this later, but this seems like a good place to bring it up.

vsk added inline comments.Apr 2 2019, 7:20 PM
docs/SourceLevelDebugging.rst
784 ↗(On Diff #192178)

@aprantl, do you remember our conversation about why llvm doesn't have a DBG_PHI MI? Was the main reason that it would require too many new meta-instructions to be inserted? Or was it that it wouldn't obviate the need for a convenient, block-local view of which variables are live?

bjope added a comment.Apr 5 2019, 4:34 AM

Nice!

One more thing that could be described (not neccessarily in this patch):

  • dbg.declare is removed by ISel and the information is squirreled away in a table in the MachineFunction. Later DwarfDebug::collectVariableInfoFromMFTable is fetching the debug info using MF->getVariableDbgInfo(). That kind of explains why there is no DBG_DECLARE meta instruction, or any DBG_VALUE instructions when using -O0, which otherwise could be a mystery for someone not familiar with that part.
jmorse updated this revision to Diff 194287.Apr 9 2019, 4:49 AM
jmorse marked 17 inline comments as done.

Many thanks for the reviews, I've juggled the text in the first third in response to comments,

Bjorn wrote:

One more thing that could be described (not neccessarily in this patch):

dbg.declare is removed by ISel and the information is squirreled away in a table in the MachineFunction. Later DwarfDebug::collectVariableInfoFromMFTable is fetching the debug info using MF->getVariableDbgInfo(). That kind of explains why there is no DBG_DECLARE meta instruction, or any DBG_VALUE instructions when using -O0, which otherwise could be a mystery for someone not familiar with that

Ooof, that's another thing I was unaware of -- which is a great reason for it to be documented. (I'll work out where to drop that in shortly).

docs/SourceLevelDebugging.rst
519 ↗(On Diff #192178)

Works for me, I guess that's less abstract,

532 ↗(On Diff #192178)

True,

540 ↗(On Diff #192178)

Hmm, all I really wanted to get across was that there are multiple categories of machine locations, no real need to distinguish different IR locations.

554 ↗(On Diff #192178)

Revised patch now describes locations as "unavailable", then adds a sentence defining what that means. I've also explicitly linked dbg.value(undef) and $noreg in the DBG_VALUE description below.

575 ↗(On Diff #192178)

Given the amount of code that juggles parameters, it's good to mention. I think it's also a design goal as well as an implementation detail.

vsk accepted this revision.Apr 12 2019, 3:50 PM

Looks great, thanks!

docs/SourceLevelDebugging.rst
590 ↗(On Diff #194287)

Consider: s/, however/. However ... scheduling, which can reorder assignments/

This revision is now accepted and ready to land.Apr 12 2019, 3:50 PM
This revision was automatically updated to reflect the committed changes.
jmorse marked an inline comment as done.

Thanks for the all the reviews; still outstanding is the dbg.declares-get-stashed-in-MF-objects matter, which I'll generate a follow-up for.