This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo][RemoveDIs] Add prototype storage classes for non-instruction variable debug-info

Authored by jmorse on Jun 28 2023, 10:25 AM.



Here's a patch that adds the basic storage for non-instruction variable location information. This has the potential to be very exciting, but only in the future: this implementation is deliberately na\"ive, implementing a base level of functionality. We're not looking to make debug-info super fast in this specific patch series.

Additionally, it also has a "rough" surface area because there's one artefact that we haven't managed to polish out yet: the total order of debug records, i.e. the fact that blocks of dbg.values in a block have a meaningful order and can affect each other. Eliminating that is tied in with future storage optimisations, so we haven't tried to eliminate that yet. There are only roughly five places in LLVM that really need to care about this (see future patches).

Hopefully the file-level comment in the added files and other docucomments are sufficient to piece together what's happening here: dbg.values are represented by DPValues, with their position stored in a DPMarker, which is then attached to an instruction. DPValues attempt to look as much like a DbgValueInst as possible, having most of the same accessors, the primary difference with them is that they're not an Instruction and don't exist in the Value hierachy. DPMarkers exist as a connection to the Instruction that is the position of the DPValue in a block, and to provide various utility methods for shuffling DPValues around. These are exercised in the DebugInfoTest.cpp unit test added.

Because a DPValue is neither metadata nor a Value, but still needs to refer to ValueAsMetadata objects wrapping instructions and things in the Value hierachy, we need to plumb DPValues into the metadata-lookup facilities. This is so that we can later query "give me all the DPValues referring to this Value", in the same way that you can do that for dbg.values. It's also needed so that when a Value gets RAUW'd, all the metadata-connected users get told about it: see the added unit test in LocalTest.cpp

This patch is probably the correct place to talk about two significant storage changes:

  • Instructions will gain another pointer in their body, pointing at an optional DPMarker. This is a noteworthy cost because it affects non-debuginfo builds. I also think it's unavoidable, we need to store precise positions for debug-info and access them from an instruction quickly. This is a trade-off: more memory in normal builds contributes to faster -g builds, in the same way that there's a DebugLoc in class Instruction already.
  • Blocks will gain a "TrailingDPValues" object, essentially a list of "dangling" DPValues for when blocks transiently don't have a terminator. Again, this inflates the size of a data structure; again, I don't think this is avoidable. Removing the terminator from a block is a legitimate operation, and we still require a location for debug-info in those circumstances.

Naming: note that the files added are named "DebugProgramInstruction", and other classes prefixed "DP". This is because we originally prototyped all of this as a "shadow" list in a block, effectively a "debug program" behind the real program. We've since moved on from that, but there are still a few names baked in, and I can't for the life of me come up with anything better right now. Suggestions welcome!

Looking back over this, we should be using a unique_ptr to store the pointer references being added, but I'm a bit to burnt out to fix that right now; coming in a future revision!

Diff Detail

Event Timeline

jmorse created this revision.Jun 28 2023, 10:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2023, 10:25 AM
jmorse requested review of this revision.Jun 28 2023, 10:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2023, 10:25 AM

I started reviewing this but didn't have time to finish it. For fear of losing my comments I'll post what I've got so far. Note to self: start at Instruction.h.


How frequently do we expect TrailingDPValues to be used? Is it worth having this heap allocated instead to save some bits?


Comment SGTM


nit: LLVM's


Looks like these fields have 'getters' already - should they be private? Or, if they don't change after ctor we could make them const instead.


nit: we changed DbgVariableIntrinsic's get/setUndef to get/setKillLocation - IMO it'd be better to keep the names consistent here.


TODO? Or can this comment be removed?


nit: Chrome's built in spell check says gnarlyness isn't a word, but it also doesn't think the result from Googling it, gnarliness, is either, so I don't know what to believe.

Code changes LGTM pending inline comments. I haven't looked at the tests yet.


Looking at the constructor definition (and the parameter type), this comment is possibly out of date? DVI is not necessarily a dbg.value.


Might be worth mentioning something about assuming this is a dbg.value?


from_here -> FromHere?
Froms -> From's


Not sure about "happens" but can't come up with anything better myself. For debugging information that takes effect before this instruction? For variable locations that start before this instruction? I'll leave it with you, feel free to ignore this.






Please could you explain this change to me (why/what it's doing)? It seems the couple of months I've not touched metadata have clouded my memory.


Can we drop these braces? IMO they are just a bit of added clutter here but YMMV, feel free to ignore.


Any reason Type isn't initialized in the member initializer list like the other fields? And is it right to assume LocationType::Value?


IMO worth asserting this, since metadata tends to be tricksy. wdyt?


(personal preference nit: unnecessary braces, ymmv, feel free to ignore)


I think this line is redundant (eraseFromParent -> removeFromParent -> MarkedInstr->DbgMarker = nullptr?).


nit: Unnecessary return


Why reverse here? (And if it's not needed, can the fors be merged with a conditional insert pos?)


nit: unnecessary braces, same below.

LGTM once inline comments have been addressed




Is there an API for handling this behind the scenes by adding DPValues to instructions? If not, why not, if yes, please can we test that too?

jmorse added inline comments.Oct 2 2023, 9:14 AM

Yuh; in fact, given that this is a transitory state that crops up rarely, we can probably put a map in Function rather than making every block pay a penalty.


I...... can't remember. Or at least, I think this comes from the work @StephenTozer did on these patches, and this deep into the metadata is out of my zone of comfort too. Any opinion on this Stephen?

I'll also see if we can just delete it.

jmorse updated this revision to Diff 557539.Oct 2 2023, 11:08 AM
jmorse marked 12 inline comments as done.

Implement review feedback from Orlando.

New changes LGTM. There are still a few outstanding questions to look at (including the inline question for @StephenTozer) so I'll hold off on Accepting just yet.


SGTM, IMO it's ok to put that change in another patch if it makes things easier, since this does already all work and has been performance tested etc.





jmorse marked an inline comment as done.Oct 3 2023, 2:56 AM

(Missed inline comments being submitted...)


Shifted them up to the "private" area of the class


Giving this another eyeball -- previously all DIArgList's have been referred to through a single MetadataAsValue object which (AFAIUI) could be directly looked up, and whenever an operand changed that single MAV/DIArgList gets rewritten and re-uniqueified. Which is fine because with only one reference to that Metadata, it's like being RAUW'd.

However with the DebugValueUser changes in Metadata.cpp here, we're tolerating replaceable metadata references to ValueAsMetadata's and DIArgLists -- instead of funnelling connections from the Value hierarchy to the Metadata hierarchy through a VAM/MAV pair there's a web of connections. Thus, now when an operand to a DIArgList changes, we have to go through all the DPValues that refer to this DIArgList and redirect them to the new one. That means the code here changes from "this Metadata object now means something different" to "rewrite all the users to point at this new object".

Is it a good idea to use this web of connections instead of the funnel? That's something we can model/measure at a later date seeing how DPValues are going to be using it a lot.


There is one coming in D154353, just this is the order that it made sense to stage the patches (insert shrugging emoticon here). Alright if we patch this up later?

Orlando accepted this revision.Oct 4 2023, 2:14 AM



Ok, tyvm for the explanation.


Ah I've got it - if InsertAtHead then prepend, otherwise append.



This revision is now accepted and ready to land.Oct 4 2023, 2:14 AM
jmorse marked an inline comment as done.Oct 17 2023, 9:26 AM

Sitrep on the TrailingDPValues situation -- we had a chat offline, and this is sort of a leftover from an earlier prototype where all DPValues were in a "shadow" list that existed behind the main instruction-list, and had a total order on them. TrailingPDValues was the "end" of that list. The model we've ended up with doesn't have that total order (because collections of DPValues are split between instructions) but we still use TrailingDPValues to pretend it is, so there's a bit of extra reshuffling that has to happen.

Long story short, I'm going to shift storage of any DPValues that dangle while we remove/add terminators into LLVMContext, as it's effectively temporary global data that we want to hide from the usual instruction APIs, and it doesn't make sense to allocate it into every block or function.

jmorse updated this revision to Diff 557864.Oct 24 2023, 5:51 AM

Here's a revised patch -- I've shifted the location of the TrailingDPValues data structure into LLVMContextImpl, as it seems we only ever have one or two blocks with trailing DPValues at a time. It doesn't make sense to bloat BasicBlock with a dedicated field for a special case. I'm planning on plumbing this through BasicBlock, as a result the relevant useful methods turn up in the next patch (D154080), so I've left an llvm_unreachable in DPMarker::removeMarker until that lands.

I've also shoved an assertion in LLVMContext that checks whether any DPValues get accidentally left in TrailingDPValues for the lifetime of the LLVMContext. The only scenario where I think that can happen is where someone erases a terminator and then destroys the LLVMContext? It's sort of awkward behaviour, however I think in most scenarios if you leave a datastructure in an illegal state and try to clean it up gracefully, you'll run into issues anyway.

The confusing pair of loops in DPMarker::cloneDebugInfoFrom have also been refactored as a I realised they can just be a single one, and I've added some comments.

Finally, I've noticed that this patch won't build independently as there are some forward declarations of stuff that doesn't have a definition, that later needs a definition -- this is due to the shifting sands of this patch series. I could jam in various shims to make it work; or I could just land the next patch at the same time, which works fine. I think that's a reasonable approach as these were only split up to make review easier anyway.

jmorse requested review of this revision.Oct 24 2023, 5:56 AM

Other bits and pieces that have changed, now that I look at the history-diff:

  • setTailCall() is called on DPValues that are converted back to dbg.values. This is consistent with the rest of LLVM.
  • I've added a global static DPMarker with an empty StoredDPValues field, so that we can always produce a legitimate, empty range of DPValues in the added getEmptyDPValueRange, from methods where there's no debug-info.

The latter is another case of patch shuffling. This prototype as it stands is going to always allocate one DPMarker per instruction, but a later patch is going to optimise that to reduce the number of memory allocations. As a result there are going to be (in the future) scenarios where we need to describe empty ranges of DPValues, but there's no DPMarker to do that with, hence the static one. It just happened to end up in this particular patch.

Orlando accepted this revision.Nov 2 2023, 2:50 AM

LGTM + a few comments

Finally, I've noticed that this patch won't build independently as there are some forward declarations of stuff that doesn't have a definition, that later needs a definition -- this is due to the shifting sands of this patch series. I could jam in various shims to make it work; or I could just land the next patch at the same time, which works fine. I think that's a reasonable approach as these were only split up to make review easier anyway.

If the two commits can't be easily separated, perhaps better to squash them? YMMV.


This will bite us if someone tries to insert a DPValue in that range

Is there no way to create an empty range without an object to grab iterators from?

If not, you could return make_range(EmptyDPMarker.StoredDPValues.end(), EmptyDPMarker.StoredDPValues.end()); (end(), end()) to avoid those issues? wdyt?

50 ↗(On Diff #557864)

nit: remove comma

1661–1663 ↗(On Diff #557864)

DenseMap::lookup returns a default constructed ValueType (so, nullptr here).

This revision is now accepted and ready to land.Nov 2 2023, 2:50 AM
jmorse marked an inline comment as done.Nov 2 2023, 3:34 AM
jmorse added inline comments.

We can generate iterators with nullptr pointers in them, and they'll compare equal indicating that the range is empty, but I fear this is going to be leaning on undefined behaviour in some way that comes back to bite us later. Returning end-end ranges sounds good, hopefully any faults that cause things to be stored in EmptyDPMarker will then trigger the leak-sanitizer buildbots.

jmorse marked an inline comment as done.Nov 2 2023, 3:34 AM