This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo][RemoveDIs] Add conversion utilities between dbg.value form and DPValue new-debug-info form

Authored by jmorse on Jun 29 2023, 6:51 AM.



This patch contributes some more plumbing to the "removing debug intrinsics" design: specifically utility methods for converting from one format into another. As it stands, it's going to be possible for blocks to move between "dbg.value" format (intrinsics) and "DPValue" format (no intrinsics) as needs be. As a result, Modules / Functions / Blocks grow a Boolean flag to indicate which format they currently contain, and have helper methods to initiate transferring from one to the other.

There's no consideration for serialising the flag to textual IR or bitcode: this is purely an in-memory representation right now. There's a "validate" method added to blocks to check that the debug-info isn't obviously broken, this is a smoke test rather than something the Verifier needs to be rigorous about.

There are a few places where we scatter calls to set the "IsNewDbgInfoFormat" flag on block or function creation: normally the functions and blocks have the flag set when they're inserted into their parent, but in a few scenarios debug-info sensitive operations occur before that happens, therefore the flag has to be explicitly set on creation. This feels fairly dumb, but it's all in aid of migration/transition rather than a long term code decision.

The unit test added takes a function with dbg.values, converts it to the new DPValue form, and confirms that the layout of the data structures is as expected, and that we can convert back to exactly the same format.

Diff Detail

Event Timeline

jmorse created this revision.Jun 29 2023, 6:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 6:51 AM
Herald added subscribers: Enna1, ormris, foad and 4 others. · View Herald Transcript
jmorse requested review of this revision.Jun 29 2023, 6:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 6:51 AM

LGTM with some minor questions


I think this comment could be slightly misleading - the function appears to set the flag and change the block to the desired debug info format.


I think it's worth copying the comment you have in BasicBlock.h here (or redirecting to it) - that one clearly identifies what true and false indicate.


See my comment in Function.h




What about other debug intrinsics? What's the plan for those?


Any reason not to put in the member initialiser list?


This pattern is used in a few places. Could be worth using a filter_iterator (llvm::make_filter_range etc), similar to what instructionsWithoutDebug() does, that skips instructions without debug markers? YMMV.


for (auto &Inst : instructionsWithDebugMarkers())


I think the verifier has a raw_ostream member OS; better to pass that in than always printing to stderr in validateDbgValues?

1777–1778 ↗(On Diff #535765)

Shouldn't CloneBasicBlock be responsible for this?


As discussed in the previous patch can we please now use createMarker and friends in this function?


Possibly worth confirming the other dbg.value fields are what we expect to see?

jmorse updated this revision to Diff 557866.Oct 24 2023, 7:50 AM
jmorse marked 9 inline comments as done.

Address review comments of various forms; also add the plumbing for replacing TrailingDPValues with an LLVMContext stored map of Block<=>TrailingDPValues.

Looking at the history-diff, I've also added some flag-maintenance in IRMover, which I believe is involved in moving modules around. I think there were just a few code paths there where we start constructing replacement modules/functions, and have to start off with the flag being correct. (This is all cost+overhead of having the two implementations available right now).

I've also added a createMarker method for iterators, to ease the creation of TrailingDPValue records and for when we're not completely sure if we're inserting at end() or not.


Later patches, basically. As this is being staged-in incrementally, we can get a totally-working implementation that only operates on dbg.values (which represent the most complicated problems to solve) and then naturally extend to dbg.declare, dbg.assign, dbg.label and pseudo probes.


I've been trying to steer clear of that for the moment, to make it harder to write things that rely on implementation details. That'll discourage us using such things like filter iterators in the optimisation passes in favour of proving APIs into/out-of the core classes that Do Things (TM). That then means we can decouple optimisations from how things are stored, making it easier to test out different storage models.

Orlando accepted this revision.Nov 2 2023, 2:21 AM


This revision is now accepted and ready to land.Nov 2 2023, 2:21 AM
jmorse closed this revision.Nov 2 2023, 6:31 AM

Closed by rG7d77bbef4ad9230 . Note that I lost one of the hunks in IRMover for assigning IsNewDbgInfoFormat as that was leading to some tests assertions where slightly-incompatible objects are linked between IR files (link-global-to-func.ll, which should fail, not assert). This hunk will instead appear in the next patch series once I'm confident enough there aren't any unintended side-effects.

jmorse added a subscriber: nikic.Nov 2 2023, 7:56 AM

I'm slightly surprised to see that this is regressing -g performance a bit [0], I'd thought I'd kept all these patches from actually being _used_ by LLVM until later ones land and switches are enabled. Two possibilities to my mind:

  • It's the linear walk over the blocks instructions to clear DPMarkers at the end. That's something we can polish later, and #ifdef guard for now.
  • Something in the metadata maps added in D153990 is getting slower? If so that must be down to branching?

All of this is #ifdef guardable while we're getting this prototype in, I just need to work out what part is the cause. CC @nikic to signal that we're doing something about this.


nikic added a comment.Nov 2 2023, 12:43 PM

FYI this has also increased time to build clang by a whopping 3%. My first suspicion is that your DebugProgramInstruction.h include in BasicBlock.h pulls in the kitchen sink.

jmorse added a comment.Nov 3 2023, 3:08 AM

FYI this has also increased time to build clang by a whopping 3%. My first suspicion is that your DebugProgramInstruction.h include in BasicBlock.h pulls in the kitchen sink.

Ah, that's not on our radar at all, there'll be much to improve there.

jmorse added a comment.Nov 8 2023, 7:46 AM

It looks like the major parts of the compile-time regression was down to the Metadata-handling changes in D153990, specifically handling DIArgList differently to other MDNodes causes a big IR-switch statement to not be optimised away, and that has a knock-on effect on inlining decisions all around Metadata.cpp. This is unfortunate, but it's also subject to change, as we're planning on re-profiling and then reimplementing how metadata-tracking works at a later date, once we're confident there are no more invasive changes. I've had some close chats with @StephenTozer and he's got some ideas on what we could change in the future to avoid an eventual regression.

Thus, in the meantime, I'm going to reland with the small key segments #ifdef'd away with the EXPERIMENTAL_DEBUGINFO_ITERATORS flag. This is enabled on this buildbot: so it's not like we're landing dead code that will rot immediately.

The net result: DIArgList's are going to continue to be distinct without the compile-time flag on, which is alright because there can only ever be one reference to them, baked into a MetadataAsValue object that can be directly looked up out of LLVMContext (which is how things currently work). With the compile-time flag on, they'll be unique'd, and will require replaceable references to them to be maintained, because DebugValueUser's (aka DPValues) can refer to them.

(Compile-of-clang-time regression fixed by switching to forward declarations).