This is an archive of the discontinued LLVM Phabricator instance.

[Assignment Tracking][5/*] Add core infrastructure for instruction reference
ClosedPublic

Authored by Orlando on Aug 19 2022, 6:14 AM.

Details

Summary

The Assignment Tracking debug-info feature is outlined in this RFC. This first series of patches adds documentation, the changes necessary to start emitting and using the new metadata, and updates clang with an option to enable the feature. Working with the new metadata in the middle and back end will come later. There are still a few rough edges but I'm putting these patches up now hoping to get feedback on the design and implementation from the upstream community.


Overview

It's possible to find intrinsics linked to an instruction by looking at the MetadataAsValue uses of the attached DIAssignID. That covers instruction -> intrinsic(s) lookup. Add a global DIAssignID -> instruction(s) map which gives us the ability to perform intrinsic -> instruction(s) lookup. Add plumbing to keep the map up to date through optimisations and add utility functions including two that perform those lookups. Finally, add a unittest.

Details / patch tour

In llvm/lib/IR/LLVMContextImpl.h add AssignmentIDToInstrs which maps DIAssignID * attachments to Instruction *s. Because the DIAssignID * is the key we can't use a TrackingMDNodeRef for it, and therefore cannot easily update the mapping when a temporary DIAssignID is replaced.

Temporary DIAssignID's are only used in IR parsing to deal with metadata forward references. Update llvm/lib/AsmParser/LLParser.cpp to avoid using temporary DIAssignID's for attachments.

In llvm/lib/IR/Metadata.cpp add Instruction::updateDIAssignIDMapping which is called to remove or add an entry (or both) to AssignmentIDToInstrs. Call this from Instruction::setMetadata and add a call to setMetadata in Intruction's dtor that explicitly unsets the DIAssignID so that the mappging gets updated.

In llvm/lib/IR/DebugInfo.cpp and DebugInfo.h add utility functions:

getAssignmentInsts(const DbgAssignIntrinsic *DAI)
getAssignmentMarkers(const Instruction *Inst)
RAUW(DIAssignID *Old, DIAssignID *New)
deleteAll(Function *F)

These core utils are tested in llvm/unittests/IR/DebugInfoTest.cpp.

Notes / observations

This all needs to be looked at closely from a performance perspective once it's up and running. This mapping is obviously quite intrusive, but I'm not sure we can get around that.

Diff Detail

Event Timeline

Orlando created this revision.Aug 19 2022, 6:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2022, 6:14 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
Orlando requested review of this revision.Aug 19 2022, 6:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2022, 6:14 AM
Orlando updated this revision to Diff 456661.Aug 30 2022, 7:17 AM

+ Add context to diff

jmorse added a subscriber: jmorse.Sep 5 2022, 10:03 AM
jmorse added inline comments.
llvm/include/llvm/IR/DebugInfo.h
163

On the one hand, "llvm::at::" isn't the most descriptive of namespaces; but on the other hand, I don't believe llvm provides a stable C++ API, so if someone doesn't like it then it can just be changed. Good use of namespaces IMO.

163

NB: a high level comment saying "Utilities for enumerating storing instructions from an assignment ID", and then from line 175 an equivalent saying "For enumerating dbg.assigns..." would make this easier to read. (IMHO, YMMV).

llvm/include/llvm/IR/Instruction.h
518–519

(Three slashes)

llvm/include/llvm/IR/IntrinsicInst.h
288–291 ↗(On Diff #456661)

IMO can be condensed to just "This method abstracts where the fragment is stored, for intrinsics with more than one expression" or something that doesn't name specific instructions. (The concern being that the comment will rot rapidly if it's too specific).

llvm/lib/IR/DebugInfo.cpp
1656–1659

Depending on how this API is to be used, would it be better to assert that an existing DIAssignID is always used? "Fail fast fail hard" works pretty well to detect deep errors.

1662

Spurious ;

1673–1675

Is there a risk that the SmallVector providing storage to the range being iterated on, is re-allocated / invalidates-iterators during the call to setMetadata, seeing how setMetadata -> updateDIAssignIDMapping -> erases / clears parts of the SmallVector?

If so, might be best replaced with something that modifies AssignmentIDToInstrs directly.

llvm/lib/IR/IntrinsicInst.cpp
86 ↗(On Diff #456661)

This has a side-effect, but is called in an assertion -- that'll mean the side-effect only happens when LLVM is built with assertions, which is presumably undesirable?

94–95 ↗(On Diff #456661)

DbgAssignAddrReplaced should be called? (It's missing parentheses)

179–184 ↗(On Diff #456661)

AFAIUI, and I might be very wrong here, replaceUsesOfWith isn't declared virtual and so only code that casts an instruction to DbgAssignIntrinsic will take this code path. Also, the base replaceUsesOfWith already iterates over all operands and updates them, so this might not be necessary.

llvm/lib/IR/LLVMContextImpl.h
1502–1504

(Three slashes)

llvm/lib/IR/Metadata.cpp
1430

Could we make this DIAssignID * instead of auto *?

(I have this twitchy feeling because every variable in this function is auto; and all the rest of them are totally justifiable, they're container references and iterators. Except this!).

1438

Clearer assertion message please

1442

IMO: can / should be an assertion, right?

1445
llvm/lib/IR/Verifier.cpp
4554–4557

Possibly stupid question, but isn't the set iterated over here the same as in the users list above? Is it worth putting the AssertDI inside the loop above? (Feels like a massive nit pick over a tiny performance thing, feel free to ignore).

6025–6032

Feels better to use OpAddress etc rather than hard coded operand indexes.

llvm/unittests/IR/DebugInfoTest.cpp
444

std::next preferred I think (what if begin() returns a reference that gets mutated?)

Orlando updated this revision to Diff 458694.Sep 8 2022, 2:55 AM
Orlando marked 18 inline comments as done.

+ Address review comments
+ Fix verifier failure in unittest (wrong subprogram for variable)
+ Add test for verifier check added in this patch

llvm/include/llvm/IR/IntrinsicInst.h
288–291 ↗(On Diff #456661)

That change leaked into this patch from the previous patch in the series (D132223) that added the dbg.assign boilerplate, sorry for the noise.

llvm/lib/IR/DebugInfo.cpp
1656–1659

This isn't a error state - you can get into the situation where you have a DIAssignID attachment on a function which is not used by any llvm.dbg.assign intrinsics. For instance, if a llvm.dbg.assign has been deleted due to living in a dead block -- whether or not _that_ is desirable needs to be looked at on a case-by-case basis IMO, so an assert would be too broad.

1673–1675

Good point, and I've added some "Iterators invalidated by ..." comments to the getAssignment... functions.

llvm/lib/IR/IntrinsicInst.cpp
86 ↗(On Diff #456661)

This stuff is from the previous patch in the stack, sorry!

llvm/lib/IR/Verifier.cpp
4554–4557

Not a stupid question, they are the same. I don't mind either way so I've changed it to your preference. I've also updated the verifier tests added in recent updates to earlier patches to check this codepath.

6025–6032

This also comes from the previous patch in the stack, but I will apply this suggestion to it over there.

jmorse accepted this revision.Sep 8 2022, 4:35 AM

LGTM

llvm/lib/IR/DebugInfo.cpp
1656–1659

SGTM

This revision is now accepted and ready to land.Sep 8 2022, 4:35 AM

Thanks!

llvm/lib/IR/DebugInfo.cpp
1656–1659

I can't seem to edit inline comments. For posterity: in the first sentence, when I said "on a function" I meant "on an instruction".

chrisjackson added inline comments.Sep 10 2022, 8:36 AM
llvm/include/llvm/IR/DebugInfo.h
163

I'm bikeshedding but i think the namespace name is too terse. Also possibly not nice to read because it's a word and an acronym

llvm/lib/AsmParser/LLParser.cpp
860

Possibly need to assert on ToReplace?

862

nit, Is assert text really a question? Can you just state instead?

Orlando marked an inline comment as done.Sep 12 2022, 1:55 AM

Hi @chrisjackson, thank you for taking a look at these patches! :-)

llvm/include/llvm/IR/DebugInfo.h
163

Any suggestions or thoughts on alternatives?
astr (or does the 'ast' or possible 'str' stand out too much?)
astra (too fun?)
asst (not a fan)
atra ... etc.

llvm/lib/AsmParser/LLParser.cpp
860

Original loop didn't see fit to do so, so I think we're okay not to too?

862

Sure, will do before landing if there are no major changes to make.

Orlando marked 3 inline comments as done.Nov 7 2022, 4:03 AM
This revision was landed with ongoing or failed builds.Nov 7 2022, 4:03 AM
This revision was automatically updated to reflect the committed changes.

This patch added a cyclic dependency that breaks the module build. Could you please fix/revert it?

https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/48197/consoleFull#-69937453049ba4694-19c4-4d7e-bec5-911270d8a58c

In file included from <module-includes>:1:
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/IR/Argument.h:18:10: fatal error: cyclic dependency in module 'LLVM_IR': LLVM_IR -> LLVM_intrinsic_gen -> LLVM_IR
#include "llvm/IR/Value.h"
         ^
While building module 'LLVM_MC' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/MC/MCAsmInfoCOFF.cpp:14:
While building module 'LLVM_IR' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/MC/MCPseudoProbe.h:57:
In file included from <module-includes>:12:
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/IR/DebugInfo.h:24:10: fatal error: could not build module 'LLVM_intrinsic_gen'
#include "llvm/IR/IntrinsicInst.h"
 ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
While building module 'LLVM_MC' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/MC/MCAsmInfoCOFF.cpp:14:
In file included from <module-includes>:15:
In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/MC/MCContext.h:23:
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/MC/MCPseudoProbe.h:57:10: fatal error: could not build module 'LLVM_IR'
#include "llvm/IR/PseudoProbe.h"
 ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/MC/MCAsmInfoCOFF.cpp:14:10: fatal error: could not build module 'LLVM_MC'
#include "llvm/MC/MCAsmInfoCOFF.h"
 ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
4 errors generated.

This patch also breaks an LLDB test on Apple Silicon:

https://ci.swift.org/view/LLDB/job/llvm-org-lldb-release-debuginfo/658/consoleFull

******************** TEST 'lldb-shell :: ScriptInterpreter/Python/Crashlog/app_specific_backtrace_crashlog.test' FAILED ********************
Script:
--
: 'RUN: at line 3';   mkdir -p /Users/ec2-user/jenkins/workspace/llvm-org-lldb-release-debuginfo/tools/lldb/test/Shell/ScriptInterpreter/Python/Crashlog/Output/app_specific_backtrace_crashlog.test.tmp.dir
: 'RUN: at line 4';   /Users/ec2-user/jenkins/workspace/llvm-org-lldb-release-debuginfo/bin/yaml2obj /Users/ec2-user/jenkins/workspace/llvm-org-lldb-release-debuginfo/llvm-project/lldb/test/Shell/ScriptInterpreter/Python/Crashlog/Inputs/application_specific_info/asi.yaml > /Users/ec2-user/jenkins/workspace/llvm-org-lldb-release-debuginfo/tools/lldb/test/Shell/ScriptInterpreter/Python/Crashlog/Output/app_specific_backtrace_crashlog.test.tmp.dir/asi
: 'RUN: at line 5';   /Users/ec2-user/jenkins/workspace/llvm-org-lldb-release-debuginfo/bin/lldb --no-lldbinit -S /Users/ec2-user/jenkins/workspace/llvm-org-lldb-release-debuginfo/tools/lldb/test/Shell/lit-lldb-init-quiet -o 'command script import lldb.macosx.crashlog'  -o 'crashlog -a -i -t /Users/ec2-user/jenkins/workspace/llvm-org-lldb-release-debuginfo/tools/lldb/test/Shell/ScriptInterpreter/Python/Crashlog/Output/app_specific_backtrace_crashlog.test.tmp.dir/asi /Users/ec2-user/jenkins/workspace/llvm-org-lldb-release-debuginfo/llvm-project/lldb/test/Shell/ScriptInterpreter/Python/Crashlog/Inputs/application_specific_info/asi.ips'  -o "thread list" -o "bt all" 2>&1 | /Users/ec2-user/jenkins/workspace/llvm-org-lldb-release-debuginfo/bin/FileCheck /Users/ec2-user/jenkins/workspace/llvm-org-lldb-release-debuginfo/llvm-project/lldb/test/Shell/ScriptInterpreter/Python/Crashlog/app_specific_backtrace_crashlog.test
--
Exit Code: 1

Command Output (stderr):
--
/Users/ec2-user/jenkins/workspace/llvm-org-lldb-release-debuginfo/llvm-project/lldb/test/Shell/ScriptInterpreter/Python/Crashlog/app_specific_backtrace_crashlog.test:47:15: error: CHECK-NEXT: expected string not found in input
# CHECK-NEXT: frame #6: 0x00000001a05d3e4f dyld`start{{.*}}
              ^
<stdin>:42:45: note: scanning from here
 frame #5: 0x00000001047e3ecf asi`main + 127
                                            ^
<stdin>:43:2: note: possible intended match here
 frame #6: 0x00000001a05d3e4f
 ^

Input file: <stdin>
Check file: /Users/ec2-user/jenkins/workspace/llvm-org-lldb-release-debuginfo/llvm-project/lldb/test/Shell/ScriptInterpreter/Python/Crashlog/app_specific_backtrace_crashlog.test

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
          37:  frame #0: 0x00000001a0a58418 
          38:  frame #1: 0x00000001a05a2ea7 
          39:  frame #2: 0x00000001a0b3dcc3 
          40:  frame #3: 0x00000001a0b46af3 
          41:  frame #4: 0x00000001a09a12a3 
          42:  frame #5: 0x00000001047e3ecf asi`main + 127 
next:47'0                                                 X error: no match found
          43:  frame #6: 0x00000001a05d3e4f 
next:47'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:47'1      ?                             possible intended match
          44: (lldb) thread list 
next:47'0     ~~~~~~~~~~~~~~~~~~~
          45: Process 96535 stopped 
next:47'0     ~~~~~~~~~~~~~~~~~~~~~~
          46: * thread #1: tid = 0x1af8f3, 0x00000001a08c7224, queue = 'com.apple.main-thread', stop reason = EXC_CRASH (code=0, subcode=0x0) 
next:47'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          47: (lldb) bt all 
next:47'0     ~~~~~~~~~~~~~~
          48: * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_CRASH (code=0, subcode=0x0) 
next:47'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .
>>>>>>

--

********************

Could you please revert the patch and fix the issues?

I am reverting this patch because of the bot failures Adrian mentioned above. I am sorry for any inconvenience!

I am reverting this patch because of the bot failures Adrian mentioned above. I am sorry for any inconvenience!

No worries, this was after office hours for me. Thanks for the revert(s).

Re-landed in 26382a4412d29e1c31fc3cda5071d5d60832b69c in which I've updated llvm/include/llvm/module.modulemap. I'm not familiar with how module maps work but this looks like it has fixed it.

I've folded D133576 into this commit (as was the original plan).

This patch added a cyclic dependency that breaks the module build. Could you please fix/revert it?

That should be fixed now.

This patch also breaks an LLDB test on Apple Silicon:
...
Could you please revert the patch and fix the issues?

I don't think my patch was responsible for this test failure though given that a run with the patch reverted (https://ci.swift.org/view/LLDB/job/llvm-org-lldb-release-debuginfo/673/) still has that failure AFAICT.