This is an archive of the discontinued LLVM Phabricator instance.

[llvm-reduce] Add pass that reduces DebugInfo metadata
ClosedPublic

Authored by ormris on Aug 17 2022, 3:15 PM.

Details

Summary

This new pass for llvm-reduce attempts to reduce DebugInfo metadata. The strategy is as follows:

  1. Scan every MD node, keeping track of nodes already visited.
  2. Look for DebugInfo nodes, then record any operands that are lists.
    • Ignore empty lists.
  3. Bisect though all the elements of the collected lists.
    • Ignore any elements that are not references to other DebugInfo MD.
  4. If all elements of the list are null, replace with an empty list.

This process can dramatically simplify the module. Without going into exactly what is removed, the difference in the final file size show how large the difference is. This uses a large internal bitcode test case.

Reduction strategySize
No reduction22M
Current49K
With Patch5.6K

I've taken more measurements of the runtime impact and it's not too bad. It's negligible when only 1 thread is used, but increases with multiple threads.

Threads:Current (s)With Patch (s)
-j 1203.33203.67
-j 8201.33203.33

Diff Detail

Event Timeline

ormris created this revision.Aug 17 2022, 3:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2022, 3:15 PM
Herald added a subscriber: mgorny. · View Herald Transcript
ormris requested review of this revision.Aug 17 2022, 3:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2022, 3:15 PM

thanks for the patch, I've really wanted something like this for a while (actually my simpler idea was to just run the strip pass at the beginning to see if we could remove all debug info, but this is probably better).

I'm a bit busy right now, but will try to find time to review this unless somebody else gets to it before me

llvm/tools/llvm-reduce/DeltaManager.cpp
71

I wonder if we should move the metadata passes later where we should already have removed as many potential metadata havers, e.g. after instructions or ir-passes

ormris added inline comments.Aug 18 2022, 5:59 PM
llvm/tools/llvm-reduce/DeltaManager.cpp
71

Good point. I've run some tests locally and it seems like moving it after the instructions pass saves a few seconds. I'll change the patch.

PositionTime (s)
Current85.3
Last81.8
After instructions80.96
ormris updated this revision to Diff 454066.Aug 19 2022, 11:05 AM

Change pass pipeline slightly to improve performance.

Gentle ping

sorry for the slow review

do you have cases where stripping the debug info e.g. passing the IR through opt -passes=strip` makes the IR uninteresting, but this significantly helps over just the metadata pass?

llvm/test/tools/llvm-reduce/remove-dimetadata.ll
2–3

looks copy/pasted

5

could you add --abort-on-invalid-reduction?

llvm/tools/llvm-reduce/DeltaManager.cpp
71

running dimetadata before metadata seems better since dimetadata can make metadata do less work, but not vice versa

74

can this be di-metadata? dimetadata is too squished for me

llvm/tools/llvm-reduce/deltas/ReduceDIMetadata.cpp
46–47

making ToLook a vector and using pop_back here seems nicer

51

nit: formatting
I see some other unformatted code, did you run clang-format?

54

check if MDT is null?

65

should this be at the top of the loop?

71

with C++17 we should be able to use auto [a, b, c] = foo;

97

check if this is null?

aeubanks added inline comments.Sep 1 2022, 2:22 PM
llvm/tools/llvm-reduce/deltas/ReduceDIMetadata.cpp
75

use brackets for bodies with more than one line

84–85

is it possible for multiple DINodes to refer to a tuple?

actually I just happened to run across a case where this would be useful, but I'm getting tons of verifier errors in reduction to do with invalid template parameter

ormris updated this revision to Diff 461363.Sep 19 2022, 2:34 PM

Changelog:

  • clang-format
  • Rename "dimetadata" to "di-metadata"
  • Code and data structure clean-up
  • Add "--abort-on-invalid-reduction" to test
  • Fix header comments

Everything should be addressed. Sorry for the delay. Things have been busy over here.

actually I just happened to run across a case where this would be useful, but I'm getting tons of verifier errors in reduction to do with invalid template parameter

Yes, this does lean on the verifier. It's fairly blind to the structure of DI Metadata. Did it provide a useful reduction?

llvm/test/tools/llvm-reduce/remove-dimetadata.ll
2–3

Fixed

5

Fixed

llvm/tools/llvm-reduce/DeltaManager.cpp
71

Fixed

74

Fixed

llvm/tools/llvm-reduce/deltas/ReduceDIMetadata.cpp
46–47

FIxed

51

I did not. Should be fixed.

54

Fixed

65

No, that messes with checks in the body of the loop. There are two places where if check if the node is visited.

71

Fixed.

75

Fixed.

84–85

Yes. In my example IR files, multiple nodes refer to the same empty tuple.

97

Fixed

fhahn added a subscriber: fhahn.Sep 22 2022, 2:18 AM

Thanks for working on this. I was also looking into improving the situation, but this patch is more complete. I pushed an additional test case in 5be203ec4db5 which also covers the DICompileUnit case. It would need updating to use the di-metadata pass here.

llvm/tools/llvm-reduce/DeltaManager.cpp
64

Should this be split off as a separate patch? It seems independent of adding a new DI-metadata reduction pass.

llvm/tools/llvm-reduce/deltas/ReduceDIMetadata.cpp
94

I think you should also include DICompileUnit nodes: Program.debug_compile_units()

arsenm added inline comments.Sep 22 2022, 5:58 AM
llvm/tools/llvm-reduce/deltas/ReduceDIMetadata.cpp
31

DenseSet?

33

DenseSet?

Everything should be addressed. Sorry for the delay. Things have been busy over here.

actually I just happened to run across a case where this would be useful, but I'm getting tons of verifier errors in reduction to do with invalid template parameter

Yes, this does lean on the verifier. It's fairly blind to the structure of DI Metadata. Did it provide a useful reduction?

IIRC I eventually gave up because it kept failing the verifier.
In its current state I think this patch can't run in the default llvm-reduce pipeline because of this. In cases with certain types of debug info (e.g. templates) it'll significantly slow down the reduce. I'm ok with adding it as a pass that isn't run by default. Or if you can somehow get the pass to not fail the verifier with templated code (i.e. llvm-reduce --abort-on-invalid-reduction -delta-passes=di-metadata doesn't fail on some IR with template debug info).

ormris added inline comments.Sep 23 2022, 4:06 PM
llvm/tools/llvm-reduce/DeltaManager.cpp
64

Actually, this is an incorrect merge. Sorry for the confusion.

ormris updated this revision to Diff 462629.Sep 23 2022, 5:31 PM

Updating this patch to address verifier issues. From my reading of the verifier, and some internal tests, it looks like removing all elements of the tuple is valid for tuples attached to nodes like DISubprogram and DICompositeType. This also allows the pass to reduce those node types more fully. DICompileUnit nodes are now added to the queue. I've also updated fhahn's test to reflect the changes.

even with the latest patch I'm seeing verifier errors

!6249 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "BindUnwrapTraits<base::internal::UnretainedWrapper<v8::JobTask> >", scope: !22, file: !1379, line: 1456, size: 8, flags: DIFlagTypePassByValue, elements: !7, templateParams: !2223, identifier: ".?AU?$BindUnwrapTraits@V?$UnretainedWrapper@VJobTask@v8@@@internal@base@@@base@@")
!2223 = distinct !{null}
invalid template parameter

is it possible to not replace the metadata with null but rather create a new metadata node with just the remaining operands?

here's what I've been testing with

$ cat /tmp/reduce.sh 
#!/bin/bash
set -eu
[[ $(rg --count DILocation $@ ) -gt 2000 ]]
$ ./build/rel/bin/llvm-reduce --test /tmp/reduce.sh /tmp/b.ll -o /tmp/c.ll --delta-passes=di-metadata
ormris updated this revision to Diff 465169.Oct 4 2022, 2:50 PM

Thanks for the example and suggestion, that was really helpful. I've changed the pass to use that strategy. I don't get any verification errors when using my internal test case or the test case you provided, and the code is overall cleaner, I think.

arsenm added inline comments.Oct 4 2022, 2:56 PM
llvm/test/tools/llvm-reduce/remove-debug-info-nodes.ll
4

Can you add -abort-on-invalid-reduction here?

ormris updated this revision to Diff 465429.Oct 5 2022, 9:13 AM

Add --abort-on-invalid-reduction to test

ormris marked an inline comment as done.Oct 5 2022, 9:14 AM

some nits, but thanks for going this direction, looks much better

llvm/test/tools/llvm-reduce/remove-dimetadata.ll
23

can we be more explicit about this CHECK and the one below?
e.g.

; CHECK: DISubroutineType(types: [[CHILD:![0-9]+]])
; CHECK: [[CHILD]] = !{}

(I might have the exact syntax wrong)

llvm/tools/llvm-reduce/deltas/ReduceDIMetadata.cpp
37

when would this be false/null?

46

some indentation weirdness, git clang-format?

49

the outer if just checked that this isn't false/null

56

ditto

63

could you give these more descriptive names?

ormris marked 4 inline comments as done.Oct 5 2022, 3:44 PM
ormris added inline comments.
llvm/tools/llvm-reduce/deltas/ReduceDIMetadata.cpp
37

NMD could be null if debug metadata attachment was null. I believe it's legal to say !dbg !null, but I could be misremembering. The second condition could be false if multiple nodes had the same debug metadata attachment.

46

This is what clang-format outputs. Happy to make it look cleaner, but I couldn't see a way to improve it.

ormris updated this revision to Diff 465582.Oct 5 2022, 3:45 PM
arsenm added inline comments.Oct 5 2022, 3:45 PM
llvm/test/tools/llvm-reduce/remove-dimetadata.ll
5

Don't need cat, FileCheck %s < %t

ormris updated this revision to Diff 465592.Oct 5 2022, 4:22 PM
ormris marked an inline comment as done.

uh looks like https://reviews.llvm.org/D135237 was submitted, I'll need to see what's going on between this patch and that patch

lg with comments addressed, but we need to figure out what to do with https://reviews.llvm.org/D135237. I think that one is overly aggressive, I'll ask if we can use this patch instead

llvm/tools/llvm-reduce/deltas/ReduceDIMetadata.cpp
2

headers need updating

46

if you delete one of the leading spaces before the comment, the whole block will be properly formatted by clang-format

59

typically we check visited at the top of the loop, rather than when adding, because we may add something to the queue multiple times before visiting it, and we end up processing something multiple times

ormris added a comment.Oct 6 2022, 1:09 PM

Agreed. I just tested it using your test case and it crashed.

ormris updated this revision to Diff 465858.Oct 6 2022, 1:34 PM

Fix header comment and check for visited nodes.

llvm/tools/llvm-reduce/deltas/ReduceDIMetadata.cpp
59

typically we check visited at the top of the loop, rather than when adding, because we may add something to the queue multiple times before visiting it, and we end up processing something multiple times

ormris marked an inline comment as done.Oct 6 2022, 1:35 PM
ormris added inline comments.
llvm/tools/llvm-reduce/deltas/ReduceDIMetadata.cpp
59

Sorry, pressed reply rather than "Done". Ignore the comment above. This should be fixed.

aeubanks accepted this revision.Oct 6 2022, 2:04 PM

lg, and the other patch has been reverted. thanks for pushing this through!

This revision is now accepted and ready to land.Oct 6 2022, 2:04 PM
ormris added a comment.Oct 6 2022, 2:06 PM

Thanks for the review, @aeubanks and @arsenm! I'll committing this shortly.

ellis added a subscriber: ellis.Oct 6 2022, 2:09 PM
ellis added inline comments.Oct 6 2022, 2:18 PM
llvm/tools/llvm-reduce/deltas/ReduceDIMetadata.cpp
84

In D135237 I searched through Program.named_metadata() which includes things like !llvm.module.flags and !llvm.dbg.cu (I assume this is the same as Program.debug_compile_units()). Do we also want to reduce these? My guess is this patch won't remove unnecessary module flags. Is that the case?

aeubanks added inline comments.Oct 6 2022, 2:22 PM
llvm/tools/llvm-reduce/deltas/ReduceDIMetadata.cpp
84

At least in regards to !llvm.dbg.cu, I've seen DICompileUnit not listed in llvm.dbg.cu verifier errors, so we need to take care there. !llvm.module.flags should be reducable (and I do that manually a lot, so it'd be nice to get that working)

This revision was landed with ongoing or failed builds.Oct 6 2022, 2:25 PM
This revision was automatically updated to reflect the committed changes.
MatzeB added a subscriber: MatzeB.Oct 6 2022, 4:27 PM
MatzeB added inline comments.
llvm/tools/llvm-reduce/DeltaManager.cpp
71

running dimetadata before metadata seems better since dimetadata can make metadata do less work, but not vice versa

Is that true? I think I typically see metadata drop references to DICompileUnit nodes and with it all the debug info disappears, reducing the work for dimetadata...

llvm/tools/llvm-reduce/deltas/ReduceDIMetadata.cpp
35–37

Isn't Visited empty in this loop and !Visited() will just always be true? So this loop in effect just copies from MDs to ToLook but filtering out nullptr. The next loop then seems to use Visited to ensure every node is only checked once even if it appears multiple times in the ToLook list.

Unless I miss something here, wouldn't it be a lot easier to:

  1. Turn MDs into a std::set<MDNode*> or similar so duplicates get eliminated immediately?
  2. Don't add nullptrs in the first place (or alternative just skip the 1 unique nullptr in the later loop)
  3. No need for ToLook or pop_back anymore, just do for (MDNode *MD : MDs) for the loop in line 39?
MatzeB added inline comments.Oct 6 2022, 4:29 PM
llvm/test/tools/llvm-reduce/Inputs/remove-dimetadata.py
1–8

This checks whether there is at least 1 line in the input with the word "interesting" in it.

But you can get that with somethign like FileCheck --prefix=INTERESTING %s < xxx INTERESTING: interesting instead with the need for a custom python script, can't you?

MatzeB added inline comments.Oct 6 2022, 4:34 PM
llvm/tools/llvm-reduce/deltas/ReduceDIMetadata.cpp
35–37

Oh I guess this a worklist like algorithm to traverse MDNodes recursively. So you can ignore most of that my comment. Though the Visited use in line 36 still seems unnecessary.

MatzeB added a comment.Oct 6 2022, 4:36 PM

Also just realized this already landed, I guess I'm also fine not having my nitpicks addressed then.

chapuni added a subscriber: chapuni.Oct 7 2022, 2:09 AM

I have seen flaky behavior.
(excuse me, in internal builders)

llvm/tools/llvm-reduce/deltas/ReduceDIMetadata.cpp
63

'Tuples' is DenseMap. Would it be safe for loop?

I have seen flaky behavior.
(excuse me, in internal builders)

you mean a failing test? do you have example output?

We're seeing a test fail intermittently as well. See below. I have a fix and am about to post a review for it.

https://lab.llvm.org/buildbot/#/builders/139/builds/29185

if it's not a trivial fix, I'd revert this patch and reupload it for review with the fix

It's pretty trivial, but I also wanted to address @MatzeB's post commit feedback. https://reviews.llvm.org/D135473