We frequently run bugpoint on a linked-together module of all modules we jit while compiling the julia standard library. As a result, this module has a very large number of compile units (10000+) in llvm.dbg.cu, which didn't get reduced at all, requiring manual post processing. This is an attempt to
have bugpoint go through and attempt to reduce the number of global named metadata nodes as well as their operands, to cut down the number of roots for such metadata.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Test cases?
tools/bugpoint/CrashDebugger.cpp | ||
---|---|---|
530 ↗ | (On Diff #38337) | Please use a shared_ptr or something. Raw pointers should be avoided. |
533 ↗ | (On Diff #38337) | min(10, size) |
549 ↗ | (On Diff #38337) | Possibly: ToDelete.reserve(NamedMD.size())? |
611 ↗ | (On Diff #38337) | shared_ptr |
615 ↗ | (On Diff #38337) | Do you need to worry about iterator invalidation here? Would it be easier to just remove the operands or set them to undef? |
816 ↗ | (On Diff #38337) | Please add an option to easily disable these passes. |
821 ↗ | (On Diff #38337) | insert(end, md_begin, md_end) |
831 ↗ | (On Diff #38337) | insert(end, op_begin, op_end) |
This sounds awesome. I won't have a chance to look until at least tomorrow, but I know Adrian and Pete both had ideas around bugpoint and debug info so I'm CCing them.
- dpnes
tools/bugpoint/CrashDebugger.cpp | ||
---|---|---|
530 ↗ | (On Diff #38337) | I copied that from above. Happy to change the API throughout, but that should be a separate change. |
549 ↗ | (On Diff #38337) | Sure |
615 ↗ | (On Diff #38337) | No, the iterators are in the original module, but the ones we're dropping are in the new one. I didn't see an obvious way to just remove operands. I agree that would be preferable. |
816 ↗ | (On Diff #38337) | Will do. |
821 ↗ | (On Diff #38337) | Needs to convert from named md to its name |
Philip has been doing most of the reviewing, so I'll defer to him for a final LGTM. But FWIW, this LGTM given the work to address his feedback.
BTW, my approach was a little different. You are reducing the number of CU's which is great. I should have tried that first.
My approach was to reduce the number of types, global variables, imported entities, and subprograms in an individual CU. Given that these tend to also be referenced from llvm.dbg.* intrinsics, many dbg info entries would remain in the metadata, even after being removed from the CU. From there, I would then strip dbg info from instructions and see if there was a reduced set which would still crash. Eventually I'd get the smallest possible CU in terms of types, subprograms, etc, which demonstrate the bug.
Unfortunately, this didn't seem to be 100% reliable. I kept encountering crashes in llc when it tried to emit the debug info as it thought entries were missing, or weren't in states it wanted them to be in. So I may have actually found real bugs, i'm not sure, but those bugs made it impossible to run llc in bug point and know that the crashes were the original bug and not what i was causing with reduced debug info.
Anyway, i still have the patch lying around if its ever of interest. Its about 6 months old though so likely needs fixed to work on ToT.
Yes, I think that would be a great step further. As I mentioned, this patch arose from the very specific situation of having tons of otherwise unreferenced CUs left over, but there is certainly room for more.
(Just in case you're waiting for me: I still haven't looked, but if I find anything I can review post-commit.)