When a new function "NewF" is created with instructions extracted from another function "OldF", the CodeExtractor only preserves debug line/column of the extracted instructions. However: 1. Any inlinedAt nodes are dropped. 2. The scope chain is replaced with a single node, the Subprogram of NewF. Both of these are incorrect: most of the debug metadata from the original instructions should be preserved. We only need to update the Subprogram found at the scope of the last node of the inline chain; this Subprogram used to be OldF but now should be NewF.
Details
- Reviewers
aprantl - Group Reviewers
debug-info - Commits
- rG62b27f893ecc: [CodeExtractor] Correctly propagate scope information post extraction
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
60,050 ms | x64 debian > libFuzzer.libFuzzer::minimize_crash.test |
Event Timeline
This is great! The old code was quite incorrect :-)
I have one possible suggestion for a cleaner/safer implementation of replaceOperand().
Did you test this on something large, like Clang?
llvm/lib/IR/DebugLoc.cpp | ||
---|---|---|
71 | should this be static? | |
74 | This seems to be very brute force. How much more code would it be to Switch over getKind() or put a replaceScope() method on DILexicalScopeBase? Can you make an assertion that the Node is distinct? Otherwise this would be dangerous. | |
114 | This might benefit from a comment. | |
126 | maybe also make clear this this means we are now in the DILocation belonging to the current subprogram and that that's why we need to reparent the scope into the new function. |
Some more context for the other reviewers: This affects hot-cold splitting, particularly the inlined frames of an assertion.
Yeah, this sounds like it was quite incorrect & the general vibe of the new code looks good to me (I'd wonder if it could be simpler to write recursively & be OK (admittedly that'd be a bit more recursive than the real code - since it'd recurse for every lexical scope, not just every inlined frame) - but, yeah, probably better the way it's written here, but more complicated to follow).
I'll leave the approval/review to you, @aprantl, but endorse the direction.
I had only run the Clang/LLVM/LLDB test suites, but I will try compiling a larger project.
llvm/lib/IR/DebugLoc.cpp | ||
---|---|---|
71 | Yes! Though this code is going to be removed in the new version | |
74 |
The assert on line 73 is doing that (assert(!N.isUniqued())). This assert allo
Agreed, this implementation is very similar to what the ValueMapper class does, so maybe it is more general than it needs to be. | |
114 | Fixed in the next version | |
126 | Will fix in the next version. |
I've now used this patch to compile a RelWithDebugSymbols build of llvm-project, and had no issues. Also tried on an e2e test that used to be buggy without the fix.
Can you make an assertion that the Node is distinct? Otherwise this would be dangerous.
The assert on line 73 is doing that (assert(!N.isUniqued())). This assert allo
Oops, I realized I never finished the sentence above! That assert (which is now in the setScope function), checks whether the node is either temporary or distinct, both conditions which would allow mutation.
This looks good to me now. I have a few suggestions for better names and comments, but I'm fine with the mechanics!
llvm/include/llvm/IR/DebugInfoMetadata.h | ||
---|---|---|
2124 | The other sibling classes would call this replaceScope(). | |
llvm/lib/IR/DebugLoc.cpp | ||
72 | This function doesn't replace anything. What do you think of reparentScopeChain, cloneScopesForSubprogram()? | |
87 | Maybe add a comment explaining what the entire loop does? | |
126 | Top-level comment for this loop? |
The other sibling classes would call this replaceScope().