This is an archive of the discontinued LLVM Phabricator instance.

[CodeExtractor] Correctly propagate scope information post extraction
ClosedPublic

Authored by fdeazeve on Dec 2 2022, 11:44 AM.

Details

Summary
When a new function "NewF" is created with instructions extracted from
another function "OldF", the CodeExtractor only preserves debug
line/column of the extracted instructions. However:

1. Any inlinedAt nodes are dropped.
2. The scope chain is replaced with a single node, the Subprogram of NewF.

Both of these are incorrect: most of the debug metadata from the
original instructions should be preserved. We only need to update the
Subprogram found at the scope of the last node of the inline chain; this
Subprogram used to be OldF but now should be NewF.

Diff Detail

Event Timeline

fdeazeve created this revision.Dec 2 2022, 11:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 2 2022, 11:44 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
fdeazeve requested review of this revision.Dec 2 2022, 11:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 2 2022, 11:44 AM
fdeazeve updated this revision to Diff 479702.Dec 2 2022, 11:51 AM

Simplified map update

fdeazeve updated this revision to Diff 479704.Dec 2 2022, 11:58 AM

Update test

fdeazeve updated this revision to Diff 479707.Dec 2 2022, 12:04 PM

Updated commit message

fdeazeve edited the summary of this revision. (Show Details)Dec 2 2022, 12:05 PM
fdeazeve added reviewers: aprantl, debug-info.
fdeazeve updated this revision to Diff 479710.Dec 2 2022, 12:07 PM

Rename a variable in the test for clarity

This is great! The old code was quite incorrect :-)

I have one possible suggestion for a cleaner/safer implementation of replaceOperand().
Did you test this on something large, like Clang?

llvm/lib/IR/DebugLoc.cpp
71

should this be static?

74

This seems to be very brute force. How much more code would it be to Switch over getKind() or put a replaceScope() method on DILexicalScopeBase?

Can you make an assertion that the Node is distinct? Otherwise this would be dangerous.

114

This might benefit from a comment.
IIUC, we collect a chain of not-yet processed inlined-at locations, which would be useful when processing increasingly deeper inlined instructions.

126

maybe also make clear this this means we are now in the DILocation belonging to the current subprogram and that that's why we need to reparent the scope into the new function.

Some more context for the other reviewers: This affects hot-cold splitting, particularly the inlined frames of an assertion.

Yeah, this sounds like it was quite incorrect & the general vibe of the new code looks good to me (I'd wonder if it could be simpler to write recursively & be OK (admittedly that'd be a bit more recursive than the real code - since it'd recurse for every lexical scope, not just every inlined frame) - but, yeah, probably better the way it's written here, but more complicated to follow).

I'll leave the approval/review to you, @aprantl, but endorse the direction.

This is great! The old code was quite incorrect :-)

I have one possible suggestion for a cleaner/safer implementation of replaceOperand().
Did you test this on something large, like Clang?

I had only run the Clang/LLVM/LLDB test suites, but I will try compiling a larger project.

llvm/lib/IR/DebugLoc.cpp
71

Yes! Though this code is going to be removed in the new version

74

Can you make an assertion that the Node is distinct? Otherwise this would be dangerous.

The assert on line 73 is doing that (assert(!N.isUniqued())). This assert allo

This seems to be very brute force. How much more code would it be to Switch over getKind() or put a replaceScope() method on DILexicalScopeBase?

Agreed, this implementation is very similar to what the ValueMapper class does, so maybe it is more general than it needs to be.
Adding a method to DILexicalBlockBase (I'm assuming this is the class you meant?) seems straightforward.

114

Fixed in the next version

126

Will fix in the next version.

fdeazeve updated this revision to Diff 479748.Dec 2 2022, 2:41 PM

Addressed review comments

I've now used this patch to compile a RelWithDebugSymbols build of llvm-project, and had no issues. Also tried on an e2e test that used to be buggy without the fix.

fdeazeve added a comment.EditedDec 4 2022, 2:11 PM

Can you make an assertion that the Node is distinct? Otherwise this would be dangerous.

The assert on line 73 is doing that (assert(!N.isUniqued())). This assert allo

Oops, I realized I never finished the sentence above! That assert (which is now in the setScope function), checks whether the node is either temporary or distinct, both conditions which would allow mutation.

aprantl accepted this revision.Dec 5 2022, 12:41 PM

This looks good to me now. I have a few suggestions for better names and comments, but I'm fine with the mechanics!

llvm/include/llvm/IR/DebugInfoMetadata.h
2124

The other sibling classes would call this replaceScope().

llvm/lib/IR/DebugLoc.cpp
72

This function doesn't replace anything. What do you think of reparentScopeChain, cloneScopesForSubprogram()?

87

Maybe add a comment explaining what the entire loop does?

126

Top-level comment for this loop?

This revision is now accepted and ready to land.Dec 5 2022, 12:41 PM
fdeazeve added inline comments.Dec 6 2022, 6:24 AM
llvm/include/llvm/IR/DebugInfoMetadata.h
2124

Good point!

llvm/lib/IR/DebugLoc.cpp
72

The irony here is that we've just created a "replaceScope" method... but you're right, this makes it sound like we're modifying the original scope. I'll go with the "clone" variant

fdeazeve updated this revision to Diff 480463.Dec 6 2022, 6:24 AM

Addressed review comments

This revision was landed with ongoing or failed builds.Dec 6 2022, 11:26 AM
This revision was automatically updated to reflect the committed changes.

Test failure due to the spurious timeout from libfuzzer, unrelated to this patch.