Download Raw Diff

Details

Reviewers

davide
aprantl
rriddle
dblaikie
danielcdh
wmi

Commits

rGe5089e2e9479: [CodeExtractor] Add debug locations for new call and branch instrs.
rL320199: [CodeExtractor] Add debug locations for new call and branch instrs.

Summary

If a partially inlined function has debug info, we have to add debug
locations to the call instruction calling the outlined function.
We use the debug location of the first instruction in the outlined
function, as the introduced call transfers control to this statement and
there is no other equivalent line in the source code.

We also use the same debug location for the branch instruction added
to jump from artificial entry block for the outlined function, which just
jumps to the first actual basic block of the outlined function.

Diff Detail

Event Timeline

fhahn created this revision.Nov 24 2017, 2:54 AM

Herald added subscribers: JDevlieghere, eraman. · View Herald TranscriptNov 24 2017, 2:54 AM

fhahn added a parent revision: D40412: [InlineFunction] Only replace call if there are VarArgs to forward..Nov 24 2017, 2:54 AM

fhahn added a child revision: D40432: [InlineFunction] Set debug loc for call to forward varargs..Nov 24 2017, 5:56 AM

Improve test slightly.

ping

This is probably okay. Is there an introduction into partial inlining somewhere? I'd like to better understand what the transformation is doing, so I can give more helpful review feedback.

lib/Transforms/Utils/CodeExtractor.cpp
750	we usually don't spell out the `!= nullptr`
1039	Could you please copy the entire text of the description you added to this phabricator review into the comment here? It is non-obvious what we are doing here, and I prefer having the complete story in the comment.

fhahn updated this revision to Diff 125361.Dec 4 2017, 9:58 AM

fhahn marked 2 inline comments as done.

In D40413#942301, @aprantl wrote:

This is probably okay. Is there an introduction into partial inlining somewhere? I'd like to better understand what the transformation is doing, so I can give more helpful review feedback.

Muth, Robert, and Saumra Debray. "Partial inlining." might be useful (although all PDF versions I found are not really good quality :(). The basic idea is to 1) create a clone of the original function f_cloned, 2) extract cold code from the cloned function into function f_extracted and 3) replace the cold code in f_cloned with a call to f_extracted. The smaller function f_cloned might be viable to inline where f was not. This is for example useful, if you have a function too big to inline directly, but with an early return at the beginning.

This patch adds debug info for the call instruction added in step 3) and the branch instruction from the artificial entry block added to f_extracted to ensure the function entry block does not have any predecessors.

lib/Transforms/Utils/CodeExtractor.cpp
1039	I've adjusted the text for the position.

Thanks! I found one actual problem with the assertion.

lib/Transforms/Utils/CodeExtractor.cpp
1046	This assertion will fail if you haved a nodebug function inlined into a function with debuginfo.

Otherwise this seems good to me. Someone who is interested in profile-based-PGO should probably also look at this, because they typically don't like it when debug info is moved from one basic block into another.

This revision is now accepted and ready to land.Dec 4 2017, 1:34 PM

The patch LGTM. Notes about SamplePGO use cases below, which may be addressed by separate patch/discussion, if necessary.

test/Transforms/CodeExtractor/PartialInlineDebug.ll
31	I'm wondering if we inline callee.1_if.then back into caller, what would the inline stack be for this instruction. Looks like it will be: test.c:10 (callee.1_if.then) test.c:10 (callee) test.c:5 (caller) But from the PGO point of view, the correct inline stack should be: test.c:10 (callee) test.c:5 (caller) Not sure if PGO-desired inline stack is possible with partial inlining with or without this patch. Maybe we should just disable partial inlining when the binary is to be used to collect SamplePGO profile, unless it's designed for flattened profile. +wmi who's evaluating flattened profile.

danielcdh added a reviewer: wmi.Dec 4 2017, 2:12 PM

Thanks for having a look! I'll address the assertion before committing tomorrow and also think a bit about the PGO case.

lib/Transforms/Utils/CodeExtractor.cpp
1046	Ah yes, I will look into how the inliner deals with that case and handle it appropriately.
test/Transforms/CodeExtractor/PartialInlineDebug.ll
31	Great, thanks for having a look! Currently, partial inlining is disabled by default, but it sounds like we have to handle this issue before enabling it or disable it with PGO for now. Is the problem for PGO with splitting callee into 2 functions the following: all samples would be (incorrectly) attributed to `callee` and not split between `callee` and `callee.1_if.then`?

danielcdh added inline comments.Dec 4 2017, 9:15 PM

test/Transforms/CodeExtractor/PartialInlineDebug.ll
31	On the contrary, in the profile, we want to have all samples be attributed to callee instead of split between 2 parts. But as callee .1_if.then is in a separate function, this does not seem possible.

fhahn added inline comments.Dec 5 2017, 6:29 AM

lib/Transforms/Utils/CodeExtractor.cpp
1046	The basic block `header` is from a function with debug info (it comes from `oldFunction`). Isn't it save to assume that instructions in this basic block should have debug info? Do you mean when inlining a nodebug function in one with debug info, some basic blocks could be missing debug info which could trigger this assertion in the partial inliner? (Sorry my knowledge of the assumptions about debug info are quite light).

rriddle added inline comments.Dec 5 2017, 11:31 AM

lib/Transforms/Utils/CodeExtractor.cpp
1046	It's not safe to assume that the header block will have debug locations. So it should probably just find the first valid debug location, if there is one, and use that for the call instruction. It's only needed if there is a valid debug location in the abstracted blocks anyways. For nodebug functions that are inlined, all of the instructions being inlined inherit the debug location of the call. If there are missing debug locations due to that, then the function was already missing debug info on some instructions.

fhahn added inline comments.Dec 5 2017, 1:39 PM

lib/Transforms/Utils/CodeExtractor.cpp
1046	Ah thanks for confirming. That makes things slightly more tricky. What is the suggested way to find the "first" valid debug location? BFS through the CFG? Could it happen that `getSubprogram()` is not null, but there are no debug locations?

aprantl added inline comments.Dec 5 2017, 1:47 PM

lib/Transforms/Utils/CodeExtractor.cpp
1046	I think so: __attribute__(nodebug) void f() { /* actual code */ } __attribute__(nodebug) void g() { f(); } void h() { g() } After inlining f into g into h, all the code of f is in h, but it won't have debug locations.

rriddle added inline comments.Dec 5 2017, 3:10 PM

lib/Transforms/Utils/CodeExtractor.cpp
1046	If the header doesn't have a valid debug location it's going to be a bit wonky in terms of step debugging. At that point it's really just satisfying the need to have a location on the call. As for getting the location, you could either iterate the set of blocks being extracted or try and search the CFG. Searching the CFG will require logic for making sure that a particular BB is in the extracted set and not already checked. The extracted set could contain loops, multiple exits, etc. IMO it doesn't really seem worth the work for little to no benefit in the debugging experience. For the last question, yes! CodeExtractor is a generic extraction utility, it needs to work in all potential scenarios. So it's possible that there could be a small set of block(s) in a function in which the instructions do not have debug locations. A more reasonable scenario for the partial inliner could be if a call to a nodebug function is inlined and the call has no debug location. For the reverse, we may need to remove the check for the subprogram. For example this partial inline scenario: void f() { /* Code / } __attribute__((nodebug)) void g() { if( / something / ) { f(); / more code / } / Other code */ } void h() { g(); } After inlining f into g the instructions still maintain their debug locations. If we extract the part containing the now inlined f, we will end up with debug locations and no subprogram. That creates a problem if we want to then inline the stub g into h.

Thanks for all the feedback, it's been really helpful. I have updated the code to look for a valid debug location in all extracted blocks and added a test case where there is no debug loc in the header.

Fix code to use the first debug location found, not the last! Updated the test case with multiple debug locations, to check we actually pick the first one, if there are multiple blocks with debug info.

It would be great if someone could have another quick look at this version of the patch, as it changed quite a bit since being accepted.

LGTM

fhahn closed this revision.Dec 8 2017, 1:49 PM

Diff 126122

lib/Transforms/Utils/CodeExtractor.cpp

Show First 20 Lines • Show All 740 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = inputs.size(); i != e; ++i) {
StoreInst *SI = new StoreInst(StructValues[i], GEP);		StoreInst *SI = new StoreInst(StructValues[i], GEP);
codeReplacer->getInstList().push_back(SI);		codeReplacer->getInstList().push_back(SI);
}		}
}		}

// Emit the call to the function		// Emit the call to the function
CallInst *call = CallInst::Create(newFunction, params,		CallInst *call = CallInst::Create(newFunction, params,
NumExitBlocks > 1 ? "targetBlock" : "");		NumExitBlocks > 1 ? "targetBlock" : "");
		// Add debug location to the new call, if the original function has debug
		// info. In that case, the terminator of the entry block of the extracted
		aprantlUnsubmitted Done Reply Inline Actions we usually don't spell out the `!= nullptr` aprantl: we usually don't spell out the `!= nullptr`
		// function contains the first debug location of the extracted function,
		// set in extractCodeRegion.
		if (codeReplacer->getParent()->getSubprogram()) {
		if (auto DL = newFunction->getEntryBlock().getTerminator()->getDebugLoc())
		call->setDebugLoc(DL);
		}
codeReplacer->getInstList().push_back(call);		codeReplacer->getInstList().push_back(call);

Function::arg_iterator OutputArgBegin = newFunction->arg_begin();		Function::arg_iterator OutputArgBegin = newFunction->arg_begin();
unsigned FirstOut = inputs.size();		unsigned FirstOut = inputs.size();
if (!AggregateArgs)		if (!AggregateArgs)
std::advance(OutputArgBegin, inputs.size());		std::advance(OutputArgBegin, inputs.size());

// Reload the outputs passed in by reference.		// Reload the outputs passed in by reference.
▲ Show 20 Lines • Show All 261 Lines • ▼ Show 20 Lines	Function *CodeExtractor::extractCodeRegion() {
BasicBlock *codeReplacer = BasicBlock::Create(header->getContext(),		BasicBlock *codeReplacer = BasicBlock::Create(header->getContext(),
"codeRepl", oldFunction,		"codeRepl", oldFunction,
header);		header);

// The new function needs a root node because other nodes can branch to the		// The new function needs a root node because other nodes can branch to the
// head of the region, but the entry node of a function cannot have preds.		// head of the region, but the entry node of a function cannot have preds.
BasicBlock *newFuncRoot = BasicBlock::Create(header->getContext(),		BasicBlock *newFuncRoot = BasicBlock::Create(header->getContext(),
"newFuncRoot");		"newFuncRoot");
newFuncRoot->getInstList().push_back(BranchInst::Create(header));		auto *BranchI = BranchInst::Create(header);
		// If the original function has debug info, we have to add a debug location
		// to the new branch instruction from the artificial entry block.
		// We use the debug location of the first instruction in the extracted
		// blocks, as there is no other equivalent line in the source code.
		if (oldFunction->getSubprogram()) {
		aprantlUnsubmitted Done Reply Inline Actions Could you please copy the entire text of the description you added to this phabricator review into the comment here? It is non-obvious what we are doing here, and I prefer having the complete story in the comment. aprantl: Could you please copy the entire text of the description you added to this phabricator review…
		fhahnAuthorUnsubmitted Not Done Reply Inline Actions I've adjusted the text for the position. fhahn: I've adjusted the text for the position.
		any_of(Blocks, [&BranchI](const BasicBlock *BB) {
		return any_of(*BB, [&BranchI](const Instruction &I) {
		if (!I.getDebugLoc())
		return false;
		BranchI->setDebugLoc(I.getDebugLoc());
		return true;
		});
		aprantlUnsubmitted Not Done Reply Inline Actions This assertion will fail if you haved a nodebug function inlined into a function with debuginfo. aprantl: This assertion will fail if you haved a nodebug function inlined into a function with debuginfo.
		fhahnAuthorUnsubmitted Not Done Reply Inline Actions Ah yes, I will look into how the inliner deals with that case and handle it appropriately. fhahn: Ah yes, I will look into how the inliner deals with that case and handle it appropriately.
		fhahnAuthorUnsubmitted Not Done Reply Inline Actions The basic block `header` is from a function with debug info (it comes from `oldFunction`). Isn't it save to assume that instructions in this basic block should have debug info? Do you mean when inlining a nodebug function in one with debug info, some basic blocks could be missing debug info which could trigger this assertion in the partial inliner? (Sorry my knowledge of the assumptions about debug info are quite light). fhahn: The basic block `header` is from a function with debug info (it comes from `oldFunction`).
		rriddleUnsubmitted Not Done Reply Inline Actions It's not safe to assume that the header block will have debug locations. So it should probably just find the first valid debug location, if there is one, and use that for the call instruction. It's only needed if there is a valid debug location in the abstracted blocks anyways. For nodebug functions that are inlined, all of the instructions being inlined inherit the debug location of the call. If there are missing debug locations due to that, then the function was already missing debug info on some instructions. rriddle: It's not safe to assume that the header block will have debug locations. So it should probably…
		fhahnAuthorUnsubmitted Not Done Reply Inline Actions Ah thanks for confirming. That makes things slightly more tricky. What is the suggested way to find the "first" valid debug location? BFS through the CFG? Could it happen that `getSubprogram()` is not null, but there are no debug locations? fhahn: Ah thanks for confirming. That makes things slightly more tricky. What is the suggested way to…
		aprantlUnsubmitted Not Done Reply Inline Actions I think so: __attribute__(nodebug) void f() { /* actual code / } __attribute__(nodebug) void g() { f(); } void h() { g() } After inlining f into g into h, all the code of f is in h, but it won't have debug locations. aprantl:* I think so: ``` __attribute__(nodebug) void f() { /* actual code */ } __attribute__(nodebug)…
		rriddleUnsubmitted Not Done Reply Inline Actions If the header doesn't have a valid debug location it's going to be a bit wonky in terms of step debugging. At that point it's really just satisfying the need to have a location on the call. As for getting the location, you could either iterate the set of blocks being extracted or try and search the CFG. Searching the CFG will require logic for making sure that a particular BB is in the extracted set and not already checked. The extracted set could contain loops, multiple exits, etc. IMO it doesn't really seem worth the work for little to no benefit in the debugging experience. For the last question, yes! CodeExtractor is a generic extraction utility, it needs to work in all potential scenarios. So it's possible that there could be a small set of block(s) in a function in which the instructions do not have debug locations. A more reasonable scenario for the partial inliner could be if a call to a nodebug function is inlined and the call has no debug location. For the reverse, we may need to remove the check for the subprogram. For example this partial inline scenario: void f() { /* Code / } __attribute__((nodebug)) void g() { if( / something / ) { f(); / more code / } / Other code / } void h() { g(); } After inlining f into g the instructions still maintain their debug locations. If we extract the part containing the now inlined f, we will end up with debug locations and no subprogram. That creates a problem if we want to then inline the stub g into h. rriddle:* If the header doesn't have a valid debug location it's going to be a bit wonky in terms of step…
		});
		}
		newFuncRoot->getInstList().push_back(BranchI);

findAllocas(SinkingCands, HoistingCands, CommonExit);		findAllocas(SinkingCands, HoistingCands, CommonExit);
assert(HoistingCands.empty() \|\| CommonExit);		assert(HoistingCands.empty() \|\| CommonExit);

// Find inputs to, outputs from the code region.		// Find inputs to, outputs from the code region.
findInputsOutputs(inputs, outputs, SinkingCands);		findInputsOutputs(inputs, outputs, SinkingCands);

// Now sink all instructions which only have non-phi uses inside the region		// Now sink all instructions which only have non-phi uses inside the region
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

test/Transforms/CodeExtractor/PartialInlineDebug.ll

This file was added.

				; RUN: opt < %s -S -partial-inliner -skip-partial-inlining-cost-analysis=true \| FileCheck %s

				; CHECK-LABEL: @callee
				; CHECK: %mul = mul nsw i32 %v, 10, !dbg ![[DBG1:[0-9]+]]
				define i32 @callee(i32 %v) !dbg !16 {
				entry:
				%cmp = icmp sgt i32 %v, 2000
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				%mul = mul nsw i32 %v, 10, !dbg !17
				br label %if.then2

				if.then2:
				%sub = sub i32 %v, 10, !dbg !23
				br label %if.end

				if.end: ; preds = %if.then, %entry
				%v2 = phi i32 [ %v, %entry ], [ %mul, %if.then2 ]
				%add = add nsw i32 %v2, 200
				ret i32 %add
				}

				; CHECK-LABEL: @caller
				; CHECK: codeRepl.i:
				; CHECK-NEXT: call void @callee.2_if.then(i32 %v, i32* %mul.loc.i), !dbg ![[DBG2:[0-9]+]]
				define i32 @caller(i32 %v) !dbg !8 {
				entry:
				%call = call i32 @callee(i32 %v), !dbg !14
				ret i32 %call
				}
				danielcdhUnsubmitted Not Done Reply Inline Actions I'm wondering if we inline callee.1_if.then back into caller, what would the inline stack be for this instruction. Looks like it will be: test.c:10 (callee.1_if.then) test.c:10 (callee) test.c:5 (caller) But from the PGO point of view, the correct inline stack should be: test.c:10 (callee) test.c:5 (caller) Not sure if PGO-desired inline stack is possible with partial inlining with or without this patch. Maybe we should just disable partial inlining when the binary is to be used to collect SamplePGO profile, unless it's designed for flattened profile. +wmi who's evaluating flattened profile. danielcdh: I'm wondering if we inline callee.1_if.then back into caller, what would the inline stack be…
				fhahnAuthorUnsubmitted Not Done Reply Inline Actions Great, thanks for having a look! Currently, partial inlining is disabled by default, but it sounds like we have to handle this issue before enabling it or disable it with PGO for now. Is the problem for PGO with splitting callee into 2 functions the following: all samples would be (incorrectly) attributed to `callee` and not split between `callee` and `callee.1_if.then`? fhahn: Great, thanks for having a look! Currently, partial inlining is disabled by default, but it…
				danielcdhUnsubmitted Not Done Reply Inline Actions On the contrary, in the profile, we want to have all samples be attributed to callee instead of split between 2 parts. But as callee .1_if.then is in a separate function, this does not seem possible. danielcdh: On the contrary, in the profile, we want to have all samples be attributed to callee instead of…


				; CHECK-LABEL: @callee2
				; CHECK: %sub = sub i32 %v, 10, !dbg ![[DBG3:[0-9]+]]
				define i32 @callee2(i32 %v) !dbg !18 {
				entry:
				%cmp = icmp sgt i32 %v, 2000
				br i1 %cmp, label %if.then, label %if.end

				if.then:
				br label %if.then2

				if.then2:
				%sub = sub i32 %v, 10, !dbg !20
				br label %if.end

				if.end:
				%v2 = phi i32 [ %v, %entry ], [ %sub, %if.then2 ]
				%add = add nsw i32 %v2, 200
				ret i32 %add
				}

				; CHECK-LABEL: @caller2
				; CHECK: codeRepl.i:
				; CHECK-NEXT: call void @callee2.1_if.then(i32 %v, i32* %sub.loc.i), !dbg ![[DBG4:[0-9]+]]
				define i32 @caller2(i32 %v) !dbg !21 {
				entry:
				%call = call i32 @callee2(i32 %v), !dbg !22
				ret i32 %call
				}

				; CHECK-LABEL: define internal void @callee2.1_if.then
				; CHECK: br label %if.then, !dbg ![[DBG5:[0-9]+]]

				; CHECK-LABEL: define internal void @callee.2_if.then
				; CHECK: br label %if.then, !dbg ![[DBG6:[0-9]+]]

				; CHECK: ![[DBG1]] = !DILocation(line: 10, column: 7,
				; CHECK: ![[DBG2]] = !DILocation(line: 10, column: 7,
				; CHECK: ![[DBG3]] = !DILocation(line: 110, column: 17,
				; CHECK: ![[DBG4]] = !DILocation(line: 110, column: 17,
				; CHECK: ![[DBG5]] = !DILocation(line: 110, column: 17,
				; CHECK: ![[DBG6]] = !DILocation(line: 10, column: 7,


				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!3, !4, !5, !6}
				!llvm.ident = !{!7}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 6.0.0 (trunk 177881)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
				!1 = !DIFile(filename: "test.c", directory: "/tmp")
				!2 = !{}
				!3 = !{i32 2, !"Dwarf Version", i32 4}
				!4 = !{i32 2, !"Debug Info Version", i32 3}
				!5 = !{i32 1, !"wchar_size", i32 4}
				!6 = !{i32 1, !"min_enum_size", i32 4}
				!7 = !{!"clang version 6.0.0"}
				!8 = distinct !DISubprogram(name: "caller", scope: !1, file: !1, line: 3, type: !9, isLocal: false, isDefinition: true, scopeLine: 3, flags: DIFlagPrototyped, isOptimized: true, unit: !0, variables: !12)
				!9 = !DISubroutineType(types: !10)
				!10 = !{!11, !11}
				!11 = !DIBasicType(name: "int", size: 19, encoding: DW_ATE_signed)
				!12 = !{!13}
				!13 = !DILocalVariable(name: "v", arg: 1, scope: !8, file: !1, line: 3, type: !11)
				!14 = !DILocation(line: 5, column: 10, scope: !8)
				!15 = distinct !DILexicalBlock(scope: !16, file: !1, line: 9, column: 7)
				!16 = distinct !DISubprogram(name: "callee", scope: !1, file: !1, line: 8, type: !9, isLocal: false, isDefinition: true, scopeLine: 8, flags: DIFlagPrototyped, isOptimized: true, unit: !0, variables: !12)
				!17 = !DILocation(line: 10, column: 7, scope: !15)
				!18 = distinct !DISubprogram(name: "callee2", scope: !1, file: !1, line: 8, type: !9, isLocal: false, isDefinition: true, scopeLine: 8, flags: DIFlagPrototyped, isOptimized: true, unit: !0, variables: !12)
				!19 = distinct !DILexicalBlock(scope: !18, file: !1, line: 100, column: 1)
				!20 = !DILocation(line: 110, column: 17, scope: !19)
				!21 = distinct !DISubprogram(name: "caller2", scope: !1, file: !1, line: 8, type: !9, isLocal: false, isDefinition: true, scopeLine: 8, flags: DIFlagPrototyped, isOptimized: true, unit: !0, variables: !12)
				!22 = !DILocation(line: 110, column: 17, scope: !21)
				!23 = !DILocation(line: 15, column: 7, scope: !15)

This is an archive of the discontinued LLVM Phabricator instance.

[CodeExtractor] Add debug locations for new call and branch instrs.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 126122

lib/Transforms/Utils/CodeExtractor.cpp

test/Transforms/CodeExtractor/PartialInlineDebug.ll

This is an archive of the discontinued LLVM Phabricator instance.

[CodeExtractor] Add debug locations for new call and branch instrs.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 126122

lib/Transforms/Utils/CodeExtractor.cpp

test/Transforms/CodeExtractor/PartialInlineDebug.ll

[CodeExtractor] Add debug locations for new call and branch instrs.
ClosedPublic