This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Analysis/
-
Analysis/
-
ModuleSummaryAnalysis.cpp
-
test/ThinLTO/X86/
-
ThinLTO/
-
X86/
-
Inputs/
-
callees-metadata.ll
-
callees-metadata.ll

Differential D44399

[ThinLTO] Add funtions in callees metadata to CallGraphEdges
ClosedPublic

Authored by twoh on Mar 12 2018, 12:14 PM.

Download Raw Diff

Details

Reviewers

tejohnson
mehdi_amini
pcc

Commits

rG7646e7796979: [ThinLTO] Add funtions in callees metadata to CallGraphEdges
rL327358: [ThinLTO] Add funtions in callees metadata to CallGraphEdges

Summary

If there's a callees metadata attached to the indirect call instruction, add CallGraphEdges to the callees mentioned in the metadata when computing FunctionSummary.

Why this is necessary:

Consider following code example:

(foo.c)
static int f1(int x) {...}
static int f2(int x);
static int (*fptr)(int) = f2;
static int f2(int x) { 
  if (x) fptr=f1; return f1(x);
}
int foo(int x) { 
  (*fptr)(x); // !callees metadata of !{i32 (i32)* @f1, i32 (i32)* @f2} would be attached to this call.
}

(bar.c)
int bar(int x) { 
  return foo(x); 
}

At LTO time when foo.o is imported into bar.o, function foo might be inlined into bar and PGO-guided indirect call promotion will run after that. If the profile data tells that the promotion of @f1 or @f2 is beneficial, the optimizer will check if the "promoted" @f1 or @f2 (such as @f1.llvm.0 or @f2.llvm.0) is available. Without this patch, importing !callees metadata would only add promoted declarations of @f1 and @f2 to the bar.o, but still the optimizer will assume that the function is available and perform the promotion. The result of that is link failure with undefined reference to @f1.llvm.0.

This patch fixes this problem by adding callees in the !callees metadata to CallGraphEdges so that their definition would be properly imported into.

One may ask that there already is a logic to add indirect call promotion targets to be added to CallGraphEdges. However, if profile data says "indirect call promotion is only beneficial under a certain inline context", the logic wouldn't work. In the code example above, if profile data is like

bar:1000000:100000
  1:100000
    1: foo:100000
        1: 100000 f1:100000

, Computing FunctionSummary for foo.o wouldn't add foo->f1 to CallGraphEdges. (Also, it is at least "possible" that one can provide profile data to only link step but not to compilation step).

Diff Detail

Repository: rL LLVM

Event Timeline

twoh created this revision.Mar 12 2018, 12:14 PM

Herald added subscribers: eraman, inglorion. · View Herald TranscriptMar 12 2018, 12:14 PM

Harbormaster completed remote builds in B15983: Diff 138069.Mar 12 2018, 12:16 PM

Without this patch, importing !callees metadata would only add promoted declarations of @f1 and @f2 to the bar.o, but still the optimizer will assume that the function is available and perform the promotion.

I guess we end up with the declarations in the importing module (bar.o) because of the references from the callee metadata itself? Normally, we would only end up with the declarations if there was an exported reference, which would cause the exporting module (foo.o) to promote as well. So this is essentially an export of these symbols that foo.o doesn't know about. But in any case, this fix is good because we want these available for not only promotion, but inlining.

One other question below.

lib/Analysis/ModuleSummaryAnalysis.cpp
300 ↗	(On Diff #138069)	Do we want to give the calls a hotness other than unknown (which is the default)?

@tejohnson I think your right. What I meant was that when the metadata is imported to bar.o, it references f1 and f2 by their promoted names, which makes the declarations with the promoted names to be added. Did I get it right, or still miss something?

For your second question, I assumed that the logic following the patch (line 304-311) will update the hotness info if available.

In D44399#1035340, @twoh wrote:

@tejohnson I think your right. What I meant was that when the metadata is imported to bar.o, it references f1 and f2 by their promoted names, which makes the declarations with the promoted names to be added. Did I get it right, or still miss something?

Sort of - we import the references to f1 and f2 (refenced on the callee metadata for the imported function), and ThinLTO knows that any references we import that were previously local must be promoted. This usually works fine because when we export a function (foo in this case), we walk the edges and mark anything it references as exported, which means they will be promoted in the original module (foo.o here). The issue is that there were no reference or call edges created for the references in the callee metadata, so we didn't know those references were exported.

For your second question, I assumed that the logic following the patch (line 304-311) will update the hotness info if available.

No, that is specific to indirect call PGO (attached as value profile metadata), which we don't have in this case (although I suppose if PGO is used we could end up with a callees metadata and VP metadata??). The VP metadata lists the count for each profiled target. Here we don't have anything, so I'm not sure what to suggest. The two possibilities would be to divide the instructions scaled count (computed in preceding code for the direct call case), and divide it evenly between the callee targets (so each would get 50% of that in your example). The other possibility, that requires more changes, but is potentially more accurate, is to extend the callees metadata and put some probability data on that based on the hotness of the block that assigned the function pointer to that callee. That's probably overkill at this point, so perhaps either leave as Unknown, or the first strategy I suggested.

Ok to do later, but you should put a FIXME noting that we are not setting any hotness for now.

test/ThinLTO/X86/callees-metadata.ll
10 ↗	(On Diff #138069)	Add comment that we are testing to make sure that callees metadata functions are imported. Check for f2.llvm.0 also?

@tejohnson Thanks for the clarification. Regarding hotness, I'm not sure if providing "some" hotness is better than leaving it as unknown if profile data is not provided (If profile data is given, as you said, VP metadata will be attached to the callsite). I'm afraid that synthesized hotness may confuse optimizers, but please let me know if you have different idea.

I'll update the test to check f2 as well. Thanks for the comments!

Update test to check f2.llvm.0 as well.

LGTM with comment suggestion below

lib/Analysis/ModuleSummaryAnalysis.cpp
294 ↗	(On Diff #138102)	s/calles/callees/ add something like "to reflect the references from the metadata, and to enable importing and subsequent indirect call promotion and inlining".

This revision is now accepted and ready to land.Mar 12 2018, 4:12 PM

Closed by commit rL327358: [ThinLTO] Add funtions in callees metadata to CallGraphEdges (authored by twoh). · Explain WhyMar 12 2018, 9:29 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Analysis/

ModuleSummaryAnalysis.cpp

12 lines

test/

ThinLTO/

X86/

Inputs/

callees-metadata.ll

34 lines

callees-metadata.ll

22 lines

Diff 138126

llvm/trunk/lib/Analysis/ModuleSummaryAnalysis.cpp

Show First 20 Lines • Show All 285 Lines • ▼ Show 20 Lines	for (const Instruction &I : BB) {
} else {		} else {
// Skip inline assembly calls.		// Skip inline assembly calls.
if (CI && CI->isInlineAsm())		if (CI && CI->isInlineAsm())
continue;		continue;
// Skip direct calls.		// Skip direct calls.
if (!CalledValue \|\| isa<Constant>(CalledValue))		if (!CalledValue \|\| isa<Constant>(CalledValue))
continue;		continue;

		// Check if the instruction has a callees metadata. If so, add callees
		// to CallGraphEdges to reflect the references from the metadata, and
		// to enable importing for subsequent indirect call promotion and
		// inlining.
		if (auto *MD = I.getMetadata(LLVMContext::MD_callees)) {
		for (auto &Op : MD->operands()) {
		Function *Callee = mdconst::extract_or_null<Function>(Op);
		if (Callee)
		CallGraphEdges[Index.getOrInsertValueInfo(Callee)];
		}
		}

uint32_t NumVals, NumCandidates;		uint32_t NumVals, NumCandidates;
uint64_t TotalCount;		uint64_t TotalCount;
auto CandidateProfileData =		auto CandidateProfileData =
ICallAnalysis.getPromotionCandidatesForInstruction(		ICallAnalysis.getPromotionCandidatesForInstruction(
&I, NumVals, TotalCount, NumCandidates);		&I, NumVals, TotalCount, NumCandidates);
for (auto &Candidate : CandidateProfileData)		for (auto &Candidate : CandidateProfileData)
CallGraphEdges[Index.getOrInsertValueInfo(Candidate.Value)]		CallGraphEdges[Index.getOrInsertValueInfo(Candidate.Value)]
.updateHotness(getHotness(Candidate.Count, PSI));		.updateHotness(getHotness(Candidate.Count, PSI));
▲ Show 20 Lines • Show All 300 Lines • Show Last 20 Lines

llvm/trunk/test/ThinLTO/X86/Inputs/callees-metadata.ll

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@fptr = internal unnamed_addr global i32 (i32)* @f2, align 8

				define dso_local i32 @foo(i32 %x) local_unnamed_addr {
				entry:
				%0 = load i32 (i32), i32 (i32)* @fptr, align 8
				%call = tail call i32 %0(i32 %x), !callees !0
				ret i32 %call
				}

				define internal i32 @f2(i32 %x) {
				entry:
				%tobool = icmp eq i32 %x, 0
				br i1 %tobool, label %if.end, label %if.then

				if.then: ; preds = %entry
				store i32 (i32)* @f1, i32 (i32)** @fptr, align 8
				%sub.i = add nsw i32 %x, -1
				br label %if.end

				if.end: ; preds = %entry, %if.then
				%phi.call = phi i32 [ %sub.i, %if.then ], [ -1, %entry ]
				ret i32 %phi.call
				}

				define internal i32 @f1(i32 %x) {
				entry:
				%sub = add nsw i32 %x, -1
				ret i32 %sub
				}

				!0 = !{i32 (i32)* @f1, i32 (i32)* @f2}

llvm/trunk/test/ThinLTO/X86/callees-metadata.ll

				; Do setup work: generate bitcode and combined index
				; RUN: opt -module-summary %s -o %t1.bc
				; RUN: opt -module-summary %p/Inputs/callees-metadata.ll -o %t2.bc

				; RUN: llvm-lto2 run %t1.bc %t2.bc -o %t.o -save-temps \
				; RUN: -r=%t1.bc,bar,plx \
				; RUN: -r=%t1.bc,foo,l \
				; RUN: -r=%t2.bc,foo,pl
				; RUN: llvm-dis %t.o.1.3.import.bc -o - \| FileCheck %s
				; CHECK: define {{.*}} i32 @f1.llvm.0
				; CHECK: define {{.*}} i32 @f2.llvm.0

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define dso_local i32 @bar(i32 %x) {
				entry:
				%call = call i32 @foo(i32 %x)
				ret i32 %call
				}

				declare dso_local i32 @foo(i32)