Page MenuHomePhabricator

[ThinLTO] Add funtions in callees metadata to CallGraphEdges
ClosedPublic

Authored by twoh on Mar 12 2018, 12:14 PM.

Details

Summary

If there's a callees metadata attached to the indirect call instruction, add CallGraphEdges to the callees mentioned in the metadata when computing FunctionSummary.

  • Why this is necessary:

Consider following code example:

(foo.c)
static int f1(int x) {...}
static int f2(int x);
static int (*fptr)(int) = f2;
static int f2(int x) { 
  if (x) fptr=f1; return f1(x);
}
int foo(int x) { 
  (*fptr)(x); // !callees metadata of !{i32 (i32)* @f1, i32 (i32)* @f2} would be attached to this call.
}

(bar.c)
int bar(int x) { 
  return foo(x); 
}

At LTO time when foo.o is imported into bar.o, function foo might be inlined into bar and PGO-guided indirect call promotion will run after that. If the profile data tells that the promotion of @f1 or @f2 is beneficial, the optimizer will check if the "promoted" @f1 or @f2 (such as @f1.llvm.0 or @f2.llvm.0) is available. Without this patch, importing !callees metadata would only add promoted declarations of @f1 and @f2 to the bar.o, but still the optimizer will assume that the function is available and perform the promotion. The result of that is link failure with undefined reference to @f1.llvm.0.

This patch fixes this problem by adding callees in the !callees metadata to CallGraphEdges so that their definition would be properly imported into.

One may ask that there already is a logic to add indirect call promotion targets to be added to CallGraphEdges. However, if profile data says "indirect call promotion is only beneficial under a certain inline context", the logic wouldn't work. In the code example above, if profile data is like

bar:1000000:100000
  1:100000
    1: foo:100000
        1: 100000 f1:100000

, Computing FunctionSummary for foo.o wouldn't add foo->f1 to CallGraphEdges. (Also, it is at least "possible" that one can provide profile data to only link step but not to compilation step).

Diff Detail

Repository
rL LLVM

Event Timeline

twoh created this revision.Mar 12 2018, 12:14 PM

Without this patch, importing !callees metadata would only add promoted declarations of @f1 and @f2 to the bar.o, but still the optimizer will assume that the function is available and perform the promotion.

I guess we end up with the declarations in the importing module (bar.o) because of the references from the callee metadata itself? Normally, we would only end up with the declarations if there was an exported reference, which would cause the exporting module (foo.o) to promote as well. So this is essentially an export of these symbols that foo.o doesn't know about. But in any case, this fix is good because we want these available for not only promotion, but inlining.

One other question below.

lib/Analysis/ModuleSummaryAnalysis.cpp
300 ↗(On Diff #138069)

Do we want to give the calls a hotness other than unknown (which is the default)?

twoh added a comment.Mar 12 2018, 3:01 PM

@tejohnson I think your right. What I meant was that when the metadata is imported to bar.o, it references f1 and f2 by their promoted names, which makes the declarations with the promoted names to be added. Did I get it right, or still miss something?

For your second question, I assumed that the logic following the patch (line 304-311) will update the hotness info if available.

@tejohnson I think your right. What I meant was that when the metadata is imported to bar.o, it references f1 and f2 by their promoted names, which makes the declarations with the promoted names to be added. Did I get it right, or still miss something?

Sort of - we import the references to f1 and f2 (refenced on the callee metadata for the imported function), and ThinLTO knows that any references we import that were previously local must be promoted. This usually works fine because when we export a function (foo in this case), we walk the edges and mark anything it references as exported, which means they will be promoted in the original module (foo.o here). The issue is that there were no reference or call edges created for the references in the callee metadata, so we didn't know those references were exported.

For your second question, I assumed that the logic following the patch (line 304-311) will update the hotness info if available.

No, that is specific to indirect call PGO (attached as value profile metadata), which we don't have in this case (although I suppose if PGO is used we could end up with a callees metadata and VP metadata??). The VP metadata lists the count for each profiled target. Here we don't have anything, so I'm not sure what to suggest. The two possibilities would be to divide the instructions scaled count (computed in preceding code for the direct call case), and divide it evenly between the callee targets (so each would get 50% of that in your example). The other possibility, that requires more changes, but is potentially more accurate, is to extend the callees metadata and put some probability data on that based on the hotness of the block that assigned the function pointer to that callee. That's probably overkill at this point, so perhaps either leave as Unknown, or the first strategy I suggested.

Ok to do later, but you should put a FIXME noting that we are not setting any hotness for now.

test/ThinLTO/X86/callees-metadata.ll
10 ↗(On Diff #138069)

Add comment that we are testing to make sure that callees metadata functions are imported.

Check for f2.llvm.0 also?

twoh added a comment.Mar 12 2018, 4:06 PM

@tejohnson Thanks for the clarification. Regarding hotness, I'm not sure if providing "some" hotness is better than leaving it as unknown if profile data is not provided (If profile data is given, as you said, VP metadata will be attached to the callsite). I'm afraid that synthesized hotness may confuse optimizers, but please let me know if you have different idea.

I'll update the test to check f2 as well. Thanks for the comments!

twoh updated this revision to Diff 138102.Mar 12 2018, 4:08 PM

Update test to check f2.llvm.0 as well.

tejohnson accepted this revision.Mar 12 2018, 4:12 PM

LGTM with comment suggestion below

lib/Analysis/ModuleSummaryAnalysis.cpp
294 ↗(On Diff #138102)

s/calles/callees/
add something like "to reflect the references from the metadata, and to enable importing and subsequent indirect call promotion and inlining".

This revision is now accepted and ready to land.Mar 12 2018, 4:12 PM
This revision was automatically updated to reflect the committed changes.