Page MenuHomePhabricator

[Attributor] Deduce attributes for non-exact functions
Needs ReviewPublic

Authored by jdoerfert on Jun 13 2019, 7:04 PM.

Details

Summary

In the LLVM test suite and SPEC2006, ~67% of all function declarations
are non-exact, thus they can be replaced at link-time. While we cannot
generally use them to perform inter-procedural (IP) reasoning, it is
allowed to "internalize" them first and derive IP information
afterwards. In fact, we can use the information we derived assuming they
have exact definitions to reason if internalization might be beneficial.

This patch allows the Attributor to employ shallow wrappers, the
cheapest internalization method I could think of. A shallow wrapper is a
function with the same type (and attributes) as the original one. Inside
the wrapper there is only call to the original one and a return of the
result. The scheme is shown below:

Assuming the declaration of looks like:

rty F(aty0 arg0, ..., atyN argN);

The wrapper will then look as follows:

rty wrapper(aty0 arg0, ..., atyN argN) {
  return F(arg0, ..., argN);
}

Once the wrapper was created we can change the linkage type of the
original function to internal which allows us to use it for recursive
reasoning. We can also manifest the results, e.g., annotate the
arguments with attributes.

Shallow wrappers are cheap because the new internal function has a
single call site. This likely means we inline them later which results
in "similar" code as we started out with except that passes in-between
the Attributor and the inliner can use the annotated information.

A separate patch will introduce deep wrappers that replace the known
uses of the function with the internalized version. Afterwards, we need
to work on a cost heuristic. I will also see if we can disable inlining
for "wrapper-like" functions and post my findings on the list.

Event Timeline

jdoerfert created this revision.Jun 13 2019, 7:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2019, 7:04 PM

We can, however, use information from non-exact definitions to improve
themselves as they are either replaced as a whole or not at all.

I didn't understand the motivation of this patch by reading the description, I understood it with this comment. In particular I had the impression that you were using the attributes of one internalized function to optimize another internalized function.


I'm curious what function-local optimizations we're missing? If I understand the design, you can deduce that a function is 'readnone', but no callers of the function can be optimized based on that deduction, since the 'readnone' body may be replaced. I'm mostly confident we don't use readnone to optimize the body of the function. Do you have an example of an attribute that we can deduce and then use to optimize the body of the function?


If it were my design de novo, I would prefer treating the function attributes as being "as authoritative as the linkage", which pushes the problem of deciding whether F->doesNotReturn() is sufficient to the caller, it needs to check the linkage (or isExactDefinition) before asking that question. Unfortunately that would churn the API as we would need to update attribute consumers to use a convenience wrapper API when they're examining a callee instead of their function parent, but I expect pushing that decision into the pass might be necessary to get corner cases optimal and correct in passes that look at multiple functions. That seems to me like it would be a lot of work, what do you think?

Even if the wrapper function approach works, I don't think the representation is right. There aren't two functions, which is why you'll need to teach the inliner, etc., that these aren't truly functions. Without adding the concept to the LangRef, you've got two separate passes in LLVM that coordinate optimizations with each other though implementation details.

I'm not sure the wrapper can be used on functions that aren't marked unnamed_addr or you prove the address isn't taken? You update all users, which means that someone who takes the address of the function in this module might get a different address than someone who takes the address in a different module since they're two different internal functions now? Those pointers need to compare equal if you pass the function pointer around.

llvm/lib/Transforms/IPO/Attributor.cpp
1187

FYI, I don't think we have a reserved prefix for llvm-produced functions, but you can make it anonymous with F.setName("");. Since the function has internal linkage, it doesn't need a name to link with other modules.

jdoerfert marked 2 inline comments as done.Jun 13 2019, 10:48 PM

Thanks for the quick feedback!

We can, however, use information from non-exact definitions to improve
themselves as they are either replaced as a whole or not at all.

I didn't understand the motivation of this patch by reading the description, I understood it with this comment. In particular I had the impression that you were using the attributes of one internalized function to optimize another internalized function.

With this patch there is no "inter-internalized-function communication". I have another one to do that. I think I have to reword the whole commit message to make it clear.

With these shallow wrappers you can only use the function itself when it comes to inter-procedural (IP) reasoning. That is why I need to check WrappedFunctions in mayDependOnNonExactDefinition.
You cannot do cross function reasoning because we actually do call the shallow wrapper which is a non-exact decleration. However, you can do recursive reasoning and manifest attributes on the function, parameters and return value of the internalized version.

I'm curious what function-local optimizations we're missing? If I understand the design, you can deduce that a function is 'readnone', but no callers of the function can be optimized based on that deduction, since the 'readnone' body may be replaced. I'm mostly confident we don't use readnone to optimize the body of the function. Do you have an example of an attribute that we can deduce and then use to optimize the body of the function?

The shallow wrappers are not the solution I want to have. Real internalization is what we actually want (later patch).
Shallow wrappers are however not useless. We could, for example, derive nofree and nosync on them which together allow to transform dereferenceable arguments into dereferenceable_globally (D61652) arguments. Invariant loads of the latter can be hoisted out of loops. Direct recursion can be optimized this way as well.

I did run a test on the LLVM-TS and SPEC2006 with "returned" and "nocapture" attribute enabled in the Attributor.
The changes in statistics > 1% are listed here: https://gist.github.com/jdoerfert/aa94861fd59d1564436d2ab490164d19
There was <5 benchmarks that were impacted wrt. compile and runtime. I don't have the numbers right now but I will show more elaborate results before we turn anything on. With deep wrappers 7 or so benchmarks got actually faster, 3-30% if I remember correctly.

If it were my design de novo, I would prefer treating the function attributes as being "as authoritative as the linkage", which pushes the problem of deciding whether F->doesNotReturn() is sufficient to the caller, it needs to check the linkage (or isExactDefinition) before asking that question. Unfortunately that would churn the API as we would need to update attribute consumers to use a convenience wrapper API when they're examining a callee instead of their function parent, but I expect pushing that decision into the pass might be necessary to get corner cases optimal and correct in passes that look at multiple functions. That seems to me like it would be a lot of work, what do you think?

I dislike this solution mostly because it makes user information less valuable (assuming I understand you correctly). If the user gives us information, __restrict__ -> noalias, int & -> dereferenceable(sizeof(int)), etc. we should be able to use that to the fullest extend regardless of the linkage.

Even if the wrapper function approach works, I don't think the representation is right. There aren't two functions, which is why you'll need to teach the inliner, etc., that these aren't truly functions. Without adding the concept to the LangRef, you've got two separate passes in LLVM that coordinate optimizations with each other though implementation details.

Even without teaching the inliner anything, deep wrappers did show a performance benefit (see above). I actually want to teach the inliner something, but I was hoping it is something general, namely:
If a function only contains a call and a return and the call can be implemented with a single jmp instruction (the arguments are passed in order,...), do not inline the call. This seems to me like a heuristic that could even save compile time while not sacrificing performance. I hope it would be applicable to user code as well. Even if not, I actually have another patch to use a similar wrapper concept in conjuction with the __attribute__((callback(...))). The use case there is to allow us to unpack struct arguments with more than 3 members without ever actually increasing/changing the arguments of a function. That is useful to communicate information about the struct members to the callee (and potentially back).

I'm not sure the wrapper can be used on functions that aren't marked unnamed_addr or you prove the address isn't taken? You update all users, which means that someone who takes the address of the function in this module might get a different address than someone who takes the address in a different module since they're two different internal functions now? Those pointers need to compare equal if you pass the function pointer around.

That is a very good point which I haven't considered yet. I will check for unnamed_addr and, if not present, check that the address is not taken.

llvm/lib/Transforms/IPO/Attributor.cpp
1187

I'll do that. The Prefix was just to avoid clashes ("__" is often reserved).

1310

This lambda got somehow in this review, will be committed before.