An abstract call site is a wrapper that allows to treat direct, indirect, and callback calls the same. If an abstract call site represents a direct or indirect call site it behaves like a stripped down version of a normal call site object. The abstract call site can also represent a callback call, thus the fact that the initially called function (=broker) may invoke a third one (=callback callee). In this case, the abstract call side hides the middle man, hence the broker function. The result is a representation of the callback call, inside the broker, but in the context of the original instruction that invoked the broker. Again, there are up to three functions involved when we talk about callback call sites. The caller (1), which invokes the broker function. The broker function (2), that may or may not invoke the callback callee. And finally the callback callee (3), which is the target of the callback call. The abstract call site will handle the mapping from parameters to arguments depending on the semantic of the broker function. However, it is important to note that the mapping is often partial. Thus, some arguments of the call/invoke instruction are mapped to parameters of the callee while others are not. At the same time, arguments of the callback callee might be unknown, thus null if queried. This patch introduces also !callback metadata to describe how a callback maps from parameters to arguments. This metadata is directly created by clang for known broker functions, or later provided through source code attributes by the user. For motivation and additional information please see the corresponding talk (slides/video) https://llvm.org/devmtg/2018-10/talk-abstracts.html#talk20 as well as the LCPC paper http://compilers.cs.uni-saarland.de/people/doerfert/par_opt_lcpc18.pdf
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 26000 Build 25999: arc lint + arc unit
Event Timeline
I think using this in IPSCCP would be a great showcase too, given that it is more powerful (and enabled in the default pipeline). lib/Transforms/Scalar/SCCP.cpp contains the solver that is used for both per function SCCP and IPSCCP.
Constant propagation is an obvious use case for this. What other transformations do you envision using this?
With this commit two known broker functions that invoke a callback are
identified by their name. While this is not the perfect way to handle
known broker functions, it should suffice for now.
I do not have any other concrete suggestions right now, but tying code in lib/IR to string names of library calls does not seem ideal. I guess one alternative would be to have some sort of !broker attribute/metadata and then either have the frontend generate it and/or a pass to infer it. Although that might be over the top.
lib/IR/AbstractCallSite.cpp | ||
---|---|---|
81 | Could this be an enum class? I think you could drop the default case in the switch later on, given you handle all cases. The compiler will warn/error on not handled cases. | |
96 | Is there a reason this cannot be part of the switch below? |
Thanks for the comments!
SCCP is certainly on my list, but one of the IPOs I haven't looked into yet.
Constant propagation is an obvious use case for this. What other transformations do you envision using this?
I have however a version of argument promotion (which turns out to be really important wrt parallel codes).
I also have extended the attribute deduction pass in various ways with an older version of the abstract/transitive/callback call sites, see here and here.
Initial results from the prototype implementations can be found in the talk and the paper (see commit message). Basically, they are as good as the dedicated transformations we implemented and evaluated here.
With this commit two known broker functions that invoke a callback are
identified by their name. While this is not the perfect way to handle
known broker functions, it should suffice for now.I do not have any other concrete suggestions right now, but tying code in lib/IR to string names of library calls does not seem ideal. I guess one alternative would be to have some sort of !broker attribute/metadata and then either have the frontend generate it and/or a pass to infer it. Although that might be over the top.
I don't think that is over the top. I have such metadata already. While I annotate it automatically based on the name (in the middle end for now), I want the front end to do it. We have some name based resolution already and we could hide it behind a flag (the current patch is not executed by default!). I also want to support user-provided callbacks either through annotations or identified by an analysis.
Overall, what do you think about think about this?
I have a question in the context of optimising sequential programs -- I'm interested in optimising across a trampoline call, since I have a haskell-like lazy language, which lowers CPS into LLVM-IR by constructing a trampoline. Example IR here Some salient features are:
- All functions are musttail, and communicate data by custom "stacks" which are allocated on the heap.
- LLVM manages to remove some of the trampoline code, but it's still super messy (I can generate a clean version where the trampoline is visible)
Would the AbstractCallSite representation allow one to represent the control flow between:
trampoline -> fn1 -> trampoline -> fn2 -> ...
And could this infrastructure help implement a pass that would resolve these custom stack pushes and pops?
Thanks, and sorry for the somewhat unstructured comment. I'm asking this question as a Haskeller interested in representing laziness within LLVM.
Sure, especially if the front-end provides the information, e.g., in form of metadata. I already use these abstract call sites to bridge more than one
function. Take pthreads for example. Since the interface is fixed (a single void * is passed), we cannot simply run argument promotion to expose the
actual passed values directly. Instead, I have a pass that introduces even more functions which are then bridged through an abstract call site. Thus,
we start with
bar(void *S) { // body } foo() { pthread_create(bar, S); }
and transform it to
__entry_bar(void (void*) __unused__, void *S) { pthread_create(bar, S); } bar(void *S) { __body_bar(S); } __body_bar(void *S) { // body } foo() { __entry_bar(__body_bar, S); }
Afterwards, the abstract call is used to model __entry_bar -> __body_bar inside of foo, with the pass through argument S. Note that the code is actually valid LLVM-IR and that we can now use argument promotion to
expand the struct and make the parameter/argument connection explicit for other passes, e.g., constant propagation.
I hope this makes sense and helps you a bit.
And could this infrastructure help implement a pass that would resolve these custom stack pushes and pops?
That is a good question which I cannot answer for sure right now. I guess it could be useful, if the front-end provides enough information.
Thanks for the answer! That clears up a lot.
So, I believe the main difference between the pthread example and the one that I have is that of mutual recursion: I will often have mutual recursion between the trampoline and the f_i, which usually complicates inter-procedural analysis (as far as I understand?)
An extreme example of such messy code could be something like this:
// computes if 9 is odd by calling two mutually recursive functions is_odd and is_even, // which in turn recurse through the trampoline entrypoint: push_int(9); push_call(IS_ODD); trampoline(); // is_odd :: Int -> Int is_odd: i = pop_int(); if (i == 0) { push_int(0); trampoline(); } if (i == 1) { push_int(1); trampoline(); } push_int(i - 1); push_call(IS_EVEN_LABEL); trampoline(); // is_even Int -> Int is_even: i = pop_int(); if (i == 0) { push_int(1); trampoline(); } if (i == 1) { push_int(0); trampoline(); } push_int(i - 1); push_call(IS_ODD_LABEL); trampoline(); trampoline: tag = pop_call(); switch (tag) { case IS_EVEN_LABEL: is_even(); case IS_ODD_LABEL: is_odd(); case ENTRYPOINT: entrypoint(); }
For you example, the front-end should emit something along the lines of the following for each push_call(...); trampoline(); sequence:
void abstract_trampoline(token, continuation) { push_call(token); trampoline(); } // push_call(IS_ODD); // trampoline(); // is replaced by: abstract_trampoline(IS_ODD, is_odd);
You can then specify that abstract_trampoline has callback behavior and that the second argument is the callee of the callback.
Afterwards, all IPOs enabled to handle abstract call sites will work and the best thing is, you code is actually valid all the time.
When we do this, and the other expansion I described for pthreads_create, we want a late inliner pass to remove these function but as there is
nothing to clean up we should not have much trouble doing that.
FWIW TransitiveCallSite seems like a slightly better name to me than AbstractCallSite, as the CallSite class is already an abstraction over different kinds of calls. As for the location of the class, if it is tied to an analysis that gathers the transitive call sites, having it in lib/Analysis/ might make sense initially as well.
As for testing/benchmarking this, is there an easy way to test this/gather statistics on real benchmarks? (e.g. via the test-suite)
With this commit two known broker functions that invoke a callback are
identified by their name. While this is not the perfect way to handle
known broker functions, it should suffice for now.I do not have any other concrete suggestions right now, but tying code in lib/IR to string names of library calls does not seem ideal. I guess one alternative would be to have some sort of !broker attribute/metadata and then either have the frontend generate it and/or a pass to infer it. Although that might be over the top.
I don't think that is over the top. I have such metadata already. While I annotate it automatically based on the name (in the middle end for now), I want the front end to do it. We have some name based resolution already and we could hide it behind a flag (the current patch is not executed by default!). I also want to support user-provided callbacks either through annotations or identified by an analysis.
Overall, what do you think about think about this?
I like the overall approach and think anything that extends the scope of IPOs is great, as it increases the incentive to work on them.
There are a few things to consider but at the end I will not fight anybody for a certain name or source location. Things that led me to
do this design included:
- Abstract call sites, as implemented here, are not *only* transitive call sites but can be transitive or direct call sites.
- The call site abstraction is going away soon [0]. Afterwards, the only abstraction will be this one on top of the new CallBase class.
- The analysis to identify callback functions doesn't exist yet. When it does, I would probably only annotate the IR and use these annotations in the constructor of AbstractCallSite. This way we can easily pass uses to the constructor and do not need to a dedicated pass that provides both direct and transitive calls (separately or unified through the AbstractCallSite abstraction).
As for testing/benchmarking this, is there an easy way to test this/gather statistics on real benchmarks? (e.g. via the test-suite)
Yes and no. The test suite doesn't contain/exhibit OpenMP/Pthread parallelism (yet). Thus, we cannot create abstract call sites that way.
If we would add the names of sequential callback functions we could test those. We are also looking into options to put OpenMP into the
test suite. Finally, I did test/benchmark a very similar implementation (less clean) on various OpenMP programs for the two papers and the
LLVM-Dev Talk [1]
Overall, what do you think about think about this?
I like the overall approach and think anything that extends the scope of IPOs is great, as it increases the incentive to work on them.
Good to hear. I totally agree and I'm currently working on a proper AttributeDeduction (see [2]) for which I hope to push the first patch up for review soon.
[0] RFC: Removing TerminatorInst, simplifying calls
[1] LLVM-Dev Talk 2018: Slides Video
[2] RFC: "Properly" Derive Function/Argument/Parameter Attributes
Right that makes sense, thanks! I do not have any strong opinions about name or place either. It would be great to get a few more people's thoughts on this.
As for testing/benchmarking this, is there an easy way to test this/gather statistics on real benchmarks? (e.g. via the test-suite)
Yes and no. The test suite doesn't contain/exhibit OpenMP/Pthread parallelism (yet). Thus, we cannot create abstract call sites that way.
If we would add the names of sequential callback functions we could test those. We are also looking into options to put OpenMP into the
test suite. Finally, I did test/benchmark a very similar implementation (less clean) on various OpenMP programs for the two papers and the
LLVM-Dev Talk [1]
Sounds good.
include/llvm/IR/CallSite.h | ||
---|---|---|
694 | side -> site | |
696 | I think it's clearer if you say: instruction -> call to the broker | |
700 | will invoke the callee zero or more times (to cover the fact that it might also be invoked more than once) | |
760 | FWIW, I like 'Callback' because it's pretty self-documenting (most everyone knows what a callback is). Given your terminology above, it might also make sense to use BrokeredCall. Thus, if we need a backup name, I suggest that one. | |
lib/IR/AbstractCallSite.cpp | ||
83 | What about pthread_once? As a higher-level comment, I'd rather not have a list here. It would be better to take this from metadata on the call and have Clang add the metadata for functions that it recognizes (and for the OpenMP runtime calls that it inserts). |
Partial review, been on my queue too long, here's at least a few comments to help drive this forward.
docs/LangRef.rst | ||
---|---|---|
5072 | A common theme running through this review is the use of the terms "callback" and "broker". In the documentation, you need to find a convenient place to describe these two, given an example, and otherwise explain the concepts to the reader. I struggled at first with even your patch description in being not quite sure what you meant by each. You have some of this inlined into the semantic description, but I think you'd do well to introduce the concepts separately from the semantic description. I don't see any reason to exclude metadata on call sites. p.s. I think we'll likely end up changing these, but I'll treat the names as placeholders for the moment. | |
5112 | One major piece missing from this is a discussion of what side effects (if any) the broker function is allowed to have. Also, why the zero or more times thing? Zero seems to complicate a lot of the reasoning. | |
5113 | We have the existing concepts of patchpoint, and statepoint which are both "brokers" in this terminology. Would be good to think about how to represent that. I also see parallels to bundle operads on calls. Maybe another approach to consider would be to represent the broker as a special bundle type on the call site to the callback? (Not sure this works, but talking about why not might be interesting.) | |
include/llvm/IR/CallSite.h | ||
843 | dyn_cast_or_null |
A high level strategy comment. I think you're trying to go a bit too far in one patch. I'd advise first writing a patch which does nothing but implement the IPConstProp adjustments "the hard way" (i.e. without the AbstractCallSite) and focus on defining the metadata with the semantics you once. To really justify the complexity of the abstraction, you need more than one user. You could add another user to this patch, but doing so will just slow down the review. Separating this into a series of patches (IPConstProp w/metadata def, second user, then introduce abstraction) will make it much easier to see the necessity and make each piece more easily reviewable.
include/llvm/IR/CallSite.h | ||
---|---|---|
843 | p.s. Ignore this suggestion. It's buggy due to the need for stripPointerCasts | |
lib/Transforms/IPO/IPConstantPropagation.cpp | ||
73 | While correct, this would be a lot easier to read if you used F.arg_size() as the end condition. | |
79 | Ok, I'm not following a key detail here. If we're processing a call to the broker function with a 5 element argument list (let's say), why is it correct to adjust the operand numbers using the metadata provided offset? I could see how that would be correct when visiting users of the *callback* function, but don't we end up invoking this function for the *broker* function as well? (This relates to the isCallee check removal.) | |
83 | isThreadDependent on a constant is a new one to me. Given I see a total of three uses in all of LLVM, I suspect this hasn't been fully plumbed through. | |
100 | You have some assorted white space and NFC indexing changes here. Would you mind separating those and landing them so they don't clutter the review? |
A high level strategy comment. I think you're trying to go a bit too far in one patch. I'd advise first writing a patch which does nothing but implement the IPConstProp adjustments "the hard way" (i.e. without the AbstractCallSite) and focus on defining the metadata with the semantics you once. To really justify the complexity of the abstraction, you need more than one user. You could add another user to this patch, but doing so will just slow down the review. Separating this into a series of patches (IPConstProp w/metadata def, second user, then introduce abstraction) will make it much easier to see the necessity and make each piece more easily reviewable.
Splitting the patch and providing multiple use cases does not necessarily require rewriting the IPConstProp without the abstraction. This can be split into multiple (dependent) patches and other dependent patches can also be posted showing other use cases.
I can certainly split this into more patches if that is preferred. I'm unsure though how to test the metadata and abstract call sites without a user.
docs/LangRef.rst | ||
---|---|---|
5072 |
I thought I did that in the commit message and in the documentation/examples below.
I'm unsure about this but if people think we need an explicit documentation point on these I can do it. (So far,) They only exist in the context of callbacks, so I described them in that context only.
Me neither. Though, for now, it seems sufficient to put them on the brokers and, as I mention below again, I only see upsides in doing so. Especially we only require a single argument encoding. Once we have use cases that require different argument encodings, or we prefer call sites for some other reason, we can simply lift this restriction. | |
5112 |
I don't think we want to connect general broker function side effects to callbacks in any way. The broker function side effects shall be arbitrary, if not otherwise specified by attributes. HOWEVER, the "passthrough" callback arguments are not to be inspected or modified, as stated in the documentation. This allows us to treat the callback as a "regular call" for the purpose of information transfer. Though, we never lose the actual broker function call and therefore will always be required to consider its side effects. I can add a phrase stating this more clearly.
It is required for the OpenMP use case and I'm unsure why it complicates much. We do not really optimize based on what was executed "for sure", and even if we did, abstract call sites handling would need to be added explicitly. | |
5113 |
I agree, but I'm not too familiar with them right now so I have to come back to you on this point.
The argument encoding has to be somewhere and I'm not sure why it shouldn't be at the broker functions. If we have it on the call sites exclusively, we end up duplicating it a lot and I'd like to hear why that might be worth it. | |
include/llvm/IR/CallSite.h | ||
843 | For the callbacks I want to optimize, that stripPointerCast is unfortunately necessary. | |
lib/Transforms/IPO/IPConstantPropagation.cpp | ||
73 | I did not "write" this code, just kept it. I can use F.arg_size() if ppl prefer that. | |
79 | Lets start here:
If an abstract call site ACS for a use U is built, thus it evaluates to a boolean true (line 67), we know that:
We only adjust argument indices, at least we only should, if the abstract call site represents a callback call. This is the case only if the use that created it is not the CallInst::isCallee() use. So when "the broker function use" of the CallInst is processed, the ACS is not a callback call site and AbstractCallSite::getCallArgOperand(int) is the identify function. Does that make sense/help? | |
83 | I was reminded in a review that I might need it. And I think I do. It should only return true for globals that are thread local and these we should not propagate through "arbitrary callback call sites". | |
100 | I can get rid of the new line and the single space difference, both not intentional but probably caused by some subconscious code cleanup. There are no NFC indexing changes, I had to change the way we iterate to allow for the remapping of indices, that part has to stay. |
I can certainly split this into more patches if that is preferred. I'm unsure though how to test the metadata and abstract call sites without a user.
While not strictly necessary in my book (given that the patch would be committed with its first user which would have tests), unit tests are probably a good idea (i.e., put something in the unittests/IR/ directory).
docs/LangRef.rst | ||
---|---|---|
5072 | Johannes and I talked offline. Here's the suggested redrafting I promised: `callback` metadata may be attached to a function declaration, or definition. For ease of exposition, we'll refer to the function annotated w/metadata as a broker. The metadata describes how the arguments of a call to the broker function are in turn passed to the specified callback function when invoked by. the broker. The only semantic restriction on the broker function itself is that it is assumed not to inspect or modify arguments passed to the callback function. (later in detailed semantics) The broker is not required to actually invoke the callback function dynamically. However, the assumptions about not inspecting or modifying arguments that would be passed to the specified callback function still hold, even if the callback function is not dynamically invoked. The broker is allowed to invoke the callback function more than once per invocation of the broker. The broker is also allowed to invoke (directly or indirectly) the function passed as a callback through another use. However, the pass through argument semantics are assumed to apply to all invocations of the callback function by the broker. (Mostly because we have no way to distinguish said calls after potential inter-procedural constant folding of the function pointer itself.) (end proposed drafting) General comment: I'd pull the definition of the metadata encoding out before introducing the example. |
LGTM. I'm approving this because I think it's reached the point of being a very meaningful step in the right direction, but there are a couple of required follow ups which need to happen in following patches. Specifically:
- Add Verifier support. This should subsume most of the asserts you have on interpreting the metadata.
- Add tests covering both basic syntax parsing, and merge semantics.
A common theme running through this review is the use of the terms "callback" and "broker". In the documentation, you need to find a convenient place to describe these two, given an example, and otherwise explain the concepts to the reader. I struggled at first with even your patch description in being not quite sure what you meant by each. You have some of this inlined into the semantic description, but I think you'd do well to introduce the concepts separately from the semantic description.
I don't see any reason to exclude metadata on call sites.
p.s. I think we'll likely end up changing these, but I'll treat the names as placeholders for the moment.