In order to import functions that exceeds size threshold, additional
information is needed. By knowing which arguments are constant on
the caller site, we can build some heuristics to give some additional
bonus to calls with multiple constant arguments (assuming that the
called functions will get simplified enough to be worth inlining).
Details
- Reviewers
tejohnson eraman mehdi_amini
Diff Detail
Event Timeline
include/llvm/IR/ModuleSummaryIndex.h | ||
---|---|---|
246 | I wasn't sure about the size here, maybe it is even enough to have bitmask here of 8 or less bits. |
Why a bit mask and not an integer with the number of arguments that are constant?
Do you plan to benefit from knowing which argument is constant?
lib/Bitcode/Reader/BitcodeReader.cpp | ||
---|---|---|
6369 | You need to handle existing bitcode. if (Record.size() >= ...) |
Yes, in order to make better heuristic, like the one that Easwaran come up with
- have some bitmask to say if the argument is compered inside function body as heuristic to determine if branch depends on it.
I will need the the information about the arguments.
More context on this (which was an offline conversation I had with Piotr). One major use of knowing the constant parameters in inline cost analysis is to determine if any branch condition becomes constant and not count the body of blocks that becomes unreachable. One thing we could do is to build a map from conditions based on parameters to number of instructions that become unreachable if the predicate is true. For example,
define i32 @foo(i32 %a) {
entry:
%cmp = icmp sgt i32 %a, 10 br i1 %cmp, label %if.then, label %if.end
if.then: ; preds = %entry
// Lots of instructions here.
if.end: ; preds = %if.then, %entry
ret i32 %a
}
we could analyze this and map the (a > 10) condition to the potential reduction in instructions. If we were to build something like this, it will be useful to know the constant values of parameters. This patch only keeps track of whether a parameter receives a constant value in some callsite, so it is less information than what would be needed, but better than knowing how many parameters are constants in some context.
It seems like in order to make the bitmask useful (over a simple count of the number of constant args), we would at a minimum need to have info on the function summary as well (e.g. a bitmask of which formal parameters yield constant folding opportunities because they are compared against constants). And as Easwaran noted, in order to make good conclusions we may need to know what the constant values are (both passed at the callsite, and which are compared against in the callee function summary).
I think until there are some uses of this new info in the function importer and an exploration of the resulting gains (e.g. we start importing and therefore inlining additional profitable functions, or are able to use this to reduce the amount of importing without losing performance) we won't know which summary representation is useful. I'm a little concerned about bloating the summary index with an extra vbr per call edge without any data or specific use cases. I suggest adding the importer heuristic and collecting some data before pushing the index change upstream.
I have some additional comments below, but you probably don't want to address these until the above points are resolved.
include/llvm/Bitcode/LLVMBitCodes.h | ||
---|---|---|
197 | Naming convention for fields uses "_" not lower camel case. | |
include/llvm/IR/ModuleSummaryIndex.h | ||
244 | Since we are accumulating the info for all callsites, there isn't a reason to keep this in a separate structure from CalleeInfo. I.e. the profile counts are accumulated in CalleeInfo, even though that is essentially also callsite info. If we decide to keep a separate edge per callsite eventually, the CalleeInfo struct will simply change to a CallsiteInfo struct and we would remove CallsiteCount. So better to have this info in CalleeInfo directly. | |
lib/Analysis/ModuleSummaryAnalysis.cpp | ||
125 | It isn't clear why it would result in nonsense data - we are setting this based on arguments passed to the indirect call, so if argument N is constant, it presumably should be marked constant for each profiled target. If there is a reason to not do it at this time, then IndirectCallEdges should go back to its previous type (no std::pair). | |
lib/Bitcode/Reader/BitcodeReader.cpp | ||
6276 | Similar to Mehdi's comment below, this part needs to handle existing bitcode. | |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
3359 | Descriptions here and below record need to be updated. | |
3444 | Descriptions here and below record need updating. | |
3501 | Remove newline change. |
include/llvm/IR/ModuleSummaryIndex.h | ||
---|---|---|
244 | oh I see, it doesn't matter what caller calls with constants as long as the info is accumulated only in one module. |
include/llvm/IR/ModuleSummaryIndex.h | ||
---|---|---|
244 | Sorry I don't understand your comment? Each caller function summary contains a single edge to each unique callee, and the info is accumulated for all calls from that function to a given callee function. I don't understand the "as long as the info is accumulated only in one module"? |
include/llvm/IR/ModuleSummaryIndex.h | ||
---|---|---|
244 | So my I firstly thought that I should make a separate Callsite info because this info would be different depending on the caller in one module, so I would probably want to |
include/llvm/IR/ModuleSummaryIndex.h | ||
---|---|---|
244 | Well the CalleeInfo is only combined within each function within each module. If there are calls from multiple functions within a module, it will not be combined across the different callers. Or are you talking about functions with weak linkage where there can be a multiple callers with the same name in different modules? In that case, right, we do not combine across modules. |
include/llvm/IR/ModuleSummaryIndex.h | ||
---|---|---|
244 | Ok nevermind. I though that in computeFunctionSummary it is iterating on every function, which would combine the data of all calls together. |
Naming convention for fields uses "_" not lower camel case.