This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] add constArgumentsBitmask to caller summary
AbandonedPublic

Authored by Prazek on Aug 25 2016, 11:10 AM.

Details

Summary

In order to import functions that exceeds size threshold, additional
information is needed. By knowing which arguments are constant on
the caller site, we can build some heuristics to give some additional
bonus to calls with multiple constant arguments (assuming that the
called functions will get simplified enough to be worth inlining).

Diff Detail

Event Timeline

Prazek updated this revision to Diff 69271.Aug 25 2016, 11:10 AM
Prazek retitled this revision from to [ThinLTO] add constArgumentsBitmask to caller summary.
Prazek updated this object.
Prazek added reviewers: tejohnson, eraman, mehdi_amini.
Prazek added a subscriber: llvm-commits.
Prazek added inline comments.Aug 25 2016, 11:21 AM
include/llvm/IR/ModuleSummaryIndex.h
246

I wasn't sure about the size here, maybe it is even enough to have bitmask here of 8 or less bits.
I assume that this struct will be used to add some extra information that is usefull

Prazek updated this revision to Diff 69272.Aug 25 2016, 11:26 AM

removed empty line

mehdi_amini edited edge metadata.Aug 25 2016, 11:58 AM

Why a bit mask and not an integer with the number of arguments that are constant?
Do you plan to benefit from knowing which argument is constant?

lib/Bitcode/Reader/BitcodeReader.cpp
6369

You need to handle existing bitcode.

if (Record.size() >= ...)

Why a bit mask and not an integer with the number of arguments that are constant?
Do you plan to benefit from knowing which argument is constant?

Yes, in order to make better heuristic, like the one that Easwaran come up with

  • have some bitmask to say if the argument is compered inside function body as heuristic to determine if branch depends on it.

I will need the the information about the arguments.

eraman edited edge metadata.Aug 25 2016, 4:55 PM

Why a bit mask and not an integer with the number of arguments that are constant?
Do you plan to benefit from knowing which argument is constant?

Yes, in order to make better heuristic, like the one that Easwaran come up with

  • have some bitmask to say if the argument is compered inside function body as heuristic to determine if branch depends on it.

I will need the the information about the arguments.

Why a bit mask and not an integer with the number of arguments that are constant?
Do you plan to benefit from knowing which argument is constant?

Yes, in order to make better heuristic, like the one that Easwaran come up with

  • have some bitmask to say if the argument is compered inside function body as heuristic to determine if branch depends on it.

I will need the the information about the arguments.

More context on this (which was an offline conversation I had with Piotr). One major use of knowing the constant parameters in inline cost analysis is to determine if any branch condition becomes constant and not count the body of blocks that becomes unreachable. One thing we could do is to build a map from conditions based on parameters to number of instructions that become unreachable if the predicate is true. For example,

define i32 @foo(i32 %a) {
entry:

%cmp = icmp sgt i32 %a, 10
br i1 %cmp, label %if.then, label %if.end

if.then: ; preds = %entry
// Lots of instructions here.
if.end: ; preds = %if.then, %entry

ret i32 %a

}

we could analyze this and map the (a > 10) condition to the potential reduction in instructions. If we were to build something like this, it will be useful to know the constant values of parameters. This patch only keeps track of whether a parameter receives a constant value in some callsite, so it is less information than what would be needed, but better than knowing how many parameters are constants in some context.

tejohnson edited edge metadata.Aug 26 2016, 8:47 AM

It seems like in order to make the bitmask useful (over a simple count of the number of constant args), we would at a minimum need to have info on the function summary as well (e.g. a bitmask of which formal parameters yield constant folding opportunities because they are compared against constants). And as Easwaran noted, in order to make good conclusions we may need to know what the constant values are (both passed at the callsite, and which are compared against in the callee function summary).

I think until there are some uses of this new info in the function importer and an exploration of the resulting gains (e.g. we start importing and therefore inlining additional profitable functions, or are able to use this to reduce the amount of importing without losing performance) we won't know which summary representation is useful. I'm a little concerned about bloating the summary index with an extra vbr per call edge without any data or specific use cases. I suggest adding the importer heuristic and collecting some data before pushing the index change upstream.

I have some additional comments below, but you probably don't want to address these until the above points are resolved.

include/llvm/Bitcode/LLVMBitCodes.h
197

Naming convention for fields uses "_" not lower camel case.

include/llvm/IR/ModuleSummaryIndex.h
244

Since we are accumulating the info for all callsites, there isn't a reason to keep this in a separate structure from CalleeInfo. I.e. the profile counts are accumulated in CalleeInfo, even though that is essentially also callsite info. If we decide to keep a separate edge per callsite eventually, the CalleeInfo struct will simply change to a CallsiteInfo struct and we would remove CallsiteCount. So better to have this info in CalleeInfo directly.

lib/Analysis/ModuleSummaryAnalysis.cpp
125

It isn't clear why it would result in nonsense data - we are setting this based on arguments passed to the indirect call, so if argument N is constant, it presumably should be marked constant for each profiled target.

If there is a reason to not do it at this time, then IndirectCallEdges should go back to its previous type (no std::pair).

lib/Bitcode/Reader/BitcodeReader.cpp
6276

Similar to Mehdi's comment below, this part needs to handle existing bitcode.

lib/Bitcode/Writer/BitcodeWriter.cpp
3359

Descriptions here and below record need to be updated.

3444

Descriptions here and below record need updating.

3501

Remove newline change.

Prazek added inline comments.Aug 26 2016, 11:34 AM
include/llvm/IR/ModuleSummaryIndex.h
244

oh I see, it doesn't matter what caller calls with constants as long as the info is accumulated only in one module.

tejohnson added inline comments.Aug 26 2016, 11:41 AM
include/llvm/IR/ModuleSummaryIndex.h
244

Sorry I don't understand your comment? Each caller function summary contains a single edge to each unique callee, and the info is accumulated for all calls from that function to a given callee function. I don't understand the "as long as the info is accumulated only in one module"?

Prazek added inline comments.Aug 26 2016, 1:53 PM
include/llvm/IR/ModuleSummaryIndex.h
244

So my I firstly thought that I should make a separate Callsite info because this info would be different depending on the caller in one module, so I would probably want to
keep it somewhere else than the callee info. But the Callee info is combined only in one module, so it really doesn't matter if I will decide to import one function only based on the one caller or on combined all callers.

tejohnson added inline comments.Aug 26 2016, 2:01 PM
include/llvm/IR/ModuleSummaryIndex.h
244

Well the CalleeInfo is only combined within each function within each module. If there are calls from multiple functions within a module, it will not be combined across the different callers.

Or are you talking about functions with weak linkage where there can be a multiple callers with the same name in different modules? In that case, right, we do not combine across modules.

Prazek added inline comments.Aug 26 2016, 4:40 PM
include/llvm/IR/ModuleSummaryIndex.h
244

Ok nevermind. I though that in computeFunctionSummary it is iterating on every function, which would combine the data of all calls together.

Prazek abandoned this revision.Oct 8 2016, 9:18 AM

After testing it on SPEC 2k6, there was no improvement with this. Dropping