Before this patch, the compiler gave a bump to the inline-threshold
when the total size of the allocas passed as arguments to the
callee was below 256 bytes.
This heuristic ignores that some of these allocas could have be removed
by SROA if inlining was applied.
Ideally, this bonus would be attributed to the threshold once the
size of all the allocas that could not be handled by SROA is known:
at the end of the InlineCost analysis.
However, we may never reach this point if the inline-cost analysis exits
early when the inline cost goes over the threshold mid-analysis.
This patch proposes:
- Attribute the bonus in the inline-threshold when allocas are passed as arguments (regardless of their total size).
- Assigns a cost to each alloca proportional to its size, such that the cost of all the allocas cancels the bonus.
Potential problems:
- This patch assumes that removing alloca instructions with SROA is always profitable. This may not be the case if the total size of the allocas is still too big to be promoted to registers/LDS.
- Redundant calls to getTotalAllocaSize
- Awkwardly, the threshold attributed contributes to the single-bb and vector bonus.
The way I understand this function is that it does a sum of the size of all the allocas used to pass arguments to a function, but only takes allocas that are in flat/private into account
If that's correct, I would rename the function to something more like getCallArgsTotalAllocaSize to reflect it.
Also nit: we seem to generally use unsigned for size types in LLVM, I also prefer size_t but I would just stay consistent with the unsigned below and also use unsigned here