LLVM Inliner encourages to inline single block callee by giving it higher threshold. However, some large single block callees still cannot be inlined although they have many redundant instructions that can be removed if they are inlined.
The motivation example is a fully unrolled 3x3 matrix multiplication. It loads every data in matrix a and b three times because of the Stores between them. The SROA analysis can figure out that these Stores can be simplified and then these redundant loads should also be free. Thus, running sroa and gvn after inlining the callee can remove 54% of the instructions.
define void @outer(%struct.matrix* %a, %struct.matrix* %b) { %c = alloca %struct.matrix call void @matrix_multiply(%struct.matrix* %a, %struct.matrix* %b, %struct.matrix* %c) ret void } define void @matrix_multiply(%struct.matrix* %a, %struct.matrix* %b, %struct.matrix* %c) {
This simple patch tries to find repeated loads in the callee. It stops finding Loads if there are Stores that cannot be simplified or there are function calls in the callee. The above restriction can be relaxed to find more CSE opportunities with more expensive analysis.
I tested the patch with SPEC20xx and llvm-test-suite using O3+LTO/O3/O2/Os on x86 and AArch64. Only spec2006/milc gets impacted when using O3+LTO. On X86, the performance is improved by +3.6% and the code size is increased by 0.07%. On AArch64, the performance is improved by +4.8% and code size has no change.
I wouldn't describe this (or limit it) to anything to do with SROA.
It should just model the CSE-ing of repeated loads that is expected to happen whenever we simplify away the stores that would otherwise cause them to be loads.
I wouldn't disable it when we don't have SROA-candidates as I can see it being valuable in plenty of other cases. As a trivial example, if a predicate causes all of the stores to memory to be dead, we won't traverse those blocks, and we should track the CSE simplification to any redundant loads. It'd be good to have a test case for this as well. As a concrete example of where I think this might well come up: