[SROA] enable splitting for non-whole-alloca loads and stores
Needs ReviewPublic

Authored by inouehrs on Tue, May 9, 5:52 AM.



Currently, SROA splits loads and stores only when they are accessing the whole alloca.
This patch relaxes this limitation to allow splitting a load/store if all other loads and stores to the alloca are disjoint to or fully included in the current load/store. If there is no other load or store that crosses the boundary of the current load/store, the current splitting implementation works as is.
The whole-alloca loads and stores meet this new condition and so they are still splittable.

Here is a simplified motivating example.

struct record {
    long long a;
    int b;
    int c;

int func(struct record r) {
    for (int i = 0; i < r.c; i++)
    return r.b;

When updating r.b (or r.c as well), LLVM generates redundant instructions on some platforms (such as x86_64, ppc64); here, r.b and r.c are packed into one 64-bit GPR when the struct is passed as a method argument.

With this patch, the above example is compiled into only few instructions without loop.
Without the patch, unnecessary loop-carried dependency is introduced by SROA and the loop cannot be eliminated by the later optimizers.

The path length of the std::string's assign method with istreambuf_iterator (an example below) is reduced by about 10% on ppc64.


Diff Detail

inouehrs created this revision.Tue, May 9, 5:52 AM
rnk added a reviewer: rnk.Fri, May 12, 3:35 PM
inouehrs updated this revision to Diff 98944.Sun, May 14, 11:06 PM
inouehrs edited the summary of this revision. (Show Details)
inouehrs updated this revision to Diff 99408.Thu, May 18, 1:34 AM
inouehrs retitled this revision from [WIP] SROA for method argument to [SROA] enable splitting for non-whole-alloca loads and stores.
inouehrs edited the summary of this revision. (Show Details)
inouehrs added a reviewer: efriedma.

I added a new unit test and modified affected existing unit tests.

rnk added inline comments.Thu, May 18, 10:54 AM

Does this have to be n^2 in the number of slices? Can't you leverage the sorting to be more efficient?


These CHECKs have no colon, they aren't firing.

jroelofs added inline comments.

Also, the NEXT must be capitalized.

inouehrs updated this revision to Diff 99527.EditedThu, May 18, 11:34 PM
  • Implemented a linear-time algorithm (instead of N^2 for number of slices) for searching splittable slices.
  • Fixed errors in the unit test.
inouehrs marked 3 inline comments as done.Thu, May 18, 11:37 PM
inouehrs added inline comments.

I changed the algorithm to a linear-time one. I confirmed that this algorithm generates the same results to the previous N^2 algorithm during the bootstrap test.


Thank you so much for pointing this out. Fixed these errors.

inouehrs marked 2 inline comments as done.Thu, May 18, 11:38 PM