[SROA] enable splitting for non-whole-alloca loads and stores
Needs ReviewPublic

Authored by inouehrs on Tue, May 9, 5:52 AM.

Details

Summary

Currently, SROA splits loads and stores only when they are accessing the whole alloca.
This patch relaxes this limitation to allow splitting a load/store if all other loads and stores to the alloca are disjoint to or fully included in the current load/store. If there is no other load or store that crosses the boundary of the current load/store, the current splitting implementation works as is.
The whole-alloca loads and stores meet this new condition and so they are still splittable.

Here is a simplified motivating example.

struct record {
    long long a;
    int b;
    int c;
};

int func(struct record r) {
    for (int i = 0; i < r.c; i++)
        r.b++;
    return r.b;
}

When updating r.b (or r.c as well), LLVM generates redundant instructions on some platforms (such as x86_64, ppc64); here, r.b and r.c are packed into one 64-bit GPR when the struct is passed as a method argument.

With this patch, the above example is compiled into only few instructions without loop.
Without the patch, unnecessary loop-carried dependency is introduced by SROA and the loop cannot be eliminated by the later optimizers.

The path length of the std::string's assign method with istreambuf_iterator (an example below) is reduced by about 10% on ppc64.

assign((std::istreambuf_iterator<char>(ifs)),
       (std::istreambuf_iterator<char>()));

Diff Detail

inouehrs created this revision.Tue, May 9, 5:52 AM
rnk added a reviewer: rnk.Fri, May 12, 3:35 PM
inouehrs updated this revision to Diff 98944.Sun, May 14, 11:06 PM
inouehrs edited the summary of this revision. (Show Details)
inouehrs updated this revision to Diff 99408.Thu, May 18, 1:34 AM
inouehrs retitled this revision from [WIP] SROA for method argument to [SROA] enable splitting for non-whole-alloca loads and stores.
inouehrs edited the summary of this revision. (Show Details)
inouehrs added a reviewer: efriedma.

I added a new unit test and modified affected existing unit tests.

rnk added inline comments.Thu, May 18, 10:54 AM
lib/Transforms/Scalar/SROA.cpp
3992–3993

Does this have to be n^2 in the number of slices? Can't you leverage the sorting to be more efficient?

test/Transforms/SROA/basictest.ll
1707–1709

These CHECKs have no colon, they aren't firing.

jroelofs added inline comments.
test/Transforms/SROA/basictest.ll
1707–1709

Also, the NEXT must be capitalized.

inouehrs updated this revision to Diff 99527.EditedThu, May 18, 11:34 PM
  • Implemented a linear-time algorithm (instead of N^2 for number of slices) for searching splittable slices.
  • Fixed errors in the unit test.
inouehrs marked 3 inline comments as done.Thu, May 18, 11:37 PM
inouehrs added inline comments.
lib/Transforms/Scalar/SROA.cpp
3992–3993

I changed the algorithm to a linear-time one. I confirmed that this algorithm generates the same results to the previous N^2 algorithm during the bootstrap test.

test/Transforms/SROA/basictest.ll
1707–1709

Thank you so much for pointing this out. Fixed these errors.

inouehrs marked 2 inline comments as done.Thu, May 18, 11:38 PM