This is an archive of the discontinued LLVM Phabricator instance.

[LoopCacheAnalysis]: Add support for negative stride
ClosedPublic

Authored by rcraik on Jan 20 2020, 1:04 PM.

Details

Summary

LoopCacheAnalysis currently assumes the loop will be iterated over in
a forward direction. This patch addresses the issue by using the
absolute value of the stride when iterating backwards.

Note: this patch will treat negative and positive array access the
same, resulting in the same cost being calculated for single and
bi-directional access patterns. This should be improved in a
subsequent patch.

Diff Detail

Event Timeline

rcraik created this revision.Jan 20 2020, 1:04 PM

LGTM, will wait for others to approve.

Correct me if I'm wrong but it seems you pretend negative steps are positive ones. While this is just a best-effort analysis, it seems wrong.
Take

for (i = N; i > 0; i--)
  A[i] = A[N - i];

which should have a different cost than

for (i = N; i > 0; i--)
  A[i] = A[i];

which is the same as

for (i = 0; i < N; i++)
  A[i] = A[i];

Can you add these examples to the tests please?

llvm/lib/Analysis/LoopCacheAnalysis.cpp
359

The example code is not really helpful here.

381

No braces.

rcraik updated this revision to Diff 241765.Jan 31 2020, 10:17 AM

Added the testcases suggested by @jdoerfert

jdoerfert added inline comments.Jan 31 2020, 10:30 AM
llvm/test/Analysis/LoopCacheAnalysis/PowerPC/compute-cost.ll
70

As I mentioned, this should *not* be scored as the ones below. Add a FIXME here to explain that. In the code we should also add a FIXME to indicate that this is not the best handling of negative strides.

I took each of these example loops and, taking N = 1024, ran each loop 10 000 000 times. These testcases I ran 10 times each and found no statistical difference between either the number of L1 cache load misses or the L1 cache miss rate between each testcase. This indicates that the cost of these loops is the same, which is what this patch implements.

llvm/lib/Analysis/LoopCacheAnalysis.cpp
381

I checked the LLVM Coding Standards, and it doesn't mention braces vs no braces for single statement if/for/etc. blocks. Since this statement covers multiple lines, I would rather keep the braces for clarity.

jdoerfert requested changes to this revision.Jan 31 2020, 1:19 PM

I took each of these example loops and, taking N = 1024, ran each loop 10 000 000 times. These testcases I ran 10 times each and found no statistical difference between either the number of L1 cache load misses or the L1 cache miss rate between each testcase. This indicates that the cost of these loops is the same, which is what this patch implements.

Thanks for trying to verify this but there is a problem with the setup. With N = 1024, the memory footprint M of each example loops is 1024 * sizeof(A[0]), now I don't know what types you used but let's assume double. The footprint M is 1024 * sizeof(double) = 1024 * 8 = 8192 bytes. My laptop L1 data cache is 32KiB in size. That means I can fit the array 3 times into the L1 cache. Now, once the loop was traversed a single time, the L1 cache contains all of the array and there is little reason to assume eviction. Every consequent iteration, so 10 000 000 -1 times, will most likely only see L1 cache hits no matter how you iterate over the array.

llvm/lib/Analysis/LoopCacheAnalysis.cpp
359

Especially since it is way to complex. Why do we need the struct, the function, ...

381

FWIW, I went through the coding standard myself again. As far as I can tell, every conditional follows the same rule: If it's a single statement, no braces, otherwise braces. (the alternative case has braces if the consequence has and vise versa)

This revision now requires changes to proceed.Jan 31 2020, 1:19 PM
rcraik added a comment.Feb 4 2020, 8:52 AM

Correct me if I'm wrong but it seems you pretend negative steps are positive ones. While this is just a best-effort analysis, it seems wrong.
Take

for (i = N; i > 0; i--)
  A[i] = A[N - i];

which should have a different cost than

for (i = N; i > 0; i--)
  A[i] = A[i];

which is the same as

for (i = 0; i < N; i++)
  A[i] = A[i];

Can you add these examples to the tests please?

Could you explain why you believe the first test should have a different cost than the second two? The first and last testcase both iterate over the array in a forwards direction (at least in terms of loads) while the middle one is the only one which goes backwards.

Correct me if I'm wrong but it seems you pretend negative steps are positive ones. While this is just a best-effort analysis, it seems wrong.
Take

for (i = N; i > 0; i--)
  A[i] = A[N - i];

which should have a different cost than

for (i = N; i > 0; i--)
  A[i] = A[i];

which is the same as

for (i = 0; i < N; i++)
  A[i] = A[i];

Can you add these examples to the tests please?

Could you explain why you believe the first test should have a different cost than the second two? The first and last testcase both iterate over the array in a forwards direction (at least in terms of loads) while the middle one is the only one which goes backwards.

Going backwards is no different from going forward (assuming we ignore prefetchers). So I argue 2 and 3 are the same when it comes to cache behavior. In both we access the same memory twice in each iteration, but since it is the same memory there is actually only a single cache miss per iteration (assuming one array element per cache line for simplicity, otherwise we just get a factor everywhere). Now in version 1 we access two different elements in all but potentially one iteration (the middle) so we get two cache misses in each iteration.

rcraik updated this revision to Diff 242717.Feb 5 2020, 11:49 AM
rcraik edited the summary of this revision. (Show Details)

Updated comments and testcases

jdoerfert accepted this revision.Feb 8 2020, 11:02 AM

Sorry for my delay, traveling...

LGTM. Thanks for the fixme and the test cases!

This revision is now accepted and ready to land.Feb 8 2020, 11:02 AM
This revision was automatically updated to reflect the committed changes.