In this patch we change test cases from using "CHECK" to using "CHECK-NEXT", which is to ensure the order of loops output by loop cache analysis is correct. After D124725 we fixed the non-deterministic output order hence we did not use "CHECK-DAG" anymore, and now we should really use "CHECK-NEXT" to make sure the loops in the output loop vector follow the right order.
Details
- Reviewers
bmahjour Whitney Meinersbur - Group Reviewers
Restricted Project - Commits
- rG80ab16d0ed75: [NFC][LoopCacheAnalysis] Update test cases to make sure the outputs follow the…
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Although the problem reported in PR55233 is worsened by D123400, the underlying issue existed even before that (as illustrated by the example in PR55233), so this patch won't completely solve the overflow problem, although it would make it a bit less likely to occur. I think a proper solution to PR55233 would need to address the overflow problem itself (eg by increasing the range of values that the cost model can represent).
I had thought about this approach when working on D123400, and while I agree it will make the cost values smaller, I have some reservations about its accuracy.
One observation is that with this patch, the cost values for non-consecutive accesses in deep nests become smaller, but also their relative differences reduce as well. For example in single-store.ll, the cost difference between for.k and for.j is in the orders of magnitude, while the cost difference between for.j and for.i is much smaller. I worry that this dilution of cost difference might make it harder to find the right order in loop nests that contain multiple reference groups. I think this approach slightly diverges from the concepts presented in the paper, inasmuch as, the unit of measurement for the cost value is meant to be number of cache lines, but with the multiplication by the "depth" factor the value will no longer be estimating number of cache lines.
llvm/lib/Analysis/LoopCacheAnalysis.cpp | ||
---|---|---|
323 ↗ | (On Diff #427203) | i-loop -> j-loop? |
324 ↗ | (On Diff #427203) | i-loop -> j-loop? |
332 ↗ | (On Diff #427203) | This will always give a factor of 0 for the inner-most subscript. Is that intentional? |
llvm/test/Analysis/LoopCacheAnalysis/PowerPC/loads-store.ll | ||
14–15 | why does this change here, but didn't change in D123400? |
Thanks Bardia for the comment! Regarding the test case in loads-store.ll: in D123400 it should actually have been changed to "; CHECK: Loop 'for.i' has cost = 300000000" since the cost becomes 300000000 instead of 3000000. Currently the test still passes just because it happens to also match "Loop 'for.i' has cost = 3000000". After this patch the cost reduces from 300000000 to 6000000, which is consistent with the bahavior in other tests.
I agree that our long-term goal is to address the overflow problem itself possibly by increasing the range of values that the cost model can represent, and this patch might be a short-term solution. IMHO I think however we want to solve the problem, the numbers calculated by loop cache analysis will always be an estimation of some sort. We try to preserve its accuracy but might need to do some adjusments due to various reasons. For example, in loads-store.ll, the cost of "300000000" for loop "for.i" that we currently get is already not an accurate estimation of the accessed cache line numbers -- "3000000" might be a closer estimate but since we want to distinguish between outer loops, we adjusted "3000000" to "300000000".
I was hoping what this patch does align with the reasoning above. Nevertheless, I'm open to other solutions as well. I wonder if you feel like we should continue this patch or we should seek other solutions?
llvm/lib/Analysis/LoopCacheAnalysis.cpp | ||
---|---|---|
332 ↗ | (On Diff #427203) | Thanks, will update it in the next version. |
332 ↗ | (On Diff #427203) | For innermost subscript usually the access is consecutive hence this "else" block won't be reached. If the access is not consecutive, the factor being 0 will still make the cost of innermost subscript to be smallest, which kind of works although not perfect. An easy improvement/fix is to use something like (getNumSubscripts() - Index - 1)==0?1:(getNumSubscripts() - Index - 1). If you think this is better I can upate it in the next version. |
Hi Bardia @bmahjour, after some thoughts I think regarding this patch we would chase a longer-term solution. For now I think I could rework and change this patch (both title and content) to an NFC patch that only updates the test cases, i.e., from using CHECK to CHECK-NEXT, since we really want to make sure that the output is ordered and the order is correct. I'm wondering how you think about it?
Yes, I agree a long term solution (eg of using double type) would be better. Updating the checks to expect the exact order seems like a good idea to me. Thanks!
@bmahjour I updated the patch to an NFC patch with test case updates only. I'd appreciate it if you could take a look :)
why does this change here, but didn't change in D123400?