Details
- Reviewers
fhahn Ayal - Commits
- rG494d28ec07dd: [LoopVectorize] Add pre-commit tests for D152366
Diff Detail
Event Timeline
Taking another look, I think we need a few more tests for cases not covered by the current tests:
- uncomputable BTC in the outer loop
- inner and outer inductions decreasing (constant step)
- inner and outer induction incremented by non-constant
llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll | ||
---|---|---|
746 | nit: remove redundant indvars prefixes. |
- inner and outer inductions decreasing (constant step)
Hi @fhahn, perhaps I'm missing something but the runtime checks already look broken to me for decreasing IVs in the inner loop:
#include <stdint.h> void decreasing_inner_iv(int32_t *dst, int32_t *src, int m, int n) { for (int i = 0; i < m; i++) { for (int j = n - 1; j >= 0; j--) { dst[(i * (n + 1)) + j] += src[(i * n) + j]; } } }
These are the SCEVs for the checks:
LAA: Adding RT check for range: Start: {%dst,+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.cond1.preheader.us> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.co nd1.preheader.us> LAA: Adding RT check for range: Start: {%src,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%for.cond1.preheader.us> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%for.cond1.preheade r.us> Calculating cost of runtime checks: 1 for %bound0 = icmp ult ptr %scevgep, %scevgep36 1 for %bound1 = icmp ult ptr %scevgep35, %scevgep34 1 for %found.conflict = and i1 %bound0, %bound1
I don't see how the Start and End values correspond to actual range of addresses accessed in the inner loop?
- Added more test cases for uncomputable trip count in the outer loop, decreasing inner IV and decreasing outer IV.
- inner and outer induction incremented by non-constant
I've added the other tests you suggested, but I'm not sure of the value of having tests for non-constant IV increments. For the inner loop a non-constant IV increment will lead to SCEV checks to ensure we only enter the vector loop if the stride is 1, in which case it's going to be a constant anyway. For the outer loop the tests already have outer loop non-constant memory access strides, i.e. tests like this:
for (int i = m - 1; i >= 0; i--) { for (int j = 0; j <= n; j++) { dst[(i * stride1) + j] += src[(i * stride2) + j]; }
where the stride for the outer loop is stride1 so by making the outer IV increment non-constant all I'm doing is transforming an already unknown stride into another unknown stride, unless I'm missing something?
Thanks for the update, I'll take a closer look in the middle of the week when I am back from traveling.
I don't see how the Start and End values correspond to actual range of addresses accessed in the inner loop?
IIUC : {%dst,+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.cond1.preheader.us> is effectively %dst + i * (4 *(1 + n)) where j = 0 (lower bound) and {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.cond1.preheader.us> is %dst + n i * (4 *(1 + n)) for j=n-1.
Does that make sense?
Even though we version with 1 at the moment, I think it is still valuable to have the additional test coverage in case the versioning (or something else) changes which may impact the correctness of the generated RT checks.
Hi @fhahn, I added another test case for the inner strides being unknown - see @unknown_inner_stride.
nit: remove redundant indvars prefixes.