This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Add pre-commit tests for D152366
ClosedPublic

Authored by david-arm on Jun 29 2023, 5:37 AM.

Diff Detail

Event Timeline

david-arm created this revision.Jun 29 2023, 5:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 5:37 AM
david-arm requested review of this revision.Jun 29 2023, 5:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 5:37 AM
fhahn accepted this revision.Jul 19 2023, 10:25 AM

LGTM, thanks!

This revision is now accepted and ready to land.Jul 19 2023, 10:25 AM
david-arm updated this revision to Diff 542899.Jul 21 2023, 7:07 AM
  • Add new test case with triple nested loop due to comments on D152366

Taking another look, I think we need a few more tests for cases not covered by the current tests:

  • uncomputable BTC in the outer loop
  • inner and outer inductions decreasing (constant step)
  • inner and outer induction incremented by non-constant
llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll
746

nit: remove redundant indvars prefixes.

  • inner and outer inductions decreasing (constant step)

Hi @fhahn, perhaps I'm missing something but the runtime checks already look broken to me for decreasing IVs in the inner loop:

#include <stdint.h>

void decreasing_inner_iv(int32_t *dst, int32_t *src, int m, int n) {
  for (int i = 0; i < m; i++) {
    for (int j = n - 1; j >= 0; j--) {
      dst[(i * (n + 1)) + j] += src[(i * n) + j];
    }
  }
}

These are the SCEVs for the checks:

LAA: Adding RT check for range:
Start: {%dst,+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.cond1.preheader.us> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.co
nd1.preheader.us>
LAA: Adding RT check for range:
Start: {%src,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%for.cond1.preheader.us> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%for.cond1.preheade
r.us>
Calculating cost of runtime checks:
  1  for   %bound0 = icmp ult ptr %scevgep, %scevgep36
  1  for   %bound1 = icmp ult ptr %scevgep35, %scevgep34
  1  for   %found.conflict = and i1 %bound0, %bound1

I don't see how the Start and End values correspond to actual range of addresses accessed in the inner loop?

david-arm updated this revision to Diff 545140.Jul 28 2023, 6:51 AM
  • Added more test cases for uncomputable trip count in the outer loop, decreasing inner IV and decreasing outer IV.

Taking another look, I think we need a few more tests for cases not covered by the current tests:

  • inner and outer induction incremented by non-constant

I've added the other tests you suggested, but I'm not sure of the value of having tests for non-constant IV increments. For the inner loop a non-constant IV increment will lead to SCEV checks to ensure we only enter the vector loop if the stride is 1, in which case it's going to be a constant anyway. For the outer loop the tests already have outer loop non-constant memory access strides, i.e. tests like this:

for (int i = m - 1; i >= 0; i--) {
  for (int j = 0; j <= n; j++) {
    dst[(i * stride1) + j] += src[(i * stride2) + j];
 }

where the stride for the outer loop is stride1 so by making the outer IV increment non-constant all I'm doing is transforming an already unknown stride into another unknown stride, unless I'm missing something?

Taking another look, I think we need a few more tests for cases not covered by the current tests:

  • inner and outer induction incremented by non-constant

I've added the other tests you suggested, but I'm not sure of the value of having tests for non-constant IV increments. For the inner loop a non-constant IV increment will lead to SCEV checks to ensure we only enter the vector loop if the stride is 1, in which case it's going to be a constant anyway. For the outer loop the tests already have outer loop non-constant memory access strides, i.e. tests like this:

for (int i = m - 1; i >= 0; i--) {
  for (int j = 0; j <= n; j++) {
    dst[(i * stride1) + j] += src[(i * stride2) + j];
 }

where the stride for the outer loop is stride1 so by making the outer IV increment non-constant all I'm doing is transforming an already unknown stride into another unknown stride, unless I'm missing something?

Thanks for the update, I'll take a closer look in the middle of the week when I am back from traveling.

fhahn added a comment.Aug 2 2023, 4:45 AM
LAA: Adding RT check for range:
Start: {%dst,+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.cond1.preheader.us> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.co
nd1.preheader.us>
LAA: Adding RT check for range:
Start: {%src,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%for.cond1.preheader.us> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%for.cond1.preheade
r.us>

I don't see how the Start and End values correspond to actual range of addresses accessed in the inner loop?

IIUC : {%dst,+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.cond1.preheader.us> is effectively %dst + i * (4 *(1 + n)) where j = 0 (lower bound) and {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.cond1.preheader.us> is %dst + n i * (4 *(1 + n)) for j=n-1.

Does that make sense?

Taking another look, I think we need a few more tests for cases not covered by the current tests:

  • inner and outer induction incremented by non-constant

I've added the other tests you suggested, but I'm not sure of the value of having tests for non-constant IV increments. For the inner loop a non-constant IV increment will lead to SCEV checks to ensure we only enter the vector loop if the stride is 1, in which case it's going to be a constant anyway. For the outer loop the tests already have outer loop non-constant memory access strides, i.e. tests like this:

for (int i = m - 1; i >= 0; i--) {
  for (int j = 0; j <= n; j++) {
    dst[(i * stride1) + j] += src[(i * stride2) + j];
 }

where the stride for the outer loop is stride1 so by making the outer IV increment non-constant all I'm doing is transforming an already unknown stride into another unknown stride, unless I'm missing something?

Even though we version with 1 at the moment, I think it is still valuable to have the additional test coverage in case the versioning (or something else) changes which may impact the correctness of the generated RT checks.

This revision was automatically updated to reflect the committed changes.

Hi @fhahn, I added another test case for the inner strides being unknown - see @unknown_inner_stride.