This is an archive of the discontinued LLVM Phabricator instance.

[LFTR] Fix post-inc pointer IV with truncated exit count (PR41998)
ClosedPublic

Authored by nikic on Jun 22 2019, 3:09 AM.

Details

Summary

Fixes https://bugs.llvm.org/show_bug.cgi?id=41998. Usually when we have a truncated exit count we'll truncate the IV when comparing against the limit, in which case exit count overflow in post-inc form doesn't matter. However, for pointer IVs we don't do that, so we have to be careful about incrementing the IV in the wide type.

I'm fixing this by removing the IVCount variable (which was ExitCount or ExitCount+1) and replacing it with a UsePostInc flag, and then moving the actual limit adjustment to the individual cases (which are pointer IV where we add to the wide type, integer IV where we add to the narrow type and constant integer IV where we add to the wide type).

Diff Detail

Event Timeline

nikic created this revision.Jun 22 2019, 3:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2019, 3:09 AM
dim added a subscriber: dim.Jun 23 2019, 2:21 PM

FWIW, the original test case with pre-increment is fixed by this, e.g.:

#include <stdio.h>

__attribute__((__noinline__)) void parse_pre(const char *dataptr) {
  int col;
  char line[1024];
  char *lineptr;

  for (col = 0, lineptr = line; *dataptr != '\0'; dataptr++) {
    printf("before inner loop: col=%d, diff=%td\n", col, lineptr - line);
    do {
      printf("inside inner loop: col=%d, diff=%td\n", col, lineptr - line);
      *++lineptr = ' ';
      col++;
    } while (col & 7);
  }
}

int main(void) {
  parse_pre("a");

  return 0;
}

now runs the correct number of loops:

$ ./ps-pdf-pre-r364133-D63686
before inner loop: col=0, diff=0
inside inner loop: col=0, diff=0
inside inner loop: col=1, diff=1
inside inner loop: col=2, diff=2
inside inner loop: col=3, diff=3
inside inner loop: col=4, diff=4
inside inner loop: col=5, diff=5
inside inner loop: col=6, diff=6
inside inner loop: col=7, diff=7

The IR difference is this:

@@ -54,7 +54,7 @@ do.body:                                          ; pr
   %9 = load i8*, i8** %lineptr, align 8, !tbaa !2
   %incdec.ptr = getelementptr inbounds i8, i8* %9, i32 1
   store i8* %incdec.ptr, i8** %lineptr, align 8, !tbaa !2
-  store i8 32, i8* %9, align 1, !tbaa !8
+  store i8 32, i8* %incdec.ptr, align 1, !tbaa !8
   %10 = load i32, i32* %col, align 4, !tbaa !6
   %inc = add nsw i32 %10, 1
   store i32 %inc, i32* %col, align 4, !tbaa !6

I'm hesitant about this. Not because your patch is wrong, but because we've ended up with a ton of complexity involving the post-increment case. I'd really like to take a step back and see if we can factor apart some code here.

Assume for the moment that we have a transform which knows how to generate the pre-increment form, can we do post increment conversion as a post processing step?

If we were willing to strip, I think we could just strip flags and add one to both sides of the comparison *in whatever bitwidth the pre-inc form chose*. Do you agree?

If so, then I think we can move the post-inc legality/stripping entirely into a separate transform which runs *after* we've formed the arguments for the new comparison. (Which is slightly different than the current patch as we'd do the pre-to-post *after* the extend/trunc decision.)

I'm also really questioning the value of doing pre-to-post conversion in the middle end at all. I'm getting increasing tempted to move that to just before codegen, and just strip all the flags without concern. :)

LGTM. As indicated in previous comment, I have some design hesitation here, but the patch is correct and fixes a bug. I'm fine continuing the design discussion after submit as I trust Nikita to engage and follow through as needed.

This revision was not accepted when it landed; it landed in state Needs Review.Jun 29 2019, 2:24 AM
This revision was automatically updated to reflect the committed changes.
nikic added a comment.Jun 29 2019, 2:44 AM

I'm hesitant about this. Not because your patch is wrong, but because we've ended up with a ton of complexity involving the post-increment case. I'd really like to take a step back and see if we can factor apart some code here.

Assume for the moment that we have a transform which knows how to generate the pre-increment form, can we do post increment conversion as a post processing step?

If we were willing to strip, I think we could just strip flags and add one to both sides of the comparison *in whatever bitwidth the pre-inc form chose*. Do you agree?

If so, then I think we can move the post-inc legality/stripping entirely into a separate transform which runs *after* we've formed the arguments for the new comparison. (Which is slightly different than the current patch as we'd do the pre-to-post *after* the extend/trunc decision.)

I'm also really questioning the value of doing pre-to-post conversion in the middle end at all. I'm getting increasing tempted to move that to just before codegen, and just strip all the flags without concern. :)

I've been thinking the same thing ... the post-inc conversion here causes a lot of issues for unclear benefit. Some thoughts:

  • LFTR currently does two things: Simplify an existing exit condition, or switch to a different IV. If we're simplifying an existing post-inc exit condition, we likely want to preserve it in that form. Though I think that this may be a good bit simpler than the general transform as we shouldn't have to deal with different-width IV and exit count in that case (is that correct?) Generally splitting the exit condition simplification and the IV switch parts of LFTR may be worthwhile, because the latter is much more dicey due to all the undef/poison baggage it comes with.
  • Ideally we would not use post-inc form in the middle end and let LSR move to post-inc if profitable. Apart from the issues we're seeing here, this post-inc form is also non-canonical -- clearly (a + C1) == C2 should canonically be a == C2-C1, but we have hacks in InstCombine to preserve this for multi-use cases. There is some relevant discussion on D58633 and in particular some samples where using pre-inc form currently causes regressions. The first step would probably be to figure out what is going on there.
nikic added a comment.Jun 29 2019, 8:36 AM

Ideally we would not use post-inc form in the middle end and let LSR move to post-inc if profitable. Apart from the issues we're seeing here, this post-inc form is also non-canonical -- clearly (a + C1) == C2 should canonically be a == C2-C1, but we have hacks in InstCombine to preserve this for multi-use cases. There is some relevant discussion on D58633 and in particular some samples where using pre-inc form currently causes regressions. The first step would probably be to figure out what is going on there.

After looking into this a bit, I think that the problems of D58633 do not apply here. The issue there is that InstCombine can fold instructions inside loops that LSR does not understand because it requires loop simplify form. This doesn't apply here because indvars also works on loop simplify form. Given that, I think it should be fine to use pre-inc form in LFTR (possibly post-inc only if it's already used) and let LSR pick up the post-inc transform.

reames added a comment.Jul 1 2019, 9:51 AM

Ideally we would not use post-inc form in the middle end and let LSR move to post-inc if profitable. Apart from the issues we're seeing here, this post-inc form is also non-canonical -- clearly (a + C1) == C2 should canonically be a == C2-C1, but we have hacks in InstCombine to preserve this for multi-use cases. There is some relevant discussion on D58633 and in particular some samples where using pre-inc form currently causes regressions. The first step would probably be to figure out what is going on there.

After looking into this a bit, I think that the problems of D58633 do not apply here. The issue there is that InstCombine can fold instructions inside loops that LSR does not understand because it requires loop simplify form. This doesn't apply here because indvars also works on loop simplify form. Given that, I think it should be fine to use pre-inc form in LFTR (possibly post-inc only if it's already used) and let LSR pick up the post-inc transform.

I think we're on the same page here. Any chance you're motivated to actually do this? I'd be happy to review. :)