Patch to fix pragma metadata for do-while loops
ClosedPublic

Authored by deepak2427 on Jun 28 2018, 6:36 AM.

Diff Detail

Repository
rL LLVM
deepak2427 created this revision.Jun 28 2018, 6:36 AM

Test?
(Or was this meant to contain [Private] in title?)

It's a patch for a bug in clang.
I have requested for a Bugzilla account, however thought of putting up the
patch in the meantime.
Do I need to mark it '[Private]'?

Please upload patch with full context

It's a patch for a bug in clang.
I have requested for a Bugzilla account, however thought of putting up the
patch in the meantime.
Do I need to mark it '[Private]'?

Phab is the correct way to submit patches.
But having a bugreport in bugzilla is good too.
But the test will be needed regardless of the patch submission method.
And yes, please do always upload all patches with full context (-U99999).

Add full context

Phab is the correct way to submit patches.
But having a bugreport in bugzilla is good too.
But the test will be needed regardless of the patch submission method.
And yes, please do always upload all patches with full context (-U99999).

Sorry about the context.
Can I add the test file to this patch itself? Or should that be another
patch?

Phab is the correct way to submit patches.
But having a bugreport in bugzilla is good too.
But the test will be needed regardless of the patch submission method.
And yes, please do always upload all patches with full context (-U99999).

Sorry about the context.
Can I add the test file to this patch itself? Or should that be another
patch?

The test needs to be in test/CodeGen/ (i'm guessing).
See other files there for examples.
(And do check that $ ninja check-clang passes)

Add tests and the patch.

deepak2427 added a reviewer: Restricted Project.Jun 28 2018, 2:21 PM
deepak2427 set the repository for this revision to rC Clang.
deepak2427 added a project: Restricted Project.

I'm not sure we can use -O in tests at all, and i'm not sure it is even needed here since you are only testing codegen.

I had based it on the other tests in clang/test/CodeGen.
Do we not need the -o to output to standard output?
Or did you mean something else?

Do I need to add specific reviewers?

xbolva00 added a subscriber: rsmith.Jul 1 2018, 2:28 PM
bjope added a subscriber: bjope.Jul 2 2018, 5:20 PM

I tried running

/clang -cc1 -O3 -funroll-loops -S -emit-llvm pragma-do-while-unroll.cpp -o - -mllvm -print-after-all

and I get this

...
!2 = distinct !{!2, !3}
!3 = !{!"llvm.loop.unroll.count", i32 3}
!4 = !{!5, !5, i64 0}
!5 = !{!"int", !6, i64 0}
!6 = !{!"omnipotent char", !7, i64 0}
!7 = !{!"Simple C/C++ TBAA"}
!8 = distinct !{!8, !9}
!9 = !{!"llvm.loop.unroll.count", i32 5}
*** IR Dump After Combine redundant instructions ***
; Function Attrs: nounwind
define i32 @test(i32* %a, i32 %n) local_unnamed_addr #0 {
entry:
  br label %do.body, !llvm.loop !2

do.body:                                          ; preds = %do.body, %entry
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
  %sum.0 = phi i32 [ 0, %entry ], [ %add5, %do.body ]
  %0 = zext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds i32, i32* %a, i64 %0
  %1 = load i32, i32* %arrayidx, align 4, !tbaa !4
  %add = add nsw i32 %1, 1
  store i32 %add, i32* %arrayidx, align 4, !tbaa !4
  %add5 = add nsw i32 %sum.0, %add
  %inc = add nuw nsw i32 %i.0, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %do.body, label %do.end, !llvm.loop !2

do.end:                                           ; preds = %do.body
  br label %do.body6, !llvm.loop !8

do.body6:                                         ; preds = %do.body6, %do.end
  %i.1 = phi i32 [ 0, %do.end ], [ %inc15, %do.body6 ]
  %sum.1 = phi i32 [ %add5, %do.end ], [ %add14, %do.body6 ]
  %2 = zext i32 %i.1 to i64
  %arrayidx8 = getelementptr inbounds i32, i32* %a, i64 %2
  %3 = load i32, i32* %arrayidx8, align 4, !tbaa !4
  %add9 = add nsw i32 %3, 1
  store i32 %add9, i32* %arrayidx8, align 4, !tbaa !4
  %add14 = add nsw i32 %sum.1, %add9
  %inc15 = add nuw nsw i32 %i.1, 1
  %cmp17 = icmp slt i32 %inc15, %n
  br i1 %cmp17, label %do.body6, label %do.end18, !llvm.loop !8

do.end18:                                         ; preds = %do.body6
  ret i32 %add14
}
*** IR Dump After Simplify the CFG ***
; Function Attrs: nounwind
define i32 @test(i32* %a, i32 %n) local_unnamed_addr #0 {
entry:
  br label %do.body, !llvm.loop !2

do.body:                                          ; preds = %do.body, %entry
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
  %sum.0 = phi i32 [ 0, %entry ], [ %add5, %do.body ]
  %0 = zext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds i32, i32* %a, i64 %0
  %1 = load i32, i32* %arrayidx, align 4, !tbaa !4
  %add = add nsw i32 %1, 1
  store i32 %add, i32* %arrayidx, align 4, !tbaa !4
  %add5 = add nsw i32 %sum.0, %add
  %inc = add nuw nsw i32 %i.0, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %do.body, label %do.body6, !llvm.loop !8

do.body6:                                         ; preds = %do.body, %do.body6
  %i.1 = phi i32 [ %inc15, %do.body6 ], [ 0, %do.body ]
  %sum.1 = phi i32 [ %add14, %do.body6 ], [ %add5, %do.body ]
  %2 = zext i32 %i.1 to i64
  %arrayidx8 = getelementptr inbounds i32, i32* %a, i64 %2
  %3 = load i32, i32* %arrayidx8, align 4, !tbaa !4
  %add9 = add nsw i32 %3, 1
  store i32 %add9, i32* %arrayidx8, align 4, !tbaa !4
  %add14 = add nsw i32 %sum.1, %add9
  %inc15 = add nuw nsw i32 %i.1, 1
  %cmp17 = icmp slt i32 %inc15, %n
  br i1 %cmp17, label %do.body6, label %do.end18, !llvm.loop !8

do.end18:                                         ; preds = %do.body6
  ret i32 %add14
}
...

So up until simplifyCFG I think things look pretty good with different loop-metadata for the two loops. But when simplifyCFG is tranforming

  br i1 %cmp, label %do.body, label %do.end, !llvm.loop !2

do.end:                                           ; preds = %do.body
  br label %do.body6, !llvm.loop !8

into

br i1 %cmp, label %do.body, label %do.body6, !llvm.loop !8

we get incorrect metadata for one branch target.

Is the fault that the metadata only should be put on the back edge, not the branch in the preheader?

I tried running

/clang -cc1 -O3 -funroll-loops -S -emit-llvm pragma-do-while-unroll.cpp -o - -mllvm -print-after-all

and I get this

...
!2 = distinct !{!2, !3}
!3 = !{!"llvm.loop.unroll.count", i32 3}
!4 = !{!5, !5, i64 0}
!5 = !{!"int", !6, i64 0}
!6 = !{!"omnipotent char", !7, i64 0}
!7 = !{!"Simple C/C++ TBAA"}
!8 = distinct !{!8, !9}
!9 = !{!"llvm.loop.unroll.count", i32 5}
*** IR Dump After Combine redundant instructions ***
; Function Attrs: nounwind
define i32 @test(i32* %a, i32 %n) local_unnamed_addr #0 {
entry:
  br label %do.body, !llvm.loop !2

do.body:                                          ; preds = %do.body, %entry
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
  %sum.0 = phi i32 [ 0, %entry ], [ %add5, %do.body ]
  %0 = zext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds i32, i32* %a, i64 %0
  %1 = load i32, i32* %arrayidx, align 4, !tbaa !4
  %add = add nsw i32 %1, 1
  store i32 %add, i32* %arrayidx, align 4, !tbaa !4
  %add5 = add nsw i32 %sum.0, %add
  %inc = add nuw nsw i32 %i.0, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %do.body, label %do.end, !llvm.loop !2

do.end:                                           ; preds = %do.body
  br label %do.body6, !llvm.loop !8

do.body6:                                         ; preds = %do.body6, %do.end
  %i.1 = phi i32 [ 0, %do.end ], [ %inc15, %do.body6 ]
  %sum.1 = phi i32 [ %add5, %do.end ], [ %add14, %do.body6 ]
  %2 = zext i32 %i.1 to i64
  %arrayidx8 = getelementptr inbounds i32, i32* %a, i64 %2
  %3 = load i32, i32* %arrayidx8, align 4, !tbaa !4
  %add9 = add nsw i32 %3, 1
  store i32 %add9, i32* %arrayidx8, align 4, !tbaa !4
  %add14 = add nsw i32 %sum.1, %add9
  %inc15 = add nuw nsw i32 %i.1, 1
  %cmp17 = icmp slt i32 %inc15, %n
  br i1 %cmp17, label %do.body6, label %do.end18, !llvm.loop !8

do.end18:                                         ; preds = %do.body6
  ret i32 %add14
}
*** IR Dump After Simplify the CFG ***
; Function Attrs: nounwind
define i32 @test(i32* %a, i32 %n) local_unnamed_addr #0 {
entry:
  br label %do.body, !llvm.loop !2

do.body:                                          ; preds = %do.body, %entry
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
  %sum.0 = phi i32 [ 0, %entry ], [ %add5, %do.body ]
  %0 = zext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds i32, i32* %a, i64 %0
  %1 = load i32, i32* %arrayidx, align 4, !tbaa !4
  %add = add nsw i32 %1, 1
  store i32 %add, i32* %arrayidx, align 4, !tbaa !4
  %add5 = add nsw i32 %sum.0, %add
  %inc = add nuw nsw i32 %i.0, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %do.body, label %do.body6, !llvm.loop !8

do.body6:                                         ; preds = %do.body, %do.body6
  %i.1 = phi i32 [ %inc15, %do.body6 ], [ 0, %do.body ]
  %sum.1 = phi i32 [ %add14, %do.body6 ], [ %add5, %do.body ]
  %2 = zext i32 %i.1 to i64
  %arrayidx8 = getelementptr inbounds i32, i32* %a, i64 %2
  %3 = load i32, i32* %arrayidx8, align 4, !tbaa !4
  %add9 = add nsw i32 %3, 1
  store i32 %add9, i32* %arrayidx8, align 4, !tbaa !4
  %add14 = add nsw i32 %sum.1, %add9
  %inc15 = add nuw nsw i32 %i.1, 1
  %cmp17 = icmp slt i32 %inc15, %n
  br i1 %cmp17, label %do.body6, label %do.end18, !llvm.loop !8

do.end18:                                         ; preds = %do.body6
  ret i32 %add14
}
...

So up until simplifyCFG I think things look pretty good with different loop-metadata for the two loops. But when simplifyCFG is tranforming

  br i1 %cmp, label %do.body, label %do.end, !llvm.loop !2

do.end:                                           ; preds = %do.body
  br label %do.body6, !llvm.loop !8

into

br i1 %cmp, label %do.body, label %do.body6, !llvm.loop !8

we get incorrect metadata for one branch target.

Is the fault that the metadata only should be put on the back edge, not the branch in the preheader?

Yea. Our past thinking has been that any backedge in the loop is valid. The metadata shouldn't end up other places, although it's benign unless those other places are (or may later become) a backedge for some different loop.

bjope added a comment.Jul 3 2018, 4:17 AM

Is the fault that the metadata only should be put on the back edge, not the branch in the preheader?

Yea. Our past thinking has been that any backedge in the loop is valid. The metadata shouldn't end up other places, although it's benign unless those other places are (or may later become) a backedge for some different loop.

I'm no expert on clang. The patch seems to fix the problem if the goal is to only add the loop-metadata on the backedge , but I'll leave it to someone else to approve it.

I'm a little bit concerned about opt not detecting this kind of problems though.
Would it be possible for some verifier to detect if we have loop metadata on some branch that aren't in the latch block?
And/or should the optimization that "merges" two branches with different loop metadata), be smarter about which loop metadata to keep? Or maybe we should be defensive and discard loop metadata on the merged branch instruction?

I encountered the issue while working with the unroller and found that it was not following the pragma info, and traced it back to the issue with metadata.
As far as I understood, for for-loops and while-loops, we add the metadata only to the loop back-edge. So it would make sense to keep them consistent.
I'm not an expert in clang, and do not know how we can detect such problems.

Is the fault that the metadata only should be put on the back edge, not the branch in the preheader?

Yea. Our past thinking has been that any backedge in the loop is valid. The metadata shouldn't end up other places, although it's benign unless those other places are (or may later become) a backedge for some different loop.

I'm no expert on clang. The patch seems to fix the problem if the goal is to only add the loop-metadata on the backedge , but I'll leave it to someone else to approve it.

I'm a little bit concerned about opt not detecting this kind of problems though.
Would it be possible for some verifier to detect if we have loop metadata on some branch that aren't in the latch block?
And/or should the optimization that "merges" two branches with different loop metadata), be smarter about which loop metadata to keep? Or maybe we should be defensive and discard loop metadata on the merged branch instruction?

We could add this to the verifier. We have transformation passes that aren't explicitly loop aware, and so the question is would those passes now need to do something special to remain valid in the presence of this metadata. As a general rule, metadata in invalid locations is simply ignored. It clearly is a problem, if metadata is moving from one valid location to an unrelated valid location.

I encountered the issue while working with the unroller and found that it was not following the pragma info, and traced it back to the issue with metadata.
As far as I understood, for for-loops and while-loops, we add the metadata only to the loop back-edge. So it would make sense to keep them consistent.
I'm not an expert in clang, and do not know how we can detect such problems.

The code change is likely okay. We need to have the tests updated. With rare exception, we don't have end-to-end tests in Clang. We test Clang's CodeGen independently, and so Clang's CodeGen tests shouldn't run the optimizer. Please write tests that directly check the expected output of Clang (without running the optimizer). If you look at other tests in the CodeGen directory, you should see what I mean. If you have any questions, please feel free to ask. Thanks!

deepak2427 updated this revision to Diff 154237.Jul 5 2018, 7:41 AM

Update the tests.

I have updated the test to not run the optimizer. The test I had added previously for checking if the unroller is respecting the pragma is useful I think. Not sure where that can be added though.
I guess it's independent of this patch anyway. If the patch and test is okay, will update bugzilla as well.

bjope added inline comments.Jul 5 2018, 8:12 AM
test/CodeGen/pragma-do-while.cpp
1 ↗(On Diff #154237)

I think that we can simplify it to use one loop here (as a regression test that we only put the label on one branch).

I also think that the important check is to verify that the llvm.loop is put on the branch in the "do.cond" block.

So may I suggest this slightly modified test case:

// RUN: %clang_cc1 -emit-llvm %s -o - | FileCheck %s

// We expect to get a loop structure like this:
//    do.body:                                       ; preds = %do.cond, ...
//      ...
//      br label %do.cond
//    do.cond:                                       ; preds = %do.body
//      ...
//      br i1 %cmp, label %do.body, label %do.end
//    do.end:                                        ; preds = %do.cond
//      ...
//
// Verify that the loop metadata only is put on the backedge.
//
// CHECK-NOT: llvm.loop
// CHECK-LABEL: do.cond:
// CHECK: br {{.*}}, label %do.body, label %do.end, !llvm.loop ![[LMD1:[0-9]+]]
// CHECK-LABEL: do.end:
// CHECK-NOT: llvm.loop
// CHECK: ![[LMD1]] = distinct !{![[LMD1]], ![[LMD2:[0-9]+]]}
// CHECK: ![[LMD2]] = !{!"llvm.loop.unroll.count", i32 4}

int test(int a[], int n) {
  int i = 0;
  int sum = 0;

#pragma unroll 4
  do
  {
    a[i] = a[i] + 1;
    sum = sum + a[i];
    i++;
  } while (i < n);

  return sum;
}

Yeah, you're right. Only one loop has to be checked in this case. I'll update the test as per your suggestion. Thank you!

deepak2427 updated this revision to Diff 154244.Jul 5 2018, 8:24 AM

Updated with test from Bjorn Pettersson which is much more accurate and clearer. Thanks!

bjope added a comment.Jul 5 2018, 10:14 AM

I have updated the test to not run the optimizer. The test I had added previously for checking if the unroller is respecting the pragma is useful I think. Not sure where that can be added though.
I guess it's independent of this patch anyway. If the patch and test is okay, will update bugzilla as well.

My idea was to forbid the IR that caused problems for the unroller (and other optimization passes). Then we do not need to fixup various passes to handle "misplaced" llvm.loop annotations correctly. And neither do we need to implement test cases to verify that the "misplaced" annotations are handled correctly for lots of passes.

I played around a little bit with teaching the IR Verifier in opt to check that !llvm.loop only is put in loop latches.
Something like this might do the trick, when added to Verifier::visitInstruction in lib/IR/Verifier.cpp:

#include "llvm/Analysis/LoopInfo.h"

if (I.getMetadata(LLVMContext::MD_loop)) {
  // FIXME: Is SwitchInst also allowed?
  Assert(isa<BranchInst>(I), "llvm.loop only expected on branches", &I);
  LoopInfo LI;
  LI.analyze(DT);
  Loop* L = LI.getLoopFor(BB);
  Assert(L, "llvm.loop not in a loop", &I);
  Assert(L->isLoopLatch(BB), "llvm.loop not in a latch", &I);
}

Note that the above just was a simple test. We do not want to reinitialize LoopInfo for each found occurence of !llvm.loop.
Another problem is that the Verifier is used by lots of tools that currently do not link with LoopInfo from the Analysis code module.

So maybe this should go into the LoopVerifier pass instead (although I'm not sure if that is commonly used).
Or is there some other place where we can do it? Or some other way to do a similar check (without using LoopInfo)?

I can bring this up on llvm-dev, to get some more feedback on the idea of having a verifier for this, and how to do it properly.

bjope accepted this revision.Jul 10 2018, 9:24 AM

LGTM!

This revision is now accepted and ready to land.Jul 10 2018, 9:24 AM
deepak2427 marked an inline comment as done.Jul 10 2018, 9:46 AM

@Bjorn, Thanks for reviewing and accepting the patch.

Could you please advise on the next steps?
Would someone else commit this on my behalf or should I request for commit access?

Thanks,
Deepak Panickal

@Bjorn, Thanks for reviewing and accepting the patch.

Could you please advise on the next steps?
Would someone else commit this on my behalf or should I request for commit access?

Thanks,
Deepak Panickal

I can commit this for you, and then you should be able to close the bugzilla ticket. Include a reference to the SVN id that will be the result of my commit when closing the bugzilla ticket.

This revision was automatically updated to reflect the committed changes.