This is an archive of the discontinued LLVM Phabricator instance.

A fix for loop vectorizer with handling loops with volatile induction variables
ClosedPublic

Authored by hsung on Sep 22 2015, 8:16 PM.

Details

Summary

Hi,

I am proposing a patch to prevent passes from destroying canonical loop structure, especially for nested loop. When eliminating or merging empty/small basic blocks, LLVM checks the existence of non-trivial PHI nodes to recognize potential loops and choose not to eliminate the block. However, the current PHI node based algorithm fails to recognize a loop if the loop’s exit condition is evaluated only with volatile values hence no PHI nodes in the loop header. Especially when such a loop is an outer loop of a nested loop, the loop is collapsed into a single loop which prevent later optimizations from being applied (e.g., transforming nested loops into simplified form in LoopSimplify and loop vectorization).

The patch augments the existing PHI node-based check by adding a pre-test if the BB actually belongs to a set of loop headers and not eliminating it if yes. Jump Threading already builds a set of loop headers by identifying backedges, so we can simply reuse the set. For SimplifyCFG, such a set of loop headers can be created per function in iterativelySimplifyCFG().

The following example loops are collapsed by JumpThreading or SimplifyCFG and the LLVM loop vectorizer cannot vectorize the resulting loop. With the patch, they maintain loop structure and get vectorized.

(1) Loop with a volatile iteration variable
void foo2(float * restrict x, float * restrict y, float * restrict z) {

for (volatile int i = 0; i < 1000; ++i) {
  for (int j = 0; j < 1600; ++j) {
    x[j] = y[j] + z[j];
  }
}

}

(2) While loop with volatile arguments for exit function
int done(float *x, float *y, float *z);
void foo3(float * restrict x, float * restrict y, float * restrict z) {

while (!done(x, y, z)) {
  for (int j = 0; j < 1600; ++j) {
    x[j] = y[j] + z[j];
  }
}

}

The patch fails to pass four tests for LoopUnswitch and SimplifyCFG. Two failed tests for LoopUnswitch are simply due to BB name differences and can be trivially resolved. Two failed tests for SimplifyCFG include two nested loops with an empty outer loop header which were supposed to be merged with the inner loop but not any more after the patch. I propose changes to these tests too. The empty blocks will get eliminated when SimplifyCFG is executed again later in the back-end. Therefore, extra branches in test IR's will not affect final codes.

Diff Detail

Repository
rL LLVM

Event Timeline

hsung updated this revision to Diff 35457.Sep 22 2015, 8:16 PM
hsung retitled this revision from to A fix for loop vectorizer with handling loops with volatile induction variables.
hsung updated this object.
hsung set the repository for this revision to rL LLVM.
hsung changed the visibility from "Public (No Login Required)" to "All Users".
hsung added a subscriber: llvm-commits.
hsung updated this revision to Diff 35875.Sep 28 2015, 8:30 AM
hsung edited edge metadata.
hsung removed rL LLVM as the repository for this revision.
hsung changed the visibility from "All Users" to "Public (No Login Required)".

Uploading a full diff.

hsung set the repository for this revision to rL LLVM.Sep 28 2015, 8:31 AM
hsung added a comment.Oct 6 2015, 7:47 AM

A gentle reminder that full diff is available and ready for review.

Hi,

A reminder that a full diff is uploaded and the patch is ready for review. Please let me know if you need any further information or have questions.

hsung added a comment.Oct 20 2015, 8:21 AM

Hi all,

A gentle reminder for feedback.

hfinkel added inline comments.Oct 27 2015, 6:06 AM
include/llvm/Transforms/Utils/Local.h
139

Please update the comment to explain the new parameter. As is, it is not even clear whether this is an optional input or output.

141–142

This line is too long.

lib/Transforms/Scalar/JumpThreading.cpp
211

Indentation is odd here (no tabs). Also, we should explain why we should not eliminate the loop header. Saying something like, "because eliminating a loop header might later prevent LoopSimplify from transforming nested loops into simplified form" is probably sufficient.

lib/Transforms/Utils/SimplifyCFG.cpp
137–140

Line too long.

4629

You're changing the behavior here when LoopHeaders is not passed in. Is that what you intend? If you don't intend that, then I assume you want:

(!LoopHeaders || !LoopHeaders->count(BB))

if you do intend this, then this must be carefully documented and explained in the interface comments.

4891–4892

Line is too long.

hsung updated this revision to Diff 39721.Nov 9 2015, 11:16 AM

Hi all,

Sorry for a delay in updating the patch. The updated patch includes aligned lines, more comments, and an update to the if condition in SimplifyCFG.cpp. The patch passed the regression tests.

hsung added a comment.Dec 1 2015, 7:58 AM

Hi all,

This is a gentle reminder that the patch is updated and ready for feedback. The if condition in SimplifyCFG.cpp is modified to not change the behavior when a set of loop headers is not passed. Thanks!

hfinkel added inline comments.Dec 10 2015, 6:16 PM
include/llvm/Transforms/Utils/Local.h
139

Please note in the comment that LoopHeaders is an optional input parameter, providing the set of loop header that SimplyCFG should not eliminate.

142

You can use SmallPtrSetImpl<BasicBlock *> here, instead of SmallPtrSet<BasicBlock *, 16>, to avoid encoding a particular size on the interface boundary.

lib/Transforms/Utils/SimplifyCFG.cpp
138

Corresponding SmallPtrSetImpl change here.

4624

loops with volatile induction variables -> nested loops

(the problem is much more general than loops with volatile induction variables)

4892

Corresponding SmallPtrSetImpl change.

chandlerc edited edge metadata.Dec 11 2015, 1:56 AM

Just a few drive-by comments. With the current (new, and different from the description) direction, I think Hal has the rest of the review covered.

include/llvm/Transforms/Utils/Local.h
140–142

Try using clang-format to re-format the code.

lib/Transforms/Scalar/JumpThreading.cpp
211

Again, clang-format would be a useful tool to ensure consistent formatting.

lib/Transforms/Utils/SimplifyCFG.cpp
4624

Note that the entire patch description really needs to be updated to correctly indicate that this is about preventing passes from destroying important loop structure, especially of nested loops.

hsung updated this object.Jan 5 2016, 7:52 AM
hsung edited edge metadata.
hsung updated this revision to Diff 44001.Jan 5 2016, 7:55 AM
hsung marked 12 inline comments as done.

Hi all,

Thanks for the feedback. I updated the patch description and uploaded a updated patch.

hsung updated this object.Jan 5 2016, 8:53 AM
hsung added a comment.Jan 28 2016, 5:04 AM

Hi all,

The patch is updated per reviewers' request and ready for further feedback.

As noted, please add some comments to the header-file function description, otherwise looks good.

include/llvm/Transforms/Utils/Local.h
139

I don't see any changes to the comments here describing the LoopHeaders parameter.

lib/Transforms/Utils/SimplifyCFG.cpp
4633

Don't add an extra new line here.

hsung updated this revision to Diff 47357.Feb 9 2016, 1:35 PM
hsung marked an inline comment as done.

Hi all,

Fixed comments and blank lines as requested.

hsung updated this revision to Diff 47358.Feb 9 2016, 1:38 PM

Minor fix for comments in SimplifyCFG.cpp

Hi all,

This is a gentle reminder that the patch is ready for further review or final approval.

Thanks,
Hyojin

hfinkel accepted this revision.Mar 21 2016, 11:30 AM
hfinkel edited edge metadata.

I apologize; I missed that you had updated this. LGTM.

include/llvm/Transforms/Utils/Local.h
139

set of loop header -> set of loop headers

This revision is now accepted and ready to land.Mar 21 2016, 11:30 AM
hsung closed this revision.Mar 29 2016, 11:28 AM

Committed in r264697.