This is an archive of the discontinued LLVM Phabricator instance.

Refine the set of UniformAfterVectorization instructions
ClosedPublic

Authored by wmi on Jun 27 2016, 10:59 AM.

Details

Summary

For %0 in the loop below, it will have both a scalar version and a vector version after vectorization, and it shouldn't be added into uniform set.

define void @test(i32* %a, i64 %n) {
entry:

br label %for.body

for.body:

%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
%0 = trunc i64 %i to i32
%1 = getelementptr inbounds i32, i32* %a, i32 %0    ====> %0 used in uniform instruction.
store i32 %0, i32* %1, align 4                                      ====> %0 used in non-uniform instruction.
%i.next = add nuw nsw i64 %i, 1
%cond = icmp eq i64 %i.next, %n
br i1 %cond, label %for.end, label %for.body

for.end:

ret void

}

Before the patch all dependencies of conditional branch or consecutive ptrs will be added into uniform set. The patch changes that to: except seed uniform instructions (conditional branch and consecutive ptr instructions), dependencies to be added into uniform set should only be used by existing uniform instructions or intructions outside of current loop.

This patch uses parts in http://reviews.llvm.org/D21631 from mkuper.

Diff Detail

Repository
rL LLVM

Event Timeline

wmi updated this revision to Diff 61979.Jun 27 2016, 10:59 AM
wmi retitled this revision from to Refine the set of UniformAfterVectorization instructions.
wmi updated this object.
wmi added reviewers: mkuper, hfinkel, jmolloy.
wmi set the repository for this revision to rL LLVM.
wmi added subscribers: llvm-commits, davidxl, mssimpso, test.
mkuper added inline comments.Jun 27 2016, 5:10 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
4947

Did you mean "PHIs will not be added"? Also, can you explain here why they're not added?
(I mean, this doesn't even apply to all PHIs, IIRC? Some PHIs will be added, it depends on whether the branch depends on the PHI itself, or on UpdateV, right?)

4962

This looks a bit weird. If the PHI has a non-uniform use, but the update doesn't, then do we expect to have a vector version of the update?
More generally, it's a bit odd that we look at exactly one instruction backwards past the Phi. But this looks like it's a net improvement over how it worked before.

test/Transforms/LoopVectorize/X86/uniform-phi.ll
31

You probably want a CHECK-NOT for %i being found uniform, right?

wmi added inline comments.Jun 27 2016, 5:38 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
4947

Yes, all PHI will not be added in the above loop. That is because PHI must exist in a dep loop, when we expand Worklist in topological order, all the nodes inside dep loop are impossible to be added into Worklist.

That is why we want to handle induction PHI specially below.

4962

If the PHI has a non-uniform use, but the update doesn't, neither of them will be added into uniform set because we will have vector versions for both pre-increment-induction var (%vec.ind) and post-increment-induction var (%step.add).

I only look at one instruction backwords past PHI because I think it covers most of the cases.

for.body:

%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
 ......
%indvars.iv.next = add nuw nsw i64 %indvars.iv, %step
br i1 %exitcond, label %for.end, label %for.body

Like the loop here, all the recursive dependents of induction PHI only includes %indvars.iv.next, %indvars.iv and %step. Because it is a vectorizable loop, %step is a loop invariant and should be defined out of loop. Then we only care whether %indvars.iv and %indvars.iv.next (i.e., PN and UpdateV) are uniform.

test/Transforms/LoopVectorize/X86/uniform-phi.ll
31

You are right. I will add it.

wmi updated this revision to Diff 62290.Jun 29 2016, 2:55 PM

Update the testcase according to mkuper's comments.

wmi updated this revision to Diff 62323.Jun 29 2016, 5:52 PM

Add more comments to explain why PHI should be processed separately in collectLoopUniform.

mkuper edited edge metadata.Jun 29 2016, 6:20 PM

The logic LGTM, now that I understand it. :-)
A few nits about the code inline.

lib/Transforms/Vectorize/LoopVectorize.cpp
4900–4913

"inside current" -> "inside the current"

4901

"scope where" -> "the scope which"

4904

Why not just "return (!I || !TheLoop->contains(I))"?
But I'm ok with this if you think it's clearer.

4909

Can this be a SetVector<Instruction *>?
I think all of your inserts are protected by isOutOfScope(), so you can never have a non-instruction Value there now, right?

4917

Can you use a range for here, iterating over TheLoop->blocks()?

4920

Range for here too?

4966–4969

Can you do something like Uniforms.insert(Worklist.begin(), Worklist.end()) instead?
We can move the debug print to where things are added to Worklist - I think it may be more useful there regardless, since it will then print the uniforms in the order that they are found.

wmi added inline comments.Jun 30 2016, 9:01 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
4900–4913

Fixed.

4901

Fixed.

4904

What You suggested is better. Fixed.

4909

It can be SetVector<Instruction *>, but then we need to use cast<instruction> before adding values into Worklist or checking whether users are already in Worklist.

4917

Fixed.

4920

Fixed.

4966–4969

If I use Uniforms.insert(Worklist.begin(), Worklist.end()), The elem type of Worklist should be "Instruction *" instead of "Value *". I feel that will increase the number of cast<Instruction> needed. I am not sure which way is better. Either way is fine with me. I updated the patch according to suggestion here.

wmi updated this revision to Diff 62375.Jun 30 2016, 9:03 AM
wmi edited edge metadata.

Address mkuper's comments.

mkuper accepted this revision.Jun 30 2016, 9:15 AM
mkuper edited edge metadata.

LGTM.
(If you strongly prefer the version where Worklist is a SetVector<Value *>, I'm ok with that too.)

This revision is now accepted and ready to land.Jun 30 2016, 9:15 AM
This revision was automatically updated to reflect the committed changes.