This is an archive of the discontinued LLVM Phabricator instance.

[LV] Allow interleaved accesses in loops with predicated blocks
ClosedPublic

Authored by mssimpso on Apr 28 2016, 2:54 PM.

Details

Summary

With this patch, we allow interleaved accesses in loops containing predicated blocks. If an interleaved access is contained in a predicated block, it's only allowed to form an interleaved group with accesses in the same block.

Diff Detail

Event Timeline

mssimpso updated this revision to Diff 55496.Apr 28 2016, 2:54 PM
mssimpso retitled this revision from to [LV] Only bail on interleaved accesses in predicated blocks.
mssimpso updated this object.
mssimpso added reviewers: sbaranga, jmolloy, anemet.
mssimpso added subscribers: mcrosier, llvm-commits.
mssimpso updated this revision to Diff 55823.May 2 2016, 8:28 AM

Simplified test case

mcrosier added inline comments.May 2 2016, 7:57 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
5091–5092

Does the Comment need updating?

sbaranga added inline comments.May 3 2016, 3:22 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
5139

With that change, analyzeInterleaving won't see all the loads/stores (as the predicated ones get skipped). Would this have any impact on the correctness of the analysis (it might be ok, but it's worth thinking about it).

mssimpso added inline comments.May 3 2016, 7:40 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
5091–5092

I think the comment should probably stay the same. We still can't handle predicated accesses. The current patch just prevents us from giving up on the non-predicated accesses in the loop, if the loop contains a predicated block.

5139

Hey Silviu, thanks for commenting! Right, this prevents accesses from being placed into an interleaved group if they are in a predicated block, which seems like the right thing to do to me. This is what the test case demonstrates. Currently, no accesses are recognized as interleaved if the loop contains a single predicated block. If an access is non-consecutive and not in an interleaved group, it should be scalarized (unless gather/scatter is supported).

mcrosier added inline comments.May 3 2016, 8:30 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
5091–5092

Thanks for the clarification, Matt.

sbaranga added inline comments.May 3 2016, 9:35 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
5139

The idea generally makes sense to me.

However we also have some logic here to handle the Write-After-Write dependences (in fact more than just these) which seems to rely on seeing all accesses. For example, we seen to be able to get the vectorizer confused by doing something like (i was able to modify your example to do this):

  1. load a[i]
  2. load a[i+1]
  3. conditional: store a[i]
  4. load a[i] -> it sees this as the first load, because we've ignored 3

Here is my modification of the example:

define void @load_gap_with_pred_store(%pair *%p, i64 %x, i64 %n) {

entry:
  br label %for.body

for.body:
  %i  = phi i64 [ %i.next, %if.merge ], [ 0, %entry ]
  %f1 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1
  %f2 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0

  %r0 = load i64, i64* %f1, align 8
  %r1 = and i64 %r0, %x
  %r2 = icmp eq i64 %r1, %x
  %r4 = add i64 %r1, 1

  br i1 %r2, label %if.then, label %if.merge

if.then:
  store i64 0, i64* %f2, align 8
  br label %if.merge

if.merge:
  %r3 = load i64, i64* %f2, align 8

  store i64 %r3, i64 *%f1, align 8 // Here we end up storing an element from the wide load, instead of having to reload

  %i.next = add nuw nsw i64 %i, 1
  %cond = icmp slt i64 %i.next, %n
  br i1 %cond, label %for.body, label %for.end

for.end:
  ret void
}
mssimpso added inline comments.May 3 2016, 10:14 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
5139

Thanks for the counterexample, Silviu. I see what you're getting at. Let me think over this change a little more carefully. We will definitely need to do a little more work here.

mssimpso added inline comments.May 3 2016, 2:48 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
5139

Hi Silviu,

I think there may be a bug in the interleaved access analysis, independent of this patch, that your test case led me to. It seems to be related to the same issue of reusing a loaded value instead of reloading due to an intervening store. I've created PR27626 to track the issue.

sbaranga added inline comments.May 4 2016, 4:35 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
5139

Thanks for digging further! Yes, this appears to also be a problem.

mssimpso updated this revision to Diff 56342.May 5 2016, 2:00 PM

Rebased on top of D19984

With this revision, we collect all accesses in the loop, even those in the predicated blocks, so we can account for any dependences (D19984). We prevent forming interleaved groups with predicated accesses unless the accesses are in the same block. I've also added Silviu's code example as a new test case.

mssimpso updated this revision to Diff 58520.May 25 2016, 3:08 PM

Rebased on top of D19984.

mssimpso updated this revision to Diff 61804.Jun 24 2016, 9:31 AM
mssimpso retitled this revision from [LV] Only bail on interleaved accesses in predicated blocks to [LV] Allow interleaved accesses in loops with predicated blocks.
mssimpso updated this object.

Rebased.

With rL273687, I think we can now resume this review.

mssimpso updated this object.Jun 24 2016, 12:13 PM
mssimpso updated this revision to Diff 62930.Jul 6 2016, 11:55 AM

Slightly refactored code and added a new test case

mssimpso updated this revision to Diff 63252.Jul 8 2016, 10:15 AM

Fixed typo in test case comment.

Silviu/Adam,

Do you have any additional comments for this change? It should be fairly straightforward now that the dependences and iteration order have been fixed.

anemet edited edge metadata.Jul 8 2016, 10:25 AM

Started with the tests, a question below.

test/Transforms/LoopVectorize/interleaved-accesses-pred-stores.ll
37

Is %p.0 unused here?

sbaranga added inline comments.Jul 8 2016, 10:47 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
5223

Could we also have a test where all the accesses in a group are predicated and in the same block? In this case would all the predicated accesses be scalarized as well? (there might be some issues with out of bounds accesses from the wide load/store and the scalar epilogue if that's not the case)

Wow, thanks for the quick response! Some comments inline.

lib/Transforms/Vectorize/LoopVectorize.cpp
5223

Yes, I wanted a test case like that, but I haven't been able to produce one that the vectorizer likes. It complains about being unable to if-convert the block I believe. I can look into this further, but currently I think we are limited to one store per predicated block. In which case, yes, we would always scalarize.

test/Transforms/LoopVectorize/interleaved-accesses-pred-stores.ll
37

Yes, thanks for noticing! I'll remove it.

sbaranga added inline comments.Jul 11 2016, 2:18 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
5223

Ok, makes sense.

Maybe we can remove the BlockA != BlockB condition for now and handle this case in a separate change when we can if-convert the block?

mssimpso added inline comments.Jul 11 2016, 2:47 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
5223

That's a great idea. Thanks! I'll update the patch.

mssimpso updated this revision to Diff 63591.Jul 11 2016, 3:04 PM
mssimpso edited edge metadata.

Addressed comments from Adam and Silviu. Thanks!

mssimpso marked 2 inline comments as done.Jul 11 2016, 3:05 PM
This revision was automatically updated to reflect the committed changes.