With this patch, we allow interleaved accesses in loops containing predicated blocks. If an interleaved access is contained in a predicated block, it's only allowed to form an interleaved group with accesses in the same block.
Details
Diff Detail
Event Timeline
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5175–5176 | Does the Comment need updating? |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5236 | With that change, analyzeInterleaving won't see all the loads/stores (as the predicated ones get skipped). Would this have any impact on the correctness of the analysis (it might be ok, but it's worth thinking about it). |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5175–5176 | I think the comment should probably stay the same. We still can't handle predicated accesses. The current patch just prevents us from giving up on the non-predicated accesses in the loop, if the loop contains a predicated block. | |
5236 | Hey Silviu, thanks for commenting! Right, this prevents accesses from being placed into an interleaved group if they are in a predicated block, which seems like the right thing to do to me. This is what the test case demonstrates. Currently, no accesses are recognized as interleaved if the loop contains a single predicated block. If an access is non-consecutive and not in an interleaved group, it should be scalarized (unless gather/scatter is supported). |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5175–5176 | Thanks for the clarification, Matt. |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5236 | The idea generally makes sense to me. However we also have some logic here to handle the Write-After-Write dependences (in fact more than just these) which seems to rely on seeing all accesses. For example, we seen to be able to get the vectorizer confused by doing something like (i was able to modify your example to do this):
Here is my modification of the example: define void @load_gap_with_pred_store(%pair *%p, i64 %x, i64 %n) { entry: br label %for.body for.body: %i = phi i64 [ %i.next, %if.merge ], [ 0, %entry ] %f1 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1 %f2 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0 %r0 = load i64, i64* %f1, align 8 %r1 = and i64 %r0, %x %r2 = icmp eq i64 %r1, %x %r4 = add i64 %r1, 1 br i1 %r2, label %if.then, label %if.merge if.then: store i64 0, i64* %f2, align 8 br label %if.merge if.merge: %r3 = load i64, i64* %f2, align 8 store i64 %r3, i64 *%f1, align 8 // Here we end up storing an element from the wide load, instead of having to reload %i.next = add nuw nsw i64 %i, 1 %cond = icmp slt i64 %i.next, %n br i1 %cond, label %for.body, label %for.end for.end: ret void } |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5236 | Thanks for the counterexample, Silviu. I see what you're getting at. Let me think over this change a little more carefully. We will definitely need to do a little more work here. |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5236 | Hi Silviu, I think there may be a bug in the interleaved access analysis, independent of this patch, that your test case led me to. It seems to be related to the same issue of reusing a loaded value instead of reloading due to an intervening store. I've created PR27626 to track the issue. |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5236 | Thanks for digging further! Yes, this appears to also be a problem. |
Rebased on top of D19984
With this revision, we collect all accesses in the loop, even those in the predicated blocks, so we can account for any dependences (D19984). We prevent forming interleaved groups with predicated accesses unless the accesses are in the same block. I've also added Silviu's code example as a new test case.
Fixed typo in test case comment.
Silviu/Adam,
Do you have any additional comments for this change? It should be fairly straightforward now that the dependences and iteration order have been fixed.
Started with the tests, a question below.
test/Transforms/LoopVectorize/interleaved-accesses-pred-stores.ll | ||
---|---|---|
37 | Is %p.0 unused here? |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5348 | Could we also have a test where all the accesses in a group are predicated and in the same block? In this case would all the predicated accesses be scalarized as well? (there might be some issues with out of bounds accesses from the wide load/store and the scalar epilogue if that's not the case) |
Wow, thanks for the quick response! Some comments inline.
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5348 | Yes, I wanted a test case like that, but I haven't been able to produce one that the vectorizer likes. It complains about being unable to if-convert the block I believe. I can look into this further, but currently I think we are limited to one store per predicated block. In which case, yes, we would always scalarize. | |
test/Transforms/LoopVectorize/interleaved-accesses-pred-stores.ll | ||
37 | Yes, thanks for noticing! I'll remove it. |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5348 | Ok, makes sense. Maybe we can remove the BlockA != BlockB condition for now and handle this case in a separate change when we can if-convert the block? |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5348 | That's a great idea. Thanks! I'll update the patch. |
Does the Comment need updating?