This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
interleaved-accesses.ll

Differential D19984

[LV] Preserve order of dependences in interleaved accesses analysis
ClosedPublic

Authored by mssimpso on May 5 2016, 10:48 AM.

Download Raw Diff

Details

Reviewers

anemet
hfinkel
sbaranga

Commits

rGe794678404ab: [LV] Preserve order of dependences in interleaved accesses analysis
rL273687: [LV] Preserve order of dependences in interleaved accesses analysis

Summary

The interleaved access analysis currently assumes that the inserted run-time pointer aliasing checks ensure the absence of dependences that would prevent its instruction reordering. However, this is not the case.

Issues can arise from how code generation is performed for interleaved groups. For a load group, all loads in the group are essentially moved to the location of the first load in program order, and for a store group, all stores in the group are moved to the location of the last store. For groups having members involved in a dependence relation with any other instruction in the loop, this reordering can violate the dependence.

This patch teaches the interleaved access analysis how to avoid breaking such dependences, and should fix PR27626.

An assumption of the original analysis was that the accesses had been collected in "program order". The analysis was then simplified by visiting the accesses bottom-up. However, this ordering was never guaranteed for anything other than single basic block loops. Thus, this patch also enforces the desired ordering.

Diff Detail

Repository: rL LLVM

Event Timeline

mssimpso updated this revision to Diff 56312.May 5 2016, 10:48 AM

mssimpso retitled this revision from to [LV] Handle RAW dependences in interleaved access analysis.

mssimpso updated this object.

mssimpso added reviewers: sbaranga, anemet, hfinkel.

mssimpso added subscribers: mcrosier, llvm-commits.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptMay 5 2016, 10:48 AM

mssimpso mentioned this in D19694: [LV] Allow interleaved accesses in loops with predicated blocks.May 5 2016, 2:00 PM

mssimpso added a child revision: D19694: [LV] Allow interleaved accesses in loops with predicated blocks.

I think non-zero dependences would have already been rejected by LAA. Would this this be the reason why it is correct to only look at the zero distance ones?

Thanks,
Silviu

lib/Transforms/Vectorize/LoopVectorize.cpp
932 ↗	(On Diff #56312)	I guess we can check this with DT because the loads/stores are not predicated? (same in a bunch of other places).
5162 ↗	(On Diff #56312)	Wouldn't we need to add B to LoopIndependentRAWStores even if we add it to StoresToRemove?
5180 ↗	(On Diff #56312)	This sentence seems to be unfinished?

Hi Silviu, thanks for the comments.

I think non-zero dependences would have already been rejected by LAA. Would this this be the reason why it is correct to only look at the zero distance ones?

Yes, that should be the case!

lib/Transforms/Vectorize/LoopVectorize.cpp
932 ↗	(On Diff #56312)	I intended the dominance check to work for the predicated accesses as well. The check says that if the read does not dominate the write, then we have to conservatively assume the write may happen first, leading to a read-after-write dependence.
5162 ↗	(On Diff #56312)	StoresToRemove holds stores whose groups will definitely be removed. LoopIndependentRAWStores holds stores that we don't yet have enough information about to determine if they need to be removed or not. In this case, if the current store (B) is already in a group, it means that it will be re-ordered by sinking it to the insert location of the group, violating the dependence. If it's not yet in a group (and thus, won't be re-ordered), we don't yet know that the load (A) won't be hoisted, which would also violate the dependence. Once we know that another load has been added to A's group, we know that A will be re-ordered. When this happens, we move all the stores in LoopIndependentRAWStores to StoresToRemove, to mark them for definite removal.
5180 ↗	(On Diff #56312)	Thanks for catching that. I'll submit an update

Updated comments.

Overall I think this looks good, but it would be better if someone else would also have a look before committing.

Thanks,
Silviu

lib/Transforms/Vectorize/LoopVectorize.cpp
932 ↗	(On Diff #56589)	OK, this makes sense.

Thanks very much Silviu! I'll wait for Adam or Hal to provide additional feedback.

In D19984#424731, @mssimpso wrote:

Hi Silviu, thanks for the comments.

I think non-zero dependences would have already been rejected by LAA. Would this this be the reason why it is correct to only look at the zero distance ones?

Yes, that should be the case!

Why are non-zero forward deps rejected by LAA? Because of the HW store-to-load forwarding case? I think that that is only a performance consideration and it's only on conditionally.

I think that we should probably take the time and review the soundness of these code motions with respect to interleaved access vectorization {RAW, WAR, WAW} x {loop-independent, loop-carried}. I've only looked at the LAA aspects of this feature so far but I am a bit worried about this code now.

Matt, do you think you can do this?

I am also thinking if we should disable this feature in the meantime?!

Hi Adam,

Why are non-zero forward deps rejected by LAA? Because of the HW store-to-load forwarding case? I think that that is only a performance consideration and it's only on conditionally.

Siviu was referring to a comment I had made in the source. But I think the idea was that LAA had already ensured the absence of the dependences that would have prevented vectorization. So the interleaved access analysis didn't need to worry about them. Regarding the store-to-load forwarding case, yes, that is the assumption the original analysis made, which was incorrect. The current patch attempts to fix that.

I think that we should probably take the time and review the soundness of these code motions with respect to interleaved access vectorization {RAW, WAR, WAW} x {loop-independent, loop-carried}. I've only looked at the LAA aspects of this feature so far but I am a bit worried about this code now.

Matt, do you think you can do this?

I'm happy to give the analysis a very careful second look. I'll do that today and report back.

I am also thinking if we should disable this feature in the meantime?!

We should definitely disable interleaved access vectorization if we can't correct the bug (PR27626) before release time. Doing so would have performance implications for ARM/AArch64, so I'm hoping we can fix it before then. I wouldn't be strongly opposed to disabling it in the meantime. But to my knowledge, the bug isn't currently blocking anyone, and it's been around since the original implementation. Silviu and I only discovered it in passing, and it wasn't something we encountered in the wild. What do you think?

Just to expand on the point above:

From the algorithm of constructing interleaved groups, we should be able to exclude both WAW (from the algorithm, see the comments) and WAR (we're interleaving and then moving stores down and loads up, so we cannot break these) - for both loop-carried and loop independent dependences.

So the problem should only be RAW. If it is loop carried, then it is a forward dependence and we cannot vectorize, so this should be safe.
And this should handle the loop independent case.

Also we've probably not seen this until now because other optimizations would simply remove the load before this got to the vectorizer?

Cheers,
Silviu

In D19984#428293, @mssimpso wrote:

Hi Adam,

Why are non-zero forward deps rejected by LAA? Because of the HW store-to-load forwarding case? I think that that is only a performance consideration and it's only on conditionally.

Siviu was referring to a comment I had made in the source. But I think the idea was that LAA had already ensured the absence of the dependences that would have prevented vectorization. So the interleaved access analysis didn't need to worry about them. Regarding the store-to-load forwarding case, yes, that is the assumption the original analysis made, which was incorrect. The current patch attempts to fix that.

No, I was asking about *non-zero* distance deps specifically. The current patch only handles zero-distance deps. So my question was whether for non-zero distance we still rely on the store-to-load forwarding detection code to make the dep unsafe for vectorization.

Hi Silviu,

In D19984#428379, @sbaranga wrote:

Just to expand on the point above:

From the algorithm of constructing interleaved groups, we should be able to exclude both WAW (from the algorithm, see the comments) and WAR (we're interleaving and then moving stores down and loads up, so we cannot break these) - for both loop-carried and loop independent dependences.

What about moving elements of an interleaved group over other dependent accesses not in the same group?

Adam

test/Transforms/LoopVectorize/interleaved-accesses.ll
574–579 ↗	(On Diff #56589)	You mean p[i].x etc in the loop.

Hi Guys,

I've been thinking through these dependence issues more carefully. I'm not yet finished with all the permutations, but I've come across another somewhat unrelated issue. I thought I would post an update about that and respond to some of Adam's questions in the meantime. First the questions:

No, I was asking about *non-zero* distance deps specifically. The current patch only handles zero-distance deps. So my question was whether for non-zero distance we still rely on the store-to-load forwarding detection code to make the dep unsafe for vectorization.

For positive dependences, LAA prevents vectorization if the distance is less than some minimum (which determines the maximum safe VF). For positive dependences between strided accesses, LAA tries to prove independence (the accesses are independent if the distance is not a multiple of the stride). Non-positive dependences are allowed, and LAA checks for store-to-load forwarding conflicts for RAWs. The original analysis assumed the store-to-load detection would guarantee the absence of all RAWs, which was incorrect. So we need to consider the non-positive RAWs (as Silviu said, we should be able to exclude the WAR and WAW cases).

What about moving elements of an interleaved group over other dependent accesses not in the same group?

This would be the RAW case. For example, S1-L2 is a RAW dependence, L1-L2 form a group:

L1: load
S1: store
L2: load  // L2 would be hoisted above S1.

Or alternatively, S1-L1 is a RAW dependence, S1-S2 form a group:

S1: store // S1 would be sunk below L1.
L1: load
S2: store

The other issue I've uncovered is related to the maximum safe VF. We set this based on the positive dependence distance LAA computes. But for the interleaved accesses, the actual VF used during vectorization is VF * IF, where IF is the interleave factor. The idea is that each component of the wide vector would have VF elements after they are shuffled out. However, this could be greater than the maximum safe VF. Here's an example.

; for (int i = 1; i < 1000; ++i) {
;   p[i + 2].x = p[i].x;
;   p[i + 2].y = p[i].y;
; }
%struct.pair = type { i32, i32 }
for.body:
  %indvars.iv = phi i64 [ 1, %entry ], [ %indvars.iv.next, %for.body ]
  %x1 = getelementptr inbounds %struct.pair, %struct.pair* %p, i64 %indvars.iv, i32 0
  %0 = load i32, i32* %x1, align 4
  %1 = add nuw nsw i64 %indvars.iv, 2
  %x4 = getelementptr inbounds %struct.pair, %struct.pair* %p, i64 %1, i32 0
  store i32 %0, i32* %x4, align 4
  %y = getelementptr inbounds %struct.pair, %struct.pair* %p, i64 %indvars.iv, i32 1
  %2 = load i32, i32* %y, align 4
  %y10 = getelementptr inbounds %struct.pair, %struct.pair* %p, i64 %1, i32 1
  store i32 %2, i32* %y10, align 4
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp eq i64 %indvars.iv.next, 1000
  br i1 %exitcond, label %for.cond.cleanup, label %for.body
}

We currently generate <8 x i32> loads and stores, but the maximum safe dependence distance is only 16 bytes, so I think we are generating incorrect code here. I think we should probably check for interleaved accesses when selecting the VF.

test/Transforms/LoopVectorize/interleaved-accesses.ll
574–579 ↗	(On Diff #56589)	Yes, thanks for catching that!

I submitted D20241 to correct the issue with the vectorization factor mentioned above.

Matt: thanks for doing this analysis!

What about moving elements of an interleaved group over other dependent accesses not in the same group?

Adam

We shouldn't be doing that. We should be preserving dependences, and as long as we do that this should be correct.

It does look like things are more complicated then what I previously stated, at least with regard to RAW. And we do need to consider non-zero distance dependences (because we are essentially moving this loads/stores after interleaving). Here is an example:

for (...) {

a[i].a = a[i + 2].a + 1
tmp = a[i - 1].a * 2
a[i].b = tmp
a[i].c = tmp
a[i].d = tmp

}

The problem here is that by moving the store to a[i].a we're breaking the loop-carried dependence from a[i].a to a[i-1].a (which doesn't stop vectorization).

Cheers,
Silviu

I agree, Silviu. That's basically what I've been thinking. I'm working on an updated patch. Thanks for all the feedback!

Updated the analysis according to feedback from Adam and Silviu.

Sorry for the delay in getting a new version of this patch ready. This update is notably different from the previous version in the following ways: (1) We collect all memory accesses in collectConstStrideAccesses instead of only the stride-greater-than-one accesses. We have to collect all accesses to check for dependences between interleaved and non-interleaved accesses. The non-strided accesses are ignored when actually creating the groups. (2) We only ignore the WAR case. I think the code generation strategy can only ensure we won't violate write-after-reads. Everything else is checked. (3) For strided accesses, we try and prove independence like is done in LAA. All other constant-distance accesses are considered dependent, and we preserve their order. The dependences are checked in canReorderMemAccs. (4) I've added another test case, and updated existing comments about the algorithm.

I have just a few quick remarks (and most of them were inherited from the existing code).

Cheers,
Silviu

lib/Transforms/Vectorize/LoopVectorize.cpp
5071 ↗	(On Diff #57508)	We should improve this at some point to check for NoDep (which is what we're really looking for here) and is implied by this test.
5077 ↗	(On Diff #57508)	This makes assumptions on how LAA works. It might be technically true at the moment, but if the dependence analysis would be improved it would possibly invalidate this. I think the case where we memcheck each pointer is an edge case? I'm not sure it's worth doing the interleaved access vectorization in this case. The more common case could be if two pointers are in different alias sets or have different underlying objects? But that should be expressed as a NoDep.
5093 ↗	(On Diff #57508)	Same here. Ideally we would know we have NoDep from LAA. Some of the checks here seem to be specifically related to forming the interleaved groups, and less about reordering accesses.

Hi Silviu,

It sounds like you're suggesting that we should just query LAA to determine if any dependence exists between the two candidate instructions, and if so, prevent reordering. Does that sound right?

lib/Transforms/Vectorize/LoopVectorize.cpp
5093 ↗	(On Diff #57508)	If you're referring to the type and factor checks, those are required before we can call areStridedAccessesIndependent.

Addressed Silviu's comments.

Hi Silviu,

I've updated the patch to use LAA for checking for the presence of dependences like you suggested. This is actually much simpler. Thanks! I had to make one change to LAA: We now try to prove strided accesses independent before handling the negative distance cases.

Matt.

In D19984#433539, @mssimpso wrote:

I had to make one change to LAA: We now try to prove strided accesses independent before handling the negative distance cases.

Yes, I've seen this recently too. Can you please split this out (and commit) with an appropriate LAA testcase.

Thanks for making the changes. I think this looks much better.

Cheers,
Silviu

lib/Transforms/Vectorize/LoopVectorize.cpp
5078 ↗	(On Diff #57671)	We should be able to do this faster than O(n) - maybe with some pre-processing of dependences.

mssimpso mentioned this in rL270072: [LAA] Check independence of strided accesses before forward case.May 19 2016, 8:43 AM

In D19984#433930, @anemet wrote:

Yes, I've seen this recently too. Can you please split this out (and commit) with an appropriate LAA testcase.

Sure. I committed the LAA portion in rL270072. I'll post a rebased version of this patch and address Silviu's latest comments.

I am not done reviewing this but I just want to say that I am having a much better feeling about this analysis after your changes. As I said I had major issues with its soundness. Thanks for your patience for working through all this!

I am also sending my initial comments but there may be more coming.

lib/Transforms/Vectorize/LoopVectorize.cpp
947–949 ↗	(On Diff #57671)	Somewhere this should state the criteria for reordering. In theory you can reorder backward dependences but looks like you don't allow any (which is fine).
5058 ↗	(On Diff #57671)	I am wondering if it's time to change the name of this. After your change it no longer contains only "interesting" strided accesses. How about like AccessStrideInfo?
5062 ↗	(On Diff #57671)	I think that we've been using Accesses pretty consistently without any abbreviation.
5093–5095 ↗	(On Diff #57671)	I would drop the part about memchecks. This is already pretty long and that part is trivial.

Addressed comments from Silviu and Adam.

Thanks very much for all the feedback! I agree, this is definitely looking better. In this update, I pre-process the dependences to enable constant-time queries like Silviu suggested, and I address Adam's latest comments.

lib/Transforms/Vectorize/LoopVectorize.cpp
5058 ↗	(On Diff #57671)	Sounds good to me. I'll rename StridedAccesses prior to committing the current patch.

anemet added inline comments.May 19 2016, 1:55 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
5162–5166 ↗	(On Diff #57821)	Here I think we're validating the correctness of: Potentially moving 'A' an interleaved load before any store 'B' or Potentially moving 'B' an interleaved store after any load/store 'A'. If I am right, I don't think either the comment or the code is tight enough to reflect this. It would also be good to add testcases for this. It would be also great to add a testcase to the earlier problem that only candidate accesses were dependency-checked. What do you think?

mssimpso added inline comments.May 19 2016, 2:39 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
5162–5166 ↗	(On Diff #57821)	Adam, I think you're mostly right in the description of what we're validating the correctness of. When you have a chance, would you mind elaborating a bit more as to why you think we're not satisfying both of the cases you mention? The algorithm works bottom-up, so B will always precede A in program order, and we always investigate B's that are closest to A first (some dependences do temporarily slip by but are later checked, see below). Also note that we don't yet handle loops with predicated blocks. I'm working on that in D19694. For case (1) the bottom-up ordering implies that if we can move a load A before a store B, we know that we can also move A before any store that B precedes. Case (2) is a bit more subtle and is probalby confusing. Say we have the case below: S1: Store // depends on L1 L1: Load // depends on S1 S2: Store When A is S2 and B is S1, we might say that these are independent, and S1 could be sunk to S2's location. We will add S1 to S2's group. But when A is L1 and B is S1, we will notice the dependence. When we do, we check to see if S1 is already in a group, and if so, invalidate it. This prevents us from moving S1. Please let me know if I'm still missing something here. We could probably improve the comment and/or rename "canReorderMemAccesses", since this doesn't quite capture what's happening with the instruction ordering. And yes, we can definitely add more tests.

anemet added inline comments.May 19 2016, 4:57 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
5162–5166 ↗	(On Diff #57821)	I think you're mostly right in the description of what we're validating the correctness of. When you have a chance, would you mind elaborating a bit more as to why you think we're not satisfying both of the cases you mention? I said "not tight enough" not that it's incorrect, sorry if that was unclear. I think I would like to include the above two cases in the comment (or an improved version) and then have the checks in the code reflect that. I.e. right now we don't check that at least one of 'A' or 'B' is interleaved. The algorithm works bottom-up, so B will always precede A in program order, and we always investigate B's that are closest to A first (some dependences do temporarily slip by but are later checked, see below). Also note that we don't yet handle loops with predicated blocks. I'm working on that in D19694. For case (1) the bottom-up ordering implies that if we can move a load A before a store B, we know that we can also move A before any store that B precedes. Case (2) is a bit more subtle and is probalby confusing. Say we have the case below: S1: Store depends on L1 L1: Load depends on S1 S2: Store When A is S2 and B is S1, we might say that these are independent, and S1 could be sunk to S2's location. We will add S1 to S2's group. But when A is L1 and B is S1, we will notice the dependence. When we do, we check to see if S1 is already in a group, and if so, invalidate it. This prevents us from moving S1. Make sense. I am wondering now if there is a simpler way to formulate this analysis with the same result. Aren't we simply saying that we don't consider an interleaved access for merging if it's either a source or the destination of a dependence? I.e. something like this in the outer loop: if (isDepSource(A) && A->mayWriteToMemory() \|\| isDepDestination(A)) break: and then no need for the inner loop? Please let me know if I'm still missing something here. We could probably improve the comment and/or rename "canReorderMemAccesses", since this doesn't quite capture what's happening with the instruction ordering. And yes, we can definitely add more tests. Thanks!

mssimpso added inline comments.May 20 2016, 6:42 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
5162–5166 ↗	(On Diff #57821)	I think I would like to include the above two cases in the comment (or an improved version) and then have the checks in the code reflect that. I.e. right now we don't check that at least one of 'A' or 'B' is interleaved. I see, yes this makes sense. I am wondering now if there is a simpler way to formulate this analysis with the same result. Aren't we simply saying that we don't consider an interleaved access for merging if it's either a source or the destination of a dependence? This sounds right to me. Let me think it over before posting another update. Thanks again for all the feedback, Adam!

Addressed Adam's latest round of comments.

Adam,

I've updated the comments and code to be more precise as you suggested. I've also added a couple of new test cases so that we are checking the conditions you enumerated in your last review. I'll reply to your other points inline. Thanks!

Matt.

mssimpso marked 2 inline comments as done.May 24 2016, 12:10 PM

mssimpso added inline comments.

lib/Transforms/Vectorize/LoopVectorize.cpp
5196–5200 ↗	(On Diff #58286)	It would be also great to add a testcase to the earlier problem that only candidate accesses were dependency-checked. I've tried constructing a testcase that would generate incorrect code for this, but I think the current limitations of LAA might actually make this unrealizable at the moment. After the memory checks, two access can be dependent only at constant distances. Thus, if one access is strided, a dependent access at a constant distance would have to be strided as well. This sounds right to me. Let me think it over before posting another update. After thinking about this more carefully, I think we will miss cases with a simplification like the one you suggested. For example, if we have a set of interleaved loads followed by a set of interleaved stores that access the same location (or vice versa), giving up in the outer loop would prevent the two groups from being created.

This looks sensible to me. LGTM! (you should wait for further comments from Adam before committing)

Cheers,
Silviu

Hi Matt,

Sorry about the delay. As I said in my earlier heads-up I was on vacation.

My main comment is the reply to your testcase remark. The other ones are just nitpicks but I haven't finished going through the patch. I am just curious what you think about the idea regarding the testcase.

Adam

lib/Transforms/Vectorize/LoopVectorize.cpp
959–960 ↗	(On Diff #58286)	Both A and B are not necessarily strided here.
976–977 ↗	(On Diff #58286)	This requirement is part of the API so it should be in the function comment. Also a nit, can you please flip the order. I think that A preceding B is the more intuitive. Let me know if you disagree.
5196–5200 ↗	(On Diff #58286)	I've tried constructing a testcase that would generate incorrect code for this, but I think the current limitations of LAA might actually make this unrealizable at the moment. After the memory checks, two access can be dependent only at constant distances. Thus, if one access is strided, a dependent access at a constant distance would have to be strided as well. Can't we use larger than stride offsets/indices to emulate non-interleaved accesses? I believe that these are currently ignored by interleaved analysis. I mean something like: for (i = 0; i < n; i+=3) { ... = A[i] A[i+4] = ... ... = A[i+1] } And then A[i+1] shouldn't be moved across A[i+4] I haven't tried this, it's just an idea....

Adam,

Welcome back! Thanks for following up. I replied to your comments inline.

Matt.

lib/Transforms/Vectorize/LoopVectorize.cpp
959–960 ↗	(On Diff #58286)	True. I will rename this and update the comments.
976–977 ↗	(On Diff #58286)	Sounds good.
5196–5200 ↗	(On Diff #58286)	This was my first thought as well. In this case, A[i+4] is still a stride-greater-than-one access, so it would've been collected in SrideAccesses and sent through the original analysis. If its stride is equal to anything other than that of the other accesses (e.g, one or some value greater than MaxInterleaveGroupFactor), the distance between it and the other accesses will be non-constant (some SCEV expression). If the distance isn't constant LAA will report Dependence::Unknown, and we will create the memory checks.

Addressed Adam's comments.

anemet added inline comments.Jun 8 2016, 12:25 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
5254–5258 ↗	(On Diff #60069)	Matt, I am not sure I follow, which of these accesses is not constant distance in my example? I just tried this example: void f(char a, char __restrict c, int n) { for (int i = 0; i < n; i+=3) { c[i] = a[i]; a[i+4] = c[i+4]; c[i+1] = a[i+1]; } } If you compile this on x86_64 with: -O3 -mllvm -enable-load-pre=0 -mllvm -store-to-load-forwarding-conflict-detection=0 -mllvm -enable-interleaved-mem-accesses -mllvm -force-vector-width=2 Then the loads from 'a' will be merged which breaks the forward dependence between the store of a[i+4] and the load of a[i+1]. No?

mssimpso added inline comments.Jun 8 2016, 12:54 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
5254–5258 ↗	(On Diff #60069)	Adam, We may be talking past each other a bit. I think your original request (correct me if I'm wrong) was for a test case showing that the existing analysis doesn't consider checking memory accesses that are not "candidates" for interleaved groups. Candidate accesses are the ones collected in StrideAccesses prior to the analysis. However, every access in your example actually is a candidate access because they are all strided. They are not ignored. Yes, we currently do the wrong thing here, but that's because the current analysis isn't correctly checking dependences between the actual candidate accesses, not because it isn't considering a dependence between a candidate access and a non-candidate access. Are you asking for something different?

anemet added inline comments.Jun 8 2016, 3:15 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
5254–5258 ↗	(On Diff #60069)	Ah, the disconnect was that I didn't realize that accesses with offsets larger than the stride were considered as candidates. The idea was to have these mimic non-strided accesses. I guess it makes sense to consider these candidates as well because the offset is not really an offset but rather the distance between two accesses. So I guess we can't have a testcase for this case at the moment... I'll continue reviewing the patch.

This is very close I just want to take extra care to update/improve the comments while this is all fresh in our memory.

lib/Transforms/Vectorize/LoopVectorize.cpp
967–968 ↗	(On Diff #60069)	Actually sorry, let's make this even more precise. At this point the function is called with any accesses. (The function later checks that if both accesses are non-strided we don't bother checking deps since we would never reorder those.)
971 ↗	(On Diff #60069)	We should probably have some qualification in the name, something like canReorderAccessesForInterleavedGroups or something like that. This is not a general canReorder predicate but takes into consideration the actual code motion strategy.
5135–5142 ↗	(On Diff #60069)	Is my understanding correct, that you removed this because we no longer support this case? I.e. the dep 1->2 does not currently allow for this. If this is true, we should probably add a FIXME. Also, I thought the WAW case was the only reason for the bottom-up ordering. That is a bit confusing now too.
5254–5257 ↗	(On Diff #60069)	It would be nice to use A and B related iterators in this part because that is what you mention in the comment. E.g. AI, BI or IA, IB.

mssimpso added inline comments.Jun 15 2016, 7:26 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
971 ↗	(On Diff #60069)	Sounds good. I'll update this and the comments.
5135–5142 ↗	(On Diff #60069)	No, we do still support this case, for essentially the reason the existing comment states. When the outer loop of the analysis is on (3) and we visit (2) and (1) in the inner loop, we will form the (3,2) group. (1) can't be added to (3,2) because a member already exists in group (3,2) with the same offset. When the outer loop is on (2), we will again visit (1) in the inner loop. This time we will see the dependence between (1) and (2) and give up trying to add additional accesses to (2)'s group. (1) is not yet in a group, so the outer loop moves on to (1). Thus, (1) must form a group with accesses that precede it and won't be sunk, as the existing comment says. I removed this comment because I thought it was somewhat misleading. The bottom-up ordering is not sufficient to prevent us from breaking WAW dependencies. We still have to explicitly check for them. For example, if we had something like the following for a factor 2 group: A[i] = x (1) A[i+2] = y (2) A[i+1] = z (3) The analysis would proceed as before. When the outer loop is on (3): this time (2) is not added to (3)'s group because its index is too large. (1) is temporarily added to (3)'s group, creating a (3,1) group. When the outer loop is on (2): we see the dependence between (1) and (2). As before, we stop trying to add additional instructions to (2)'s group. But now since (1) is already in a group, and we now know it shouldn't be reordered, we release the (3,1) group. So the bottom-up ordering isn't sufficient as we still have to check for the dependence. It's probably worth adding a comment that clarifies why the analysis is bottom-up.
5254–5257 ↗	(On Diff #60069)	I agree. I think we should also swap A and B in this function to match what we did to the canReorderMemAccesses function. There, A precedes B, but here B precedes A. They should be consistent.

mssimpso added inline comments.Jun 15 2016, 10:03 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
5135–5142 ↗	(On Diff #60069)	I got the second example here slightly wrong, but the idea is the same. I was intending to describe something like: A[i-1] = x (1) A[i-3] = y (2) A[i] = z (3) Now, the dependence between (1) and (2) will cause the (3,1) group to be released, as I originally described. I'm adding test cases for both of these WAW examples.
5254–5257 ↗	(On Diff #60069)	Actually, swapping A with B in this function will make the review very difficult. I think I'd rather swap A and B in canReorderMemAccesses to make things consistent. This will mean B precedes A both places. We can then swap them both in an NFC patch if we want A to precede B.

Addressed Adam's comments.

Renamed canReorderMemAccesses and updated comments
Swapped A and B in canReorderMemAccesses to be consistent with the reset of the analysis (i.e., B precedes A). We can swap the entire analysis in a follow-on.
Updated iterators to be more descriptive (i.e., AI and BI).
Commented about the bottom-up ordering.
Added test cases for the two WAW examples mentioned in my last update.

Adam,

Do you have any additional comments for this patch? Thanks!

Matt.

Matt,

All agreed, I just have a few more suggestions to improve comments for this complex piece.

Let me know what you think.

Adam

lib/Transforms/Vectorize/LoopVectorize.cpp
5200–5207 ↗	(On Diff #60862)	Wow, this is very subtle too. This and the previous case of why we need to remove elements from a group needs a high-level comment somewhere around the call to canReorder... See my comments on the particular lines.
5334–5343 ↗	(On Diff #60862)	I don't think the comment is sufficient to cover all the subtleties here. I would prefer to say something like: We can't have dependences between accesses in a group and other accesses that are located between the first and the last element of group. Probably before the break statement we should also say that: It's OK to have dependences between accesses in a group and other accesses before the first instruction we just can't extend the group beyond these. I am also wondering if we need a picture to further visualize this, i.e. a case where there is an intervening access and one where falls outside the range of the group.
5351–5354 ↗	(On Diff #60862)	OK.
test/Transforms/LoopVectorize/interleaved-accesses.ll
769–770 ↗	(On Diff #60862)	... but exclude a[i] = x

Adam,

Thanks very much for the suggestions. They all look good to me! I will update the patch.

Matt.

Sharpened comments according to Adam's feedback.

Looks great to me! Thanks for the improvements and of course initially taking this on!

This revision is now accepted and ready to land.Jun 23 2016, 1:51 PM

Thanks Adam and Silviu for all the detailed feedback! Getting these dependences right can be tricky, so I appreciate the attention.

Closed by commit rL273687: [LV] Preserve order of dependences in interleaved accesses analysis (authored by mssimpso). · Explain WhyJun 24 2016, 8:40 AM

This revision was automatically updated to reflect the committed changes.

mssimpso mentioned this in rL275473: [LV] Rename StrideAccesses to AccessStrideInfo (NFC).Jul 14 2016, 2:12 PM

mssimpso mentioned this in rL275567: [LV] Swap A and B in interleaved access analysis (NFC).Jul 15 2016, 8:30 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

259 lines

test/

Transforms/

LoopVectorize/

interleaved-accesses.ll

305 lines

Diff 61798

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 826 Lines • ▼ Show 20 Lines
/// a loop. Otherwise it's meaningless to do analysis as the vectorization		/// a loop. Otherwise it's meaningless to do analysis as the vectorization
/// on interleaved accesses is unsafe.		/// on interleaved accesses is unsafe.
///		///
/// The analysis collects interleave groups and records the relationships		/// The analysis collects interleave groups and records the relationships
/// between the member and the group in a map.		/// between the member and the group in a map.
class InterleavedAccessInfo {		class InterleavedAccessInfo {
public:		public:
InterleavedAccessInfo(PredicatedScalarEvolution &PSE, Loop *L,		InterleavedAccessInfo(PredicatedScalarEvolution &PSE, Loop *L,
DominatorTree *DT)		DominatorTree DT, LoopInfo LI)
: PSE(PSE), TheLoop(L), DT(DT), RequiresScalarEpilogue(false) {}		: PSE(PSE), TheLoop(L), DT(DT), LI(LI), LAI(nullptr),
		RequiresScalarEpilogue(false) {}

~InterleavedAccessInfo() {		~InterleavedAccessInfo() {
SmallSet<InterleaveGroup *, 4> DelSet;		SmallSet<InterleaveGroup *, 4> DelSet;
// Avoid releasing a pointer twice.		// Avoid releasing a pointer twice.
for (auto &I : InterleaveGroupMap)		for (auto &I : InterleaveGroupMap)
DelSet.insert(I.second);		DelSet.insert(I.second);
for (auto *Ptr : DelSet)		for (auto *Ptr : DelSet)
delete Ptr;		delete Ptr;
Show All 24 Lines	if (InterleaveGroupMap.count(Instr))
return InterleaveGroupMap.find(Instr)->second;		return InterleaveGroupMap.find(Instr)->second;
return nullptr;		return nullptr;
}		}

/// \brief Returns true if an interleaved group that may access memory		/// \brief Returns true if an interleaved group that may access memory
/// out-of-bounds requires a scalar epilogue iteration for correctness.		/// out-of-bounds requires a scalar epilogue iteration for correctness.
bool requiresScalarEpilogue() const { return RequiresScalarEpilogue; }		bool requiresScalarEpilogue() const { return RequiresScalarEpilogue; }

		/// \brief Initialize the LoopAccessInfo used for dependence checking.
		void setLAI(const LoopAccessInfo *Info) { LAI = Info; }

private:		private:
/// A wrapper around ScalarEvolution, used to add runtime SCEV checks.		/// A wrapper around ScalarEvolution, used to add runtime SCEV checks.
/// Simplifies SCEV expressions in the context of existing SCEV assumptions.		/// Simplifies SCEV expressions in the context of existing SCEV assumptions.
/// The interleaved access analysis can also add new predicates (for example		/// The interleaved access analysis can also add new predicates (for example
/// by versioning strides of pointers).		/// by versioning strides of pointers).
PredicatedScalarEvolution &PSE;		PredicatedScalarEvolution &PSE;
Loop *TheLoop;		Loop *TheLoop;
DominatorTree *DT;		DominatorTree *DT;
		LoopInfo *LI;
		const LoopAccessInfo *LAI;

/// True if the loop may contain non-reversed interleaved groups with		/// True if the loop may contain non-reversed interleaved groups with
/// out-of-bounds accesses. We ensure we don't speculatively access memory		/// out-of-bounds accesses. We ensure we don't speculatively access memory
/// out-of-bounds by executing at least one scalar epilogue iteration.		/// out-of-bounds by executing at least one scalar epilogue iteration.
bool RequiresScalarEpilogue;		bool RequiresScalarEpilogue;

/// Holds the relationships between the members and the interleave group.		/// Holds the relationships between the members and the interleave group.
DenseMap<Instruction , InterleaveGroup > InterleaveGroupMap;		DenseMap<Instruction , InterleaveGroup > InterleaveGroupMap;

		/// Holds dependences among the memory accesses in the loop. It maps a source
		/// access to a set of dependent sink accesses.
		DenseMap<Instruction , SmallPtrSet<Instruction , 2>> Dependences;

/// \brief The descriptor for a strided memory access.		/// \brief The descriptor for a strided memory access.
struct StrideDescriptor {		struct StrideDescriptor {
StrideDescriptor(int Stride, const SCEV *Scev, unsigned Size,		StrideDescriptor(int Stride, const SCEV *Scev, unsigned Size,
unsigned Align)		unsigned Align)
: Stride(Stride), Scev(Scev), Size(Size), Align(Align) {}		: Stride(Stride), Scev(Scev), Size(Size), Align(Align) {}

StrideDescriptor() : Stride(0), Scev(nullptr), Size(0), Align(0) {}		StrideDescriptor() : Stride(0), Scev(nullptr), Size(0), Align(0) {}

int Stride; // The access's stride. It is negative for a reverse access.		int Stride; // The access's stride. It is negative for a reverse access.
const SCEV *Scev; // The scalar expression of this access		const SCEV *Scev; // The scalar expression of this access
unsigned Size; // The size of the memory object.		unsigned Size; // The size of the memory object.
unsigned Align; // The alignment of this access.		unsigned Align; // The alignment of this access.
};		};

		/// \brief A type for holding instructions and their stride descriptors.
		typedef std::pair<Instruction *, StrideDescriptor> StrideEntry;

/// \brief Create a new interleave group with the given instruction \p Instr,		/// \brief Create a new interleave group with the given instruction \p Instr,
/// stride \p Stride and alignment \p Align.		/// stride \p Stride and alignment \p Align.
///		///
/// \returns the newly created interleave group.		/// \returns the newly created interleave group.
InterleaveGroup createInterleaveGroup(Instruction Instr, int Stride,		InterleaveGroup createInterleaveGroup(Instruction Instr, int Stride,
unsigned Align) {		unsigned Align) {
assert(!InterleaveGroupMap.count(Instr) &&		assert(!InterleaveGroupMap.count(Instr) &&
"Already in an interleaved access group");		"Already in an interleaved access group");
Show All 9 Lines	void releaseGroup(InterleaveGroup *Group) {

delete Group;		delete Group;
}		}

/// \brief Collect all the accesses with a constant stride in program order.		/// \brief Collect all the accesses with a constant stride in program order.
void collectConstStridedAccesses(		void collectConstStridedAccesses(
MapVector<Instruction *, StrideDescriptor> &StrideAccesses,		MapVector<Instruction *, StrideDescriptor> &StrideAccesses,
const ValueToValueMap &Strides);		const ValueToValueMap &Strides);

		/// \brief Returns true if \p Stride is allowed in an interleaved group.
		static bool isStrided(int Stride) {
		unsigned Factor = std::abs(Stride);
		return Factor >= 2 && Factor <= MaxInterleaveGroupFactor;
		}

		/// \brief Returns true if LoopAccessInfo can be used for dependence queries.
		bool areDependencesValid() const {
		return LAI && LAI->getDepChecker().getDependences();
		}

		/// \brief Returns true if memory accesses \p B and \p A can be reordered, if
		/// necessary, when constructing interleaved groups.
		///
		/// \p B must precede \p A in program order. We return false if reordering is
		/// not necessary or is prevented because \p B and \p A may be dependent.
		bool canReorderMemAccessesForInterleavedGroups(StrideEntry *B,
		StrideEntry *A) const {

		// Code motion for interleaved accesses can potentially hoist strided loads
		// and sink strided stores. The code below checks the legality of the
		// following two conditions:
		//
		// 1. Potentially moving a strided load (A) before any store (B) that
		// precedes A, or
		//
		// 2. Potentially moving a strided store (B) after any load or store (A)
		// that B precedes.
		//
		// It's legal to reorder B and A if we know there isn't a dependence from B
		// to A. Note that this determination is conservative since some
		// dependences could potentially be reordered safely.

		// B is potentially the source of a dependence.
		auto *Src = B->first;
		auto SrcDes = B->second;

		// A is potentially the sink of a dependence.
		auto *Sink = A->first;
		auto SinkDes = A->second;

		// Code motion for interleaved accesses can't violate WAR dependences.
		// Thus, reordering is legal if the source isn't a write.
		if (!Src->mayWriteToMemory())
		return true;

		// At least one of the accesses must be strided.
		if (!isStrided(SrcDes.Stride) && !isStrided(SinkDes.Stride))
		return true;

		// If dependence information is not available from LoopAccessInfo,
		// conservatively assume the instructions can't be reordered.
		if (!areDependencesValid())
		return false;

		// If we know there is a dependence from source to sink, assume the
		// instructions can't be reordered. Otherwise, reordering is legal.
		return !Dependences.count(Src) \|\| !Dependences.lookup(Src).count(Sink);
		}

		/// \brief Collect the dependences from LoopAccessInfo.
		///
		/// We process the dependences once during the interleaved access analysis to
		/// enable constant-time dependence queries.
		void collectDependences() {
		if (!areDependencesValid())
		return;
		auto *Deps = LAI->getDepChecker().getDependences();
		for (auto Dep : *Deps)
		Dependences[Dep.getSource(LAI)].insert(Dep.getDestination(LAI));
		}
};		};

/// Utility class for getting and setting loop vectorizer hints in the form		/// Utility class for getting and setting loop vectorizer hints in the form
/// of loop metadata.		/// of loop metadata.
/// This class keeps a number of loop annotations locally (as member variables)		/// This class keeps a number of loop annotations locally (as member variables)
/// and can, upon request, write them back as metadata on the loop. It will		/// and can, upon request, write them back as metadata on the loop. It will
/// initially scan the loop for existing metadata, and will update the local		/// initially scan the loop for existing metadata, and will update the local
/// values based on information in the loop.		/// values based on information in the loop.
▲ Show 20 Lines • Show All 314 Lines • ▼ Show 20 Lines
/// This class is also used by InnerLoopVectorizer for identifying		/// This class is also used by InnerLoopVectorizer for identifying
/// induction variable and the different reduction variables.		/// induction variable and the different reduction variables.
class LoopVectorizationLegality {		class LoopVectorizationLegality {
public:		public:
LoopVectorizationLegality(Loop *L, PredicatedScalarEvolution &PSE,		LoopVectorizationLegality(Loop *L, PredicatedScalarEvolution &PSE,
DominatorTree DT, TargetLibraryInfo TLI,		DominatorTree DT, TargetLibraryInfo TLI,
AliasAnalysis AA, Function F,		AliasAnalysis AA, Function F,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
LoopAccessAnalysis *LAA,		LoopAccessAnalysis LAA, LoopInfo LI,
LoopVectorizationRequirements *R,		LoopVectorizationRequirements *R,
LoopVectorizeHints *H)		LoopVectorizeHints *H)
: NumPredStores(0), TheLoop(L), PSE(PSE), TLI(TLI), TheFunction(F),		: NumPredStores(0), TheLoop(L), PSE(PSE), TLI(TLI), TheFunction(F),
TTI(TTI), DT(DT), LAA(LAA), LAI(nullptr), InterleaveInfo(PSE, L, DT),		TTI(TTI), DT(DT), LAA(LAA), LAI(nullptr),
Induction(nullptr), WidestIndTy(nullptr), HasFunNoNaNAttr(false),		InterleaveInfo(PSE, L, DT, LI), Induction(nullptr),
Requirements(R), Hints(H) {}		WidestIndTy(nullptr), HasFunNoNaNAttr(false), Requirements(R),
		Hints(H) {}

/// ReductionList contains the reduction descriptors for all		/// ReductionList contains the reduction descriptors for all
/// of the reductions that were found in the loop.		/// of the reductions that were found in the loop.
typedef DenseMap<PHINode *, RecurrenceDescriptor> ReductionList;		typedef DenseMap<PHINode *, RecurrenceDescriptor> ReductionList;

/// InductionList saves induction variables and maps them to the		/// InductionList saves induction variables and maps them to the
/// induction descriptor.		/// induction descriptor.
typedef MapVector<PHINode *, InductionDescriptor> InductionList;		typedef MapVector<PHINode *, InductionDescriptor> InductionList;
▲ Show 20 Lines • Show All 589 Lines • ▼ Show 20 Lines	if (TC > 0u && TC < TinyTripCountVectorThreshold) {
return false;		return false;
}		}
}		}

PredicatedScalarEvolution PSE(SE, L);		PredicatedScalarEvolution PSE(SE, L);

// Check if it is legal to vectorize the loop.		// Check if it is legal to vectorize the loop.
LoopVectorizationRequirements Requirements;		LoopVectorizationRequirements Requirements;
LoopVectorizationLegality LVL(L, PSE, DT, TLI, AA, F, TTI, LAA,		LoopVectorizationLegality LVL(L, PSE, DT, TLI, AA, F, TTI, LAA, LI,
&Requirements, &Hints);		&Requirements, &Hints);
if (!LVL.canVectorize()) {		if (!LVL.canVectorize()) {
DEBUG(dbgs() << "LV: Not vectorizing: Cannot prove legality.\n");		DEBUG(dbgs() << "LV: Not vectorizing: Cannot prove legality.\n");
emitMissedWarning(F, L, Hints);		emitMissedWarning(F, L, Hints);
return false;		return false;
}		}

// Use the cost model.		// Use the cost model.
▲ Show 20 Lines • Show All 3,067 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {

// Insert all operands.		// Insert all operands.
Worklist.insert(Worklist.end(), I->op_begin(), I->op_end());		Worklist.insert(Worklist.end(), I->op_begin(), I->op_end());
}		}
}		}

bool LoopVectorizationLegality::canVectorizeMemory() {		bool LoopVectorizationLegality::canVectorizeMemory() {
LAI = &LAA->getInfo(TheLoop);		LAI = &LAA->getInfo(TheLoop);
		InterleaveInfo.setLAI(LAI);
auto &OptionalReport = LAI->getReport();		auto &OptionalReport = LAI->getReport();
if (OptionalReport)		if (OptionalReport)
emitAnalysis(VectorizationReport(*OptionalReport));		emitAnalysis(VectorizationReport(*OptionalReport));
if (!LAI->canVectorizeMemory())		if (!LAI->canVectorizeMemory())
return false;		return false;

if (LAI->hasStoreToLoopInvariantAddress()) {		if (LAI->hasStoreToLoopInvariantAddress()) {
emitAnalysis(		emitAnalysis(
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
}		}

void InterleavedAccessInfo::collectConstStridedAccesses(		void InterleavedAccessInfo::collectConstStridedAccesses(
MapVector<Instruction *, StrideDescriptor> &StrideAccesses,		MapVector<Instruction *, StrideDescriptor> &StrideAccesses,
const ValueToValueMap &Strides) {		const ValueToValueMap &Strides) {
// Holds load/store instructions in program order.		// Holds load/store instructions in program order.
SmallVector<Instruction *, 16> AccessList;		SmallVector<Instruction *, 16> AccessList;

for (auto *BB : TheLoop->getBlocks()) {		// Since it's desired that the load/store instructions be maintained in
		// "program order" for the interleaved access analysis, we have to visit the
		// blocks in the loop in reverse postorder (i.e., in a topological order).
		// Such an ordering will ensure that any load/store that may be executed
		// before a second load/store will precede the second load/store in the
		// AccessList.
		LoopBlocksDFS DFS(TheLoop);
		DFS.perform(LI);
		for (LoopBlocksDFS::RPOIterator I = DFS.beginRPO(), E = DFS.endRPO(); I != E;
		++I) {
		BasicBlock BB = I;
bool IsPred = LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);		bool IsPred = LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);

for (auto &I : *BB) {		for (auto &I : *BB) {
if (!isa<LoadInst>(&I) && !isa<StoreInst>(&I))		if (!isa<LoadInst>(&I) && !isa<StoreInst>(&I))
continue;		continue;
// FIXME: Currently we can't handle mixed accesses and predicated accesses		// FIXME: Currently we can't handle mixed accesses and predicated accesses
if (IsPred)		if (IsPred)
return;		return;

AccessList.push_back(&I);		AccessList.push_back(&I);
}		}
}		}

if (AccessList.empty())		if (AccessList.empty())
return;		return;

auto &DL = TheLoop->getHeader()->getModule()->getDataLayout();		auto &DL = TheLoop->getHeader()->getModule()->getDataLayout();
for (auto I : AccessList) {		for (auto I : AccessList) {
LoadInst *LI = dyn_cast<LoadInst>(I);		LoadInst *LI = dyn_cast<LoadInst>(I);
StoreInst *SI = dyn_cast<StoreInst>(I);		StoreInst *SI = dyn_cast<StoreInst>(I);

Value *Ptr = LI ? LI->getPointerOperand() : SI->getPointerOperand();		Value *Ptr = LI ? LI->getPointerOperand() : SI->getPointerOperand();
int Stride = getPtrStride(PSE, Ptr, TheLoop, Strides);		int Stride = getPtrStride(PSE, Ptr, TheLoop, Strides);

// The factor of the corresponding interleave group.
unsigned Factor = std::abs(Stride);

// Ignore the access if the factor is too small or too large.
if (Factor < 2 \|\| Factor > MaxInterleaveGroupFactor)
continue;

const SCEV *Scev = replaceSymbolicStrideSCEV(PSE, Strides, Ptr);		const SCEV *Scev = replaceSymbolicStrideSCEV(PSE, Strides, Ptr);
PointerType *PtrTy = dyn_cast<PointerType>(Ptr->getType());		PointerType *PtrTy = dyn_cast<PointerType>(Ptr->getType());
unsigned Size = DL.getTypeAllocSize(PtrTy->getElementType());		unsigned Size = DL.getTypeAllocSize(PtrTy->getElementType());

// An alignment of 0 means target ABI alignment.		// An alignment of 0 means target ABI alignment.
unsigned Align = LI ? LI->getAlignment() : SI->getAlignment();		unsigned Align = LI ? LI->getAlignment() : SI->getAlignment();
if (!Align)		if (!Align)
Align = DL.getABITypeAlignment(PtrTy->getElementType());		Align = DL.getABITypeAlignment(PtrTy->getElementType());

StrideAccesses[I] = StrideDescriptor(Stride, Scev, Size, Align);		StrideAccesses[I] = StrideDescriptor(Stride, Scev, Size, Align);
}		}
}		}

// Analyze interleaved accesses and collect them into interleave groups.		// Analyze interleaved accesses and collect them into interleaved load and
		// store groups.
//		//
// Notice that the vectorization on interleaved groups will change instruction		// When generating code for an interleaved load group, we effectively hoist all
// orders and may break dependences. But the memory dependence check guarantees		// loads in the group to the location of the first load in program order. When
// that there is no overlap between two pointers of different strides, element		// generating code for an interleaved store group, we sink all stores to the
// sizes or underlying bases.		// location of the last store. This code motion can change the order of load
		// and store instructions and may break dependences.
//		//
// For pointers sharing the same stride, element size and underlying base, no		// The code generation strategy mentioned above ensures that we won't violate
// need to worry about Read-After-Write dependences and Write-After-Read		// any write-after-read (WAR) dependences.
// dependences.		//
		// E.g., for the WAR dependence: a = A[i]; // (1)
		// A[i] = b; // (2)
		//
		// The store group of (2) is always inserted at or below (2), and the load
		// group of (1) is always inserted at or above (1). Thus, the instructions will
		// never be reordered. All other dependences are checked to ensure the
		// correctness of the instruction reordering.
		//
		// The algorithm visits all memory accesses in the loop in bottom-up program
		// order. Program order is established by traversing the blocks in the loop in
		// reverse postorder when collecting the accesses.
//		//
// E.g. The RAW dependence: A[i] = a;		// We visit the memory accesses in bottom-up order because it can simplify the
// b = A[i];		// construction of store groups in the presence of write-after-write (WAW)
// This won't exist as it is a store-load forwarding conflict, which has		// dependences.
// already been checked and forbidden in the dependence check.
//		//
// E.g. The WAR dependence: a = A[i]; // (1)		// E.g., for the WAW dependence: A[i] = a; // (1)
// A[i] = b; // (2)		// A[i] = b; // (2)
// The store group of (2) is always inserted at or below (2), and the load group		// A[i + 1] = c; // (3)
// of (1) is always inserted at or above (1). The dependence is safe.		//
		// We will first create a store group with (3) and (2). (1) can't be added to
		// this group because it and (2) are dependent. However, (1) can be grouped
		// with other accesses that may precede it in program order. Note that a
		// bottom-up order does not imply that WAW dependences should not be checked.
void InterleavedAccessInfo::analyzeInterleaving(		void InterleavedAccessInfo::analyzeInterleaving(
const ValueToValueMap &Strides) {		const ValueToValueMap &Strides) {
DEBUG(dbgs() << "LV: Analyzing interleaved accesses...\n");		DEBUG(dbgs() << "LV: Analyzing interleaved accesses...\n");

// Holds all the stride accesses.		// Holds all the stride accesses.
MapVector<Instruction *, StrideDescriptor> StrideAccesses;		MapVector<Instruction *, StrideDescriptor> StrideAccesses;
collectConstStridedAccesses(StrideAccesses, Strides);		collectConstStridedAccesses(StrideAccesses, Strides);

if (StrideAccesses.empty())		if (StrideAccesses.empty())
return;		return;

		// Collect the dependences in the loop.
		collectDependences();

// Holds all interleaved store groups temporarily.		// Holds all interleaved store groups temporarily.
SmallSetVector<InterleaveGroup *, 4> StoreGroups;		SmallSetVector<InterleaveGroup *, 4> StoreGroups;
// Holds all interleaved load groups temporarily.		// Holds all interleaved load groups temporarily.
SmallSetVector<InterleaveGroup *, 4> LoadGroups;		SmallSetVector<InterleaveGroup *, 4> LoadGroups;

// Search the load-load/write-write pair B-A in bottom-up order and try to		// Search the load-load/write-write pair B-A in bottom-up order and try to
// insert B into the interleave group of A according to 3 rules:		// insert B into the interleave group of A according to 3 rules:
// 1. A and B have the same stride.		// 1. A and B have the same stride.
// 2. A and B have the same memory object size.		// 2. A and B have the same memory object size.
// 3. B belongs to the group according to the distance.		// 3. B belongs to the group according to the distance.
//		for (auto AI = StrideAccesses.rbegin(), E = StrideAccesses.rend(); AI != E;
// The bottom-up order can avoid breaking the Write-After-Write dependences		++AI) {
// between two pointers of the same base.		Instruction *A = AI->first;
// E.g. A[i] = a; (1)		StrideDescriptor DesA = AI->second;
// A[i] = b; (2)
// A[i+1] = c (3)		// Initialize a group for A if it has an allowable stride. Even if we don't
// We form the group (2)+(3) in front, so (1) has to form groups with accesses		// create a group for A, we continue with the bottom-up algorithm to ensure
// above (1), which guarantees that (1) is always above (2).		// we don't break any of A's dependences.
for (auto I = StrideAccesses.rbegin(), E = StrideAccesses.rend(); I != E;		InterleaveGroup *Group = nullptr;
++I) {		if (isStrided(DesA.Stride)) {
Instruction *A = I->first;		Group = getInterleaveGroup(A);
StrideDescriptor DesA = I->second;

InterleaveGroup *Group = getInterleaveGroup(A);
if (!Group) {		if (!Group) {
DEBUG(dbgs() << "LV: Creating an interleave group with:" << *A << '\n');		DEBUG(dbgs() << "LV: Creating an interleave group with:" << *A << '\n');
Group = createInterleaveGroup(A, DesA.Stride, DesA.Align);		Group = createInterleaveGroup(A, DesA.Stride, DesA.Align);
}		}

if (A->mayWriteToMemory())		if (A->mayWriteToMemory())
StoreGroups.insert(Group);		StoreGroups.insert(Group);
else		else
LoadGroups.insert(Group);		LoadGroups.insert(Group);
		}

for (auto II = std::next(I); II != E; ++II) {		for (auto BI = std::next(AI); BI != E; ++BI) {
Instruction *B = II->first;		Instruction *B = BI->first;
StrideDescriptor DesB = II->second;		StrideDescriptor DesB = BI->second;

		// Our code motion strategy implies that we can't have dependences
		// between accesses in an interleaved group and other accesses located
		// between the first and last member of the group. Note that this also
		// means that a group can't have more than one member at a given offset.
		// The accesses in a group can have dependences with other accesses, but
		// we must ensure we don't extend the boundaries of the group such that
		// we encompass those dependent accesses.
		//
		// For example, assume we have the sequence of accesses shown below in a
		// stride-2 loop:
		//
		// (1, 2) is a group \| A[i] = a; // (1)
		// \| A[i-1] = b; // (2) \|
		// A[i-3] = c; // (3)
		// A[i] = d; // (4) \| (2, 4) is not a group
		//
		// Because accesses (2) and (3) are dependent, we can group (2) with (1)
		// but not with (4). If we did, the dependent access (3) would be within
		// the boundaries of the (2, 4) group.
		if (!canReorderMemAccessesForInterleavedGroups(&BI, &AI)) {

		// If a dependence exists and B is already in a group, we know that B
		// must be a store since B precedes A and WAR dependences are allowed.
		// Thus, B would be sunk below A. We release B's group to prevent this
		// illegal code motion. B will then be free to form another group with
		// instructions that precede it.
		if (isInterleaved(B)) {
		InterleaveGroup *StoreGroup = getInterleaveGroup(B);
		StoreGroups.remove(StoreGroup);
		releaseGroup(StoreGroup);
		}

		// If a dependence exists and B is not already in a group (or it was
		// and we just released it), A might be hoisted above B (if A is a
		// load) or another store might be sunk below B (if A is a store). In
		// either case, we can't add additional instructions to A's group. A
		// will only form a group with instructions that it precedes.
		break;
		}

		// At this point, we've checked for illegal code motion. If either A or B
		// isn't strided, there's nothing left to do.
		if (!isStrided(DesA.Stride) \|\| !isStrided(DesB.Stride))
		continue;

// Ignore if B is already in a group or B is a different memory operation.		// Ignore if B is already in a group or B is a different memory operation.
if (isInterleaved(B) \|\| A->mayReadFromMemory() != B->mayReadFromMemory())		if (isInterleaved(B) \|\| A->mayReadFromMemory() != B->mayReadFromMemory())
continue;		continue;

// Check the rule 1 and 2.		// Check the rule 1 and 2.
if (DesB.Stride != DesA.Stride \|\| DesB.Size != DesA.Size)		if (DesB.Stride != DesA.Stride \|\| DesB.Size != DesA.Size)
continue;		continue;
▲ Show 20 Lines • Show All 1,244 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/interleaved-accesses.ll

Show First 20 Lines • Show All 549 Lines • ▼ Show 20 Lines	for.body: ; preds = %for.body, %entry
%b = getelementptr inbounds %struct.IntFloat, %struct.IntFloat* %A, i64 %indvars.iv, i32 1		%b = getelementptr inbounds %struct.IntFloat, %struct.IntFloat* %A, i64 %indvars.iv, i32 1
%tmp1 = load float, float* %b, align 4		%tmp1 = load float, float* %b, align 4
%add3 = fadd fast float %SumB.014, %tmp1		%add3 = fadd fast float %SumB.014, %tmp1
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 1024		%exitcond = icmp eq i64 %indvars.iv.next, 1024
br i1 %exitcond, label %for.cond.cleanup, label %for.body		br i1 %exitcond, label %for.cond.cleanup, label %for.body
}		}

		; Check vectorization of interleaved access groups in the presence of
		; dependences (PR27626). The following tests check that we don't reorder
		; dependent loads and stores when generating code for interleaved access
		; groups. Stores should be scalarized because the required code motion would
		; break dependences, and the remaining interleaved load groups should have
		; gaps.

		; PR27626_0: Ensure a strided store is not moved after a dependent (zero
		; distance) strided load.

		; void PR27626_0(struct pair *p, int z, int n) {
		; for (int i = 0; i < n; i++) {
		; p[i].x = z;
		; p[i].y = p[i].x;
		; }
		; }

		; CHECK-LABEL: @PR27626_0(
		; CHECK: min.iters.checked:
		; CHECK: %n.mod.vf = and i64 %[[N:.+]], 3
		; CHECK: %[[IsZero:[a-zA-Z0-9]+]] = icmp eq i64 %n.mod.vf, 0
		; CHECK: %[[R:[a-zA-Z0-9]+]] = select i1 %[[IsZero]], i64 4, i64 %n.mod.vf
		; CHECK: %n.vec = sub i64 %[[N]], %[[R]]
		; CHECK: vector.body:
		; CHECK: %[[L1:.+]] = load <8 x i32>, <8 x i32>* {{.*}}
		; CHECK: %[[X1:.+]] = extractelement <8 x i32> %[[L1]], i32 0
		; CHECK: store i32 %[[X1]], {{.*}}
		; CHECK: %[[X2:.+]] = extractelement <8 x i32> %[[L1]], i32 2
		; CHECK: store i32 %[[X2]], {{.*}}
		; CHECK: %[[X3:.+]] = extractelement <8 x i32> %[[L1]], i32 4
		; CHECK: store i32 %[[X3]], {{.*}}
		; CHECK: %[[X4:.+]] = extractelement <8 x i32> %[[L1]], i32 6
		; CHECK: store i32 %[[X4]], {{.*}}

		%pair.i32 = type { i32, i32 }
		define void @PR27626_0(%pair.i32 *%p, i32 %z, i64 %n) {
		entry:
		br label %for.body

		for.body:
		%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
		%p_i.x = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i, i32 0
		%p_i.y = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i, i32 1
		store i32 %z, i32* %p_i.x, align 4
		%0 = load i32, i32* %p_i.x, align 4
		store i32 %0, i32 *%p_i.y, align 4
		%i.next = add nuw nsw i64 %i, 1
		%cond = icmp slt i64 %i.next, %n
		br i1 %cond, label %for.body, label %for.end

		for.end:
		ret void
		}

		; PR27626_1: Ensure a strided load is not moved before a dependent (zero
		; distance) strided store.

		; void PR27626_1(struct pair *p, int n) {
		; int s = 0;
		; for (int i = 0; i < n; i++) {
		; p[i].y = p[i].x;
		; s += p[i].y
		; }
		; }

		; CHECK-LABEL: @PR27626_1(
		; CHECK: min.iters.checked:
		; CHECK: %n.mod.vf = and i64 %[[N:.+]], 3
		; CHECK: %[[IsZero:[a-zA-Z0-9]+]] = icmp eq i64 %n.mod.vf, 0
		; CHECK: %[[R:[a-zA-Z0-9]+]] = select i1 %[[IsZero]], i64 4, i64 %n.mod.vf
		; CHECK: %n.vec = sub i64 %[[N]], %[[R]]
		; CHECK: vector.body:
		; CHECK: %[[Phi:.+]] = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ {{.*}}, %vector.body ]
		; CHECK: %[[L1:.+]] = load <8 x i32>, <8 x i32>* {{.*}}
		; CHECK: %[[X1:.+]] = extractelement <8 x i32> %[[L1:.+]], i32 0
		; CHECK: store i32 %[[X1:.+]], {{.*}}
		; CHECK: %[[X2:.+]] = extractelement <8 x i32> %[[L1:.+]], i32 2
		; CHECK: store i32 %[[X2:.+]], {{.*}}
		; CHECK: %[[X3:.+]] = extractelement <8 x i32> %[[L1:.+]], i32 4
		; CHECK: store i32 %[[X3:.+]], {{.*}}
		; CHECK: %[[X4:.+]] = extractelement <8 x i32> %[[L1:.+]], i32 6
		; CHECK: store i32 %[[X4:.+]], {{.*}}
		; CHECK: %[[L2:.+]] = load <8 x i32>, <8 x i32>* {{.*}}
		; CHECK: %[[S1:.+]] = shufflevector <8 x i32> %[[L2]], <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
		; CHECK: add nsw <4 x i32> %[[S1]], %[[Phi]]

		define i32 @PR27626_1(%pair.i32 *%p, i64 %n) {
		entry:
		br label %for.body

		for.body:
		%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
		%s = phi i32 [ %2, %for.body ], [ 0, %entry ]
		%p_i.x = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i, i32 0
		%p_i.y = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i, i32 1
		%0 = load i32, i32* %p_i.x, align 4
		store i32 %0, i32* %p_i.y, align 4
		%1 = load i32, i32* %p_i.y, align 4
		%2 = add nsw i32 %1, %s
		%i.next = add nuw nsw i64 %i, 1
		%cond = icmp slt i64 %i.next, %n
		br i1 %cond, label %for.body, label %for.end

		for.end:
		%3 = phi i32 [ %2, %for.body ]
		ret i32 %3
		}

		; PR27626_2: Ensure a strided store is not moved after a dependent (negative
		; distance) strided load.

		; void PR27626_2(struct pair *p, int z, int n) {
		; for (int i = 0; i < n; i++) {
		; p[i].x = z;
		; p[i].y = p[i - 1].x;
		; }
		; }

		; CHECK-LABEL: @PR27626_2(
		; CHECK: min.iters.checked:
		; CHECK: %n.mod.vf = and i64 %[[N:.+]], 3
		; CHECK: %[[IsZero:[a-zA-Z0-9]+]] = icmp eq i64 %n.mod.vf, 0
		; CHECK: %[[R:[a-zA-Z0-9]+]] = select i1 %[[IsZero]], i64 4, i64 %n.mod.vf
		; CHECK: %n.vec = sub i64 %[[N]], %[[R]]
		; CHECK: vector.body:
		; CHECK: %[[L1:.+]] = load <8 x i32>, <8 x i32>* {{.*}}
		; CHECK: %[[X1:.+]] = extractelement <8 x i32> %[[L1]], i32 0
		; CHECK: store i32 %[[X1]], {{.*}}
		; CHECK: %[[X2:.+]] = extractelement <8 x i32> %[[L1]], i32 2
		; CHECK: store i32 %[[X2]], {{.*}}
		; CHECK: %[[X3:.+]] = extractelement <8 x i32> %[[L1]], i32 4
		; CHECK: store i32 %[[X3]], {{.*}}
		; CHECK: %[[X4:.+]] = extractelement <8 x i32> %[[L1]], i32 6
		; CHECK: store i32 %[[X4]], {{.*}}

		define void @PR27626_2(%pair.i32 *%p, i64 %n, i32 %z) {
		entry:
		br label %for.body

		for.body:
		%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
		%i_minus_1 = add nuw nsw i64 %i, -1
		%p_i.x = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i, i32 0
		%p_i_minus_1.x = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i_minus_1, i32 0
		%p_i.y = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i, i32 1
		store i32 %z, i32* %p_i.x, align 4
		%0 = load i32, i32* %p_i_minus_1.x, align 4
		store i32 %0, i32 *%p_i.y, align 4
		%i.next = add nuw nsw i64 %i, 1
		%cond = icmp slt i64 %i.next, %n
		br i1 %cond, label %for.body, label %for.end

		for.end:
		ret void
		}

		; PR27626_3: Ensure a strided load is not moved before a dependent (negative
		; distance) strided store.

		; void PR27626_3(struct pair *p, int z, int n) {
		; for (int i = 0; i < n; i++) {
		; p[i + 1].y = p[i].x;
		; s += p[i].y;
		; }
		; }

		; CHECK-LABEL: @PR27626_3(
		; CHECK: min.iters.checked:
		; CHECK: %n.mod.vf = and i64 %[[N:.+]], 3
		; CHECK: %[[IsZero:[a-zA-Z0-9]+]] = icmp eq i64 %n.mod.vf, 0
		; CHECK: %[[R:[a-zA-Z0-9]+]] = select i1 %[[IsZero]], i64 4, i64 %n.mod.vf
		; CHECK: %n.vec = sub i64 %[[N]], %[[R]]
		; CHECK: vector.body:
		; CHECK: %[[Phi:.+]] = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ {{.*}}, %vector.body ]
		; CHECK: %[[L1:.+]] = load <8 x i32>, <8 x i32>* {{.*}}
		; CHECK: %[[X1:.+]] = extractelement <8 x i32> %[[L1:.+]], i32 0
		; CHECK: store i32 %[[X1:.+]], {{.*}}
		; CHECK: %[[X2:.+]] = extractelement <8 x i32> %[[L1:.+]], i32 2
		; CHECK: store i32 %[[X2:.+]], {{.*}}
		; CHECK: %[[X3:.+]] = extractelement <8 x i32> %[[L1:.+]], i32 4
		; CHECK: store i32 %[[X3:.+]], {{.*}}
		; CHECK: %[[X4:.+]] = extractelement <8 x i32> %[[L1:.+]], i32 6
		; CHECK: store i32 %[[X4:.+]], {{.*}}
		; CHECK: %[[L2:.+]] = load <8 x i32>, <8 x i32>* {{.*}}
		; CHECK: %[[S1:.+]] = shufflevector <8 x i32> %[[L2]], <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
		; CHECK: add nsw <4 x i32> %[[S1]], %[[Phi]]

		define i32 @PR27626_3(%pair.i32 *%p, i64 %n, i32 %z) {
		entry:
		br label %for.body

		for.body:
		%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
		%s = phi i32 [ %2, %for.body ], [ 0, %entry ]
		%i_plus_1 = add nuw nsw i64 %i, 1
		%p_i.x = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i, i32 0
		%p_i.y = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i, i32 1
		%p_i_plus_1.y = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i_plus_1, i32 1
		%0 = load i32, i32* %p_i.x, align 4
		store i32 %0, i32* %p_i_plus_1.y, align 4
		%1 = load i32, i32* %p_i.y, align 4
		%2 = add nsw i32 %1, %s
		%i.next = add nuw nsw i64 %i, 1
		%cond = icmp slt i64 %i.next, %n
		br i1 %cond, label %for.body, label %for.end

		for.end:
		%3 = phi i32 [ %2, %for.body ]
		ret i32 %3
		}

		; PR27626_4: Ensure we form an interleaved group for strided stores in the
		; presence of a write-after-write dependence. We create a group for
		; (2) and (3) while excluding (1).

		; void PR27626_4(int *a, int x, int y, int z, int n) {
		; for (int i = 0; i < n; i += 2) {
		; a[i] = x; // (1)
		; a[i] = y; // (2)
		; a[i + 1] = z; // (3)
		; }
		; }

		; CHECK-LABEL: @PR27626_4(
		; CHECK: vector.ph:
		; CHECK: %[[INS_Y:.+]] = insertelement <4 x i32> undef, i32 %y, i32 0
		; CHECK: %[[SPLAT_Y:.+]] = shufflevector <4 x i32> %[[INS_Y]], <4 x i32> undef, <4 x i32> zeroinitializer
		; CHECK: %[[INS_Z:.+]] = insertelement <4 x i32> undef, i32 %z, i32 0
		; CHECK: %[[SPLAT_Z:.+]] = shufflevector <4 x i32> %[[INS_Z]], <4 x i32> undef, <4 x i32> zeroinitializer
		; CHECK: vector.body:
		; CHECK: store i32 %x, {{.*}}
		; CHECK: store i32 %x, {{.*}}
		; CHECK: store i32 %x, {{.*}}
		; CHECK: store i32 %x, {{.*}}
		; CHECK: %[[VEC:.+]] = shufflevector <4 x i32> %[[SPLAT_Y]], <4 x i32> %[[SPLAT_Z]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
		; CHECK: store <8 x i32> %[[VEC]], {{.*}}

		define void @PR27626_4(i32 *%a, i32 %x, i32 %y, i32 %z, i64 %n) {
		entry:
		br label %for.body

		for.body:
		%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
		%i_plus_1 = add i64 %i, 1
		%a_i = getelementptr inbounds i32, i32* %a, i64 %i
		%a_i_plus_1 = getelementptr inbounds i32, i32* %a, i64 %i_plus_1
		store i32 %x, i32* %a_i, align 4
		store i32 %y, i32* %a_i, align 4
		store i32 %z, i32* %a_i_plus_1, align 4
		%i.next = add nuw nsw i64 %i, 2
		%cond = icmp slt i64 %i.next, %n
		br i1 %cond, label %for.body, label %for.end

		for.end:
		ret void
		}

		; PR27626_5: Ensure we do not form an interleaved group for strided stores in
		; the presence of a write-after-write dependence.

		; void PR27626_5(int *a, int x, int y, int z, int n) {
		; for (int i = 3; i < n; i += 2) {
		; a[i - 1] = x;
		; a[i - 3] = y;
		; a[i] = z;
		; }
		; }

		; CHECK-LABEL: @PR27626_5(
		; CHECK: vector.body:
		; CHECK: store i32 %x, {{.*}}
		; CHECK: store i32 %x, {{.*}}
		; CHECK: store i32 %x, {{.*}}
		; CHECK: store i32 %x, {{.*}}
		; CHECK: store i32 %y, {{.*}}
		; CHECK: store i32 %y, {{.*}}
		; CHECK: store i32 %y, {{.*}}
		; CHECK: store i32 %y, {{.*}}
		; CHECK: store i32 %z, {{.*}}
		; CHECK: store i32 %z, {{.*}}
		; CHECK: store i32 %z, {{.*}}
		; CHECK: store i32 %z, {{.*}}

		define void @PR27626_5(i32 *%a, i32 %x, i32 %y, i32 %z, i64 %n) {
		entry:
		br label %for.body

		for.body:
		%i = phi i64 [ %i.next, %for.body ], [ 3, %entry ]
		%i_minus_1 = sub i64 %i, 1
		%i_minus_3 = sub i64 %i_minus_1, 2
		%a_i = getelementptr inbounds i32, i32* %a, i64 %i
		%a_i_minus_1 = getelementptr inbounds i32, i32* %a, i64 %i_minus_1
		%a_i_minus_3 = getelementptr inbounds i32, i32* %a, i64 %i_minus_3
		store i32 %x, i32* %a_i_minus_1, align 4
		store i32 %y, i32* %a_i_minus_3, align 4
		store i32 %z, i32* %a_i, align 4
		%i.next = add nuw nsw i64 %i, 2
		%cond = icmp slt i64 %i.next, %n
		br i1 %cond, label %for.body, label %for.end

		for.end:
		ret void
		}

attributes #0 = { "unsafe-fp-math"="true" }		attributes #0 = { "unsafe-fp-math"="true" }

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Preserve order of dependences in interleaved accesses analysisClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 61798

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/trunk/test/Transforms/LoopVectorize/interleaved-accesses.ll

[LV] Preserve order of dependences in interleaved accesses analysis
ClosedPublic