This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
8/8
IVDescriptors.h
5/5
LoopAccessAnalysis.h
-
Transforms/Vectorize/
-
Vectorize/
2/2
LoopVectorizationLegality.h
-
lib/
-
Analysis/
15/15
IVDescriptors.cpp
1/1
LoopAccessAnalysis.cpp
-
Transforms/
-
Scalar/
-
LoopInterchange.cpp
-
Vectorize/
24/24
LoopVectorizationLegality.cpp
5/5
LoopVectorize.cpp
-
test/Transforms/
-
Transforms/
-
LoopInterchange/
-
reductions-across-inner-and-outer-loop.ll
-
LoopVectorize/
-
AArch64/
-
scalable-reductions.ll
-
strict-fadd.ll
21/25
reduction.ll

Differential D110235

[LoopVectorize] Support reductions that store intermediary result
ClosedPublic

Authored by igor.kirillov on Sep 22 2021, 5:51 AM.

Download Raw Diff

Details

Reviewers

fhahn
Ayal
david-arm
kmclaughlin
peterwaller-arm
reames

Commits

rG4e5e042d9a4a: [LoopVectorize] Support reductions that store intermediary result

Summary

Adds ability to vectorize loops containing a store to a loop-invariant address as part of a reduction that isn't converted to SSA form due to lack of aliasing info. Runtime checks are generated to ensure the store does not alias any other accesses in the loop.
Ordered fadd reductions are not yet supported.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

fhahn added inline comments.Oct 11 2021, 2:37 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
935	What about loads to the same address in the loop? At the moment, `LAA` cannot analyze dependences with invariant addresses. But if this limitation gets removed the code here may become incorrect, because it relies on this limitation IIUC?

david-arm added inline comments.Oct 11 2021, 3:51 AM

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
642	I assume here that InvariantStores refers to stores to an invariant (i.e. uniform?) address, rather than storing an invariant value to variant address. If so, perhaps it could be named StoresToInvariantAddress or something like that?

david-arm added inline comments.Oct 11 2021, 6:53 AM

llvm/test/Transforms/LoopVectorize/reduction.ll
471	Is it possible to add a simple floating point test with "fadd fast"?

It seems like the main problem is that we potentially bail out too early at the moment when checking for reductions due to the store, but once we generate runtime checks, sinking the store may become legal (see inline comment about loads to the same address)? If that's the case, ideally we'd just sink any such loads/stores before detecting reductions once we know they can be sunk due to runtime checks, but unfortunately I do not think that's possible with the current structure/ordering.

Is it worth for me to explore if it possible to do other way around or we should work on with the solution from this merge?

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
642	@fhah Just to make sure we are on the same line - for me uniform and invariant are just synonyms, isn't it so?
llvm/include/llvm/Transforms/Utils/LoopUtils.h
402 ↗	(On Diff #378503)	What do you think if I make this function a public member of `class ScalarEvolution`? Or is there a better place for it?
llvm/lib/Analysis/IVDescriptors.cpp
315	I checked and only `LoopInterchangeLegality` is using this function and it is not affected by stores. Anyway, I can add a parameter to `RecurrenceDescriptor::isReductionPHI` or a member to `RecurrenceDescriptor` allowing or not to handle stores. What do you think about it? The comment is to be updated, yes.
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
935	Loads from or stores to the same address in the loop? I'm sorry could you clarify what the problem is. As it is I don't understand the message.

Hi @igor.kirillov, is it also possible to get this working for ordered reductions, i.e.

float sum = 0;
for(i=0..N) {
  sum += src[i];
  dst[42] = sum;
}

when building with -O3? I think it might mean updating checkOrderedReductions to look through the store. If it looks too difficult to do as part of this patch we can always follow-up with a patch later.

Update commit message and comments
Move storeToSameAddress to ScalarEvolution
Add fadd fast test
Do not apply patch for ordered fadd reductions

In D110235#3072725, @david-arm wrote:
Hi @igor.kirillov, is it also possible to get this working for ordered reductions, i.e.
float sum = 0;
for(i=0..N) {
  sum += src[i];
  dst[42] = sum;
}
when building with -O3? I think it might mean updating checkOrderedReductions to look through the store. If it looks too difficult to do as part of this patch we can always follow-up with a patch later.

Yes, I added a check to LoopVectorizationLegality::canVectorizeFPMath so as not to allow this reduction when math is strict. Enabling it requires some work and it is better to do it separately.

llvm/test/Transforms/LoopVectorize/reduction.ll
471	Added! see reduc_store_fadd_fast function

igor.kirillov edited the summary of this revision. (Show Details)Nov 7 2021, 5:02 AM

igor.kirillov marked an inline comment as done.

Harbormaster completed remote builds in B132898: Diff 385336.Nov 7 2021, 5:43 AM

david-arm added inline comments.Nov 8 2021, 6:05 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
984	Maybe this can be folded into the `all_of` case below, i.e. return (all_of(getReductionVars(), [&](auto &Reduction) -> bool { const RecurrenceDescriptor &RdxDesc = Reduction.second; return !RdxDesc.hasExactFPMath() \|\| (RdxDesc.isOrdered() && !RdxDesc.IntermediateStore); })); Also, the problem with this code at the moment is that you could have a mixture of fast and ordered reductions in the same loop. There could be an intermediate store for one of the fast reductions, but not for the ordered ones. At the moment with your code we will just bail out in this case.

Update intermediate store check for ordered fadd vectorization

igor.kirillov added inline comments.Nov 9 2021, 7:24 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
984	You are right! It can be done much simpler and the check is needed only when ordered reduction is present.

Harbormaster completed remote builds in B133249: Diff 385810.Nov 9 2021, 7:46 AM

david-arm added inline comments.Nov 10 2021, 5:40 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
937	Hi @igor.kirillov, I think you're missing a test case here. Suppose we have two stores in the loop to the same invariant address. If the first store is predicated, but the second isn't we should still vectorise because the second store wins and the first one can be removed. At least the code here suggests that. I couldn't find any test that exercised this code path.
945	I think you might be missing a test case for this. Can you make sure this code path is exercised please by your existing tests please?
llvm/test/Transforms/LoopVectorize/reduction.ll
477	Can you put all the CHECK lines in the same place near the top of the function please to be consistent with the existing tests?
517	Hi @igor.kirillov, this doesn't look right. The test is called `@reduc_store_fadd_fast`, but there is not `fast` keyword on the `fadd` instruction. I don't think we should even be vectorising this because it requires ordered reductions, which you haven't enabled for this test. Can you change the name of this to `@reduc_store_fadd_ordered` and investigate why we are vectorising this? Can you also add a separate test called `@reduc_store_fadd_fast` that actually has the `fast` keyword too?

Update tests

Harbormaster completed remote builds in B134854: Diff 388126.Nov 18 2021, 2:03 AM

igor.kirillov marked 2 inline comments as done and an inline comment as not done.Nov 18 2021, 3:06 AM

igor.kirillov added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
937	Added `reduc_store_final_store_predicated` test that address both requests
945	It is now covered by new `reduc_store_final_store_predicated` test and an old test `reduc_store_inside_unrolled` also executes this path
llvm/test/Transforms/LoopVectorize/reduction.ll
517	I missed the `fast` keyword there. As for why this code gets vectorized - it happens because of `Hints->allowReordering()` returning true in `LoopVectorizationLegality::canVectorizeFPMath`. As you can see the test specifies vector width (-force-vector-width=4) and llvm allows to process ordered instructions in unordered manner (see also `LoopVectorizeHints::allowReordering` function) in that case. I added a new test with `fast` keyword to `AArch64/strict-fadd.ll` and the loop is not vectorized there.

david-arm added inline comments.Nov 18 2021, 3:10 AM

llvm/test/Transforms/LoopVectorize/reduction.ll
656	Can you also add a test where the first store is predicated and the second one isn't? According to the code changes in this patch we should vectorise this case because the second one overrides the first.

Add test

llvm/test/Transforms/LoopVectorize/reduction.ll
656	Added! See `reduc_store_middle_store_predicated`

fhahn added inline comments.Nov 18 2021, 3:44 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
935	The case I was thinking about was something like the snippet below, where we have a load of the invariant address in the loop (`%lv = load...` in the example below). define void @reduc_store(i32* %dst, i32* readonly %src, i32* noalias %dst.2) { entry: %arrayidx = getelementptr inbounds i32, i32* %dst, i64 42 store i32 0, i32* %arrayidx, align 4 br label %for.body for.body: %0 = phi i32 [ 0, %entry ], [ %add, %for.body ] %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] %arrayidx1 = getelementptr inbounds i32, i32* %src, i64 %indvars.iv %1 = load i32, i32* %arrayidx1, align 4 %add = add nsw i32 %0, %1 %lv = load i32, i32* %arrayidx store i32 %add, i32* %arrayidx, align 4 %gep.dst.2 = getelementptr inbounds i32, i32* %dst.2, i64 %indvars.iv store i32 %lv, i32* %gep.dst.2, %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp eq i64 %indvars.iv.next, 1000 br i1 %exitcond, label %for.cond.cleanup, label %for.body for.cond.cleanup: ret void }
llvm/test/Transforms/LoopVectorize/reduction.ll
468	Could you add a brief textual explanation of what the test covers?
482	perhaps rename to `%sum` or something like that, to make it a bit easier to read the test?
487	The names for the 2 different GEPs in the tests are very similar. Could you rename them to make it easier to distinguish them (e.g. something like `%gep.src`/`%gep.dst`)..

Hi @igor.kirillov, thanks for all the changes and the patch looks good now! I just had a couple of minor comments.

llvm/lib/Analysis/IVDescriptors.cpp
485	nit: Could you remove an extra level of indentation here before merging the patch? i.e. this can be written like this: if (isa<PHINode>(UI)) PHIs.push_back(UI); else if (auto *SI = dyn_cast<StoreInst>(UI)) { ... } else NonPHIs.push_back(UI);
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
994	nit: I think you can you might be able to simplify the code here by removing `FoundMatchingRecurrence` . Then, lower down instead of the `break` you can just do if (DSI && (DSI == SI)) { *IsPredicated = blockNeedsPredication(DSI->getParent()); return true; } and at the bottom of the function just do: return false;

Harbormaster completed remote builds in B134865: Diff 388148.Nov 18 2021, 4:18 AM

Refactoring a bit

igor.kirillov marked 8 inline comments as done.Nov 18 2021, 7:22 AM

igor.kirillov added inline comments.

llvm/lib/Analysis/IVDescriptors.cpp

485

I made an extra step and simplified it even more

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

935

In that case we do not vectorize, the rejection happens in LoopAccessInfo::analyzeLoop when loads ape processed:

  for (LoadInst *LD : Loads) {
    Value *Ptr = LD->getPointerOperand();
...
    // See if there is an unsafe dependency between a load to a uniform address and
    // store to the same uniform address.
    if (UniformStores.count(Ptr)) {
      LLVM_DEBUG(dbgs() << "LAA: Found an unsafe dependency between a uniform "
                           "load and uniform store to the same address!\n");
      HasDependenceInvolvingLoopInvariantAddress = true;
    }
...

I added reduc_store_load test anyway

Harbormaster completed remote builds in B134896: Diff 388188.Nov 18 2021, 7:47 AM

Add scalable reduction test

Little extension in a new test

Harbormaster completed remote builds in B135418: Diff 388906.Nov 22 2021, 12:39 PM

LGTM! Thanks for making all the changes @igor.kirillov.

llvm/test/Transforms/LoopVectorize/reduction.ll
466	nit: I think this should be `invariant`

This revision is now accepted and ready to land.Nov 23 2021, 2:27 AM

Fix typo

igor.kirillov marked an inline comment as done.Nov 23 2021, 2:48 AM

fhahn added inline comments.Nov 23 2021, 2:48 AM

llvm/include/llvm/Analysis/IVDescriptors.h
267	nit: should be a doc-comment?
llvm/include/llvm/Analysis/LoopAccessAnalysis.h
583	nit: could return `ArrayRef`, not leaking any details about the underlying container.
642	nit: Is there a reason for choosing 5 for the size? if not, nowadays SmallVector can pick a good size automatically.
llvm/include/llvm/Analysis/ScalarEvolution.h
1120 ↗	(On Diff #388906)	Do you anticipate this to be used outside `Transforms/Vectorize`? If not I'm not sure if it makes sense to live here, and extending the API interface of `ScalarEvolution`. Can this instead be defined somewhere in `Transforms/Vectorize`?
llvm/lib/Analysis/IVDescriptors.cpp
248	needs documentation?
315	I guess it would be good to have at least a test case for loop-interchange to make sure it can deal with the change properly?
514	nit: for consistency with other code avoid using `!= nullptr`. This is more in line with the style use in LLVM in general (and you use `==/!= nullptr` in the adjacent code here).
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
935	Yeah but this is only due to some limitations that currently existing in LAA, right? I think we should at least make this clear somewhere, e.g. in a comment.
941	nit: no need to use `llvm::`
997	nit: redundant `()`.
998	could we instead just do the check at the call site or is there a benefit of doing it here?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3144	We effectively sink the store outside the loop. In that case, I don't think we should create a recipe and also we should not consider its cost.

Harbormaster completed remote builds in B135583: Diff 389138.Nov 23 2021, 5:41 AM

mgabka added a subscriber: mgabka.Nov 29 2021, 1:56 AM

Lots of updates related to the recent comments

igor.kirillov marked 5 inline comments as done.Nov 29 2021, 11:09 AM

igor.kirillov added inline comments.

llvm/lib/Analysis/IVDescriptors.cpp
315	You are right. We actually should not do loop interchange is an invariant address is present. Otherwise it may introduce incorrect result.
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
935	Added comment

igor.kirillov marked 2 inline comments as done.Nov 29 2021, 11:09 AM

Harbormaster completed remote builds in B136491: Diff 390414.Nov 29 2021, 12:18 PM

Hi, @fhahn! Since there have been several quite serious changes in the patch after your last review, I would be happy to receive you approval before merge (even though status is accepted now).

In D110235#3174384, @igor.kirillov wrote:

Hi, @fhahn! Since there have been several quite serious changes in the patch after your last review, I would be happy to receive you approval before merge (even though status is accepted now).

sure, I'll try to take another look by end-of-day Monday

@fhahn ping

Thanks for the latest update! I left some more additional comments. It might be good to pre-commit the tests and only include the diff in the patch. Another thing to consider is moving the reduction-store tests to a separate file, as reduction.ll is already quite large.

One thing to note on the overall approach is that it's a bit of a workaround some current limitations in our modeling, but I don't think there's an alternative in the short term and this is clearly a desirable case to support, so that seems fine to me in general. There's work in progress to model the pre-header and exit block in VPlan, which allows us to do the sinking of the store more easily, so we can improve things further then :)

llvm/include/llvm/Analysis/IVDescriptors.h
181	From an interface perspective, I think it would be better to make `SE` an optional argument and only perform the the store analysis when it is passed. This would also require users to explicitly opt-in, which would at least partially guarding against people ignoring `IntermediateStore`.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9333	`either deleted or go outside the loop` sounds a bit unclear. Aren't they moved to the vector exit block and store the final reduction value?
9335	I think it would also be good to include at least some information in the VPlan that the store is handled as part of the reduction. Perhaps `VPReductionRecipe` should print if the result is stored after the loop? Please add a test case to `vplan-printing.ll`
llvm/test/Transforms/LoopVectorize/reduction.ll
469	FWIW for such compact IR test cases, the pseudo code doesn't add much value in my personal opinion. Better to strive to make the tests as readable/compact as possible and have a comment explaining what it tests when needed.
546	nit: newline.
628	Here (and for the other tests it would be good to check at least the vector reduction sequence and that the correct value is stored.

fhahn added inline comments.Jan 5 2022, 1:53 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
914	It would probably be good to make clear what is handled here exactly and why we can handle those stores. IIUC this applies only to invariant stores that store reduction results and is safe because runtime checks guarantee that it won't alias with other objects. The store won't get vectorized, but sunk to the exit block during codegen.
llvm/test/Transforms/LoopVectorize/reduction.ll
582	not needed?
585	nit: is this needed? Can just pass `%n` as `i64`
672	move exit block to end of function?
711	move exit block to end of function?
753	move exit block to end of function?
803	move exit block to end of function?

Not sure if you saw, there's a somewhat related bugreport: https://github.com/llvm/llvm-project/issues/50286
Not sure if this already supports that pattern though.

Update tests, move invariant store tests into separate file, add more comments

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9335	I'm not sure how to display this information properly. This is how output of `VReductionRecipe::print` looks like now: REDUCE ir<%red.next> = ir<%red> + fast reduce.fadd (ir<%lv>) I could add something to the end but it doesn't seem to fit there. Also I have not found any proper `V.Recipe` where I could place something like `DELETE store .`.
llvm/test/Transforms/LoopVectorize/reduction.ll
469	When I look at pseudo-code I immediately understand what it is about, whereas looking at IR takes at least 30 seconds of cognitive exertions. And it is also easier to see the difference between and purpose of all those quite similar tests. But I delete it, of course, if you insist :)

Harbormaster completed remote builds in B143094: Diff 399570.Jan 13 2022, 2:04 AM

igor.kirillov mentioned this in D117213: [LoopVectorize] Add tests with reductions that are stored in invariant address.Jan 13 2022, 5:05 AM

@fhahn I created the review with tests only here - https://reviews.llvm.org/D117213. Once it is merged I'll update this one.

@lebedev.ri This patch addresses a different problem. Here we try to handle when a reduction value is stored in some address and if this address is invariant plus other lucky conditions are satisfied we manage to vectorize. In your example address where value is stored is not invariant at all. Nevertheless, the case is interesting.

@fhahn ping

igor.kirillov mentioned this in rGd3932c690d97: [LoopVectorize] Add tests with reductions that are stored in invariant address.Jan 24 2022, 1:29 PM

Update tests

Harbormaster completed remote builds in B145428: Diff 402801.Jan 26 2022, 2:03 PM

Add invariant store information to VReductionRecipe::print output

Herald added subscribers: vkmr, rogfer01. · View Herald TranscriptJan 27 2022, 1:24 PM

reames resigned from this revision.Jan 27 2022, 1:26 PM

igor.kirillov added inline comments.Jan 27 2022, 1:29 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9335	@fhahn I added some info to VReductionRecipe::print and a relevant test. What do you think about it?

Harbormaster completed remote builds in B146121: Diff 403776.Jan 27 2022, 2:03 PM

Hi @igor.kirillov, the patch looks good now! I just had one question about an odd-looking CHECK line in one of the test files.

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
341 ↗	(On Diff #403776)	nit: I think it's fine (and cleaner) to just do %[[PHI:%.]] = %[[ADDR:%.]] = here. Also, can you add the incoming value to the phi just to make sure it's the correct value? I think it should just be phi i32 {{ [[TMP]], %middle.block }}
369 ↗	(On Diff #403776)	This looks a bit odd. I'm not sure why it's needed? If there is a CHECK-LABEL for every function I don't think this should happen.

Thanks for the update!

llvm/lib/Analysis/IVDescriptors.cpp
358	Do we have to make sure that the stored value is feeding the phi again and not an earlier value in the cycle?
llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
170 ↗	(On Diff #403776)	I think it would be good to check the full reduction cycle & store here. I don't think this test case is handled correctly at the moment as the IR seems out of sync with the pseudo code (which is why personally I think the C pseudo code is a bit distracting) Note that in the IR the reduction cycle is `%sum -> sum.1 -> %sum` and not `%sum -> %sum.1 -> %sum.2 -> %sum`. When this gets vectorized, the final value written to dst[42] is the final value of %sum.1, not %sum.2 as it should be I think.

Add fix that prevents vectorization for the case when not a final reduction value stored in a loop invariant address.
Add test for this case.
Remove some unused code and rename a couple of IR variables in the reduction-with-invariant-store.ll test.
Fix incorrect phi node in reduc_store_inside_unrolled and reduc_double_invariant_store tests.

igor.kirillov marked 3 inline comments as done.Feb 4 2022, 6:47 AM

igor.kirillov added inline comments.

llvm/lib/Analysis/IVDescriptors.cpp
358	Yes! I added this check and also a test case (reduc_store_not_final_value)
llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
170 ↗	(On Diff #403776)	The IR was incorrect actually because of the error I made during renaming and the pseudo-code was showing my original plot for the test-case. Luckily it helped to expose a bug, so I added an extra check to `IVDescriptors.cpp` and now `reduc_store_not_final_value` function is testing this case (when we have IntermediateStore but no ExitInstruction and value stored is not actually a final value).

igor.kirillov marked 2 inline comments as done.Feb 4 2022, 6:47 AM

Harbormaster completed remote builds in B147618: Diff 405949.Feb 4 2022, 7:19 AM

fhahn mentioned this in D119078: [LAA,LV] Add initial support for pointer-diff memory checks..Feb 8 2022, 12:43 PM

fhahn added inline comments.Feb 14 2022, 2:16 AM

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
170 ↗	(On Diff #403776)	I think it would still be good to test that the full reduction cycle is generated correctly (probably worth checking the full vector body + middle.block), at least for some of the tests in the file, to make sure the full sequence is generated correctly.

fhahn added inline comments.Feb 14 2022, 2:18 AM

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
170 ↗	(On Diff #403776)	The IR was incorrect actually because of the error I made during renaming and the pseudo-code was showing my original plot for the test-case. This seems to indicate that the pseudo code may be distracting (:

Add full vector.body and middle.block checks for reduc_store and reduc_store_inside_unrolled tests

igor.kirillov added inline comments.Feb 17 2022, 8:46 AM

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
170 ↗	(On Diff #403776)	@fhahn Added full checks for `reduc_store` and this function. The code generated is overscalarized for `reduc_store_inside_unrolled` but I checked and without invariant stores it is the same.

Harbormaster completed remote builds in B150258: Diff 409674.Feb 17 2022, 9:16 AM

Thanks for the update with the tests! I think there might still be a correctness issue. More details inline.

llvm/include/llvm/Analysis/IVDescriptors.h
186	nit: addresses ?
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
959	I'm not sure if the comment matches the code. There's no guarantee that all stores to the same address come before the store of the reduction AFAICT. E.g. the test below has a store to the same address after the store of the reduction. I think this may get mis-compiled, as the store to 0 gets replaced with the store of the final value of the reduction. define void @reduc_store(i32* %dst, i32* readonly %src) { entry: %gep.dst = getelementptr inbounds i32, i32* %dst, i64 42 store i32 1, i32* %gep.dst, align 4 br label %for.body for.body: %sum = phi i32 [ 0, %entry ], [ %add, %for.body ] %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] %gep.src = getelementptr inbounds i32, i32* %src, i64 %iv %0 = load i32, i32* %gep.src, align 4 %add = add nsw i32 %sum, %0 store i32 %add, i32* %gep.dst, align 4 store i32 0, i32* %gep.dst, align 4 %iv.next = add nuw nsw i64 %iv, 1 %exitcond = icmp eq i64 %iv.next, 1000 br i1 %exitcond, label %exit, label %for.body exit: ret void }
996	nit: can drop `DSI`, as `SI` is guaranteed to be non-null?

Fix incorrectly processed case when reduction value stored in invariant could be overwritten inside loop

Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2022, 1:28 PM

igor.kirillov marked 4 inline comments as done.Mar 15 2022, 1:33 PM

igor.kirillov added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
959	Yeah, it was processed incorrectly. I added this test along with the fix.

Harbormaster completed remote builds in B154413: Diff 415557.Mar 15 2022, 2:26 PM

georges added a subscriber: georges.Mar 17 2022, 6:46 AM

@fhahn, gentle ping :)

Thanks for the update! I'll do a bit more testing and will take another look soon, but I *think* it should be good now.

@fhahn, @david-arm ping. Also I run lnt test-suite and everything is fine and 17 more loops were vectorized according to -stats

LGTM! Thanks for addressing all the comments and fixing bugs, adding new tests, etc. I have a couple of nits you can address before merging. I think given that you've run the LLVM test suite without seeing failures and we know more loops are vectorising then that gives us a good level of confidence. :)

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
256 ↗	(On Diff #415557)	nit: Should be i++, not i+=2.
303 ↗	(On Diff #415557)	nit: Looks like an unnecessary change from %sum.1 -> %sum.2.

LGTM, thanks! I added a few additional comments that can be addressed directly before committing.

FYI work has started to model the exit block in VPlan as well: D123457, D123537. fixReduction and the store sinking can and should now be migrated to be modeled explicitly in VPlan.

llvm/include/llvm/Analysis/IVDescriptors.h
272	nit: would be good to also document what this means, what the properties of such stores are.
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
310	nit: could be more specific, the increment of the reduction is used as stored operand, right? Also, they only handle reductions right? Then using Reduction instead of Recurrence seems more descriptive, .e.g. `isInvariantStoreOfReduction`? (same for `isRecurringInvariantAddress`
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
431	With opaque pointers, this code may behave differently. It's possible to have store i32 0, ptr %x store i8 0, ptr %x In that case the pointer operands will be the same, but different store widths. I think we should also check that the types of the stored values match for now, as we use this to remove earlier stores. This should only be correct if the later store writes at least as many bits as the earlier stores.
993	nit: could be any_of
1003	nit: could be any_of.
llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
412 ↗	(On Diff #415557)	Would it be possible to add runtime tests for the mis-compiles fixed to https://github.com/llvm/llvm-test-suite/tree/main/SingleSource/UnitTests/Vectorizer?

Lots of updates related to the recent comments

igor.kirillov marked an inline comment as done.Apr 27 2022, 6:34 AM

igor.kirillov added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
431	I think if this check is added here, then the purpose of the function would be different (address is still the same even if values have different size). So, instead of that I added a check to a place where those pointers are really processed - see LoopVectorizationLegality.cpp:975. There we now make sure that values stored in an invariant address are of the same type.
llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
303 ↗	(On Diff #415557)	`sum.1` is `sum + src[i]` and `sum.2` is `sum + src[i] + src[i+1]`. Looks like everything is fine for me.

igor.kirillov marked an inline comment as done.Apr 27 2022, 6:34 AM

Harbormaster completed remote builds in B161596: Diff 425505.Apr 27 2022, 9:52 AM

igor.kirillov mentioned this in D124609: Add unit test with invariant store for vectorizer memory runtime checks.Apr 28 2022, 5:47 AM

igor.kirillov marked an inline comment as done.Apr 28 2022, 5:54 AM

igor.kirillov added inline comments.

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
412 ↗	(On Diff #415557)	Added a simple test here - https://reviews.llvm.org/D124609 I don't feel like any other test adds more robustness, but if you have an idea what could be also added I'll gladly implement it!

igor.kirillov marked an inline comment as done.Apr 28 2022, 5:54 AM

This revision was landed with ongoing or failed builds.May 3 2022, 2:13 AM

Closed by commit rG4e5e042d9a4a: [LoopVectorize] Support reductions that store intermediary result (authored by igor.kirillov). · Explain Why

This revision was automatically updated to reflect the committed changes.

igor.kirillov added a commit: rG4e5e042d9a4a: [LoopVectorize] Support reductions that store intermediary result.

igor.kirillov mentioned this in rT2a41ecd23309: Add unit test with invariant store for vectorizer memory runtime checks.May 5 2022, 12:18 AM

Hello,

I wrote
https://github.com/llvm/llvm-project/issues/57572
about a verifier complaint/crash that started happening with this patch.

Herald added subscribers: • pcwang-thead, shiva0217. · View Herald TranscriptSep 5 2022, 11:25 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

IVDescriptors.h

20 lines

LoopAccessAnalysis.h

8 lines

Transforms/

Vectorize/

LoopVectorizationLegality.h

7 lines

lib/

Analysis/

IVDescriptors.cpp

131 lines

LoopAccessAnalysis.cpp

5 lines

Transforms/

Scalar/

LoopInterchange.cpp

8 lines

Vectorize/

LoopVectorizationLegality.cpp

100 lines

LoopVectorize.cpp

28 lines

test/

Transforms/

LoopInterchange/

reductions-across-inner-and-outer-loop.ll

38 lines

LoopVectorize/

AArch64/

scalable-reductions.ll

35 lines

strict-fadd.ll

31 lines

reduction.ll

356 lines

Diff 390414

llvm/include/llvm/Analysis/IVDescriptors.h

Show All 12 Lines
#ifndef LLVM_ANALYSIS_IVDESCRIPTORS_H		#ifndef LLVM_ANALYSIS_IVDESCRIPTORS_H
#define LLVM_ANALYSIS_IVDESCRIPTORS_H		#define LLVM_ANALYSIS_IVDESCRIPTORS_H

#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
		nikicUnsubmitted Done Reply Inline Actions Drive by note: This new include does not look necessary. nikic: Drive by note: This new include does not look necessary.
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Unfortunately, llvm doesn't compile without this include igor.kirillov: Unfortunately, llvm doesn't compile without this include
		peterwaller-armUnsubmitted Done Reply Inline Actions It looks like the needed include is #include "llvm/IR/Instructions.h" for StoreInst, or better forward declare `class StoreInst;` along with the others nearby. https://include-what-you-use.org https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/WhyIWYU.md peterwaller-arm: It looks like the needed include is #include "llvm/IR/Instructions.h" for StoreInst, or better…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Fixed igor.kirillov: Fixed
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"

namespace llvm {		namespace llvm {

class DemandedBits;		class DemandedBits;
class AssumptionCache;		class AssumptionCache;
class Loop;		class Loop;
class PredicatedScalarEvolution;		class PredicatedScalarEvolution;
class ScalarEvolution;		class ScalarEvolution;
class SCEV;		class SCEV;
		class StoreInst;
class DominatorTree;		class DominatorTree;

/// These are the kinds of recurrences that we support.		/// These are the kinds of recurrences that we support.
enum class RecurKind {		enum class RecurKind {
None, ///< Not a recurrence.		None, ///< Not a recurrence.
Add, ///< Sum of integers.		Add, ///< Sum of integers.
Mul, ///< Product of integers.		Mul, ///< Product of integers.
Or, ///< Bitwise or logical OR of integers.		Or, ///< Bitwise or logical OR of integers.
Show All 25 Lines
/// special case of chains of recurrences (CR). See ScalarEvolution for CR		/// special case of chains of recurrences (CR). See ScalarEvolution for CR
/// references.		/// references.

/// This struct holds information about recurrence variables.		/// This struct holds information about recurrence variables.
class RecurrenceDescriptor {		class RecurrenceDescriptor {
public:		public:
RecurrenceDescriptor() = default;		RecurrenceDescriptor() = default;

RecurrenceDescriptor(Value Start, Instruction Exit, RecurKind K,		RecurrenceDescriptor(Value Start, Instruction Exit, StoreInst *Store,
FastMathFlags FMF, Instruction ExactFP, Type RT,		RecurKind K, FastMathFlags FMF, Instruction *ExactFP,
bool Signed, bool Ordered,		Type *RT, bool Signed, bool Ordered,
SmallPtrSetImpl<Instruction *> &CI)		SmallPtrSetImpl<Instruction *> &CI)
: StartValue(Start), LoopExitInstr(Exit), Kind(K), FMF(FMF),		: IntermediateStore(Store), StartValue(Start), LoopExitInstr(Exit),
ExactFPMathInst(ExactFP), RecurrenceType(RT), IsSigned(Signed),		Kind(K), FMF(FMF), ExactFPMathInst(ExactFP), RecurrenceType(RT),
IsOrdered(Ordered) {		IsSigned(Signed), IsOrdered(Ordered) {
CastInsts.insert(CI.begin(), CI.end());		CastInsts.insert(CI.begin(), CI.end());
}		}

/// This POD struct holds information about a potential recurrence operation.		/// This POD struct holds information about a potential recurrence operation.
class InstDesc {		class InstDesc {
public:		public:
InstDesc(bool IsRecur, Instruction I, Instruction ExactFP = nullptr)		InstDesc(bool IsRecur, Instruction I, Instruction ExactFP = nullptr)
: IsRecurrence(IsRecur), PatternLastInst(I),		: IsRecurrence(IsRecur), PatternLastInst(I),
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	public:
/// Returns the opcode corresponding to the RecurrenceKind.		/// Returns the opcode corresponding to the RecurrenceKind.
static unsigned getOpcode(RecurKind Kind);		static unsigned getOpcode(RecurKind Kind);

/// Returns true if Phi is a reduction of type Kind and adds it to the		/// Returns true if Phi is a reduction of type Kind and adds it to the
/// RecurrenceDescriptor. If either \p DB is non-null or \p AC and \p DT are		/// RecurrenceDescriptor. If either \p DB is non-null or \p AC and \p DT are
/// non-null, the minimal bit width needed to compute the reduction will be		/// non-null, the minimal bit width needed to compute the reduction will be
/// computed.		/// computed.
static bool AddReductionVar(PHINode Phi, RecurKind Kind, Loop TheLoop,		static bool AddReductionVar(PHINode Phi, RecurKind Kind, Loop TheLoop,
FastMathFlags FuncFMF,		FastMathFlags FuncFMF, ScalarEvolution *SE,
RecurrenceDescriptor &RedDes,		RecurrenceDescriptor &RedDes,
DemandedBits *DB = nullptr,		DemandedBits *DB = nullptr,
AssumptionCache *AC = nullptr,		AssumptionCache *AC = nullptr,
DominatorTree *DT = nullptr);		DominatorTree *DT = nullptr);

/// Returns true if Phi is a reduction in TheLoop. The RecurrenceDescriptor		/// Returns true if Phi is a reduction in TheLoop. The RecurrenceDescriptor
/// is returned in RedDes. If either \p DB is non-null or \p AC and \p DT are		/// is returned in RedDes. If either \p DB is non-null or \p AC and \p DT are
/// non-null, the minimal bit width needed to compute the reduction will be		/// non-null, the minimal bit width needed to compute the reduction will be
/// computed.		/// computed.
static bool isReductionPHI(PHINode Phi, Loop TheLoop,		static bool isReductionPHI(PHINode Phi, Loop TheLoop, ScalarEvolution *SE,
		fhahnUnsubmitted Done Reply Inline Actions From an interface perspective, I think it would be better to make `SE` an optional argument and only perform the the store analysis when it is passed. This would also require users to explicitly opt-in, which would at least partially guarding against people ignoring `IntermediateStore`. fhahn: From an interface perspective, I think it would be better to make `SE` an optional argument and…
RecurrenceDescriptor &RedDes,		RecurrenceDescriptor &RedDes,
DemandedBits *DB = nullptr,		DemandedBits *DB = nullptr,
AssumptionCache *AC = nullptr,		AssumptionCache *AC = nullptr,
DominatorTree *DT = nullptr);		DominatorTree *DT = nullptr);

		fhahnUnsubmitted Done Reply Inline Actions nit: addresses ? fhahn: nit: addresses ?
/// Returns true if Phi is a first-order recurrence. A first-order recurrence		/// Returns true if Phi is a first-order recurrence. A first-order recurrence
/// is a non-reduction recurrence relation in which the value of the		/// is a non-reduction recurrence relation in which the value of the
/// recurrence in the current loop iteration equals a value defined in the		/// recurrence in the current loop iteration equals a value defined in the
/// previous iteration. \p SinkAfter includes pairs of instructions where the		/// previous iteration. \p SinkAfter includes pairs of instructions where the
/// first will be rescheduled to appear after the second if/when the loop is		/// first will be rescheduled to appear after the second if/when the loop is
/// vectorized. It may be augmented with additional pairs if needed in order		/// vectorized. It may be augmented with additional pairs if needed in order
/// to handle Phi as a first-order recurrence.		/// to handle Phi as a first-order recurrence.
static bool		static bool
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	public:
bool isOrdered() const { return IsOrdered; }		bool isOrdered() const { return IsOrdered; }

/// Attempts to find a chain of operations from Phi to LoopExitInst that can		/// Attempts to find a chain of operations from Phi to LoopExitInst that can
/// be treated as a set of reductions instructions for in-loop reductions.		/// be treated as a set of reductions instructions for in-loop reductions.
SmallVector<Instruction , 4> getReductionOpChain(PHINode Phi,		SmallVector<Instruction , 4> getReductionOpChain(PHINode Phi,
Loop *L) const;		Loop *L) const;

/// Returns true if the instruction is a call to the llvm.fmuladd intrinsic.		/// Returns true if the instruction is a call to the llvm.fmuladd intrinsic.
static bool isFMulAddIntrinsic(Instruction *I) {		static bool isFMulAddIntrinsic(Instruction *I) {
		fhahnUnsubmitted Done Reply Inline Actions nit: should be a doc-comment? fhahn: nit: should be a doc-comment?
return isa<IntrinsicInst>(I) &&		return isa<IntrinsicInst>(I) &&
cast<IntrinsicInst>(I)->getIntrinsicID() == Intrinsic::fmuladd;		cast<IntrinsicInst>(I)->getIntrinsicID() == Intrinsic::fmuladd;
}		}

		/// Intermediate store of the reduction
		fhahnUnsubmitted Done Reply Inline Actions nit: would be good to also document what this means, what the properties of such stores are. fhahn: nit: would be good to also document what this means, what the properties of such stores are.
		StoreInst *IntermediateStore = nullptr;

private:		private:
// The starting value of the recurrence.		// The starting value of the recurrence.
// It does not have to be zero!		// It does not have to be zero!
TrackingVH<Value> StartValue;		TrackingVH<Value> StartValue;
// The instruction who's value is used outside the loop.		// The instruction who's value is used outside the loop.
Instruction *LoopExitInstr = nullptr;		Instruction *LoopExitInstr = nullptr;
// The kind of the recurrence.		// The kind of the recurrence.
RecurKind Kind = RecurKind::None;		RecurKind Kind = RecurKind::None;
▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

Show First 20 Lines • Show All 573 Lines • ▼ Show 20 Lines	public:
void print(raw_ostream &OS, unsigned Depth = 0) const;		void print(raw_ostream &OS, unsigned Depth = 0) const;

/// If the loop has memory dependence involving an invariant address, i.e. two		/// If the loop has memory dependence involving an invariant address, i.e. two
/// stores or a store and a load, then return true, else return false.		/// stores or a store and a load, then return true, else return false.
bool hasDependenceInvolvingLoopInvariantAddress() const {		bool hasDependenceInvolvingLoopInvariantAddress() const {
return HasDependenceInvolvingLoopInvariantAddress;		return HasDependenceInvolvingLoopInvariantAddress;
}		}

		/// Return the list of stores to invariant addresses.
		const ArrayRef<StoreInst *> getStoresToInvariantAddresses() const {
		fhahnUnsubmitted Done Reply Inline Actions nit: could return `ArrayRef`, not leaking any details about the underlying container. fhahn: nit: could return `ArrayRef`, not leaking any details about the underlying container.
		return StoresToInvariantAddresses;
		}

/// Used to add runtime SCEV checks. Simplifies SCEV expressions and converts		/// Used to add runtime SCEV checks. Simplifies SCEV expressions and converts
/// them to a more usable form. All SCEV expressions during the analysis		/// them to a more usable form. All SCEV expressions during the analysis
/// should be re-written (and therefore simplified) according to PSE.		/// should be re-written (and therefore simplified) according to PSE.
/// A user of LoopAccessAnalysis will need to emit the runtime checks		/// A user of LoopAccessAnalysis will need to emit the runtime checks
/// associated with this predicate.		/// associated with this predicate.
const PredicatedScalarEvolution &getPSE() const { return *PSE; }		const PredicatedScalarEvolution &getPSE() const { return *PSE; }

private:		private:
Show All 38 Lines	private:

/// Cache the result of analyzeLoop.		/// Cache the result of analyzeLoop.
bool CanVecMem;		bool CanVecMem;
bool HasConvergentOp;		bool HasConvergentOp;

/// Indicator that there are non vectorizable stores to a uniform address.		/// Indicator that there are non vectorizable stores to a uniform address.
bool HasDependenceInvolvingLoopInvariantAddress;		bool HasDependenceInvolvingLoopInvariantAddress;

		/// List of stores to invariant addresses.
		SmallVector<StoreInst *> StoresToInvariantAddresses;
		fhahnUnsubmitted Done Reply Inline Actions Are the stores invariant or to a uniform address? `InvariantStores` implies they are invariant, which may not be the case? fhahn: Are the stores invariant or to a uniform address? `InvariantStores` implies they are invariant…
		david-armUnsubmitted Done Reply Inline Actions I assume here that InvariantStores refers to stores to an invariant (i.e. uniform?) address, rather than storing an invariant value to variant address. If so, perhaps it could be named StoresToInvariantAddress or something like that? david-arm: I assume here that InvariantStores refers to stores to an invariant (i.e. uniform?) address…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions @fhah Just to make sure we are on the same line - for me uniform and invariant are just synonyms, isn't it so? igor.kirillov: @fhah Just to make sure we are on the same line - for me uniform and invariant are just…
		fhahnUnsubmitted Done Reply Inline Actions nit: Is there a reason for choosing 5 for the size? if not, nowadays SmallVector can pick a good size automatically. fhahn: nit: Is there a reason for choosing 5 for the size? if not, nowadays SmallVector can pick a…

/// The diagnostics report generated for the analysis. E.g. why we		/// The diagnostics report generated for the analysis. E.g. why we
/// couldn't analyze the loop.		/// couldn't analyze the loop.
std::unique_ptr<OptimizationRemarkAnalysis> Report;		std::unique_ptr<OptimizationRemarkAnalysis> Report;

/// If an access has a symbolic strides, this maps the pointer value to		/// If an access has a symbolic strides, this maps the pointer value to
/// the stride symbol.		/// the stride symbol.
ValueToValueMap SymbolicStrides;		ValueToValueMap SymbolicStrides;

▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

Show First 20 Lines • Show All 301 Lines • ▼ Show 20 Lines	public:
RecurrenceSet &getFirstOrderRecurrences() { return FirstOrderRecurrences; }		RecurrenceSet &getFirstOrderRecurrences() { return FirstOrderRecurrences; }

/// Return the set of instructions to sink to handle first-order recurrences.		/// Return the set of instructions to sink to handle first-order recurrences.
MapVector<Instruction , Instruction > &getSinkAfter() { return SinkAfter; }		MapVector<Instruction , Instruction > &getSinkAfter() { return SinkAfter; }

/// Returns the widest induction type.		/// Returns the widest induction type.
Type *getWidestInductionType() { return WidestIndTy; }		Type *getWidestInductionType() { return WidestIndTy; }

		/// Returns True if given invariant store uses recurrent expression
		peterwaller-armUnsubmitted Done Reply Inline Actions nit. 'recurrent' peterwaller-arm: nit. 'recurrent'
		fhahnUnsubmitted Done Reply Inline Actions nit: could be more specific, the increment of the reduction is used as stored operand, right? Also, they only handle reductions right? Then using Reduction instead of Recurrence seems more descriptive, .e.g. `isInvariantStoreOfReduction`? (same for `isRecurringInvariantAddress` fhahn: nit: could be more specific, the increment of the reduction is used as stored operand, right?
		bool isRecurringInvariantStore(StoreInst *SI);

		/// Returns True if given address is invariant and is used to store recurrent
		/// expression
		bool isRecurringInvariantAddress(Value *V);

/// Returns True if V is a Phi node of an induction variable in this loop.		/// Returns True if V is a Phi node of an induction variable in this loop.
bool isInductionPhi(const Value *V);		bool isInductionPhi(const Value *V);

/// Returns True if V is a cast that is part of an induction def-use chain,		/// Returns True if V is a cast that is part of an induction def-use chain,
/// and had been proven to be redundant under a runtime guard (in other		/// and had been proven to be redundant under a runtime guard (in other
/// words, the cast has the same SCEV expression as the induction phi).		/// words, the cast has the same SCEV expression as the induction phi).
bool isCastedInductionVariable(const Value *V);		bool isCastedInductionVariable(const Value *V);

▲ Show 20 Lines • Show All 244 Lines • Show Last 20 Lines

llvm/lib/Analysis/IVDescriptors.cpp

Show First 20 Lines • Show All 219 Lines • ▼ Show 20 Lines	if (Kind == RecurKind::FMulAdd && Exit->getOperand(2) != Phi)
return false;		return false;

LLVM_DEBUG(dbgs() << "LV: Found an ordered reduction: Phi: " << *Phi		LLVM_DEBUG(dbgs() << "LV: Found an ordered reduction: Phi: " << *Phi
<< ", ExitInst: " << *Exit << "\n");		<< ", ExitInst: " << *Exit << "\n");

return true;		return true;
}		}

bool RecurrenceDescriptor::AddReductionVar(PHINode *Phi, RecurKind Kind,		bool RecurrenceDescriptor::AddReductionVar(
Loop *TheLoop, FastMathFlags FuncFMF,		PHINode Phi, RecurKind Kind, Loop TheLoop, FastMathFlags FuncFMF,
RecurrenceDescriptor &RedDes,		ScalarEvolution SE, RecurrenceDescriptor &RedDes, DemandedBits DB,
DemandedBits *DB,		AssumptionCache AC, DominatorTree DT) {
AssumptionCache *AC,
DominatorTree *DT) {
if (Phi->getNumIncomingValues() != 2)		if (Phi->getNumIncomingValues() != 2)
return false;		return false;

// Reduction variables are only found in the loop header block.		// Reduction variables are only found in the loop header block.
if (Phi->getParent() != TheLoop->getHeader())		if (Phi->getParent() != TheLoop->getHeader())
return false;		return false;

// Obtain the reduction start value from the value that comes from the loop		// Obtain the reduction start value from the value that comes from the loop
// preheader.		// preheader.
Value *RdxStart = Phi->getIncomingValueForBlock(TheLoop->getLoopPreheader());		Value *RdxStart = Phi->getIncomingValueForBlock(TheLoop->getLoopPreheader());

// ExitInstruction is the single value which is used outside the loop.		// ExitInstruction is the single value which is used outside the loop.
// We only allow for a single reduction value to be used outside the loop.		// We only allow for a single reduction value to be used outside the loop.
// This includes users of the reduction, variables (which form a cycle		// This includes users of the reduction, variables (which form a cycle
// which ends in the phi node).		// which ends in the phi node).
Instruction *ExitInstruction = nullptr;		Instruction *ExitInstruction = nullptr;

		fhahnUnsubmitted Done Reply Inline Actions needs documentation? fhahn: needs documentation?
		// Variable to keep last visited store instruction. By the end of the
		// algorithm this variable will be either empty or having intermediate
		// reduction value stored in invariant address.
		StoreInst *IntermediateStore = nullptr;

// Indicates that we found a reduction operation in our scan.		// Indicates that we found a reduction operation in our scan.
bool FoundReduxOp = false;		bool FoundReduxOp = false;

// We start with the PHI node and scan for all of the users of this		// We start with the PHI node and scan for all of the users of this
// instruction. All users must be instructions that can be used as reduction		// instruction. All users must be instructions that can be used as reduction
// variables (such as ADD). We must have a single out-of-block user. The cycle		// variables (such as ADD). We must have a single out-of-block user. The cycle
// must include the original PHI.		// must include the original PHI.
bool FoundStartPHI = false;		bool FoundStartPHI = false;
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::AddReductionVar(
// - One use of reduction value (safe).		// - One use of reduction value (safe).
// - Multiple use of reduction value (not safe).		// - Multiple use of reduction value (not safe).
// - PHI:		// - PHI:
// - All uses of the PHI must be the reduction (safe).		// - All uses of the PHI must be the reduction (safe).
// - Otherwise, not safe.		// - Otherwise, not safe.
// - By instructions outside of the loop (safe).		// - By instructions outside of the loop (safe).
// * One value may have several outside users, but all outside		// * One value may have several outside users, but all outside
// uses must be of the same value.		// uses must be of the same value.
		// - By store instructions with a loop invariant address (safe with
		// the following restrictions):
		fhahnUnsubmitted Done Reply Inline Actions What about existing users of `isReductionPHI` which currently may rely on the fact that all instruction in the loop must either be phis or reduction operations? Also, with respect to the store restriction, the important bit is that the final value is also stored, right? fhahn: What about existing users of `isReductionPHI` which currently may rely on the fact that all…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions I checked and only `LoopInterchangeLegality` is using this function and it is not affected by stores. Anyway, I can add a parameter to `RecurrenceDescriptor::isReductionPHI` or a member to `RecurrenceDescriptor` allowing or not to handle stores. What do you think about it? The comment is to be updated, yes. igor.kirillov: I checked and only `LoopInterchangeLegality` is using this function and it is not affected by…
		fhahnUnsubmitted Done Reply Inline Actions I guess it would be good to have at least a test case for loop-interchange to make sure it can deal with the change properly? fhahn: I guess it would be good to have at least a test case for loop-interchange to make sure it can…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions You are right. We actually should not do loop interchange is an invariant address is present. Otherwise it may introduce incorrect result. igor.kirillov: You are right. We actually should not do loop interchange is an invariant address is present.
		// * If there are several stores, all must have the same address.
		// * Final value should be stored in that loop invariant address.
// - By an instruction that is not part of the reduction (not safe).		// - By an instruction that is not part of the reduction (not safe).
// This is either:		// This is either:
// * An instruction type other than PHI or the reduction operation.		// * An instruction type other than PHI or the reduction operation.
// * A PHI in the header other than the initial PHI.		// * A PHI in the header other than the initial PHI.
		peterwaller-armUnsubmitted Done Reply Inline Actions Does this comment need updating to discuss your intermediate stores? peterwaller-arm: Does this comment need updating to discuss your intermediate stores?
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Yes, indeed igor.kirillov: Yes, indeed
while (!Worklist.empty()) {		while (!Worklist.empty()) {
Instruction *Cur = Worklist.pop_back_val();		Instruction *Cur = Worklist.pop_back_val();

		// Store instructions are allowed iff it is the store of the reduction
		// value to the same loop invariant memory location.
		fhahnUnsubmitted Done Reply Inline Actions You are only checking for loop-invariant addresses, so should this be `loop invariant memory location`? fhahn: You are only checking for loop-invariant addresses, so should this be `loop invariant memory…
		if (auto *SI = dyn_cast<StoreInst>(Cur)) {
		const SCEV *PtrScev = SE->getSCEV(SI->getPointerOperand());
		// Check it is the same address as previous stores
		if (IntermediateStore) {
		const SCEV *OtherScev =
		SE->getSCEV(IntermediateStore->getPointerOperand());

		if (OtherScev != PtrScev) {
		LLVM_DEBUG(dbgs() << "Storing reduction value to different addresses "
		<< "inside the loop: " << *SI->getPointerOperand()
		<< " and "
		<< *IntermediateStore->getPointerOperand() << '\n');
		return false;
		}
		}

		// Check the pointer is loop invariant
		if (!SE->isLoopInvariant(PtrScev, TheLoop)) {
		LLVM_DEBUG(dbgs() << "Storing reduction value to non-uniform address "
		<< "inside the loop: " << *SI->getPointerOperand()
		<< '\n');
		return false;
		}

		// IntermediateStore is always the last store in the loop.
		IntermediateStore = SI;
		continue;
		}

// No Users.		// No Users.
// If the instruction has no users then this is a broken chain and can't be		// If the instruction has no users then this is a broken chain and can't be
// a reduction variable.		// a reduction variable.
		fhahnUnsubmitted Done Reply Inline Actions Do we have to make sure that the stored value is feeding the phi again and not an earlier value in the cycle? fhahn: Do we have to make sure that the stored value is feeding the phi again and not an earlier value…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Yes! I added this check and also a test case (reduc_store_not_final_value) igor.kirillov: Yes! I added this check and also a test case (reduc_store_not_final_value)
if (Cur->use_empty())		if (Cur->use_empty())
return false;		return false;

bool IsAPhi = isa<PHINode>(Cur);		bool IsAPhi = isa<PHINode>(Cur);

// A header PHI use other than the original PHI.		// A header PHI use other than the original PHI.
if (Cur != Phi && IsAPhi && Cur->getParent() == Phi->getParent())		if (Cur != Phi && IsAPhi && Cur->getParent() == Phi->getParent())
return false;		return false;
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	for (User *U : Cur->users()) {
continue;		continue;
}		}

// Process instructions only once (termination). Each reduction cycle		// Process instructions only once (termination). Each reduction cycle
// value must only be used once, except by phi nodes and min/max		// value must only be used once, except by phi nodes and min/max
// reductions which are represented as a cmp followed by a select.		// reductions which are represented as a cmp followed by a select.
InstDesc IgnoredVal(false, nullptr);		InstDesc IgnoredVal(false, nullptr);
if (VisitedInsts.insert(UI).second) {		if (VisitedInsts.insert(UI).second) {
if (isa<PHINode>(UI))		if (isa<PHINode>(UI)) {
PHIs.push_back(UI);		PHIs.push_back(UI);
else		} else {
		StoreInst *SI = dyn_cast<StoreInst>(UI);
		if (SI && SI->getPointerOperand() == Cur) {
		// Reduction variable chain can only be stored somewhere but it
		// can't be used as an address.
		return false;
		}
NonPHIs.push_back(UI);		NonPHIs.push_back(UI);
		}
		david-armUnsubmitted Done Reply Inline Actions nit: Could you remove an extra level of indentation here before merging the patch? i.e. this can be written like this: if (isa<PHINode>(UI)) PHIs.push_back(UI); else if (auto SI = dyn_cast<StoreInst>(UI)) { ... } else NonPHIs.push_back(UI); david-arm:* nit: Could you remove an extra level of indentation here before merging the patch? i.e. this…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions I made an extra step and simplified it even more igor.kirillov: I made an extra step and simplified it even more
} else if (!isa<PHINode>(UI) &&		} else if (!isa<PHINode>(UI) &&
((!isa<FCmpInst>(UI) && !isa<ICmpInst>(UI) &&		((!isa<FCmpInst>(UI) && !isa<ICmpInst>(UI) &&
!isa<SelectInst>(UI)) \|\|		!isa<SelectInst>(UI)) \|\|
		peterwaller-armUnsubmitted Done Reply Inline Actions Suggestion: Does this warrant a comment? (I only observe that there are lots of comments around here and I spent a moment trying to guess at what this did without arriving at an answer). peterwaller-arm: Suggestion: Does this warrant a comment? (I only observe that there are lots of comments around…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions I hope this one clarifies. And, actually we should exit if reduction variable is used as an address (this should be highly unlikely case but nevertheless) igor.kirillov: I hope this one clarifies. And, actually we should exit if reduction variable is used as an…
(!isConditionalRdxPattern(Kind, UI).isRecurrence() &&		(!isConditionalRdxPattern(Kind, UI).isRecurrence() &&
!isSelectCmpPattern(TheLoop, Phi, UI, IgnoredVal)		!isSelectCmpPattern(TheLoop, Phi, UI, IgnoredVal)
.isRecurrence() &&		.isRecurrence() &&
!isMinMaxPattern(UI, Kind, IgnoredVal).isRecurrence())))		!isMinMaxPattern(UI, Kind, IgnoredVal).isRecurrence())))
return false;		return false;

// Remember that we completed the cycle.		// Remember that we completed the cycle.
if (UI == Phi)		if (UI == Phi)
FoundStartPHI = true;		FoundStartPHI = true;
}		}
Worklist.append(PHIs.begin(), PHIs.end());		Worklist.append(PHIs.begin(), PHIs.end());
Worklist.append(NonPHIs.begin(), NonPHIs.end());		Worklist.append(NonPHIs.begin(), NonPHIs.end());
}		}

// This means we have seen one but not the other instruction of the		// This means we have seen one but not the other instruction of the
// pattern or more than just a select and cmp. Zero implies that we saw a		// pattern or more than just a select and cmp. Zero implies that we saw a
// llvm.min/max instrinsic, which is always OK.		// llvm.min/max instrinsic, which is always OK.
if (isMinMaxRecurrenceKind(Kind) && NumCmpSelectPatternInst != 2 &&		if (isMinMaxRecurrenceKind(Kind) && NumCmpSelectPatternInst != 2 &&
NumCmpSelectPatternInst != 0)		NumCmpSelectPatternInst != 0)
return false;		return false;

if (isSelectCmpRecurrenceKind(Kind) && NumCmpSelectPatternInst != 1)		if (isSelectCmpRecurrenceKind(Kind) && NumCmpSelectPatternInst != 1)
return false;		return false;

		// If there is an intermediate store, it must store the last reduction value.
		if (ExitInstruction && IntermediateStore) {
		fhahnUnsubmitted Done Reply Inline Actions nit: for consistency with other code avoid using `!= nullptr`. This is more in line with the style use in LLVM in general (and you use `==/!= nullptr` in the adjacent code here). fhahn: nit: for consistency with other code avoid using `!= nullptr`. This is more in line with the…
		if (IntermediateStore->getValueOperand() != ExitInstruction) {
		LLVM_DEBUG(
		dbgs() << "LU: Last store Instruction of reduction value "
		<< "does not store last calculated value of the reduction: "
		<< *IntermediateStore << '\n');
		return false;
		}
		}
		// If all uses are inside the loop (intermediate stores), then the
		// reduction value after the loop will be the one used in the last store.
		if (!ExitInstruction && IntermediateStore)
		ExitInstruction = cast<Instruction>(IntermediateStore->getValueOperand());

if (!FoundStartPHI \|\| !FoundReduxOp \|\| !ExitInstruction)		if (!FoundStartPHI \|\| !FoundReduxOp \|\| !ExitInstruction)
return false;		return false;

const bool IsOrdered = checkOrderedReduction(		const bool IsOrdered = checkOrderedReduction(
Kind, ReduxDesc.getExactFPMathInst(), ExitInstruction, Phi);		Kind, ReduxDesc.getExactFPMathInst(), ExitInstruction, Phi);

if (Start != Phi) {		if (Start != Phi) {
// If the starting value is not the same as the phi node, we speculatively		// If the starting value is not the same as the phi node, we speculatively
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::AddReductionVar(

// We found a reduction var if we have reached the original phi node and we		// We found a reduction var if we have reached the original phi node and we
// only have a single instruction with out-of-loop users.		// only have a single instruction with out-of-loop users.

// The ExitInstruction(Instruction which is allowed to have out-of-loop users)		// The ExitInstruction(Instruction which is allowed to have out-of-loop users)
// is saved as part of the RecurrenceDescriptor.		// is saved as part of the RecurrenceDescriptor.

// Save the description of this reduction variable.		// Save the description of this reduction variable.
RecurrenceDescriptor RD(RdxStart, ExitInstruction, Kind, FMF,		RecurrenceDescriptor RD(RdxStart, ExitInstruction, IntermediateStore, Kind,
ReduxDesc.getExactFPMathInst(), RecurrenceType,		FMF, ReduxDesc.getExactFPMathInst(), RecurrenceType,
IsSigned, IsOrdered, CastInsts);		IsSigned, IsOrdered, CastInsts);
RedDes = RD;		RedDes = RD;

return true;		return true;
}		}

// We are looking for loops that do something like this:		// We are looking for loops that do something like this:
// int r = 0;		// int r = 0;
▲ Show 20 Lines • Show All 206 Lines • ▼ Show 20 Lines	for (const Use &U : I->operands()) {
if (NumUses > MaxNumUses)		if (NumUses > MaxNumUses)
return true;		return true;
}		}

return false;		return false;
}		}

bool RecurrenceDescriptor::isReductionPHI(PHINode Phi, Loop TheLoop,		bool RecurrenceDescriptor::isReductionPHI(PHINode Phi, Loop TheLoop,
		ScalarEvolution *SE,
RecurrenceDescriptor &RedDes,		RecurrenceDescriptor &RedDes,
DemandedBits DB, AssumptionCache AC,		DemandedBits DB, AssumptionCache AC,
DominatorTree *DT) {		DominatorTree *DT) {
BasicBlock *Header = TheLoop->getHeader();		BasicBlock *Header = TheLoop->getHeader();
Function &F = *Header->getParent();		Function &F = *Header->getParent();
FastMathFlags FMF;		FastMathFlags FMF;
FMF.setNoNaNs(		FMF.setNoNaNs(
F.getFnAttribute("no-nans-fp-math").getValueAsBool());		F.getFnAttribute("no-nans-fp-math").getValueAsBool());
FMF.setNoSignedZeros(		FMF.setNoSignedZeros(
F.getFnAttribute("no-signed-zeros-fp-math").getValueAsBool());		F.getFnAttribute("no-signed-zeros-fp-math").getValueAsBool());

if (AddReductionVar(Phi, RecurKind::Add, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::Add, TheLoop, FMF, SE, RedDes, DB, AC,
		DT)) {
LLVM_DEBUG(dbgs() << "Found an ADD reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found an ADD reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::Mul, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::Mul, TheLoop, FMF, SE, RedDes, DB, AC,
		DT)) {
LLVM_DEBUG(dbgs() << "Found a MUL reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a MUL reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::Or, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::Or, TheLoop, FMF, SE, RedDes, DB, AC,
		DT)) {
LLVM_DEBUG(dbgs() << "Found an OR reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found an OR reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::And, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::And, TheLoop, FMF, SE, RedDes, DB, AC,
		DT)) {
LLVM_DEBUG(dbgs() << "Found an AND reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found an AND reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::Xor, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::Xor, TheLoop, FMF, SE, RedDes, DB, AC,
		DT)) {
LLVM_DEBUG(dbgs() << "Found a XOR reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a XOR reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::SMax, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::SMax, TheLoop, FMF, SE, RedDes, DB, AC,
		DT)) {
LLVM_DEBUG(dbgs() << "Found a SMAX reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a SMAX reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::SMin, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::SMin, TheLoop, FMF, SE, RedDes, DB, AC,
		DT)) {
LLVM_DEBUG(dbgs() << "Found a SMIN reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a SMIN reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::UMax, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::UMax, TheLoop, FMF, SE, RedDes, DB, AC,
		DT)) {
LLVM_DEBUG(dbgs() << "Found a UMAX reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a UMAX reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::UMin, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::UMin, TheLoop, FMF, SE, RedDes, DB, AC,
		DT)) {
LLVM_DEBUG(dbgs() << "Found a UMIN reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a UMIN reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::SelectICmp, TheLoop, FMF, RedDes, DB, AC,		if (AddReductionVar(Phi, RecurKind::SelectICmp, TheLoop, FMF, SE, RedDes, DB,
DT)) {		AC, DT)) {
LLVM_DEBUG(dbgs() << "Found an integer conditional select reduction PHI."		LLVM_DEBUG(dbgs() << "Found an integer conditional select reduction PHI."
<< *Phi << "\n");		<< *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::FMul, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::FMul, TheLoop, FMF, SE, RedDes, DB, AC,
		DT)) {
LLVM_DEBUG(dbgs() << "Found an FMult reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found an FMult reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::FAdd, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::FAdd, TheLoop, FMF, SE, RedDes, DB, AC,
		DT)) {
LLVM_DEBUG(dbgs() << "Found an FAdd reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found an FAdd reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::FMax, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::FMax, TheLoop, FMF, SE, RedDes, DB, AC,
		DT)) {
LLVM_DEBUG(dbgs() << "Found a float MAX reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a float MAX reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::FMin, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::FMin, TheLoop, FMF, SE, RedDes, DB, AC,
		DT)) {
LLVM_DEBUG(dbgs() << "Found a float MIN reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a float MIN reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::SelectFCmp, TheLoop, FMF, RedDes, DB, AC,		if (AddReductionVar(Phi, RecurKind::SelectFCmp, TheLoop, FMF, SE, RedDes, DB,
DT)) {		AC, DT)) {
LLVM_DEBUG(dbgs() << "Found a float conditional select reduction PHI."		LLVM_DEBUG(dbgs() << "Found a float conditional select reduction PHI."
<< " PHI." << *Phi << "\n");		<< " PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::FMulAdd, TheLoop, FMF, RedDes, DB, AC,		if (AddReductionVar(Phi, RecurKind::FMulAdd, TheLoop, FMF, SE, RedDes, DB, AC,
DT)) {		DT)) {
LLVM_DEBUG(dbgs() << "Found an FMulAdd reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found an FMulAdd reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
// Not a reduction of known type.		// Not a reduction of known type.
return false;		return false;
}		}

▲ Show 20 Lines • Show All 587 Lines • Show Last 20 Lines

llvm/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 1,981 Lines • ▼ Show 20 Lines	void LoopAccessInfo::analyzeLoop(AAResults AA, LoopInfo LI,

// Record uniform store addresses to identify if we have multiple stores		// Record uniform store addresses to identify if we have multiple stores
// to the same address.		// to the same address.
ValueSet UniformStores;		ValueSet UniformStores;

for (StoreInst *ST : Stores) {		for (StoreInst *ST : Stores) {
Value *Ptr = ST->getPointerOperand();		Value *Ptr = ST->getPointerOperand();

if (isUniform(Ptr))		if (isUniform(Ptr)) {
		// Record store instructions to loop invariant addresses
		StoresToInvariantAddresses.push_back(ST);
		peterwaller-armUnsubmitted Done Reply Inline Actions Does the word 'variant' add anything? I couldn't find any other uses nearby which elucidate what you're trying to say here. It feels confusing because you are talking about 'InvariantStores' which means invariant in the address, so it looks like a typo. I feel this comment could be better: 'Record stores instructions to loop-invariant addresses', in contrast to the comment on UniformStores above? Do you need to say much about the value? peterwaller-arm: Does the word 'variant' add anything? I couldn't find any other uses nearby which elucidate…
HasDependenceInvolvingLoopInvariantAddress \|=		HasDependenceInvolvingLoopInvariantAddress \|=
!UniformStores.insert(Ptr).second;		!UniformStores.insert(Ptr).second;
		}

// If we did not see this pointer before, insert it to the read-write		// If we did not see this pointer before, insert it to the read-write
// list. At this phase it is only a 'write' list.		// list. At this phase it is only a 'write' list.
if (Seen.insert(Ptr).second) {		if (Seen.insert(Ptr).second) {
++NumReadWrites;		++NumReadWrites;

MemoryLocation Loc = MemoryLocation::get(ST);		MemoryLocation Loc = MemoryLocation::get(ST);
// The TBAA metadata could have a control dependency on the predication		// The TBAA metadata could have a control dependency on the predication
▲ Show 20 Lines • Show All 368 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

Show First 20 Lines • Show All 725 Lines • ▼ Show 20 Lines	if (!PHI)
return SV;		return SV;

if (PHI->getNumIncomingValues() != 1)		if (PHI->getNumIncomingValues() != 1)
return SV;		return SV;
return followLCSSA(PHI->getIncomingValue(0));		return followLCSSA(PHI->getIncomingValue(0));
}		}

// Check V's users to see if it is involved in a reduction in L.		// Check V's users to see if it is involved in a reduction in L.
static PHINode findInnerReductionPhi(Loop L, Value *V) {		static PHINode findInnerReductionPhi(Loop L, Value V, ScalarEvolution SE) {
// Reduction variables cannot be constants.		// Reduction variables cannot be constants.
if (isa<Constant>(V))		if (isa<Constant>(V))
return nullptr;		return nullptr;

for (Value *User : V->users()) {		for (Value *User : V->users()) {
if (PHINode *PHI = dyn_cast<PHINode>(User)) {		if (PHINode *PHI = dyn_cast<PHINode>(User)) {
if (PHI->getNumIncomingValues() == 1)		if (PHI->getNumIncomingValues() == 1)
continue;		continue;
RecurrenceDescriptor RD;		RecurrenceDescriptor RD;
if (RecurrenceDescriptor::isReductionPHI(PHI, L, RD))		// Recurrence should not have intermediate store.
		if (RecurrenceDescriptor::isReductionPHI(PHI, L, SE, RD) &&
		!RD.IntermediateStore)
return PHI;		return PHI;
return nullptr;		return nullptr;
}		}
}		}

return nullptr;		return nullptr;
}		}

Show All 16 Lines	else {
return false;		return false;
}		}
} else {		} else {
assert(PHI.getNumIncomingValues() == 2 &&		assert(PHI.getNumIncomingValues() == 2 &&
"Phis in loop header should have exactly 2 incoming values");		"Phis in loop header should have exactly 2 incoming values");
// Check if we have a PHI node in the outer loop that has a reduction		// Check if we have a PHI node in the outer loop that has a reduction
// result from the inner loop as an incoming value.		// result from the inner loop as an incoming value.
Value *V = followLCSSA(PHI.getIncomingValueForBlock(L->getLoopLatch()));		Value *V = followLCSSA(PHI.getIncomingValueForBlock(L->getLoopLatch()));
PHINode *InnerRedPhi = findInnerReductionPhi(InnerLoop, V);		PHINode *InnerRedPhi = findInnerReductionPhi(InnerLoop, V, SE);
if (!InnerRedPhi \|\|		if (!InnerRedPhi \|\|
!llvm::is_contained(InnerRedPhi->incoming_values(), &PHI)) {		!llvm::is_contained(InnerRedPhi->incoming_values(), &PHI)) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs()		dbgs()
<< "Failed to recognize PHI as an induction or reduction.\n");		<< "Failed to recognize PHI as an induction or reduction.\n");
return false;		return false;
}		}
OuterInnerReductions.insert(&PHI);		OuterInnerReductions.insert(&PHI);
▲ Show 20 Lines • Show All 1,038 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show First 20 Lines • Show All 413 Lines • ▼ Show 20 Lines	for (User *U : Inst->users()) {
if (!TheLoop->contains(UI)) {		if (!TheLoop->contains(UI)) {
LLVM_DEBUG(dbgs() << "LV: Found an outside user for : " << *UI << '\n');		LLVM_DEBUG(dbgs() << "LV: Found an outside user for : " << *UI << '\n');
return true;		return true;
}		}
}		}
return false;		return false;
}		}

		/// Returns true if A and B have same pointer operands or same SCEVs addresses
		static bool storeToSameAddress(ScalarEvolution SE, StoreInst A,
		StoreInst *B) {
		// Compare store
		if (A == B)
		return true;

		// Otherwise Compare pointers
		Value *APtr = A->getPointerOperand();
		Value *BPtr = B->getPointerOperand();
		fhahnUnsubmitted Done Reply Inline Actions With opaque pointers, this code may behave differently. It's possible to have store i32 0, ptr %x store i8 0, ptr %x In that case the pointer operands will be the same, but different store widths. I think we should also check that the types of the stored values match for now, as we use this to remove earlier stores. This should only be correct if the later store writes at least as many bits as the earlier stores. fhahn: With opaque pointers, this code may behave differently. It's possible to have ``` store i32 0…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions I think if this check is added here, then the purpose of the function would be different (address is still the same even if values have different size). So, instead of that I added a check to a place where those pointers are really processed - see LoopVectorizationLegality.cpp:975. There we now make sure that values stored in an invariant address are of the same type. igor.kirillov: I think if this check is added here, then the purpose of the function would be different…
		if (APtr == BPtr)
		return true;

		// Otherwise compare address SCEVs
		if (SE->getSCEV(APtr) == SE->getSCEV(BPtr))
		return true;

		return false;
		}

int LoopVectorizationLegality::isConsecutivePtr(Type *AccessTy,		int LoopVectorizationLegality::isConsecutivePtr(Type *AccessTy,
Value *Ptr) const {		Value *Ptr) const {
const ValueToValueMap &Strides =		const ValueToValueMap &Strides =
getSymbolicStrides() ? *getSymbolicStrides() : ValueToValueMap();		getSymbolicStrides() ? *getSymbolicStrides() : ValueToValueMap();

Function *F = TheLoop->getHeader()->getParent();		Function *F = TheLoop->getHeader()->getParent();
bool OptForSize = F->hasOptSize() \|\|		bool OptForSize = F->hasOptSize() \|\|
llvm::shouldOptimizeForSize(TheLoop->getHeader(), PSI, BFI,		llvm::shouldOptimizeForSize(TheLoop->getHeader(), PSI, BFI,
▲ Show 20 Lines • Show All 220 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
if (Phi->getNumIncomingValues() != 2) {		if (Phi->getNumIncomingValues() != 2) {
reportVectorizationFailure("Found an invalid PHI",		reportVectorizationFailure("Found an invalid PHI",
"loop control flow is not understood by vectorizer",		"loop control flow is not understood by vectorizer",
"CFGNotUnderstood", ORE, TheLoop, Phi);		"CFGNotUnderstood", ORE, TheLoop, Phi);
return false;		return false;
}		}

RecurrenceDescriptor RedDes;		RecurrenceDescriptor RedDes;
if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, RedDes, DB, AC,		if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, PSE.getSE(),
DT)) {		RedDes, DB, AC, DT)) {
Requirements->addExactFPMathInst(RedDes.getExactFPMathInst());		Requirements->addExactFPMathInst(RedDes.getExactFPMathInst());
AllowedExit.insert(RedDes.getLoopExitInstr());		AllowedExit.insert(RedDes.getLoopExitInstr());
Reductions[Phi] = RedDes;		Reductions[Phi] = RedDes;
continue;		continue;
}		}

// TODO: Instead of recording the AllowedExit, it would be good to record the		// TODO: Instead of recording the AllowedExit, it would be good to record the
// complementary set: NotAllowedExit. These include (but may not be		// complementary set: NotAllowedExit. These include (but may not be
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	ORE->emit([&]() {
return OptimizationRemarkAnalysis(Hints->vectorizeAnalysisPassName(),		return OptimizationRemarkAnalysis(Hints->vectorizeAnalysisPassName(),
"loop not vectorized: ", *LAR);		"loop not vectorized: ", *LAR);
});		});
}		}

if (!LAI->canVectorizeMemory())		if (!LAI->canVectorizeMemory())
return false;		return false;

if (LAI->hasDependenceInvolvingLoopInvariantAddress()) {		if (!LAI->getStoresToInvariantAddresses().empty()) {
		fhahnUnsubmitted Done Reply Inline Actions It would probably be good to make clear what is handled here exactly and why we can handle those stores. IIUC this applies only to invariant stores that store reduction results and is safe because runtime checks guarantee that it won't alias with other objects. The store won't get vectorized, but sunk to the exit block during codegen. fhahn: It would probably be good to make clear what is handled here exactly and why we can handle…
reportVectorizationFailure("Stores to a uniform address",		// For each invariant address, check its last stored value is unconditional.
"write to a loop invariant address could not be vectorized",		for (StoreInst *SI : LAI->getStoresToInvariantAddresses()) {
		if (isRecurringInvariantStore(SI) &&
		blockNeedsPredication(SI->getParent())) {
		reportVectorizationFailure(
		"We don't allow storing to uniform addresses",
		"write of conditional recurring variant value to a loop "
		"invariant address could not be vectorized",
"CantVectorizeStoreToLoopInvariantAddress", ORE, TheLoop);		"CantVectorizeStoreToLoopInvariantAddress", ORE, TheLoop);
return false;		return false;
}		}
		}
		}

		if (LAI->hasDependenceInvolvingLoopInvariantAddress()) {
		// For each invariant address, check its last stored value is the result
		// of one of our reductions.
		//
		// We do not check if dependence with loads exists because they are
		// currently rejected earlier in LoopAccessInfo::analyzeLoop. In case this
		// behaviour changes we have to modify this code.
		fhahnUnsubmitted Done Reply Inline Actions What about loads to the same address in the loop? At the moment, `LAA` cannot analyze dependences with invariant addresses. But if this limitation gets removed the code here may become incorrect, because it relies on this limitation IIUC? fhahn: What about loads to the same address in the loop? At the moment, `LAA` cannot analyze…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Loads from or stores to the same address in the loop? I'm sorry could you clarify what the problem is. As it is I don't understand the message. igor.kirillov: Loads from or stores to the same address in the loop? I'm sorry could you clarify what the…
		fhahnUnsubmitted Done Reply Inline Actions The case I was thinking about was something like the snippet below, where we have a load of the invariant address in the loop (`%lv = load...` in the example below). define void @reduc_store(i32* %dst, i32* readonly %src, i32* noalias %dst.2) { entry: %arrayidx = getelementptr inbounds i32, i32* %dst, i64 42 store i32 0, i32* %arrayidx, align 4 br label %for.body for.body: %0 = phi i32 [ 0, %entry ], [ %add, %for.body ] %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] %arrayidx1 = getelementptr inbounds i32, i32* %src, i64 %indvars.iv %1 = load i32, i32* %arrayidx1, align 4 %add = add nsw i32 %0, %1 %lv = load i32, i32* %arrayidx store i32 %add, i32* %arrayidx, align 4 %gep.dst.2 = getelementptr inbounds i32, i32* %dst.2, i64 %indvars.iv store i32 %lv, i32* %gep.dst.2, %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp eq i64 %indvars.iv.next, 1000 br i1 %exitcond, label %for.cond.cleanup, label %for.body for.cond.cleanup: ret void } fhahn: The case I was thinking about was something like the snippet below, where we have a load of the…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions In that case we do not vectorize, the rejection happens in `LoopAccessInfo::analyzeLoop` when loads ape processed: for (LoadInst LD : Loads) { Value Ptr = LD->getPointerOperand(); ... // See if there is an unsafe dependency between a load to a uniform address and // store to the same uniform address. if (UniformStores.count(Ptr)) { LLVM_DEBUG(dbgs() << "LAA: Found an unsafe dependency between a uniform " "load and uniform store to the same address!\n"); HasDependenceInvolvingLoopInvariantAddress = true; } ... I added `reduc_store_load` test anyway igor.kirillov: In that case we do not vectorize, the rejection happens in `LoopAccessInfo::analyzeLoop` when…
		fhahnUnsubmitted Done Reply Inline Actions Yeah but this is only due to some limitations that currently existing in LAA, right? I think we should at least make this clear somewhere, e.g. in a comment. fhahn: Yeah but this is only due to some limitations that currently existing in LAA, right? I think we…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Added comment igor.kirillov: Added comment
		ScalarEvolution *SE = PSE.getSE();
		SmallVector<StoreInst *, 4> UnhandledStores;
		david-armUnsubmitted Done Reply Inline Actions Hi @igor.kirillov, I think you're missing a test case here. Suppose we have two stores in the loop to the same invariant address. If the first store is predicated, but the second isn't we should still vectorise because the second store wins and the first one can be removed. At least the code here suggests that. I couldn't find any test that exercised this code path. david-arm: Hi @igor.kirillov, I think you're missing a test case here. Suppose we have two stores in the…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Added `reduc_store_final_store_predicated` test that address both requests igor.kirillov: Added `reduc_store_final_store_predicated` test that address both requests
		for (StoreInst *SI : LAI->getStoresToInvariantAddresses()) {
		if (isRecurringInvariantStore(SI)) {
		// Earlier stores to this address are effectively deadcode.
		erase_if(UnhandledStores, [SE, SI](StoreInst *I) {
		fhahnUnsubmitted Done Reply Inline Actions nit: no need to use `llvm::` fhahn: nit: no need to use `llvm::`
		return storeToSameAddress(SE, SI, I);
		});
		} else if (!isUniform(SI->getValueOperand()))
		UnhandledStores.push_back(SI);
		david-armUnsubmitted Done Reply Inline Actions I think you might be missing a test case for this. Can you make sure this code path is exercised please by your existing tests please? david-arm: I think you might be missing a test case for this. Can you make sure this code path is…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions It is now covered by new `reduc_store_final_store_predicated` test and an old test `reduc_store_inside_unrolled` also executes this path igor.kirillov: It is now covered by new `reduc_store_final_store_predicated` test and an old test…
		}

		bool IsOK = UnhandledStores.empty();
		// TODO: we should also validate against InvariantMemSets.
		if (!IsOK) {
		reportVectorizationFailure("We don't allow storing to uniform addresses",
		"write to a loop invariant address could not "
		"be vectorized",
		"CantVectorizeStoreToLoopInvariantAddress",
		ORE, TheLoop);
		return false;
		}
		}

		fhahnUnsubmitted Done Reply Inline Actions I'm not sure if the comment matches the code. There's no guarantee that all stores to the same address come before the store of the reduction AFAICT. E.g. the test below has a store to the same address after the store of the reduction. I think this may get mis-compiled, as the store to 0 gets replaced with the store of the final value of the reduction. define void @reduc_store(i32* %dst, i32* readonly %src) { entry: %gep.dst = getelementptr inbounds i32, i32* %dst, i64 42 store i32 1, i32* %gep.dst, align 4 br label %for.body for.body: %sum = phi i32 [ 0, %entry ], [ %add, %for.body ] %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] %gep.src = getelementptr inbounds i32, i32* %src, i64 %iv %0 = load i32, i32* %gep.src, align 4 %add = add nsw i32 %sum, %0 store i32 %add, i32* %gep.dst, align 4 store i32 0, i32* %gep.dst, align 4 %iv.next = add nuw nsw i64 %iv, 1 %exitcond = icmp eq i64 %iv.next, 1000 br i1 %exitcond, label %exit, label %for.body exit: ret void } fhahn: I'm not sure if the comment matches the code. There's no guarantee that all stores to the same…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Yeah, it was processed incorrectly. I added this test along with the fix. igor.kirillov: Yeah, it was processed incorrectly. I added this test along with the fix.
Requirements->addRuntimePointerChecks(LAI->getNumRuntimePointerChecks());		Requirements->addRuntimePointerChecks(LAI->getNumRuntimePointerChecks());
PSE.addPredicate(LAI->getPSE().getUnionPredicate());		PSE.addPredicate(LAI->getPSE().getUnionPredicate());
return true;		return true;
}		}

bool LoopVectorizationLegality::canVectorizeFPMath(		bool LoopVectorizationLegality::canVectorizeFPMath(
bool EnableStrictReductions) {		bool EnableStrictReductions) {

// First check if there is any ExactFP math or if we allow reassociations		// First check if there is any ExactFP math or if we allow reassociations
if (!Requirements->getExactFPInst() \|\| Hints->allowReordering())		if (!Requirements->getExactFPInst() \|\| Hints->allowReordering())
return true;		return true;

// If the above is false, we have ExactFPMath & do not allow reordering.		// If the above is false, we have ExactFPMath & do not allow reordering.
// If the EnableStrictReductions flag is set, first check if we have any		// If the EnableStrictReductions flag is set, first check if we have any
// Exact FP induction vars, which we cannot vectorize.		// Exact FP induction vars, which we cannot vectorize.
if (!EnableStrictReductions \|\|		if (!EnableStrictReductions \|\|
any_of(getInductionVars(), [&](auto &Induction) -> bool {		any_of(getInductionVars(), [&](auto &Induction) -> bool {
InductionDescriptor IndDesc = Induction.second;		InductionDescriptor IndDesc = Induction.second;
return IndDesc.getExactFPMathInst();		return IndDesc.getExactFPMathInst();
}))		}))
return false;		return false;

// We can now only vectorize if all reductions with Exact FP math also		// We can now only vectorize if all reductions with Exact FP math also
// have the isOrdered flag set, which indicates that we can move the		// have the isOrdered flag set, which indicates that we can move the
// reduction operations in-loop.		// reduction operations in-loop, and do not have intermediate store.
		david-armUnsubmitted Done Reply Inline Actions Maybe this can be folded into the `all_of` case below, i.e. return (all_of(getReductionVars(), [&](auto &Reduction) -> bool { const RecurrenceDescriptor &RdxDesc = Reduction.second; return !RdxDesc.hasExactFPMath() \|\| (RdxDesc.isOrdered() && !RdxDesc.IntermediateStore); })); Also, the problem with this code at the moment is that you could have a mixture of fast and ordered reductions in the same loop. There could be an intermediate store for one of the fast reductions, but not for the ordered ones. At the moment with your code we will just bail out in this case. david-arm: Maybe this can be folded into the `all_of` case below, i.e. return (all_of…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions You are right! It can be done much simpler and the check is needed only when ordered reduction is present. igor.kirillov: You are right! It can be done much simpler and the check is needed only when ordered reduction…
return (all_of(getReductionVars(), [&](auto &Reduction) -> bool {		return (all_of(getReductionVars(), [&](auto &Reduction) -> bool {
const RecurrenceDescriptor &RdxDesc = Reduction.second;		const RecurrenceDescriptor &RdxDesc = Reduction.second;
return !RdxDesc.hasExactFPMath() \|\| RdxDesc.isOrdered();		return !RdxDesc.hasExactFPMath() \|\|
		(RdxDesc.isOrdered() && !RdxDesc.IntermediateStore);
}));		}));
}		}

		bool LoopVectorizationLegality::isRecurringInvariantStore(StoreInst *SI) {
		for (auto &Reduction : Reductions) {
		fhahnUnsubmitted Done Reply Inline Actions nit: could be any_of fhahn: nit: could be any_of
		RecurrenceDescriptor DS = Reduction.second;
		david-armUnsubmitted Done Reply Inline Actions nit: I think you can you might be able to simplify the code here by removing `FoundMatchingRecurrence` . Then, lower down instead of the `break` you can just do if (DSI && (DSI == SI)) { IsPredicated = blockNeedsPredication(DSI->getParent()); return true; } and at the bottom of the function just do: return false; david-arm:* nit: I think you can you might be able to simplify the code here by removing…
		StoreInst *DSI = DS.IntermediateStore;
		if (DSI && DSI == SI)
		fhahnUnsubmitted Done Reply Inline Actions nit: can drop `DSI`, as `SI` is guaranteed to be non-null? fhahn: nit: can drop `DSI`, as `SI` is guaranteed to be non-null?
		return true;
		fhahnUnsubmitted Done Reply Inline Actions nit: redundant `()`. fhahn: nit: redundant `()`.
		}
		fhahnUnsubmitted Done Reply Inline Actions could we instead just do the check at the call site or is there a benefit of doing it here? fhahn: could we instead just do the check at the call site or is there a benefit of doing it here?
		return false;
		}

		bool LoopVectorizationLegality::isRecurringInvariantAddress(Value *V) {
		ScalarEvolution *SE = PSE.getSE();
		fhahnUnsubmitted Done Reply Inline Actions nit: could be any_of. fhahn: nit: could be any_of.
		for (auto &Reduction : Reductions) {
		RecurrenceDescriptor DS = Reduction.second;
		if (!DS.IntermediateStore)
		continue;
		Value *InvariantAddress = DS.IntermediateStore->getPointerOperand();
		if (V == InvariantAddress \|\|
		SE->getSCEV(V) == SE->getSCEV(InvariantAddress))
		return true;
		}
		return false;
		}

bool LoopVectorizationLegality::isInductionPhi(const Value *V) {		bool LoopVectorizationLegality::isInductionPhi(const Value *V) {
Value In0 = const_cast<Value >(V);		Value In0 = const_cast<Value >(V);
PHINode *PN = dyn_cast_or_null<PHINode>(In0);		PHINode *PN = dyn_cast_or_null<PHINode>(In0);
if (!PN)		if (!PN)
return false;		return false;

return Inductions.count(PN);		return Inductions.count(PN);
}		}
▲ Show 20 Lines • Show All 366 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,135 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr,
VPReplicateRecipe *RepRecipe,		VPReplicateRecipe *RepRecipe,
const VPIteration &Instance,		const VPIteration &Instance,
bool IfPredicateInstr,		bool IfPredicateInstr,
VPTransformState &State) {		VPTransformState &State) {
assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");		assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");

// llvm.experimental.noalias.scope.decl intrinsics must only be duplicated for		// llvm.experimental.noalias.scope.decl intrinsics must only be duplicated for
// the first lane and part.		// the first lane and part.
if (isa<NoAliasScopeDeclInst>(Instr))		if (isa<NoAliasScopeDeclInst>(Instr))
		fhahnUnsubmitted Done Reply Inline Actions We effectively sink the store outside the loop. In that case, I don't think we should create a recipe and also we should not consider its cost. fhahn: We effectively sink the store outside the loop. In that case, I don't think we should create a…
if (!Instance.isFirstIteration())		if (!Instance.isFirstIteration())
return;		return;

setDebugLocFromInst(Instr);		setDebugLocFromInst(Instr);

// Does this instruction return a value ?		// Does this instruction return a value ?
bool IsVoidRetTy = Instr->getType()->isVoidTy();		bool IsVoidRetTy = Instr->getType()->isVoidTy();

▲ Show 20 Lines • Show All 1,381 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixReduction(VPReductionPHIRecipe *PhiR,
// Create a phi node that merges control-flow from the backedge-taken check		// Create a phi node that merges control-flow from the backedge-taken check
// block and the middle block.		// block and the middle block.
PHINode *BCBlockPhi = PHINode::Create(PhiTy, 2, "bc.merge.rdx",		PHINode *BCBlockPhi = PHINode::Create(PhiTy, 2, "bc.merge.rdx",
LoopScalarPreHeader->getTerminator());		LoopScalarPreHeader->getTerminator());
for (unsigned I = 0, E = LoopBypassBlocks.size(); I != E; ++I)		for (unsigned I = 0, E = LoopBypassBlocks.size(); I != E; ++I)
BCBlockPhi->addIncoming(ReductionStartValue, LoopBypassBlocks[I]);		BCBlockPhi->addIncoming(ReductionStartValue, LoopBypassBlocks[I]);
BCBlockPhi->addIncoming(ReducedPartRdx, LoopMiddleBlock);		BCBlockPhi->addIncoming(ReducedPartRdx, LoopMiddleBlock);

		// If there were stores of the reduction value to a uniform memory address
		// inside the loop, create the final store here.
		if (StoreInst *SI = RdxDesc.IntermediateStore) {
		StoreInst *NewSI =
		Builder.CreateStore(ReducedPartRdx, SI->getPointerOperand());
		propagateMetadata(NewSI, SI);

		// If the reduction value is used in other places,
		// then let the code below create PHI's for that.
		}

// Now, we need to fix the users of the reduction variable		// Now, we need to fix the users of the reduction variable
// inside and outside of the scalar remainder loop.		// inside and outside of the scalar remainder loop.

// We know that the loop is in LCSSA form. We need to update the PHI nodes		// We know that the loop is in LCSSA form. We need to update the PHI nodes
// in the exit blocks. See comment on analogous loop in		// in the exit blocks. See comment on analogous loop in
// fixFirstOrderRecurrence for a more complete explaination of the logic.		// fixFirstOrderRecurrence for a more complete explaination of the logic.
if (!Cost->requiresScalarEpilogue(VF))		if (!Cost->requiresScalarEpilogue(VF))
for (PHINode &LCSSAPhi : LoopExitBlock->phis())		for (PHINode &LCSSAPhi : LoopExitBlock->phis())
▲ Show 20 Lines • Show All 3,364 Lines • ▼ Show 20 Lines	if (auto *Ptr = getLoadStorePointerOperand(Inst))
return Legal->isConsecutivePtr(getLoadStoreType(Inst), Ptr);		return Legal->isConsecutivePtr(getLoadStoreType(Inst), Ptr);
return false;		return false;
}		}

void LoopVectorizationCostModel::collectValuesToIgnore() {		void LoopVectorizationCostModel::collectValuesToIgnore() {
// Ignore ephemeral values.		// Ignore ephemeral values.
CodeMetrics::collectEphemeralValues(TheLoop, AC, ValuesToIgnore);		CodeMetrics::collectEphemeralValues(TheLoop, AC, ValuesToIgnore);

		// Find all stores to invariant variables. Since they are going to sink
		// outside the loop we do not need calculate cost for them.
		for (BasicBlock *BB : TheLoop->blocks())
		for (Instruction &I : *BB) {
		StoreInst *SI;
		if ((SI = dyn_cast<StoreInst>(&I)) &&
		Legal->isRecurringInvariantAddress(SI->getPointerOperand()))
		ValuesToIgnore.insert(&I);
		}

// Ignore type-promoting instructions we identified during reduction		// Ignore type-promoting instructions we identified during reduction
// detection.		// detection.
for (auto &Reduction : Legal->getReductionVars()) {		for (auto &Reduction : Legal->getReductionVars()) {
RecurrenceDescriptor &RedDes = Reduction.second;		RecurrenceDescriptor &RedDes = Reduction.second;
const SmallPtrSetImpl<Instruction *> &Casts = RedDes.getCastInsts();		const SmallPtrSetImpl<Instruction *> &Casts = RedDes.getCastInsts();
VecValuesToIgnore.insert(Casts.begin(), Casts.end());		VecValuesToIgnore.insert(Casts.begin(), Casts.end());
}		}
// Ignore type-casting instructions we identified during induction		// Ignore type-casting instructions we identified during induction
▲ Show 20 Lines • Show All 1,374 Lines • ▼ Show 20 Lines	for (Instruction &I : BB->instructionsWithoutDebug()) {
InductionsToMove.push_back(		InductionsToMove.push_back(
cast<VPWidenIntOrFpInductionRecipe>(Recipe));		cast<VPWidenIntOrFpInductionRecipe>(Recipe));
}		}
RecipeBuilder.setRecipe(Instr, Recipe);		RecipeBuilder.setRecipe(Instr, Recipe);
VPBB->appendRecipe(Recipe);		VPBB->appendRecipe(Recipe);
continue;		continue;
}		}

		// Invariant stores will be either deleted or go outside of loop so there
		fhahnUnsubmitted Done Reply Inline Actions `either deleted or go outside the loop` sounds a bit unclear. Aren't they moved to the vector exit block and store the final reduction value? fhahn: `either deleted or go outside the loop` sounds a bit unclear. Aren't they moved to the vector…
		// is no need to create a recipe for them.
		StoreInst *SI;
		fhahnUnsubmitted Done Reply Inline Actions I think it would also be good to include at least some information in the VPlan that the store is handled as part of the reduction. Perhaps `VPReductionRecipe` should print if the result is stored after the loop? Please add a test case to `vplan-printing.ll` fhahn: I think it would also be good to include at least some information in the VPlan that the store…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions I'm not sure how to display this information properly. This is how output of `VReductionRecipe::print` looks like now: REDUCE ir<%red.next> = ir<%red> + fast reduce.fadd (ir<%lv>) I could add something to the end but it doesn't seem to fit there. Also I have not found any proper `V.Recipe` where I could place something like `DELETE store .`. igor.kirillov: I'm not sure how to display this information properly. This is how output of `VReductionRecipe…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions @fhahn I added some info to VReductionRecipe::print and a relevant test. What do you think about it? igor.kirillov: @fhahn I added some info to VReductionRecipe::print and a relevant test. What do you think…
		if ((SI = dyn_cast<StoreInst>(&I)) &&
		Legal->isRecurringInvariantAddress(SI->getPointerOperand()))
		continue;

// Otherwise, if all widening options failed, Instruction is to be		// Otherwise, if all widening options failed, Instruction is to be
// replicated. This may create a successor for VPBB.		// replicated. This may create a successor for VPBB.
VPBasicBlock *NextVPBB =		VPBasicBlock *NextVPBB =
RecipeBuilder.handleReplication(Instr, Range, VPBB, Plan);		RecipeBuilder.handleReplication(Instr, Range, VPBB, Plan);
if (NextVPBB != VPBB) {		if (NextVPBB != VPBB) {
VPBB = NextVPBB;		VPBB = NextVPBB;
VPBB->setName(BB->hasName() ? BB->getName() + "." + Twine(VPBBsForBB++)		VPBB->setName(BB->hasName() ? BB->getName() + "." + Twine(VPBBsForBB++)
: "");		: "");
▲ Show 20 Lines • Show All 1,469 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/reductions-across-inner-and-outer-loop.ll

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	for1.inc: ; preds = %for2
%exit2 = icmp eq i64 %indvars.iv.next24, 100		%exit2 = icmp eq i64 %indvars.iv.next24, 100
br i1 %exit2, label %for1.loopexit, label %for1.header		br i1 %exit2, label %for1.loopexit, label %for1.header

for1.loopexit: ; preds = %for1.inc		for1.loopexit: ; preds = %for1.inc
%sum.inc.lcssa2 = phi i64 [ %sum.inc.lcssa, %for1.inc ]		%sum.inc.lcssa2 = phi i64 [ %sum.inc.lcssa, %for1.inc ]
ret i64 %sum.inc.lcssa2		ret i64 %sum.inc.lcssa2
}		}

		; Check that we do not interchange if reduction is stored in an invariant address inside inner loop
		; REMARKS: --- !Missed
		; REMARKS-NEXT: Pass: loop-interchange
		; REMARKS-NEXT: Name: UnsupportedPHIOuter
		; REMARKS-NEXT: Function: test4

		define i64 @test4([100 x [100 x i64]]* %Arr, i64* %dst) {
		entry:
		%gep.dst = getelementptr inbounds i64, i64* %dst, i64 42
		br label %for1.header

		for1.header: ; preds = %for1.inc, %entry
		%indvars.iv23 = phi i64 [ 0, %entry ], [ %indvars.iv.next24, %for1.inc ]
		%sum.outer = phi i64 [ 0, %entry ], [ %sum.inc.lcssa, %for1.inc ]
		br label %for2

		for2: ; preds = %for2, %for1.header
		%indvars.iv = phi i64 [ 0, %for1.header ], [ %indvars.iv.next.3, %for2 ]
		%sum.inner = phi i64 [ %sum.outer, %for1.header ], [ %sum.inc, %for2 ]
		%arrayidx = getelementptr inbounds [100 x [100 x i64]], [100 x [100 x i64]]* %Arr, i64 0, i64 %indvars.iv, i64 %indvars.iv23
		%lv = load i64, i64* %arrayidx, align 4
		%sum.inc = add i64 %sum.inner, %lv
		store i64 %sum.inc, i64* %gep.dst, align 4
		%indvars.iv.next.3 = add nuw nsw i64 %indvars.iv, 1
		%exit1 = icmp eq i64 %indvars.iv.next.3, 100
		br i1 %exit1, label %for1.inc, label %for2

		for1.inc: ; preds = %for2
		%sum.inc.lcssa = phi i64 [ %sum.inc, %for2 ]
		%indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1
		%exit2 = icmp eq i64 %indvars.iv.next24, 100
		br i1 %exit2, label %for1.loopexit, label %for1.header

		for1.loopexit: ; preds = %for1.inc
		%sum.inc.lcssa2 = phi i64 [ %sum.inc.lcssa, %for1.inc ]
		ret i64 %sum.inc.lcssa2
		}

; Check that we do not interchange or crash if the PHI in the outer loop gets a		; Check that we do not interchange or crash if the PHI in the outer loop gets a
; constant from the inner loop.		; constant from the inner loop.
; REMARKS: --- !Missed		; REMARKS: --- !Missed
; REMARKS-NEXT: Pass: loop-interchange		; REMARKS-NEXT: Pass: loop-interchange
; REMARKS-NEXT: Name: UnsupportedPHIOuter		; REMARKS-NEXT: Name: UnsupportedPHIOuter
; REMARKS-NEXT: Function: test_constant_inner_loop_res		; REMARKS-NEXT: Function: test_constant_inner_loop_res

define i64 @test_constant_inner_loop_res([100 x [100 x i64]]* %Arr) {		define i64 @test_constant_inner_loop_res([100 x [100 x i64]]* %Arr) {
Show All 30 Lines

llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll

Show First 20 Lines • Show All 312 Lines • ▼ Show 20 Lines	for.body:
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond.not = icmp eq i64 %iv.next, %n		%exitcond.not = icmp eq i64 %iv.next, %n
br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0		br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

for.end:		for.end:
ret float %.sroa.speculated		ret float %.sroa.speculated
}		}

		; ADD (with reduction stored in invariant address)

		; CHECK-REMARK: vectorized loop (vectorization width: vscale x 4, interleaved count: 2)
		define void @invariant_store(i32* %dst, i32* readonly %src) {
		; CHECK-LABEL: @invariant_store
		; CHECK: vector.body:
		; CHECK: %[[LOAD1:.*]] = load <vscale x 4 x i32>
		; CHECK: %[[LOAD2:.*]] = load <vscale x 4 x i32>
		; CHECK: %[[ADD1:.]] = add <vscale x 4 x i32> %{{.}}, %[[LOAD1]]
		; CHECK: %[[ADD2:.]] = add <vscale x 4 x i32> %{{.}}, %[[LOAD2]]
		; CHECK: call void @llvm.masked.scatter.nxv4i32.nxv4p0i32(<vscale x 4 x i32> %[[ADD1]]
		; CHECK: call void @llvm.masked.scatter.nxv4i32.nxv4p0i32(<vscale x 4 x i32> %[[ADD2]]
		; CHECK: middle.block:
		; CHECK: %[[ADD:.*]] = add <vscale x 4 x i32> %[[ADD2]], %[[ADD1]]
		; CHECK-NEXT: %[[SUM:.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> %[[ADD]])
		; CHECK-NEXT: store i32 %[[SUM]], i32* %gep.dst, align 4
		entry:
		%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42
		store i32 0, i32* %gep.dst, align 4
		br label %for.body
		for.body:
		%sum = phi i32 [ 0, %entry ], [ %add, %for.body ]
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
		%gep.src = getelementptr inbounds i32, i32* %src, i64 %indvars.iv
		%0 = load i32, i32* %gep.src, align 4
		%add = add nsw i32 %sum, %0
		store i32 %add, i32* %gep.dst, align 4
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%exitcond = icmp eq i64 %indvars.iv.next, 1000
		br i1 %exitcond, label %for.cond.cleanup, label %for.body

		for.cond.cleanup:
		ret void
		}

; Reduction cannot be vectorized		; Reduction cannot be vectorized

; MUL		; MUL

; CHECK-REMARK: Scalable vectorization not supported for the reduction operations found in this loop.		; CHECK-REMARK: Scalable vectorization not supported for the reduction operations found in this loop.
; CHECK-REMARK: vectorized loop (vectorization width: 4, interleaved count: 2)		; CHECK-REMARK: vectorized loop (vectorization width: 4, interleaved count: 2)
define i32 @mul(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {		define i32 @mul(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
; CHECK-LABEL: @mul		; CHECK-LABEL: @mul
▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll

Show First 20 Lines • Show All 1,303 Lines • ▼ Show 20 Lines	for.body:
br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !1		br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !1

for.end:		for.end:
ret float %muladd2		ret float %muladd2
}		}

declare float @llvm.fmuladd.f32(float, float, float)		declare float @llvm.fmuladd.f32(float, float, float)

		; Test case with invariant store where fadd is strict.
		define void @reduction_store_to_invariant_address(float* %dst, float* readonly %src) {
		; CHECK-ORDERED-LABEL: @reduction_store_to_invariant_address(
		; CHECK-ORDERED-NOT: vector.body

		; CHECK-UNORDERED-LABEL: @reduction_store_to_invariant_address(
		; CHECK-UNORDERED-NOT: vector.body

		; CHECK-NOT-VECTORIZED-LABEL: @reduction_store_to_invariant_address(
		; CHECK-NOT-VECTORIZED-NOT: vector.body

		entry:
		%arrayidx = getelementptr inbounds float, float* %dst, i64 42
		store float 0.000000e+00, float* %arrayidx, align 4
		br label %for.body

		for.body:
		%0 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
		%arrayidx1 = getelementptr inbounds float, float* %src, i64 %indvars.iv
		%1 = load float, float* %arrayidx1, align 4
		%add = fadd float %0, %1
		store float %add, float* %arrayidx, align 4
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%exitcond = icmp eq i64 %indvars.iv.next, 1000
		br i1 %exitcond, label %for.cond.cleanup, label %for.body

		for.cond.cleanup:
		ret void
		}

!0 = distinct !{!0, !5, !9, !11}		!0 = distinct !{!0, !5, !9, !11}
!1 = distinct !{!1, !5, !10, !11}		!1 = distinct !{!1, !5, !10, !11}
!2 = distinct !{!2, !6, !9, !11}		!2 = distinct !{!2, !6, !9, !11}
!3 = distinct !{!3, !7, !9, !11, !12}		!3 = distinct !{!3, !7, !9, !11, !12}
!4 = distinct !{!4, !8, !10, !11}		!4 = distinct !{!4, !8, !10, !11}
!5 = !{!"llvm.loop.vectorize.width", i32 8}		!5 = !{!"llvm.loop.vectorize.width", i32 8}
!6 = !{!"llvm.loop.vectorize.width", i32 4}		!6 = !{!"llvm.loop.vectorize.width", i32 4}
!7 = !{!"llvm.loop.vectorize.width", i32 2}		!7 = !{!"llvm.loop.vectorize.width", i32 2}
!8 = !{!"llvm.loop.vectorize.width", i32 1}		!8 = !{!"llvm.loop.vectorize.width", i32 1}
!9 = !{!"llvm.loop.interleave.count", i32 1}		!9 = !{!"llvm.loop.interleave.count", i32 1}
!10 = !{!"llvm.loop.interleave.count", i32 4}		!10 = !{!"llvm.loop.interleave.count", i32 4}
!11 = !{!"llvm.loop.vectorize.enable", i1 true}		!11 = !{!"llvm.loop.vectorize.enable", i1 true}
!12 = !{!"llvm.loop.vectorize.predicate.enable", i1 true}		!12 = !{!"llvm.loop.vectorize.predicate.enable", i1 true}
!13 = distinct !{!13, !6, !9, !11}		!13 = distinct !{!13, !6, !9, !11}

llvm/test/Transforms/LoopVectorize/reduction.ll

Show First 20 Lines • Show All 456 Lines • ▼ Show 20 Lines	for.body:
%exitcond.1 = icmp eq i32 %inc6.1, 22		%exitcond.1 = icmp eq i32 %inc6.1, 22
br i1 %exitcond.1, label %exit, label %for.body		br i1 %exitcond.1, label %exit, label %for.body

exit:		exit:
%inc.2 = add nsw i32 %inc511.1.inc4.1, 2		%inc.2 = add nsw i32 %inc511.1.inc4.1, 2
ret i32 %inc.2		ret i32 %inc.2
}		}

		; This test checks that we can vectorize loop with reduction variable
		; stored in an invariant address.
		david-armUnsubmitted Done Reply Inline Actions nit: I think this should be `invariant` david-arm: nit: I think this should be `invariant`
		;
		; int sum = 0;
		fhahnUnsubmitted Done Reply Inline Actions Could you add a brief textual explanation of what the test covers? fhahn: Could you add a brief textual explanation of what the test covers?
		; for(i=0..N) {
		fhahnUnsubmitted Not Done Reply Inline Actions FWIW for such compact IR test cases, the pseudo code doesn't add much value in my personal opinion. Better to strive to make the tests as readable/compact as possible and have a comment explaining what it tests when needed. fhahn: FWIW for such compact IR test cases, the pseudo code doesn't add much value in my personal…
		igor.kirillovAuthorUnsubmitted Not Done Reply Inline Actions When I look at pseudo-code I immediately understand what it is about, whereas looking at IR takes at least 30 seconds of cognitive exertions. And it is also easier to see the difference between and purpose of all those quite similar tests. But I delete it, of course, if you insist :) igor.kirillov: When I look at pseudo-code I immediately understand what it is about, whereas looking at IR…
		; sum += src[i];
		; dst[42] = sum;
		david-armUnsubmitted Done Reply Inline Actions Is it possible to add a simple floating point test with "fadd fast"? david-arm: Is it possible to add a simple floating point test with "fadd fast"?
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Added! see reduc_store_fadd_fast function igor.kirillov: Added! see reduc_store_fadd_fast function
		; }
		; CHECK-LABEL: @reduc_store

		; CHECK: vector.body:
		; CHECK-NOT: store i32 %{{[0-9]+}}, i32* %gep.dst
		; CHECK: middle.block:
		david-armUnsubmitted Done Reply Inline Actions Can you put all the CHECK lines in the same place near the top of the function please to be consistent with the existing tests? david-arm: Can you put all the CHECK lines in the same place near the top of the function please to be…
		; CHECK: store i32 %{{[0-9]+}}, i32* %gep.dst
		define void @reduc_store(i32* %dst, i32* readonly %src) {
		entry:
		%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42
		store i32 0, i32* %gep.dst, align 4
		fhahnUnsubmitted Done Reply Inline Actions perhaps rename to `%sum` or something like that, to make it a bit easier to read the test? fhahn: perhaps rename to `%sum` or something like that, to make it a bit easier to read the test?
		br label %for.body
		for.body:
		%sum = phi i32 [ 0, %entry ], [ %add, %for.body ]
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
		%gep.src = getelementptr inbounds i32, i32* %src, i64 %indvars.iv
		fhahnUnsubmitted Done Reply Inline Actions The names for the 2 different GEPs in the tests are very similar. Could you rename them to make it easier to distinguish them (e.g. something like `%gep.src`/`%gep.dst`).. fhahn: The names for the 2 different GEPs in the tests are very similar. Could you rename them to make…
		%0 = load i32, i32* %gep.src, align 4
		%add = add nsw i32 %sum, %0
		store i32 %add, i32* %gep.dst, align 4
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%exitcond = icmp eq i64 %indvars.iv.next, 1000
		br i1 %exitcond, label %for.cond.cleanup, label %for.body

		for.cond.cleanup:
		ret void
		}

		; Same as above but with floating point numbers instead.
		;
		; float sum = 0;
		; for(i=0..N) {
		; sum += src[i];
		; dst[42] = sum;
		; }
		; CHECK-LABEL: @reduc_store_fadd_fast
		; CHECK: vector.body:
		; CHECK-NOT: store float %{{[0-9]+}}, float* %gep.dst
		; CHECK: middle.block:
		; CHECK: store float %{{[0-9]+}}, float* %gep.dst
		define void @reduc_store_fadd_fast(float* %dst, float* readonly %src) {
		entry:
		%gep.dst = getelementptr inbounds float, float* %dst, i64 42
		store float 0.000000e+00, float* %gep.dst, align 4
		br label %for.body

		for.body:
		david-armUnsubmitted Done Reply Inline Actions Hi @igor.kirillov, this doesn't look right. The test is called `@reduc_store_fadd_fast`, but there is not `fast` keyword on the `fadd` instruction. I don't think we should even be vectorising this because it requires ordered reductions, which you haven't enabled for this test. Can you change the name of this to `@reduc_store_fadd_ordered` and investigate why we are vectorising this? Can you also add a separate test called `@reduc_store_fadd_fast` that actually has the `fast` keyword too? david-arm: Hi @igor.kirillov, this doesn't look right. The test is called `@reduc_store_fadd_fast`, but…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions I missed the `fast` keyword there. As for why this code gets vectorized - it happens because of `Hints->allowReordering()` returning true in `LoopVectorizationLegality::canVectorizeFPMath`. As you can see the test specifies vector width (-force-vector-width=4) and llvm allows to process ordered instructions in unordered manner (see also `LoopVectorizeHints::allowReordering` function) in that case. I added a new test with `fast` keyword to `AArch64/strict-fadd.ll` and the loop is not vectorized there. igor.kirillov: I missed the `fast` keyword there. As for why this code gets vectorized - it happens because of…
		%sum = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
		%gep.src = getelementptr inbounds float, float* %src, i64 %indvars.iv
		%0 = load float, float* %gep.src, align 4
		%add = fadd fast float %sum, %0
		store float %add, float* %gep.dst, align 4
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%exitcond = icmp eq i64 %indvars.iv.next, 1000
		br i1 %exitcond, label %for.cond.cleanup, label %for.body

		for.cond.cleanup:
		ret void
		}

		; Check that if we have a read from an invariant address, we do not vectorize.
		;
		; int sum = 0;
		; for(i=0..N) {
		; sum += src[i];
		; dst.2[i] = dst[42];
		; dst[42] = sum;
		; }
		peterwaller-armUnsubmitted Not Done Reply Inline Actions Suggestion for a test: what about testing two different destination addresses modified in the loop body? peterwaller-arm: Suggestion for a test: what about testing two different destination addresses modified in the…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Actually this is not supported so far, only one invariant address is possible. (See llvm/lib/Analysis/IVDescriptors.cpp:318). Do you think we need test that checks that vectorisation is not happening? igor.kirillov: Actually this is not supported so far, only one invariant address is possible. (See…
		peterwaller-armUnsubmitted Not Done Reply Inline Actions Given that it is not supported, such a test would demonstrate that your bailout logic for this case is working as intended. peterwaller-arm: Given that it is not supported, such a test would demonstrate that your bailout logic for this…
		; CHECK-LABEL: @reduc_store_load
		; CHECK-NOT: vector.body:
		define void @reduc_store_load(i32* %dst, i32* readonly %src, i32* noalias %dst.2) {
		entry:
		%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42
		store i32 0, i32* %gep.dst, align 4
		br label %for.body
		fhahnUnsubmitted Done Reply Inline Actions nit: newline. fhahn: nit: newline.
		for.body:
		%sum = phi i32 [ 0, %entry ], [ %add, %for.body ]
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
		%gep.src = getelementptr inbounds i32, i32* %src, i64 %indvars.iv
		%0 = load i32, i32* %gep.src, align 4
		%add = add nsw i32 %sum, %0
		%lv = load i32, i32* %gep.dst
		%gep.dst.2 = getelementptr inbounds i32, i32* %dst.2, i64 %indvars.iv
		store i32 %lv, i32* %gep.dst.2, align 4
		store i32 %add, i32* %gep.dst, align 4
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%exitcond = icmp eq i64 %indvars.iv.next, 1000
		br i1 %exitcond, label %for.cond.cleanup, label %for.body

		for.cond.cleanup:
		ret void
		}

		; Final value is not guaranteed to be stored in an invariant address.
		; We don't vectorize in that case.
		;
		; int sum = 0;
		; for(i=0..N) {
		; int diff = y[i] - x[i];
		; if (diff > 0) {
		; sum = += diff;
		; *t = sum;
		; }
		; }
		; CHECK-LABEL: @reduc_cond_store
		; CHECK-NOT: vector.body
		define void @reduc_cond_store(i32* %t, i32* readonly %x, i32* readonly %y, i32 %n) {
		entry:
		store i32 0, i32* %t, align 4
		%cmp1 = icmp sgt i32 %n, 0
		br i1 %cmp1, label %for.body.preheader, label %for.end
		fhahnUnsubmitted Done Reply Inline Actions not needed? fhahn: not needed?

		for.body.preheader: ; preds = %entry
		%wide.trip.count = zext i32 %n to i64
		fhahnUnsubmitted Done Reply Inline Actions nit: is this needed? Can just pass `%n` as `i64` fhahn: nit: is this needed? Can just pass `%n` as `i64`
		br label %for.body

		for.body: ; preds = %if.end, %for.body.preheader
		%sum = phi i32 [ 0, %for.body.preheader ], [ %sum.2, %if.end ]
		%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %if.end ]
		peterwaller-armUnsubmitted Done Reply Inline Actions CHECK-? peterwaller-arm: CHECK-?
		%gep.y = getelementptr inbounds i32, i32* %y, i64 %indvars.iv
		%0 = load i32, i32* %gep.y, align 4
		%gep.x = getelementptr inbounds i32, i32* %x, i64 %indvars.iv
		%1 = load i32, i32* %gep.x, align 4
		%diff = sub nsw i32 %0, %1
		%cmp2 = icmp sgt i32 %diff, 0
		br i1 %cmp2, label %if.then, label %if.end

		if.then: ; preds = %for.body
		%sum.1 = add nsw i32 %diff, %sum
		store i32 %sum.1, i32* %t, align 4
		br label %if.end

		if.end: ; preds = %if.then, %for.body
		%sum.2 = phi i32 [ %sum.1, %if.then ], [ %0, %for.body ]
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
		br i1 %exitcond, label %for.end, label %for.body

		for.end: ; preds = %if.end, %entry
		ret void
		}

		; Check that we can vectorize code with several stores to an invariant address
		; with condition that final reduction value is stored too.
		;
		; int sum = 0;
		; for(int i=0; i < 1000; i+=2) {
		; sum += src[i];
		; dst[42] = sum;
		; sum += src[i+1];
		; dst[42] = sum;
		; }
		; CHECK-LABEL: @reduc_store_inside_unrolled
		; CHECK: vector.body:
		; CHECK-NOT: store i32 %{{[0-9]+}}, i32* %gep.dst
		; CHECK: middle.block:
		; CHECK: store i32 %{{[0-9]+}}, i32* %gep.dst
		fhahnUnsubmitted Done Reply Inline Actions Here (and for the other tests it would be good to check at least the vector reduction sequence and that the correct value is stored. fhahn: Here (and for the other tests it would be good to check at least the vector reduction sequence…
		; CHECK: ret void
		define void @reduc_store_inside_unrolled(i32* %dst, i32* readonly %src) {
		entry:
		%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42
		br label %for.body

		for.cond.cleanup:
		ret void

		for.body:
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
		%sum = phi i32 [ 0, %entry ], [ %sum.1, %for.body ]
		%gep.src = getelementptr inbounds i32, i32* %src, i64 %indvars.iv
		%0 = load i32, i32* %gep.src, align 4
		%sum.1 = add nsw i32 %0, %sum
		store i32 %sum.1, i32* %gep.dst, align 4
		%1 = or i64 %indvars.iv, 1
		%gep.src.1 = getelementptr inbounds i32, i32* %src, i64 %1
		%2 = load i32, i32* %gep.src.1, align 4
		%sum.2 = add nsw i32 %2, %sum.1
		store i32 %sum.2, i32* %gep.dst, align 4
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
		%cmp = icmp slt i64 %indvars.iv.next, 1000
		br i1 %cmp, label %for.body, label %for.cond.cleanup
		}

		; We cannot vectorize if two (or more) invariant stores exist in a loop.
		;
		david-armUnsubmitted Done Reply Inline Actions Can you also add a test where the first store is predicated and the second one isn't? According to the code changes in this patch we should vectorise this case because the second one overrides the first. david-arm: Can you also add a test where the first store is predicated and the second one isn't? According…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Added! See `reduc_store_middle_store_predicated` igor.kirillov: Added! See `reduc_store_middle_store_predicated`
		; int sum = 0;
		; for(int i=0; i < 1000; i+=2) {
		; sum += src[i];
		; dst[42] = sum;
		; sum += src[i+1];
		; other_dst[42] = sum;
		; }
		; CHECK-LABEL: @reduc_double_invariant_store
		; CHECK-NOT: vector.body:
		define void @reduc_double_invariant_store(i32* %dst, i32* %other_dst, i32* readonly %src) {
		entry:
		%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42
		%gep.other_dst = getelementptr inbounds i32, i32* %other_dst, i64 42
		br label %for.body

		for.cond.cleanup:
		fhahnUnsubmitted Done Reply Inline Actions move exit block to end of function? fhahn: move exit block to end of function?
		ret void

		for.body:
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
		%sum = phi i32 [ 0, %entry ], [ %sum.1, %for.body ]
		%arrayidx = getelementptr inbounds i32, i32* %src, i64 %indvars.iv
		%0 = load i32, i32* %arrayidx, align 4
		%sum.1 = add nsw i32 %0, %sum
		store i32 %sum.1, i32* %gep.dst, align 4
		%1 = or i64 %indvars.iv, 1
		%arrayidx4 = getelementptr inbounds i32, i32* %src, i64 %1
		%2 = load i32, i32* %arrayidx4, align 4
		%sum.2 = add nsw i32 %2, %sum.1
		store i32 %sum.2, i32* %gep.other_dst, align 4
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
		%cmp = icmp slt i64 %indvars.iv.next, 1000
		br i1 %cmp, label %for.body, label %for.cond.cleanup
		}

		; int sum = 0;
		; for(int i=0; i < 1000; i+=2) {
		; sum += src[i];
		; if (src[i+1] > 0)
		; dst[42] = sum;
		; sum += src[i+1];
		; dst[42] = sum;
		; }
		; CHECK-LABEL: @reduc_store_middle_store_predicated
		; CHECK: vector.body:
		; CHECK-NOT: store i32 %{{[0-9]+}}, i32* %gep.dst
		; CHECK: middle.block:
		; CHECK: store i32 %{{[0-9]+}}, i32* %gep.dst
		; CHECK: ret void
		define void @reduc_store_middle_store_predicated(i32* %dst, i32* readonly %src) {
		entry:
		%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42
		br label %for.body

		for.cond.cleanup: ; preds = %latch
		fhahnUnsubmitted Done Reply Inline Actions move exit block to end of function? fhahn: move exit block to end of function?
		ret void

		for.body: ; preds = %latch, %entry
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %latch ]
		%sum = phi i32 [ 0, %entry ], [ %sum.2, %latch ]
		%gep.src = getelementptr inbounds i32, i32* %src, i64 %indvars.iv
		%0 = load i32, i32* %gep.src, align 4
		%sum.1 = add nsw i32 %0, %sum
		%cmp = icmp sgt i32 %0, 0
		br i1 %cmp, label %predicated, label %latch

		predicated: ; preds = %for.body
		store i32 %sum.1, i32* %gep.dst, align 4
		br label %latch

		latch: ; preds = %predicated, %for.body
		%1 = or i64 %indvars.iv, 1
		%gep.src.1 = getelementptr inbounds i32, i32* %src, i64 %1
		%2 = load i32, i32* %gep.src.1, align 4
		%sum.2 = add nsw i32 %2, %sum.1
		store i32 %sum.2, i32* %gep.dst, align 4
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
		%cmp.1 = icmp slt i64 %indvars.iv.next, 1000
		br i1 %cmp.1, label %for.body, label %for.cond.cleanup
		}

		; int sum = 0;
		; for(int i=0; i < 1000; i+=2) {
		; sum += src[i];
		; dst[42] = sum;
		; sum += src[i+1];
		; if (src[i+1] > 0)
		; dst[42] = sum;
		; }
		; CHECK-LABEL: @reduc_store_final_store_predicated
		; CHECK-NOT: vector.body:
		define void @reduc_store_final_store_predicated(i32* %dst, i32* readonly %src) {
		entry:
		%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42
		br label %for.body

		for.cond.cleanup: ; preds = %latch
		fhahnUnsubmitted Done Reply Inline Actions move exit block to end of function? fhahn: move exit block to end of function?
		ret void

		for.body: ; preds = %latch, %entry
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %latch ]
		%sum = phi i32 [ 0, %entry ], [ %sum.1, %latch ]
		%arrayidx = getelementptr inbounds i32, i32* %src, i64 %indvars.iv
		%0 = load i32, i32* %arrayidx, align 4
		%sum.1 = add nsw i32 %0, %sum
		store i32 %sum.1, i32* %gep.dst, align 4
		%1 = or i64 %indvars.iv, 1
		%gep.src.1 = getelementptr inbounds i32, i32* %src, i64 %1
		%2 = load i32, i32* %gep.src.1, align 4
		%sum.2 = add nsw i32 %2, %sum.1
		%cmp1 = icmp sgt i32 %2, 0
		br i1 %cmp1, label %predicated, label %latch

		predicated: ; preds = %for.body
		store i32 %sum.2, i32* %gep.dst, align 4
		br label %latch

		latch: ; preds = %predicated, %for.body
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
		%cmp = icmp slt i64 %indvars.iv.next, 1000
		br i1 %cmp, label %for.body, label %for.cond.cleanup
		}

		; Final value used outside of loop does not prevent vectorization
		;
		; int sum = 0;
		; for(int i=0; i < 1000; i++) {
		; sum += src[i];
		; dst[42] = sum;
		; }
		; dst[43] = sum;
		; CHECK-LABEL: @reduc_store_inoutside
		; CHECK: vector.body:
		; CHECK-NOT: store i32 %{{[0-9]+}}, i32* %gep.src
		; CHECK: middle.block:
		; CHECK: store i32 %[[RDX:[0-9]+]], i32* %gep.src
		; CHECK: for.cond.cleanup:
		; CHECK: %[[PHI:[a-zA-Z.0-9]+]] = phi i32 {{.*}} %[[RDX]]
		; CHECK: %[[ADDR:[a-zA-Z.0-9]+]] = getelementptr inbounds i32, i32* %dst, i64 43
		; CHECK: store i32 %[[PHI]], i32* %[[ADDR]]
		; CHECK: ret void
		define void @reduc_store_inoutside(i32* %dst, i32* readonly %src) {
		entry:
		%gep.src = getelementptr inbounds i32, i32* %dst, i64 42
		br label %for.body

		for.cond.cleanup:
		fhahnUnsubmitted Done Reply Inline Actions move exit block to end of function? fhahn: move exit block to end of function?
		%sum.lcssa = phi i32 [ %sum.1, %for.body ]
		%gep.src.1 = getelementptr inbounds i32, i32* %dst, i64 43
		store i32 %sum.lcssa, i32* %gep.src.1, align 4
		ret void

		for.body:
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
		%sum = phi i32 [ 0, %entry ], [ %sum.1, %for.body ]
		%arrayidx = getelementptr inbounds i32, i32* %src, i64 %indvars.iv
		%0 = load i32, i32* %arrayidx, align 4
		%sum.1 = add nsw i32 %0, %sum
		store i32 %sum.1, i32* %gep.src, align 4
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%exitcond = icmp eq i64 %indvars.iv.next, 1000
		br i1 %exitcond, label %for.cond.cleanup, label %for.body
		}

;CHECK-LABEL: @reduction_sum_multiuse(		;CHECK-LABEL: @reduction_sum_multiuse(
;CHECK: phi <4 x i32>		;CHECK: phi <4 x i32>
;CHECK: load <4 x i32>		;CHECK: load <4 x i32>
;CHECK: add <4 x i32>		;CHECK: add <4 x i32>
;CHECK: call i32 @llvm.vector.reduce.add.v4i32(<4 x i32>		;CHECK: call i32 @llvm.vector.reduce.add.v4i32(<4 x i32>
;CHECK: %sum.copy = phi i32 [ %[[SCALAR:.]], %.lr.ph ], [ %[[VECTOR:.]], %middle.block ]		;CHECK: %sum.copy = phi i32 [ %[[SCALAR:.]], %.lr.ph ], [ %[[VECTOR:.]], %middle.block ]
;CHECK: ret i32		;CHECK: ret i32
define i32 @reduction_sum_multiuse(i32 %n, i32* noalias nocapture %A, i32* noalias nocapture %B) {		define i32 @reduction_sum_multiuse(i32 %n, i32* noalias nocapture %A, i32* noalias nocapture %B) {
▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Support reductions that store intermediary resultClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 390414

llvm/include/llvm/Analysis/IVDescriptors.h

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

llvm/lib/Analysis/IVDescriptors.cpp

llvm/lib/Analysis/LoopAccessAnalysis.cpp

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopInterchange/reductions-across-inner-and-outer-loop.ll

llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll

llvm/test/Transforms/LoopVectorize/reduction.ll

[LoopVectorize] Support reductions that store intermediary result
ClosedPublic