This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
8/8
IVDescriptors.h
5/5
LoopAccessAnalysis.h
-
Transforms/Vectorize/
-
Vectorize/
2/2
LoopVectorizationLegality.h
-
lib/
-
Analysis/
15/15
IVDescriptors.cpp
1/1
LoopAccessAnalysis.cpp
-
Transforms/Vectorize/
-
Vectorize/
24/24
LoopVectorizationLegality.cpp
5/5
LoopVectorize.cpp
-
VPlan.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
scalable-reductions.ll
12/12
reduction-with-invariant-store.ll
-
vplan-printing.ll

Differential D110235

[LoopVectorize] Support reductions that store intermediary result
ClosedPublic

Authored by igor.kirillov on Sep 22 2021, 5:51 AM.

Download Raw Diff

Details

Reviewers

fhahn
Ayal
david-arm
kmclaughlin
peterwaller-arm
reames

Commits

rG4e5e042d9a4a: [LoopVectorize] Support reductions that store intermediary result

Summary

Adds ability to vectorize loops containing a store to a loop-invariant address as part of a reduction that isn't converted to SSA form due to lack of aliasing info. Runtime checks are generated to ensure the store does not alias any other accesses in the loop.
Ordered fadd reductions are not yet supported.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

fhahn added inline comments.Oct 11 2021, 2:37 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
939	What about loads to the same address in the loop? At the moment, `LAA` cannot analyze dependences with invariant addresses. But if this limitation gets removed the code here may become incorrect, because it relies on this limitation IIUC?

david-arm added inline comments.Oct 11 2021, 3:51 AM

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
643	I assume here that InvariantStores refers to stores to an invariant (i.e. uniform?) address, rather than storing an invariant value to variant address. If so, perhaps it could be named StoresToInvariantAddress or something like that?

david-arm added inline comments.Oct 11 2021, 6:53 AM

llvm/test/Transforms/LoopVectorize/reduction.ll
471 ↗	(On Diff #378503)	Is it possible to add a simple floating point test with "fadd fast"?

It seems like the main problem is that we potentially bail out too early at the moment when checking for reductions due to the store, but once we generate runtime checks, sinking the store may become legal (see inline comment about loads to the same address)? If that's the case, ideally we'd just sink any such loads/stores before detecting reductions once we know they can be sunk due to runtime checks, but unfortunately I do not think that's possible with the current structure/ordering.

Is it worth for me to explore if it possible to do other way around or we should work on with the solution from this merge?

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
643	@fhah Just to make sure we are on the same line - for me uniform and invariant are just synonyms, isn't it so?
llvm/include/llvm/Transforms/Utils/LoopUtils.h
402 ↗	(On Diff #378503)	What do you think if I make this function a public member of `class ScalarEvolution`? Or is there a better place for it?
llvm/lib/Analysis/IVDescriptors.cpp
332	I checked and only `LoopInterchangeLegality` is using this function and it is not affected by stores. Anyway, I can add a parameter to `RecurrenceDescriptor::isReductionPHI` or a member to `RecurrenceDescriptor` allowing or not to handle stores. What do you think about it? The comment is to be updated, yes.
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
939	Loads from or stores to the same address in the loop? I'm sorry could you clarify what the problem is. As it is I don't understand the message.

Hi @igor.kirillov, is it also possible to get this working for ordered reductions, i.e.

float sum = 0;
for(i=0..N) {
  sum += src[i];
  dst[42] = sum;
}

when building with -O3? I think it might mean updating checkOrderedReductions to look through the store. If it looks too difficult to do as part of this patch we can always follow-up with a patch later.

Update commit message and comments
Move storeToSameAddress to ScalarEvolution
Add fadd fast test
Do not apply patch for ordered fadd reductions

In D110235#3072725, @david-arm wrote:
Hi @igor.kirillov, is it also possible to get this working for ordered reductions, i.e.
float sum = 0;
for(i=0..N) {
  sum += src[i];
  dst[42] = sum;
}
when building with -O3? I think it might mean updating checkOrderedReductions to look through the store. If it looks too difficult to do as part of this patch we can always follow-up with a patch later.

Yes, I added a check to LoopVectorizationLegality::canVectorizeFPMath so as not to allow this reduction when math is strict. Enabling it requires some work and it is better to do it separately.

llvm/test/Transforms/LoopVectorize/reduction.ll
471 ↗	(On Diff #378503)	Added! see reduc_store_fadd_fast function

igor.kirillov edited the summary of this revision. (Show Details)Nov 7 2021, 5:02 AM

igor.kirillov marked an inline comment as done.

Harbormaster completed remote builds in B132898: Diff 385336.Nov 7 2021, 5:43 AM

david-arm added inline comments.Nov 8 2021, 6:05 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1008	Maybe this can be folded into the `all_of` case below, i.e. return (all_of(getReductionVars(), [&](auto &Reduction) -> bool { const RecurrenceDescriptor &RdxDesc = Reduction.second; return !RdxDesc.hasExactFPMath() \|\| (RdxDesc.isOrdered() && !RdxDesc.IntermediateStore); })); Also, the problem with this code at the moment is that you could have a mixture of fast and ordered reductions in the same loop. There could be an intermediate store for one of the fast reductions, but not for the ordered ones. At the moment with your code we will just bail out in this case.

Update intermediate store check for ordered fadd vectorization

igor.kirillov added inline comments.Nov 9 2021, 7:24 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1008	You are right! It can be done much simpler and the check is needed only when ordered reduction is present.

Harbormaster completed remote builds in B133249: Diff 385810.Nov 9 2021, 7:46 AM

david-arm added inline comments.Nov 10 2021, 5:40 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
941	Hi @igor.kirillov, I think you're missing a test case here. Suppose we have two stores in the loop to the same invariant address. If the first store is predicated, but the second isn't we should still vectorise because the second store wins and the first one can be removed. At least the code here suggests that. I couldn't find any test that exercised this code path.
949	I think you might be missing a test case for this. Can you make sure this code path is exercised please by your existing tests please?
llvm/test/Transforms/LoopVectorize/reduction.ll
477 ↗	(On Diff #385810)	Can you put all the CHECK lines in the same place near the top of the function please to be consistent with the existing tests?
517 ↗	(On Diff #385810)	Hi @igor.kirillov, this doesn't look right. The test is called `@reduc_store_fadd_fast`, but there is not `fast` keyword on the `fadd` instruction. I don't think we should even be vectorising this because it requires ordered reductions, which you haven't enabled for this test. Can you change the name of this to `@reduc_store_fadd_ordered` and investigate why we are vectorising this? Can you also add a separate test called `@reduc_store_fadd_fast` that actually has the `fast` keyword too?

Update tests

Harbormaster completed remote builds in B134854: Diff 388126.Nov 18 2021, 2:03 AM

igor.kirillov marked 2 inline comments as done and an inline comment as not done.Nov 18 2021, 3:06 AM

igor.kirillov added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
941	Added `reduc_store_final_store_predicated` test that address both requests
949	It is now covered by new `reduc_store_final_store_predicated` test and an old test `reduc_store_inside_unrolled` also executes this path
llvm/test/Transforms/LoopVectorize/reduction.ll
517 ↗	(On Diff #385810)	I missed the `fast` keyword there. As for why this code gets vectorized - it happens because of `Hints->allowReordering()` returning true in `LoopVectorizationLegality::canVectorizeFPMath`. As you can see the test specifies vector width (-force-vector-width=4) and llvm allows to process ordered instructions in unordered manner (see also `LoopVectorizeHints::allowReordering` function) in that case. I added a new test with `fast` keyword to `AArch64/strict-fadd.ll` and the loop is not vectorized there.

david-arm added inline comments.Nov 18 2021, 3:10 AM

llvm/test/Transforms/LoopVectorize/reduction.ll
656 ↗	(On Diff #388126)	Can you also add a test where the first store is predicated and the second one isn't? According to the code changes in this patch we should vectorise this case because the second one overrides the first.

Add test

llvm/test/Transforms/LoopVectorize/reduction.ll
656 ↗	(On Diff #388126)	Added! See `reduc_store_middle_store_predicated`

fhahn added inline comments.Nov 18 2021, 3:44 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
939	The case I was thinking about was something like the snippet below, where we have a load of the invariant address in the loop (`%lv = load...` in the example below). define void @reduc_store(i32* %dst, i32* readonly %src, i32* noalias %dst.2) { entry: %arrayidx = getelementptr inbounds i32, i32* %dst, i64 42 store i32 0, i32* %arrayidx, align 4 br label %for.body for.body: %0 = phi i32 [ 0, %entry ], [ %add, %for.body ] %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] %arrayidx1 = getelementptr inbounds i32, i32* %src, i64 %indvars.iv %1 = load i32, i32* %arrayidx1, align 4 %add = add nsw i32 %0, %1 %lv = load i32, i32* %arrayidx store i32 %add, i32* %arrayidx, align 4 %gep.dst.2 = getelementptr inbounds i32, i32* %dst.2, i64 %indvars.iv store i32 %lv, i32* %gep.dst.2, %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp eq i64 %indvars.iv.next, 1000 br i1 %exitcond, label %for.cond.cleanup, label %for.body for.cond.cleanup: ret void }
llvm/test/Transforms/LoopVectorize/reduction.ll
468 ↗	(On Diff #388148)	Could you add a brief textual explanation of what the test covers?
482 ↗	(On Diff #388148)	perhaps rename to `%sum` or something like that, to make it a bit easier to read the test?
487 ↗	(On Diff #388148)	The names for the 2 different GEPs in the tests are very similar. Could you rename them to make it easier to distinguish them (e.g. something like `%gep.src`/`%gep.dst`)..

Hi @igor.kirillov, thanks for all the changes and the patch looks good now! I just had a couple of minor comments.

llvm/lib/Analysis/IVDescriptors.cpp
511	nit: Could you remove an extra level of indentation here before merging the patch? i.e. this can be written like this: if (isa<PHINode>(UI)) PHIs.push_back(UI); else if (auto *SI = dyn_cast<StoreInst>(UI)) { ... } else NonPHIs.push_back(UI);
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1018	nit: I think you can you might be able to simplify the code here by removing `FoundMatchingRecurrence` . Then, lower down instead of the `break` you can just do if (DSI && (DSI == SI)) { *IsPredicated = blockNeedsPredication(DSI->getParent()); return true; } and at the bottom of the function just do: return false;

Harbormaster completed remote builds in B134865: Diff 388148.Nov 18 2021, 4:18 AM

Refactoring a bit

igor.kirillov marked 8 inline comments as done.Nov 18 2021, 7:22 AM

igor.kirillov added inline comments.

llvm/lib/Analysis/IVDescriptors.cpp

511

I made an extra step and simplified it even more

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

939

In that case we do not vectorize, the rejection happens in LoopAccessInfo::analyzeLoop when loads ape processed:

  for (LoadInst *LD : Loads) {
    Value *Ptr = LD->getPointerOperand();
...
    // See if there is an unsafe dependency between a load to a uniform address and
    // store to the same uniform address.
    if (UniformStores.count(Ptr)) {
      LLVM_DEBUG(dbgs() << "LAA: Found an unsafe dependency between a uniform "
                           "load and uniform store to the same address!\n");
      HasDependenceInvolvingLoopInvariantAddress = true;
    }
...

I added reduc_store_load test anyway

Harbormaster completed remote builds in B134896: Diff 388188.Nov 18 2021, 7:47 AM

Add scalable reduction test

Little extension in a new test

Harbormaster completed remote builds in B135418: Diff 388906.Nov 22 2021, 12:39 PM

LGTM! Thanks for making all the changes @igor.kirillov.

llvm/test/Transforms/LoopVectorize/reduction.ll
466 ↗	(On Diff #388188)	nit: I think this should be `invariant`

This revision is now accepted and ready to land.Nov 23 2021, 2:27 AM

Fix typo

igor.kirillov marked an inline comment as done.Nov 23 2021, 2:48 AM

fhahn added inline comments.Nov 23 2021, 2:48 AM

llvm/include/llvm/Analysis/IVDescriptors.h
273	nit: should be a doc-comment?
llvm/include/llvm/Analysis/LoopAccessAnalysis.h
579	nit: could return `ArrayRef`, not leaking any details about the underlying container.
643	nit: Is there a reason for choosing 5 for the size? if not, nowadays SmallVector can pick a good size automatically.
llvm/include/llvm/Analysis/ScalarEvolution.h
1120 ↗	(On Diff #388906)	Do you anticipate this to be used outside `Transforms/Vectorize`? If not I'm not sure if it makes sense to live here, and extending the API interface of `ScalarEvolution`. Can this instead be defined somewhere in `Transforms/Vectorize`?
llvm/lib/Analysis/IVDescriptors.cpp
260	needs documentation?
332	I guess it would be good to have at least a test case for loop-interchange to make sure it can deal with the change properly?
540	nit: for consistency with other code avoid using `!= nullptr`. This is more in line with the style use in LLVM in general (and you use `==/!= nullptr` in the adjacent code here).
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
939	Yeah but this is only due to some limitations that currently existing in LAA, right? I think we should at least make this clear somewhere, e.g. in a comment.
945	nit: no need to use `llvm::`
1021	nit: redundant `()`.
1022	could we instead just do the check at the call site or is there a benefit of doing it here?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2919	We effectively sink the store outside the loop. In that case, I don't think we should create a recipe and also we should not consider its cost.

Harbormaster completed remote builds in B135583: Diff 389138.Nov 23 2021, 5:41 AM

mgabka added a subscriber: mgabka.Nov 29 2021, 1:56 AM

Lots of updates related to the recent comments

igor.kirillov marked 5 inline comments as done.Nov 29 2021, 11:09 AM

igor.kirillov added inline comments.

llvm/lib/Analysis/IVDescriptors.cpp
332	You are right. We actually should not do loop interchange is an invariant address is present. Otherwise it may introduce incorrect result.
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
939	Added comment

igor.kirillov marked 2 inline comments as done.Nov 29 2021, 11:09 AM

Harbormaster completed remote builds in B136491: Diff 390414.Nov 29 2021, 12:18 PM

Hi, @fhahn! Since there have been several quite serious changes in the patch after your last review, I would be happy to receive you approval before merge (even though status is accepted now).

In D110235#3174384, @igor.kirillov wrote:

Hi, @fhahn! Since there have been several quite serious changes in the patch after your last review, I would be happy to receive you approval before merge (even though status is accepted now).

sure, I'll try to take another look by end-of-day Monday

@fhahn ping

Thanks for the latest update! I left some more additional comments. It might be good to pre-commit the tests and only include the diff in the patch. Another thing to consider is moving the reduction-store tests to a separate file, as reduction.ll is already quite large.

One thing to note on the overall approach is that it's a bit of a workaround some current limitations in our modeling, but I don't think there's an alternative in the short term and this is clearly a desirable case to support, so that seems fine to me in general. There's work in progress to model the pre-header and exit block in VPlan, which allows us to do the sinking of the store more easily, so we can improve things further then :)

llvm/include/llvm/Analysis/IVDescriptors.h
180–181	From an interface perspective, I think it would be better to make `SE` an optional argument and only perform the the store analysis when it is passed. This would also require users to explicitly opt-in, which would at least partially guarding against people ignoring `IntermediateStore`.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9173	`either deleted or go outside the loop` sounds a bit unclear. Aren't they moved to the vector exit block and store the final reduction value?
9175	I think it would also be good to include at least some information in the VPlan that the store is handled as part of the reduction. Perhaps `VPReductionRecipe` should print if the result is stored after the loop? Please add a test case to `vplan-printing.ll`
llvm/test/Transforms/LoopVectorize/reduction.ll
469 ↗	(On Diff #390414)	FWIW for such compact IR test cases, the pseudo code doesn't add much value in my personal opinion. Better to strive to make the tests as readable/compact as possible and have a comment explaining what it tests when needed.
546 ↗	(On Diff #390414)	nit: newline.
628 ↗	(On Diff #390414)	Here (and for the other tests it would be good to check at least the vector reduction sequence and that the correct value is stored.

fhahn added inline comments.Jan 5 2022, 1:53 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
934	It would probably be good to make clear what is handled here exactly and why we can handle those stores. IIUC this applies only to invariant stores that store reduction results and is safe because runtime checks guarantee that it won't alias with other objects. The store won't get vectorized, but sunk to the exit block during codegen.
llvm/test/Transforms/LoopVectorize/reduction.ll
582 ↗	(On Diff #390414)	not needed?
585 ↗	(On Diff #390414)	nit: is this needed? Can just pass `%n` as `i64`
672 ↗	(On Diff #390414)	move exit block to end of function?
711 ↗	(On Diff #390414)	move exit block to end of function?
753 ↗	(On Diff #390414)	move exit block to end of function?
803 ↗	(On Diff #390414)	move exit block to end of function?

Not sure if you saw, there's a somewhat related bugreport: https://github.com/llvm/llvm-project/issues/50286
Not sure if this already supports that pattern though.

Update tests, move invariant store tests into separate file, add more comments

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9175	I'm not sure how to display this information properly. This is how output of `VReductionRecipe::print` looks like now: REDUCE ir<%red.next> = ir<%red> + fast reduce.fadd (ir<%lv>) I could add something to the end but it doesn't seem to fit there. Also I have not found any proper `V.Recipe` where I could place something like `DELETE store .`.
llvm/test/Transforms/LoopVectorize/reduction.ll
469 ↗	(On Diff #390414)	When I look at pseudo-code I immediately understand what it is about, whereas looking at IR takes at least 30 seconds of cognitive exertions. And it is also easier to see the difference between and purpose of all those quite similar tests. But I delete it, of course, if you insist :)

Harbormaster completed remote builds in B143094: Diff 399570.Jan 13 2022, 2:04 AM

igor.kirillov mentioned this in D117213: [LoopVectorize] Add tests with reductions that are stored in invariant address.Jan 13 2022, 5:05 AM

@fhahn I created the review with tests only here - https://reviews.llvm.org/D117213. Once it is merged I'll update this one.

@lebedev.ri This patch addresses a different problem. Here we try to handle when a reduction value is stored in some address and if this address is invariant plus other lucky conditions are satisfied we manage to vectorize. In your example address where value is stored is not invariant at all. Nevertheless, the case is interesting.

@fhahn ping

igor.kirillov mentioned this in rGd3932c690d97: [LoopVectorize] Add tests with reductions that are stored in invariant address.Jan 24 2022, 1:29 PM

Update tests

Harbormaster completed remote builds in B145428: Diff 402801.Jan 26 2022, 2:03 PM

Add invariant store information to VReductionRecipe::print output

Herald added subscribers: vkmr, rogfer01. · View Herald TranscriptJan 27 2022, 1:24 PM

reames resigned from this revision.Jan 27 2022, 1:26 PM

igor.kirillov added inline comments.Jan 27 2022, 1:29 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9175	@fhahn I added some info to VReductionRecipe::print and a relevant test. What do you think about it?

Harbormaster completed remote builds in B146121: Diff 403776.Jan 27 2022, 2:03 PM

Hi @igor.kirillov, the patch looks good now! I just had one question about an odd-looking CHECK line in one of the test files.

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
374	nit: I think it's fine (and cleaner) to just do %[[PHI:%.]] = %[[ADDR:%.]] = here. Also, can you add the incoming value to the phi just to make sure it's the correct value? I think it should just be phi i32 {{ [[TMP]], %middle.block }}
402	This looks a bit odd. I'm not sure why it's needed? If there is a CHECK-LABEL for every function I don't think this should happen.

Thanks for the update!

llvm/lib/Analysis/IVDescriptors.cpp
375	Do we have to make sure that the stored value is feeding the phi again and not an earlier value in the cycle?
llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
170	I think it would be good to check the full reduction cycle & store here. I don't think this test case is handled correctly at the moment as the IR seems out of sync with the pseudo code (which is why personally I think the C pseudo code is a bit distracting) Note that in the IR the reduction cycle is `%sum -> sum.1 -> %sum` and not `%sum -> %sum.1 -> %sum.2 -> %sum`. When this gets vectorized, the final value written to dst[42] is the final value of %sum.1, not %sum.2 as it should be I think.

Add fix that prevents vectorization for the case when not a final reduction value stored in a loop invariant address.
Add test for this case.
Remove some unused code and rename a couple of IR variables in the reduction-with-invariant-store.ll test.
Fix incorrect phi node in reduc_store_inside_unrolled and reduc_double_invariant_store tests.

igor.kirillov marked 3 inline comments as done.Feb 4 2022, 6:47 AM

igor.kirillov added inline comments.

llvm/lib/Analysis/IVDescriptors.cpp
375	Yes! I added this check and also a test case (reduc_store_not_final_value)
llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
170	The IR was incorrect actually because of the error I made during renaming and the pseudo-code was showing my original plot for the test-case. Luckily it helped to expose a bug, so I added an extra check to `IVDescriptors.cpp` and now `reduc_store_not_final_value` function is testing this case (when we have IntermediateStore but no ExitInstruction and value stored is not actually a final value).

igor.kirillov marked 2 inline comments as done.Feb 4 2022, 6:47 AM

Harbormaster completed remote builds in B147618: Diff 405949.Feb 4 2022, 7:19 AM

fhahn mentioned this in D119078: [LAA,LV] Add initial support for pointer-diff memory checks..Feb 8 2022, 12:43 PM

fhahn added inline comments.Feb 14 2022, 2:16 AM

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
170	I think it would still be good to test that the full reduction cycle is generated correctly (probably worth checking the full vector body + middle.block), at least for some of the tests in the file, to make sure the full sequence is generated correctly.

fhahn added inline comments.Feb 14 2022, 2:18 AM

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
170	The IR was incorrect actually because of the error I made during renaming and the pseudo-code was showing my original plot for the test-case. This seems to indicate that the pseudo code may be distracting (:

Add full vector.body and middle.block checks for reduc_store and reduc_store_inside_unrolled tests

igor.kirillov added inline comments.Feb 17 2022, 8:46 AM

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
170	@fhahn Added full checks for `reduc_store` and this function. The code generated is overscalarized for `reduc_store_inside_unrolled` but I checked and without invariant stores it is the same.

Harbormaster completed remote builds in B150258: Diff 409674.Feb 17 2022, 9:16 AM

Thanks for the update with the tests! I think there might still be a correctness issue. More details inline.

llvm/include/llvm/Analysis/IVDescriptors.h
182	nit: addresses ?
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
963	I'm not sure if the comment matches the code. There's no guarantee that all stores to the same address come before the store of the reduction AFAICT. E.g. the test below has a store to the same address after the store of the reduction. I think this may get mis-compiled, as the store to 0 gets replaced with the store of the final value of the reduction. define void @reduc_store(i32* %dst, i32* readonly %src) { entry: %gep.dst = getelementptr inbounds i32, i32* %dst, i64 42 store i32 1, i32* %gep.dst, align 4 br label %for.body for.body: %sum = phi i32 [ 0, %entry ], [ %add, %for.body ] %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] %gep.src = getelementptr inbounds i32, i32* %src, i64 %iv %0 = load i32, i32* %gep.src, align 4 %add = add nsw i32 %sum, %0 store i32 %add, i32* %gep.dst, align 4 store i32 0, i32* %gep.dst, align 4 %iv.next = add nuw nsw i64 %iv, 1 %exitcond = icmp eq i64 %iv.next, 1000 br i1 %exitcond, label %exit, label %for.body exit: ret void }
1020	nit: can drop `DSI`, as `SI` is guaranteed to be non-null?

Fix incorrectly processed case when reduction value stored in invariant could be overwritten inside loop

Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2022, 1:28 PM

igor.kirillov marked 4 inline comments as done.Mar 15 2022, 1:33 PM

igor.kirillov added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
963	Yeah, it was processed incorrectly. I added this test along with the fix.

Harbormaster completed remote builds in B154413: Diff 415557.Mar 15 2022, 2:26 PM

georges added a subscriber: georges.Mar 17 2022, 6:46 AM

@fhahn, gentle ping :)

Thanks for the update! I'll do a bit more testing and will take another look soon, but I *think* it should be good now.

@fhahn, @david-arm ping. Also I run lnt test-suite and everything is fine and 17 more loops were vectorized according to -stats

LGTM! Thanks for addressing all the comments and fixing bugs, adding new tests, etc. I have a couple of nits you can address before merging. I think given that you've run the LLVM test suite without seeing failures and we know more loops are vectorising then that gives us a good level of confidence. :)

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
204	nit: Should be i++, not i+=2.
252	nit: Looks like an unnecessary change from %sum.1 -> %sum.2.

LGTM, thanks! I added a few additional comments that can be addressed directly before committing.

FYI work has started to model the exit block in VPlan as well: D123457, D123537. fixReduction and the store sinking can and should now be migrated to be modeled explicitly in VPlan.

llvm/include/llvm/Analysis/IVDescriptors.h
278	nit: would be good to also document what this means, what the properties of such stores are.
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
297	nit: could be more specific, the increment of the reduction is used as stored operand, right? Also, they only handle reductions right? Then using Reduction instead of Recurrence seems more descriptive, .e.g. `isInvariantStoreOfReduction`? (same for `isRecurringInvariantAddress`
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
451	With opaque pointers, this code may behave differently. It's possible to have store i32 0, ptr %x store i8 0, ptr %x In that case the pointer operands will be the same, but different store widths. I think we should also check that the types of the stored values match for now, as we use this to remove earlier stores. This should only be correct if the later store writes at least as many bits as the earlier stores.
1017	nit: could be any_of
1027	nit: could be any_of.
llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
361	Would it be possible to add runtime tests for the mis-compiles fixed to https://github.com/llvm/llvm-test-suite/tree/main/SingleSource/UnitTests/Vectorizer?

Lots of updates related to the recent comments

igor.kirillov marked an inline comment as done.Apr 27 2022, 6:34 AM

igor.kirillov added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
451	I think if this check is added here, then the purpose of the function would be different (address is still the same even if values have different size). So, instead of that I added a check to a place where those pointers are really processed - see LoopVectorizationLegality.cpp:975. There we now make sure that values stored in an invariant address are of the same type.
llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
252	`sum.1` is `sum + src[i]` and `sum.2` is `sum + src[i] + src[i+1]`. Looks like everything is fine for me.

igor.kirillov marked an inline comment as done.Apr 27 2022, 6:34 AM

Harbormaster completed remote builds in B161596: Diff 425505.Apr 27 2022, 9:52 AM

igor.kirillov mentioned this in D124609: Add unit test with invariant store for vectorizer memory runtime checks.Apr 28 2022, 5:47 AM

igor.kirillov marked an inline comment as done.Apr 28 2022, 5:54 AM

igor.kirillov added inline comments.

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
361	Added a simple test here - https://reviews.llvm.org/D124609 I don't feel like any other test adds more robustness, but if you have an idea what could be also added I'll gladly implement it!

igor.kirillov marked an inline comment as done.Apr 28 2022, 5:54 AM

This revision was landed with ongoing or failed builds.May 3 2022, 2:13 AM

Closed by commit rG4e5e042d9a4a: [LoopVectorize] Support reductions that store intermediary result (authored by igor.kirillov). · Explain Why

This revision was automatically updated to reflect the committed changes.

igor.kirillov added a commit: rG4e5e042d9a4a: [LoopVectorize] Support reductions that store intermediary result.

igor.kirillov mentioned this in rT2a41ecd23309: Add unit test with invariant store for vectorizer memory runtime checks.May 5 2022, 12:18 AM

Hello,

I wrote
https://github.com/llvm/llvm-project/issues/57572
about a verifier complaint/crash that started happening with this patch.

Herald added subscribers: • pcwang-thead, shiva0217. · View Herald TranscriptSep 5 2022, 11:25 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

IVDescriptors.h

39 lines

LoopAccessAnalysis.h

8 lines

Transforms/

Vectorize/

LoopVectorizationLegality.h

7 lines

lib/

Analysis/

IVDescriptors.cpp

150 lines

LoopAccessAnalysis.cpp

5 lines

Transforms/

Vectorize/

LoopVectorizationLegality.cpp

104 lines

LoopVectorize.cpp

28 lines

VPlan.cpp

3 lines

test/

Transforms/

LoopVectorize/

AArch64/

scalable-reductions.ll

14 lines

reduction-with-invariant-store.ll

80 lines

vplan-printing.ll

38 lines

Diff 405949

llvm/include/llvm/Analysis/IVDescriptors.h

Show All 12 Lines
#ifndef LLVM_ANALYSIS_IVDESCRIPTORS_H		#ifndef LLVM_ANALYSIS_IVDESCRIPTORS_H
#define LLVM_ANALYSIS_IVDESCRIPTORS_H		#define LLVM_ANALYSIS_IVDESCRIPTORS_H

#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
		nikicUnsubmitted Done Reply Inline Actions Drive by note: This new include does not look necessary. nikic: Drive by note: This new include does not look necessary.
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Unfortunately, llvm doesn't compile without this include igor.kirillov: Unfortunately, llvm doesn't compile without this include
		peterwaller-armUnsubmitted Done Reply Inline Actions It looks like the needed include is #include "llvm/IR/Instructions.h" for StoreInst, or better forward declare `class StoreInst;` along with the others nearby. https://include-what-you-use.org https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/WhyIWYU.md peterwaller-arm: It looks like the needed include is #include "llvm/IR/Instructions.h" for StoreInst, or better…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Fixed igor.kirillov: Fixed
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"

namespace llvm {		namespace llvm {

class DemandedBits;		class DemandedBits;
class AssumptionCache;		class AssumptionCache;
class Loop;		class Loop;
class PredicatedScalarEvolution;		class PredicatedScalarEvolution;
class ScalarEvolution;		class ScalarEvolution;
class SCEV;		class SCEV;
		class StoreInst;
class DominatorTree;		class DominatorTree;

/// These are the kinds of recurrences that we support.		/// These are the kinds of recurrences that we support.
enum class RecurKind {		enum class RecurKind {
None, ///< Not a recurrence.		None, ///< Not a recurrence.
Add, ///< Sum of integers.		Add, ///< Sum of integers.
Mul, ///< Product of integers.		Mul, ///< Product of integers.
Or, ///< Bitwise or logical OR of integers.		Or, ///< Bitwise or logical OR of integers.
Show All 25 Lines
/// special case of chains of recurrences (CR). See ScalarEvolution for CR		/// special case of chains of recurrences (CR). See ScalarEvolution for CR
/// references.		/// references.

/// This struct holds information about recurrence variables.		/// This struct holds information about recurrence variables.
class RecurrenceDescriptor {		class RecurrenceDescriptor {
public:		public:
RecurrenceDescriptor() = default;		RecurrenceDescriptor() = default;

RecurrenceDescriptor(Value Start, Instruction Exit, RecurKind K,		RecurrenceDescriptor(Value Start, Instruction Exit, StoreInst *Store,
FastMathFlags FMF, Instruction ExactFP, Type RT,		RecurKind K, FastMathFlags FMF, Instruction *ExactFP,
bool Signed, bool Ordered,		Type *RT, bool Signed, bool Ordered,
SmallPtrSetImpl<Instruction *> &CI,		SmallPtrSetImpl<Instruction *> &CI,
unsigned MinWidthCastToRecurTy)		unsigned MinWidthCastToRecurTy)
: StartValue(Start), LoopExitInstr(Exit), Kind(K), FMF(FMF),		: IntermediateStore(Store), StartValue(Start), LoopExitInstr(Exit),
ExactFPMathInst(ExactFP), RecurrenceType(RT), IsSigned(Signed),		Kind(K), FMF(FMF), ExactFPMathInst(ExactFP), RecurrenceType(RT),
IsOrdered(Ordered),		IsSigned(Signed), IsOrdered(Ordered),
MinWidthCastToRecurrenceType(MinWidthCastToRecurTy) {		MinWidthCastToRecurrenceType(MinWidthCastToRecurTy) {
CastInsts.insert(CI.begin(), CI.end());		CastInsts.insert(CI.begin(), CI.end());
}		}

/// This POD struct holds information about a potential recurrence operation.		/// This POD struct holds information about a potential recurrence operation.
class InstDesc {		class InstDesc {
public:		public:
InstDesc(bool IsRecur, Instruction I, Instruction ExactFP = nullptr)		InstDesc(bool IsRecur, Instruction I, Instruction ExactFP = nullptr)
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	public:

/// Returns the opcode corresponding to the RecurrenceKind.		/// Returns the opcode corresponding to the RecurrenceKind.
static unsigned getOpcode(RecurKind Kind);		static unsigned getOpcode(RecurKind Kind);

/// Returns true if Phi is a reduction of type Kind and adds it to the		/// Returns true if Phi is a reduction of type Kind and adds it to the
/// RecurrenceDescriptor. If either \p DB is non-null or \p AC and \p DT are		/// RecurrenceDescriptor. If either \p DB is non-null or \p AC and \p DT are
/// non-null, the minimal bit width needed to compute the reduction will be		/// non-null, the minimal bit width needed to compute the reduction will be
/// computed.		/// computed.
static bool AddReductionVar(PHINode Phi, RecurKind Kind, Loop TheLoop,		static bool
FastMathFlags FuncFMF,		AddReductionVar(PHINode Phi, RecurKind Kind, Loop TheLoop,
RecurrenceDescriptor &RedDes,		FastMathFlags FuncFMF, RecurrenceDescriptor &RedDes,
DemandedBits *DB = nullptr,		DemandedBits DB = nullptr, AssumptionCache AC = nullptr,
AssumptionCache *AC = nullptr,		DominatorTree DT = nullptr, ScalarEvolution SE = nullptr);
DominatorTree *DT = nullptr);

/// Returns true if Phi is a reduction in TheLoop. The RecurrenceDescriptor		/// Returns true if Phi is a reduction in TheLoop. The RecurrenceDescriptor
/// is returned in RedDes. If either \p DB is non-null or \p AC and \p DT are		/// is returned in RedDes. If either \p DB is non-null or \p AC and \p DT are
/// non-null, the minimal bit width needed to compute the reduction will be		/// non-null, the minimal bit width needed to compute the reduction will be
/// computed.		/// computed. If \p SE is non-null, store instructions to loop invariant
		fhahnUnsubmitted Done Reply Inline Actions From an interface perspective, I think it would be better to make `SE` an optional argument and only perform the the store analysis when it is passed. This would also require users to explicitly opt-in, which would at least partially guarding against people ignoring `IntermediateStore`. fhahn: From an interface perspective, I think it would be better to make `SE` an optional argument and…
static bool isReductionPHI(PHINode Phi, Loop TheLoop,		/// address are processed.
		fhahnUnsubmitted Done Reply Inline Actions nit: addresses ? fhahn: nit: addresses ?
RecurrenceDescriptor &RedDes,		static bool
DemandedBits *DB = nullptr,		isReductionPHI(PHINode Phi, Loop TheLoop, RecurrenceDescriptor &RedDes,
AssumptionCache *AC = nullptr,		DemandedBits DB = nullptr, AssumptionCache AC = nullptr,
DominatorTree *DT = nullptr);		DominatorTree DT = nullptr, ScalarEvolution SE = nullptr);

/// Returns true if Phi is a first-order recurrence. A first-order recurrence		/// Returns true if Phi is a first-order recurrence. A first-order recurrence
/// is a non-reduction recurrence relation in which the value of the		/// is a non-reduction recurrence relation in which the value of the
/// recurrence in the current loop iteration equals a value defined in the		/// recurrence in the current loop iteration equals a value defined in the
/// previous iteration. \p SinkAfter includes pairs of instructions where the		/// previous iteration. \p SinkAfter includes pairs of instructions where the
/// first will be rescheduled to appear after the second if/when the loop is		/// first will be rescheduled to appear after the second if/when the loop is
/// vectorized. It may be augmented with additional pairs if needed in order		/// vectorized. It may be augmented with additional pairs if needed in order
/// to handle Phi as a first-order recurrence.		/// to handle Phi as a first-order recurrence.
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	public:
bool isOrdered() const { return IsOrdered; }		bool isOrdered() const { return IsOrdered; }

/// Attempts to find a chain of operations from Phi to LoopExitInst that can		/// Attempts to find a chain of operations from Phi to LoopExitInst that can
/// be treated as a set of reductions instructions for in-loop reductions.		/// be treated as a set of reductions instructions for in-loop reductions.
SmallVector<Instruction , 4> getReductionOpChain(PHINode Phi,		SmallVector<Instruction , 4> getReductionOpChain(PHINode Phi,
Loop *L) const;		Loop *L) const;

/// Returns true if the instruction is a call to the llvm.fmuladd intrinsic.		/// Returns true if the instruction is a call to the llvm.fmuladd intrinsic.
static bool isFMulAddIntrinsic(Instruction *I) {		static bool isFMulAddIntrinsic(Instruction *I) {
		fhahnUnsubmitted Done Reply Inline Actions nit: should be a doc-comment? fhahn: nit: should be a doc-comment?
return isa<IntrinsicInst>(I) &&		return isa<IntrinsicInst>(I) &&
cast<IntrinsicInst>(I)->getIntrinsicID() == Intrinsic::fmuladd;		cast<IntrinsicInst>(I)->getIntrinsicID() == Intrinsic::fmuladd;
}		}

		/// Intermediate store of the reduction
		fhahnUnsubmitted Done Reply Inline Actions nit: would be good to also document what this means, what the properties of such stores are. fhahn: nit: would be good to also document what this means, what the properties of such stores are.
		StoreInst *IntermediateStore = nullptr;

private:		private:
// The starting value of the recurrence.		// The starting value of the recurrence.
// It does not have to be zero!		// It does not have to be zero!
TrackingVH<Value> StartValue;		TrackingVH<Value> StartValue;
// The instruction who's value is used outside the loop.		// The instruction who's value is used outside the loop.
Instruction *LoopExitInstr = nullptr;		Instruction *LoopExitInstr = nullptr;
// The kind of the recurrence.		// The kind of the recurrence.
RecurKind Kind = RecurKind::None;		RecurKind Kind = RecurKind::None;
▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

Show First 20 Lines • Show All 569 Lines • ▼ Show 20 Lines	public:
void print(raw_ostream &OS, unsigned Depth = 0) const;		void print(raw_ostream &OS, unsigned Depth = 0) const;

/// If the loop has memory dependence involving an invariant address, i.e. two		/// If the loop has memory dependence involving an invariant address, i.e. two
/// stores or a store and a load, then return true, else return false.		/// stores or a store and a load, then return true, else return false.
bool hasDependenceInvolvingLoopInvariantAddress() const {		bool hasDependenceInvolvingLoopInvariantAddress() const {
return HasDependenceInvolvingLoopInvariantAddress;		return HasDependenceInvolvingLoopInvariantAddress;
}		}

		/// Return the list of stores to invariant addresses.
		const ArrayRef<StoreInst *> getStoresToInvariantAddresses() const {
		fhahnUnsubmitted Done Reply Inline Actions nit: could return `ArrayRef`, not leaking any details about the underlying container. fhahn: nit: could return `ArrayRef`, not leaking any details about the underlying container.
		return StoresToInvariantAddresses;
		}

/// Used to add runtime SCEV checks. Simplifies SCEV expressions and converts		/// Used to add runtime SCEV checks. Simplifies SCEV expressions and converts
/// them to a more usable form. All SCEV expressions during the analysis		/// them to a more usable form. All SCEV expressions during the analysis
/// should be re-written (and therefore simplified) according to PSE.		/// should be re-written (and therefore simplified) according to PSE.
/// A user of LoopAccessAnalysis will need to emit the runtime checks		/// A user of LoopAccessAnalysis will need to emit the runtime checks
/// associated with this predicate.		/// associated with this predicate.
const PredicatedScalarEvolution &getPSE() const { return *PSE; }		const PredicatedScalarEvolution &getPSE() const { return *PSE; }

private:		private:
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	private:

/// Cache the result of analyzeLoop.		/// Cache the result of analyzeLoop.
bool CanVecMem = false;		bool CanVecMem = false;
bool HasConvergentOp = false;		bool HasConvergentOp = false;

/// Indicator that there are non vectorizable stores to a uniform address.		/// Indicator that there are non vectorizable stores to a uniform address.
bool HasDependenceInvolvingLoopInvariantAddress = false;		bool HasDependenceInvolvingLoopInvariantAddress = false;

		/// List of stores to invariant addresses.
		SmallVector<StoreInst *> StoresToInvariantAddresses;
		fhahnUnsubmitted Done Reply Inline Actions Are the stores invariant or to a uniform address? `InvariantStores` implies they are invariant, which may not be the case? fhahn: Are the stores invariant or to a uniform address? `InvariantStores` implies they are invariant…
		david-armUnsubmitted Done Reply Inline Actions I assume here that InvariantStores refers to stores to an invariant (i.e. uniform?) address, rather than storing an invariant value to variant address. If so, perhaps it could be named StoresToInvariantAddress or something like that? david-arm: I assume here that InvariantStores refers to stores to an invariant (i.e. uniform?) address…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions @fhah Just to make sure we are on the same line - for me uniform and invariant are just synonyms, isn't it so? igor.kirillov: @fhah Just to make sure we are on the same line - for me uniform and invariant are just…
		fhahnUnsubmitted Done Reply Inline Actions nit: Is there a reason for choosing 5 for the size? if not, nowadays SmallVector can pick a good size automatically. fhahn: nit: Is there a reason for choosing 5 for the size? if not, nowadays SmallVector can pick a…

/// The diagnostics report generated for the analysis. E.g. why we		/// The diagnostics report generated for the analysis. E.g. why we
/// couldn't analyze the loop.		/// couldn't analyze the loop.
std::unique_ptr<OptimizationRemarkAnalysis> Report;		std::unique_ptr<OptimizationRemarkAnalysis> Report;

/// If an access has a symbolic strides, this maps the pointer value to		/// If an access has a symbolic strides, this maps the pointer value to
/// the stride symbol.		/// the stride symbol.
ValueToValueMap SymbolicStrides;		ValueToValueMap SymbolicStrides;

▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

Show First 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	public:
RecurrenceSet &getFirstOrderRecurrences() { return FirstOrderRecurrences; }		RecurrenceSet &getFirstOrderRecurrences() { return FirstOrderRecurrences; }

/// Return the set of instructions to sink to handle first-order recurrences.		/// Return the set of instructions to sink to handle first-order recurrences.
MapVector<Instruction , Instruction > &getSinkAfter() { return SinkAfter; }		MapVector<Instruction , Instruction > &getSinkAfter() { return SinkAfter; }

/// Returns the widest induction type.		/// Returns the widest induction type.
Type *getWidestInductionType() { return WidestIndTy; }		Type *getWidestInductionType() { return WidestIndTy; }

		/// Returns True if given invariant store uses recurrent expression
		peterwaller-armUnsubmitted Done Reply Inline Actions nit. 'recurrent' peterwaller-arm: nit. 'recurrent'
		fhahnUnsubmitted Done Reply Inline Actions nit: could be more specific, the increment of the reduction is used as stored operand, right? Also, they only handle reductions right? Then using Reduction instead of Recurrence seems more descriptive, .e.g. `isInvariantStoreOfReduction`? (same for `isRecurringInvariantAddress` fhahn: nit: could be more specific, the increment of the reduction is used as stored operand, right?
		bool isRecurringInvariantStore(StoreInst *SI);

		/// Returns True if given address is invariant and is used to store recurrent
		/// expression
		bool isRecurringInvariantAddress(Value *V);

/// Returns True if V is a Phi node of an induction variable in this loop.		/// Returns True if V is a Phi node of an induction variable in this loop.
bool isInductionPhi(const Value *V) const;		bool isInductionPhi(const Value *V) const;

/// Returns a pointer to the induction descriptor, if \p Phi is an integer or		/// Returns a pointer to the induction descriptor, if \p Phi is an integer or
/// floating point induction.		/// floating point induction.
const InductionDescriptor getIntOrFpInductionDescriptor(PHINode Phi) const;		const InductionDescriptor getIntOrFpInductionDescriptor(PHINode Phi) const;

/// Returns True if V is a cast that is part of an induction def-use chain,		/// Returns True if V is a cast that is part of an induction def-use chain,
▲ Show 20 Lines • Show All 248 Lines • Show Last 20 Lines

llvm/lib/Analysis/IVDescriptors.cpp

Show First 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	if (Kind == RecurKind::FMulAdd && Exit->getOperand(2) != Phi)
return false;		return false;

LLVM_DEBUG(dbgs() << "LV: Found an ordered reduction: Phi: " << *Phi		LLVM_DEBUG(dbgs() << "LV: Found an ordered reduction: Phi: " << *Phi
<< ", ExitInst: " << *Exit << "\n");		<< ", ExitInst: " << *Exit << "\n");

return true;		return true;
}		}

bool RecurrenceDescriptor::AddReductionVar(PHINode *Phi, RecurKind Kind,		bool RecurrenceDescriptor::AddReductionVar(
Loop *TheLoop, FastMathFlags FuncFMF,		PHINode Phi, RecurKind Kind, Loop TheLoop, FastMathFlags FuncFMF,
RecurrenceDescriptor &RedDes,		RecurrenceDescriptor &RedDes, DemandedBits DB, AssumptionCache AC,
DemandedBits *DB,		DominatorTree DT, ScalarEvolution SE) {
AssumptionCache *AC,
DominatorTree *DT) {
if (Phi->getNumIncomingValues() != 2)		if (Phi->getNumIncomingValues() != 2)
return false;		return false;

// Reduction variables are only found in the loop header block.		// Reduction variables are only found in the loop header block.
if (Phi->getParent() != TheLoop->getHeader())		if (Phi->getParent() != TheLoop->getHeader())
return false;		return false;

// Obtain the reduction start value from the value that comes from the loop		// Obtain the reduction start value from the value that comes from the loop
// preheader.		// preheader.
Value *RdxStart = Phi->getIncomingValueForBlock(TheLoop->getLoopPreheader());		Value *RdxStart = Phi->getIncomingValueForBlock(TheLoop->getLoopPreheader());

// ExitInstruction is the single value which is used outside the loop.		// ExitInstruction is the single value which is used outside the loop.
// We only allow for a single reduction value to be used outside the loop.		// We only allow for a single reduction value to be used outside the loop.
// This includes users of the reduction, variables (which form a cycle		// This includes users of the reduction, variables (which form a cycle
// which ends in the phi node).		// which ends in the phi node).
Instruction *ExitInstruction = nullptr;		Instruction *ExitInstruction = nullptr;

		fhahnUnsubmitted Done Reply Inline Actions needs documentation? fhahn: needs documentation?
		// Variable to keep last visited store instruction. By the end of the
		// algorithm this variable will be either empty or having intermediate
		// reduction value stored in invariant address.
		StoreInst *IntermediateStore = nullptr;

// Indicates that we found a reduction operation in our scan.		// Indicates that we found a reduction operation in our scan.
bool FoundReduxOp = false;		bool FoundReduxOp = false;

// We start with the PHI node and scan for all of the users of this		// We start with the PHI node and scan for all of the users of this
// instruction. All users must be instructions that can be used as reduction		// instruction. All users must be instructions that can be used as reduction
// variables (such as ADD). We must have a single out-of-block user. The cycle		// variables (such as ADD). We must have a single out-of-block user. The cycle
// must include the original PHI.		// must include the original PHI.
bool FoundStartPHI = false;		bool FoundStartPHI = false;
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::AddReductionVar(
// - One use of reduction value (safe).		// - One use of reduction value (safe).
// - Multiple use of reduction value (not safe).		// - Multiple use of reduction value (not safe).
// - PHI:		// - PHI:
// - All uses of the PHI must be the reduction (safe).		// - All uses of the PHI must be the reduction (safe).
// - Otherwise, not safe.		// - Otherwise, not safe.
// - By instructions outside of the loop (safe).		// - By instructions outside of the loop (safe).
// * One value may have several outside users, but all outside		// * One value may have several outside users, but all outside
// uses must be of the same value.		// uses must be of the same value.
		// - By store instructions with a loop invariant address (safe with
		// the following restrictions):
		fhahnUnsubmitted Done Reply Inline Actions What about existing users of `isReductionPHI` which currently may rely on the fact that all instruction in the loop must either be phis or reduction operations? Also, with respect to the store restriction, the important bit is that the final value is also stored, right? fhahn: What about existing users of `isReductionPHI` which currently may rely on the fact that all…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions I checked and only `LoopInterchangeLegality` is using this function and it is not affected by stores. Anyway, I can add a parameter to `RecurrenceDescriptor::isReductionPHI` or a member to `RecurrenceDescriptor` allowing or not to handle stores. What do you think about it? The comment is to be updated, yes. igor.kirillov: I checked and only `LoopInterchangeLegality` is using this function and it is not affected by…
		fhahnUnsubmitted Done Reply Inline Actions I guess it would be good to have at least a test case for loop-interchange to make sure it can deal with the change properly? fhahn: I guess it would be good to have at least a test case for loop-interchange to make sure it can…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions You are right. We actually should not do loop interchange is an invariant address is present. Otherwise it may introduce incorrect result. igor.kirillov: You are right. We actually should not do loop interchange is an invariant address is present.
		// * If there are several stores, all must have the same address.
		// * Final value should be stored in that loop invariant address.
// - By an instruction that is not part of the reduction (not safe).		// - By an instruction that is not part of the reduction (not safe).
// This is either:		// This is either:
// * An instruction type other than PHI or the reduction operation.		// * An instruction type other than PHI or the reduction operation.
// * A PHI in the header other than the initial PHI.		// * A PHI in the header other than the initial PHI.
		peterwaller-armUnsubmitted Done Reply Inline Actions Does this comment need updating to discuss your intermediate stores? peterwaller-arm: Does this comment need updating to discuss your intermediate stores?
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Yes, indeed igor.kirillov: Yes, indeed
while (!Worklist.empty()) {		while (!Worklist.empty()) {
Instruction *Cur = Worklist.pop_back_val();		Instruction *Cur = Worklist.pop_back_val();

		// Store instructions are allowed iff it is the store of the reduction
		// value to the same loop invariant memory location.
		fhahnUnsubmitted Done Reply Inline Actions You are only checking for loop-invariant addresses, so should this be `loop invariant memory location`? fhahn: You are only checking for loop-invariant addresses, so should this be `loop invariant memory…
		if (auto *SI = dyn_cast<StoreInst>(Cur)) {
		if (!SE) {
		LLVM_DEBUG(dbgs() << "Store instructions are not processed without "
		<< "Scalar Evolution Analysis\n");
		return false;
		}

		const SCEV *PtrScev = SE->getSCEV(SI->getPointerOperand());
		// Check it is the same address as previous stores
		if (IntermediateStore) {
		const SCEV *OtherScev =
		SE->getSCEV(IntermediateStore->getPointerOperand());

		if (OtherScev != PtrScev) {
		LLVM_DEBUG(dbgs() << "Storing reduction value to different addresses "
		<< "inside the loop: " << *SI->getPointerOperand()
		<< " and "
		<< *IntermediateStore->getPointerOperand() << '\n');
		return false;
		}
		}

		// Check the pointer is loop invariant
		if (!SE->isLoopInvariant(PtrScev, TheLoop)) {
		LLVM_DEBUG(dbgs() << "Storing reduction value to non-uniform address "
		<< "inside the loop: " << *SI->getPointerOperand()
		<< '\n');
		return false;
		}

		// IntermediateStore is always the last store in the loop.
		IntermediateStore = SI;
		fhahnUnsubmitted Done Reply Inline Actions Do we have to make sure that the stored value is feeding the phi again and not an earlier value in the cycle? fhahn: Do we have to make sure that the stored value is feeding the phi again and not an earlier value…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Yes! I added this check and also a test case (reduc_store_not_final_value) igor.kirillov: Yes! I added this check and also a test case (reduc_store_not_final_value)
		continue;
		}

// No Users.		// No Users.
// If the instruction has no users then this is a broken chain and can't be		// If the instruction has no users then this is a broken chain and can't be
// a reduction variable.		// a reduction variable.
if (Cur->use_empty())		if (Cur->use_empty())
return false;		return false;

bool IsAPhi = isa<PHINode>(Cur);		bool IsAPhi = isa<PHINode>(Cur);

▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	for (User *U : Cur->users()) {
continue;		continue;
}		}

// Process instructions only once (termination). Each reduction cycle		// Process instructions only once (termination). Each reduction cycle
// value must only be used once, except by phi nodes and min/max		// value must only be used once, except by phi nodes and min/max
// reductions which are represented as a cmp followed by a select.		// reductions which are represented as a cmp followed by a select.
InstDesc IgnoredVal(false, nullptr);		InstDesc IgnoredVal(false, nullptr);
if (VisitedInsts.insert(UI).second) {		if (VisitedInsts.insert(UI).second) {
if (isa<PHINode>(UI))		if (isa<PHINode>(UI)) {
PHIs.push_back(UI);		PHIs.push_back(UI);
else		} else {
		StoreInst *SI = dyn_cast<StoreInst>(UI);
		if (SI && SI->getPointerOperand() == Cur) {
		// Reduction variable chain can only be stored somewhere but it
		// can't be used as an address.
		return false;
		}
NonPHIs.push_back(UI);		NonPHIs.push_back(UI);
		}
		david-armUnsubmitted Done Reply Inline Actions nit: Could you remove an extra level of indentation here before merging the patch? i.e. this can be written like this: if (isa<PHINode>(UI)) PHIs.push_back(UI); else if (auto SI = dyn_cast<StoreInst>(UI)) { ... } else NonPHIs.push_back(UI); david-arm:* nit: Could you remove an extra level of indentation here before merging the patch? i.e. this…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions I made an extra step and simplified it even more igor.kirillov: I made an extra step and simplified it even more
} else if (!isa<PHINode>(UI) &&		} else if (!isa<PHINode>(UI) &&
((!isa<FCmpInst>(UI) && !isa<ICmpInst>(UI) &&		((!isa<FCmpInst>(UI) && !isa<ICmpInst>(UI) &&
!isa<SelectInst>(UI)) \|\|		!isa<SelectInst>(UI)) \|\|
		peterwaller-armUnsubmitted Done Reply Inline Actions Suggestion: Does this warrant a comment? (I only observe that there are lots of comments around here and I spent a moment trying to guess at what this did without arriving at an answer). peterwaller-arm: Suggestion: Does this warrant a comment? (I only observe that there are lots of comments around…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions I hope this one clarifies. And, actually we should exit if reduction variable is used as an address (this should be highly unlikely case but nevertheless) igor.kirillov: I hope this one clarifies. And, actually we should exit if reduction variable is used as an…
(!isConditionalRdxPattern(Kind, UI).isRecurrence() &&		(!isConditionalRdxPattern(Kind, UI).isRecurrence() &&
!isSelectCmpPattern(TheLoop, Phi, UI, IgnoredVal)		!isSelectCmpPattern(TheLoop, Phi, UI, IgnoredVal)
.isRecurrence() &&		.isRecurrence() &&
!isMinMaxPattern(UI, Kind, IgnoredVal).isRecurrence())))		!isMinMaxPattern(UI, Kind, IgnoredVal).isRecurrence())))
return false;		return false;

// Remember that we completed the cycle.		// Remember that we completed the cycle.
if (UI == Phi)		if (UI == Phi)
FoundStartPHI = true;		FoundStartPHI = true;
}		}
Worklist.append(PHIs.begin(), PHIs.end());		Worklist.append(PHIs.begin(), PHIs.end());
Worklist.append(NonPHIs.begin(), NonPHIs.end());		Worklist.append(NonPHIs.begin(), NonPHIs.end());
}		}

// This means we have seen one but not the other instruction of the		// This means we have seen one but not the other instruction of the
// pattern or more than just a select and cmp. Zero implies that we saw a		// pattern or more than just a select and cmp. Zero implies that we saw a
// llvm.min/max instrinsic, which is always OK.		// llvm.min/max instrinsic, which is always OK.
if (isMinMaxRecurrenceKind(Kind) && NumCmpSelectPatternInst != 2 &&		if (isMinMaxRecurrenceKind(Kind) && NumCmpSelectPatternInst != 2 &&
NumCmpSelectPatternInst != 0)		NumCmpSelectPatternInst != 0)
return false;		return false;

if (isSelectCmpRecurrenceKind(Kind) && NumCmpSelectPatternInst != 1)		if (isSelectCmpRecurrenceKind(Kind) && NumCmpSelectPatternInst != 1)
return false;		return false;

		if (IntermediateStore) {
		// Check that stored value goes to the phi node again. This way we make sure
		fhahnUnsubmitted Done Reply Inline Actions nit: for consistency with other code avoid using `!= nullptr`. This is more in line with the style use in LLVM in general (and you use `==/!= nullptr` in the adjacent code here). fhahn: nit: for consistency with other code avoid using `!= nullptr`. This is more in line with the…
		// that the value stored in IntermediateStore is indeed the final reduction
		// value.
		if (!is_contained(Phi->operands(), IntermediateStore->getValueOperand())) {
		LLVM_DEBUG(dbgs() << "Not a final reduction value stored: "
		<< *IntermediateStore << '\n');
		return false;
		}

		// If there is an exit instruction it's value should be stored in
		// IntermediateStore
		if (ExitInstruction &&
		IntermediateStore->getValueOperand() != ExitInstruction) {
		LLVM_DEBUG(dbgs() << "Last store Instruction of reduction value does not "
		"store last calculated value of the reduction: "
		<< *IntermediateStore << '\n');
		return false;
		}

		// If all uses are inside the loop (intermediate stores), then the
		// reduction value after the loop will be the one used in the last store.
		if (!ExitInstruction)
		ExitInstruction = cast<Instruction>(IntermediateStore->getValueOperand());
		}

if (!FoundStartPHI \|\| !FoundReduxOp \|\| !ExitInstruction)		if (!FoundStartPHI \|\| !FoundReduxOp \|\| !ExitInstruction)
return false;		return false;

const bool IsOrdered =		const bool IsOrdered =
checkOrderedReduction(Kind, ExactFPMathInst, ExitInstruction, Phi);		checkOrderedReduction(Kind, ExactFPMathInst, ExitInstruction, Phi);

if (Start != Phi) {		if (Start != Phi) {
// If the starting value is not the same as the phi node, we speculatively		// If the starting value is not the same as the phi node, we speculatively
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::AddReductionVar(

// We found a reduction var if we have reached the original phi node and we		// We found a reduction var if we have reached the original phi node and we
// only have a single instruction with out-of-loop users.		// only have a single instruction with out-of-loop users.

// The ExitInstruction(Instruction which is allowed to have out-of-loop users)		// The ExitInstruction(Instruction which is allowed to have out-of-loop users)
// is saved as part of the RecurrenceDescriptor.		// is saved as part of the RecurrenceDescriptor.

// Save the description of this reduction variable.		// Save the description of this reduction variable.
RecurrenceDescriptor RD(RdxStart, ExitInstruction, Kind, FMF, ExactFPMathInst,		RecurrenceDescriptor RD(RdxStart, ExitInstruction, IntermediateStore, Kind,
RecurrenceType, IsSigned, IsOrdered, CastInsts,		FMF, ExactFPMathInst, RecurrenceType, IsSigned,
MinWidthCastToRecurrenceType);		IsOrdered, CastInsts, MinWidthCastToRecurrenceType);
RedDes = RD;		RedDes = RD;

return true;		return true;
}		}

// We are looking for loops that do something like this:		// We are looking for loops that do something like this:
// int r = 0;		// int r = 0;
// for (int i = 0; i < n; i++) {		// for (int i = 0; i < n; i++) {
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::hasMultipleUsesOf(
}		}

return false;		return false;
}		}

bool RecurrenceDescriptor::isReductionPHI(PHINode Phi, Loop TheLoop,		bool RecurrenceDescriptor::isReductionPHI(PHINode Phi, Loop TheLoop,
RecurrenceDescriptor &RedDes,		RecurrenceDescriptor &RedDes,
DemandedBits DB, AssumptionCache AC,		DemandedBits DB, AssumptionCache AC,
DominatorTree *DT) {		DominatorTree *DT,
		ScalarEvolution *SE) {
BasicBlock *Header = TheLoop->getHeader();		BasicBlock *Header = TheLoop->getHeader();
Function &F = *Header->getParent();		Function &F = *Header->getParent();
FastMathFlags FMF;		FastMathFlags FMF;
FMF.setNoNaNs(		FMF.setNoNaNs(
F.getFnAttribute("no-nans-fp-math").getValueAsBool());		F.getFnAttribute("no-nans-fp-math").getValueAsBool());
FMF.setNoSignedZeros(		FMF.setNoSignedZeros(
F.getFnAttribute("no-signed-zeros-fp-math").getValueAsBool());		F.getFnAttribute("no-signed-zeros-fp-math").getValueAsBool());

if (AddReductionVar(Phi, RecurKind::Add, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::Add, TheLoop, FMF, RedDes, DB, AC, DT,
		SE)) {
LLVM_DEBUG(dbgs() << "Found an ADD reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found an ADD reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::Mul, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::Mul, TheLoop, FMF, RedDes, DB, AC, DT,
		SE)) {
LLVM_DEBUG(dbgs() << "Found a MUL reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a MUL reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::Or, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::Or, TheLoop, FMF, RedDes, DB, AC, DT,
		SE)) {
LLVM_DEBUG(dbgs() << "Found an OR reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found an OR reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::And, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::And, TheLoop, FMF, RedDes, DB, AC, DT,
		SE)) {
LLVM_DEBUG(dbgs() << "Found an AND reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found an AND reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::Xor, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::Xor, TheLoop, FMF, RedDes, DB, AC, DT,
		SE)) {
LLVM_DEBUG(dbgs() << "Found a XOR reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a XOR reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::SMax, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::SMax, TheLoop, FMF, RedDes, DB, AC, DT,
		SE)) {
LLVM_DEBUG(dbgs() << "Found a SMAX reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a SMAX reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::SMin, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::SMin, TheLoop, FMF, RedDes, DB, AC, DT,
		SE)) {
LLVM_DEBUG(dbgs() << "Found a SMIN reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a SMIN reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::UMax, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::UMax, TheLoop, FMF, RedDes, DB, AC, DT,
		SE)) {
LLVM_DEBUG(dbgs() << "Found a UMAX reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a UMAX reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::UMin, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::UMin, TheLoop, FMF, RedDes, DB, AC, DT,
		SE)) {
LLVM_DEBUG(dbgs() << "Found a UMIN reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a UMIN reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::SelectICmp, TheLoop, FMF, RedDes, DB, AC,		if (AddReductionVar(Phi, RecurKind::SelectICmp, TheLoop, FMF, RedDes, DB, AC,
DT)) {		DT, SE)) {
LLVM_DEBUG(dbgs() << "Found an integer conditional select reduction PHI."		LLVM_DEBUG(dbgs() << "Found an integer conditional select reduction PHI."
<< *Phi << "\n");		<< *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::FMul, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::FMul, TheLoop, FMF, RedDes, DB, AC, DT,
		SE)) {
LLVM_DEBUG(dbgs() << "Found an FMult reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found an FMult reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::FAdd, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::FAdd, TheLoop, FMF, RedDes, DB, AC, DT,
		SE)) {
LLVM_DEBUG(dbgs() << "Found an FAdd reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found an FAdd reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::FMax, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::FMax, TheLoop, FMF, RedDes, DB, AC, DT,
		SE)) {
LLVM_DEBUG(dbgs() << "Found a float MAX reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a float MAX reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::FMin, TheLoop, FMF, RedDes, DB, AC, DT)) {		if (AddReductionVar(Phi, RecurKind::FMin, TheLoop, FMF, RedDes, DB, AC, DT,
		SE)) {
LLVM_DEBUG(dbgs() << "Found a float MIN reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found a float MIN reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::SelectFCmp, TheLoop, FMF, RedDes, DB, AC,		if (AddReductionVar(Phi, RecurKind::SelectFCmp, TheLoop, FMF, RedDes, DB, AC,
DT)) {		DT, SE)) {
LLVM_DEBUG(dbgs() << "Found a float conditional select reduction PHI."		LLVM_DEBUG(dbgs() << "Found a float conditional select reduction PHI."
<< " PHI." << *Phi << "\n");		<< " PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::FMulAdd, TheLoop, FMF, RedDes, DB, AC,		if (AddReductionVar(Phi, RecurKind::FMulAdd, TheLoop, FMF, RedDes, DB, AC, DT,
DT)) {		SE)) {
LLVM_DEBUG(dbgs() << "Found an FMulAdd reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found an FMulAdd reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
// Not a reduction of known type.		// Not a reduction of known type.
return false;		return false;
}		}

bool RecurrenceDescriptor::isFirstOrderRecurrence(		bool RecurrenceDescriptor::isFirstOrderRecurrence(
▲ Show 20 Lines • Show All 587 Lines • Show Last 20 Lines

llvm/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 1,978 Lines • ▼ Show 20 Lines	void LoopAccessInfo::analyzeLoop(AAResults AA, LoopInfo LI,

// Record uniform store addresses to identify if we have multiple stores		// Record uniform store addresses to identify if we have multiple stores
// to the same address.		// to the same address.
ValueSet UniformStores;		ValueSet UniformStores;

for (StoreInst *ST : Stores) {		for (StoreInst *ST : Stores) {
Value *Ptr = ST->getPointerOperand();		Value *Ptr = ST->getPointerOperand();

if (isUniform(Ptr))		if (isUniform(Ptr)) {
		// Record store instructions to loop invariant addresses
		StoresToInvariantAddresses.push_back(ST);
		peterwaller-armUnsubmitted Done Reply Inline Actions Does the word 'variant' add anything? I couldn't find any other uses nearby which elucidate what you're trying to say here. It feels confusing because you are talking about 'InvariantStores' which means invariant in the address, so it looks like a typo. I feel this comment could be better: 'Record stores instructions to loop-invariant addresses', in contrast to the comment on UniformStores above? Do you need to say much about the value? peterwaller-arm: Does the word 'variant' add anything? I couldn't find any other uses nearby which elucidate…
HasDependenceInvolvingLoopInvariantAddress \|=		HasDependenceInvolvingLoopInvariantAddress \|=
!UniformStores.insert(Ptr).second;		!UniformStores.insert(Ptr).second;
		}

// If we did not see this pointer before, insert it to the read-write		// If we did not see this pointer before, insert it to the read-write
// list. At this phase it is only a 'write' list.		// list. At this phase it is only a 'write' list.
if (Seen.insert(Ptr).second) {		if (Seen.insert(Ptr).second) {
++NumReadWrites;		++NumReadWrites;

MemoryLocation Loc = MemoryLocation::get(ST);		MemoryLocation Loc = MemoryLocation::get(ST);
// The TBAA metadata could have a control dependency on the predication		// The TBAA metadata could have a control dependency on the predication
▲ Show 20 Lines • Show All 413 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show First 20 Lines • Show All 433 Lines • ▼ Show 20 Lines	for (User *U : Inst->users()) {
if (!TheLoop->contains(UI)) {		if (!TheLoop->contains(UI)) {
LLVM_DEBUG(dbgs() << "LV: Found an outside user for : " << *UI << '\n');		LLVM_DEBUG(dbgs() << "LV: Found an outside user for : " << *UI << '\n');
return true;		return true;
}		}
}		}
return false;		return false;
}		}

		/// Returns true if A and B have same pointer operands or same SCEVs addresses
		static bool storeToSameAddress(ScalarEvolution SE, StoreInst A,
		StoreInst *B) {
		// Compare store
		if (A == B)
		return true;

		// Otherwise Compare pointers
		Value *APtr = A->getPointerOperand();
		Value *BPtr = B->getPointerOperand();
		fhahnUnsubmitted Done Reply Inline Actions With opaque pointers, this code may behave differently. It's possible to have store i32 0, ptr %x store i8 0, ptr %x In that case the pointer operands will be the same, but different store widths. I think we should also check that the types of the stored values match for now, as we use this to remove earlier stores. This should only be correct if the later store writes at least as many bits as the earlier stores. fhahn: With opaque pointers, this code may behave differently. It's possible to have ``` store i32 0…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions I think if this check is added here, then the purpose of the function would be different (address is still the same even if values have different size). So, instead of that I added a check to a place where those pointers are really processed - see LoopVectorizationLegality.cpp:975. There we now make sure that values stored in an invariant address are of the same type. igor.kirillov: I think if this check is added here, then the purpose of the function would be different…
		if (APtr == BPtr)
		return true;

		// Otherwise compare address SCEVs
		if (SE->getSCEV(APtr) == SE->getSCEV(BPtr))
		return true;

		return false;
		}

int LoopVectorizationLegality::isConsecutivePtr(Type *AccessTy,		int LoopVectorizationLegality::isConsecutivePtr(Type *AccessTy,
Value *Ptr) const {		Value *Ptr) const {
const ValueToValueMap &Strides =		const ValueToValueMap &Strides =
getSymbolicStrides() ? *getSymbolicStrides() : ValueToValueMap();		getSymbolicStrides() ? *getSymbolicStrides() : ValueToValueMap();

Function *F = TheLoop->getHeader()->getParent();		Function *F = TheLoop->getHeader()->getParent();
bool OptForSize = F->hasOptSize() \|\|		bool OptForSize = F->hasOptSize() \|\|
llvm::shouldOptimizeForSize(TheLoop->getHeader(), PSI, BFI,		llvm::shouldOptimizeForSize(TheLoop->getHeader(), PSI, BFI,
▲ Show 20 Lines • Show All 221 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
reportVectorizationFailure("Found an invalid PHI",		reportVectorizationFailure("Found an invalid PHI",
"loop control flow is not understood by vectorizer",		"loop control flow is not understood by vectorizer",
"CFGNotUnderstood", ORE, TheLoop, Phi);		"CFGNotUnderstood", ORE, TheLoop, Phi);
return false;		return false;
}		}

RecurrenceDescriptor RedDes;		RecurrenceDescriptor RedDes;
if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, RedDes, DB, AC,		if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, RedDes, DB, AC,
DT)) {		DT, PSE.getSE())) {
Requirements->addExactFPMathInst(RedDes.getExactFPMathInst());		Requirements->addExactFPMathInst(RedDes.getExactFPMathInst());
AllowedExit.insert(RedDes.getLoopExitInstr());		AllowedExit.insert(RedDes.getLoopExitInstr());
Reductions[Phi] = RedDes;		Reductions[Phi] = RedDes;
continue;		continue;
}		}

// TODO: Instead of recording the AllowedExit, it would be good to record the		// TODO: Instead of recording the AllowedExit, it would be good to record the
// complementary set: NotAllowedExit. These include (but may not be		// complementary set: NotAllowedExit. These include (but may not be
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	ORE->emit([&]() {
return OptimizationRemarkAnalysis(Hints->vectorizeAnalysisPassName(),		return OptimizationRemarkAnalysis(Hints->vectorizeAnalysisPassName(),
"loop not vectorized: ", *LAR);		"loop not vectorized: ", *LAR);
});		});
}		}

if (!LAI->canVectorizeMemory())		if (!LAI->canVectorizeMemory())
return false;		return false;

		// We can vectorize stores to invariant address when final reduction value is
		fhahnUnsubmitted Done Reply Inline Actions It would probably be good to make clear what is handled here exactly and why we can handle those stores. IIUC this applies only to invariant stores that store reduction results and is safe because runtime checks guarantee that it won't alias with other objects. The store won't get vectorized, but sunk to the exit block during codegen. fhahn: It would probably be good to make clear what is handled here exactly and why we can handle…
		// guaranteed to be stored at the end of the loop. Also, if decision to
		// vectorize loop is made, runtime checks are added so as to make sure that
		// invariant address won't alias with any other objects.
		if (!LAI->getStoresToInvariantAddresses().empty()) {
		// For each invariant address, check its last stored value is unconditional.
		fhahnUnsubmitted Done Reply Inline Actions What about loads to the same address in the loop? At the moment, `LAA` cannot analyze dependences with invariant addresses. But if this limitation gets removed the code here may become incorrect, because it relies on this limitation IIUC? fhahn: What about loads to the same address in the loop? At the moment, `LAA` cannot analyze…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Loads from or stores to the same address in the loop? I'm sorry could you clarify what the problem is. As it is I don't understand the message. igor.kirillov: Loads from or stores to the same address in the loop? I'm sorry could you clarify what the…
		fhahnUnsubmitted Done Reply Inline Actions The case I was thinking about was something like the snippet below, where we have a load of the invariant address in the loop (`%lv = load...` in the example below). define void @reduc_store(i32* %dst, i32* readonly %src, i32* noalias %dst.2) { entry: %arrayidx = getelementptr inbounds i32, i32* %dst, i64 42 store i32 0, i32* %arrayidx, align 4 br label %for.body for.body: %0 = phi i32 [ 0, %entry ], [ %add, %for.body ] %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] %arrayidx1 = getelementptr inbounds i32, i32* %src, i64 %indvars.iv %1 = load i32, i32* %arrayidx1, align 4 %add = add nsw i32 %0, %1 %lv = load i32, i32* %arrayidx store i32 %add, i32* %arrayidx, align 4 %gep.dst.2 = getelementptr inbounds i32, i32* %dst.2, i64 %indvars.iv store i32 %lv, i32* %gep.dst.2, %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp eq i64 %indvars.iv.next, 1000 br i1 %exitcond, label %for.cond.cleanup, label %for.body for.cond.cleanup: ret void } fhahn: The case I was thinking about was something like the snippet below, where we have a load of the…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions In that case we do not vectorize, the rejection happens in `LoopAccessInfo::analyzeLoop` when loads ape processed: for (LoadInst LD : Loads) { Value Ptr = LD->getPointerOperand(); ... // See if there is an unsafe dependency between a load to a uniform address and // store to the same uniform address. if (UniformStores.count(Ptr)) { LLVM_DEBUG(dbgs() << "LAA: Found an unsafe dependency between a uniform " "load and uniform store to the same address!\n"); HasDependenceInvolvingLoopInvariantAddress = true; } ... I added `reduc_store_load` test anyway igor.kirillov: In that case we do not vectorize, the rejection happens in `LoopAccessInfo::analyzeLoop` when…
		fhahnUnsubmitted Done Reply Inline Actions Yeah but this is only due to some limitations that currently existing in LAA, right? I think we should at least make this clear somewhere, e.g. in a comment. fhahn: Yeah but this is only due to some limitations that currently existing in LAA, right? I think we…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Added comment igor.kirillov: Added comment
		for (StoreInst *SI : LAI->getStoresToInvariantAddresses()) {
		if (isRecurringInvariantStore(SI) &&
		david-armUnsubmitted Done Reply Inline Actions Hi @igor.kirillov, I think you're missing a test case here. Suppose we have two stores in the loop to the same invariant address. If the first store is predicated, but the second isn't we should still vectorise because the second store wins and the first one can be removed. At least the code here suggests that. I couldn't find any test that exercised this code path. david-arm: Hi @igor.kirillov, I think you're missing a test case here. Suppose we have two stores in the…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Added `reduc_store_final_store_predicated` test that address both requests igor.kirillov: Added `reduc_store_final_store_predicated` test that address both requests
		blockNeedsPredication(SI->getParent())) {
		reportVectorizationFailure(
		"We don't allow storing to uniform addresses",
		"write of conditional recurring variant value to a loop "
		fhahnUnsubmitted Done Reply Inline Actions nit: no need to use `llvm::` fhahn: nit: no need to use `llvm::`
		"invariant address could not be vectorized",
		"CantVectorizeStoreToLoopInvariantAddress", ORE, TheLoop);
		return false;
		}
		david-armUnsubmitted Done Reply Inline Actions I think you might be missing a test case for this. Can you make sure this code path is exercised please by your existing tests please? david-arm: I think you might be missing a test case for this. Can you make sure this code path is…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions It is now covered by new `reduc_store_final_store_predicated` test and an old test `reduc_store_inside_unrolled` also executes this path igor.kirillov: It is now covered by new `reduc_store_final_store_predicated` test and an old test…
		}

if (LAI->hasDependenceInvolvingLoopInvariantAddress()) {		if (LAI->hasDependenceInvolvingLoopInvariantAddress()) {
reportVectorizationFailure("Stores to a uniform address",		// For each invariant address, check its last stored value is the result
"write to a loop invariant address could not be vectorized",		// of one of our reductions.
		//
		// We do not check if dependence with loads exists because they are
		// currently rejected earlier in LoopAccessInfo::analyzeLoop. In case this
		// behaviour changes we have to modify this code.
		ScalarEvolution *SE = PSE.getSE();
		SmallVector<StoreInst *, 4> UnhandledStores;
		for (StoreInst *SI : LAI->getStoresToInvariantAddresses()) {
		if (isRecurringInvariantStore(SI)) {
		// Earlier stores to this address are effectively deadcode.
		fhahnUnsubmitted Done Reply Inline Actions I'm not sure if the comment matches the code. There's no guarantee that all stores to the same address come before the store of the reduction AFAICT. E.g. the test below has a store to the same address after the store of the reduction. I think this may get mis-compiled, as the store to 0 gets replaced with the store of the final value of the reduction. define void @reduc_store(i32* %dst, i32* readonly %src) { entry: %gep.dst = getelementptr inbounds i32, i32* %dst, i64 42 store i32 1, i32* %gep.dst, align 4 br label %for.body for.body: %sum = phi i32 [ 0, %entry ], [ %add, %for.body ] %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] %gep.src = getelementptr inbounds i32, i32* %src, i64 %iv %0 = load i32, i32* %gep.src, align 4 %add = add nsw i32 %sum, %0 store i32 %add, i32* %gep.dst, align 4 store i32 0, i32* %gep.dst, align 4 %iv.next = add nuw nsw i64 %iv, 1 %exitcond = icmp eq i64 %iv.next, 1000 br i1 %exitcond, label %exit, label %for.body exit: ret void } fhahn: I'm not sure if the comment matches the code. There's no guarantee that all stores to the same…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Yeah, it was processed incorrectly. I added this test along with the fix. igor.kirillov: Yeah, it was processed incorrectly. I added this test along with the fix.
		erase_if(UnhandledStores, [SE, SI](StoreInst *I) {
		return storeToSameAddress(SE, SI, I);
		});
		} else if (!isUniform(SI->getValueOperand()))
		UnhandledStores.push_back(SI);
		}

		bool IsOK = UnhandledStores.empty();
		// TODO: we should also validate against InvariantMemSets.
		if (!IsOK) {
		reportVectorizationFailure(
		"We don't allow storing to uniform addresses",
		"write to a loop invariant address could not "
		"be vectorized",
"CantVectorizeStoreToLoopInvariantAddress", ORE, TheLoop);		"CantVectorizeStoreToLoopInvariantAddress", ORE, TheLoop);
return false;		return false;
}		}
		}
		}

Requirements->addRuntimePointerChecks(LAI->getNumRuntimePointerChecks());		Requirements->addRuntimePointerChecks(LAI->getNumRuntimePointerChecks());
PSE.addPredicate(LAI->getPSE().getUnionPredicate());		PSE.addPredicate(LAI->getPSE().getUnionPredicate());
return true;		return true;
}		}

bool LoopVectorizationLegality::canVectorizeFPMath(		bool LoopVectorizationLegality::canVectorizeFPMath(
bool EnableStrictReductions) {		bool EnableStrictReductions) {
Show All 9 Lines	if (!EnableStrictReductions \|\|
any_of(getInductionVars(), [&](auto &Induction) -> bool {		any_of(getInductionVars(), [&](auto &Induction) -> bool {
InductionDescriptor IndDesc = Induction.second;		InductionDescriptor IndDesc = Induction.second;
return IndDesc.getExactFPMathInst();		return IndDesc.getExactFPMathInst();
}))		}))
return false;		return false;

// We can now only vectorize if all reductions with Exact FP math also		// We can now only vectorize if all reductions with Exact FP math also
// have the isOrdered flag set, which indicates that we can move the		// have the isOrdered flag set, which indicates that we can move the
// reduction operations in-loop.		// reduction operations in-loop, and do not have intermediate store.
		david-armUnsubmitted Done Reply Inline Actions Maybe this can be folded into the `all_of` case below, i.e. return (all_of(getReductionVars(), [&](auto &Reduction) -> bool { const RecurrenceDescriptor &RdxDesc = Reduction.second; return !RdxDesc.hasExactFPMath() \|\| (RdxDesc.isOrdered() && !RdxDesc.IntermediateStore); })); Also, the problem with this code at the moment is that you could have a mixture of fast and ordered reductions in the same loop. There could be an intermediate store for one of the fast reductions, but not for the ordered ones. At the moment with your code we will just bail out in this case. david-arm: Maybe this can be folded into the `all_of` case below, i.e. return (all_of…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions You are right! It can be done much simpler and the check is needed only when ordered reduction is present. igor.kirillov: You are right! It can be done much simpler and the check is needed only when ordered reduction…
return (all_of(getReductionVars(), [&](auto &Reduction) -> bool {		return (all_of(getReductionVars(), [&](auto &Reduction) -> bool {
const RecurrenceDescriptor &RdxDesc = Reduction.second;		const RecurrenceDescriptor &RdxDesc = Reduction.second;
return !RdxDesc.hasExactFPMath() \|\| RdxDesc.isOrdered();		return !RdxDesc.hasExactFPMath() \|\|
		(RdxDesc.isOrdered() && !RdxDesc.IntermediateStore);
}));		}));
}		}

		bool LoopVectorizationLegality::isRecurringInvariantStore(StoreInst *SI) {
		for (auto &Reduction : Reductions) {
		fhahnUnsubmitted Done Reply Inline Actions nit: could be any_of fhahn: nit: could be any_of
		RecurrenceDescriptor DS = Reduction.second;
		david-armUnsubmitted Done Reply Inline Actions nit: I think you can you might be able to simplify the code here by removing `FoundMatchingRecurrence` . Then, lower down instead of the `break` you can just do if (DSI && (DSI == SI)) { IsPredicated = blockNeedsPredication(DSI->getParent()); return true; } and at the bottom of the function just do: return false; david-arm:* nit: I think you can you might be able to simplify the code here by removing…
		StoreInst *DSI = DS.IntermediateStore;
		if (DSI && DSI == SI)
		fhahnUnsubmitted Done Reply Inline Actions nit: can drop `DSI`, as `SI` is guaranteed to be non-null? fhahn: nit: can drop `DSI`, as `SI` is guaranteed to be non-null?
		return true;
		fhahnUnsubmitted Done Reply Inline Actions nit: redundant `()`. fhahn: nit: redundant `()`.
		}
		fhahnUnsubmitted Done Reply Inline Actions could we instead just do the check at the call site or is there a benefit of doing it here? fhahn: could we instead just do the check at the call site or is there a benefit of doing it here?
		return false;
		}

		bool LoopVectorizationLegality::isRecurringInvariantAddress(Value *V) {
		ScalarEvolution *SE = PSE.getSE();
		fhahnUnsubmitted Done Reply Inline Actions nit: could be any_of. fhahn: nit: could be any_of.
		for (auto &Reduction : Reductions) {
		RecurrenceDescriptor DS = Reduction.second;
		if (!DS.IntermediateStore)
		continue;
		Value *InvariantAddress = DS.IntermediateStore->getPointerOperand();
		if (V == InvariantAddress \|\|
		SE->getSCEV(V) == SE->getSCEV(InvariantAddress))
		return true;
		}
		return false;
		}

bool LoopVectorizationLegality::isInductionPhi(const Value *V) const {		bool LoopVectorizationLegality::isInductionPhi(const Value *V) const {
Value In0 = const_cast<Value >(V);		Value In0 = const_cast<Value >(V);
PHINode *PN = dyn_cast_or_null<PHINode>(In0);		PHINode *PN = dyn_cast_or_null<PHINode>(In0);
if (!PN)		if (!PN)
return false;		return false;

return Inductions.count(PN);		return Inductions.count(PN);
}		}
▲ Show 20 Lines • Show All 379 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,910 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr,
VPReplicateRecipe *RepRecipe,		VPReplicateRecipe *RepRecipe,
const VPIteration &Instance,		const VPIteration &Instance,
bool IfPredicateInstr,		bool IfPredicateInstr,
VPTransformState &State) {		VPTransformState &State) {
assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");		assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");

// llvm.experimental.noalias.scope.decl intrinsics must only be duplicated for		// llvm.experimental.noalias.scope.decl intrinsics must only be duplicated for
// the first lane and part.		// the first lane and part.
if (isa<NoAliasScopeDeclInst>(Instr))		if (isa<NoAliasScopeDeclInst>(Instr))
		fhahnUnsubmitted Done Reply Inline Actions We effectively sink the store outside the loop. In that case, I don't think we should create a recipe and also we should not consider its cost. fhahn: We effectively sink the store outside the loop. In that case, I don't think we should create a…
if (!Instance.isFirstIteration())		if (!Instance.isFirstIteration())
return;		return;

setDebugLocFromInst(Instr);		setDebugLocFromInst(Instr);

// Does this instruction return a value ?		// Does this instruction return a value ?
bool IsVoidRetTy = Instr->getType()->isVoidTy();		bool IsVoidRetTy = Instr->getType()->isVoidTy();

▲ Show 20 Lines • Show All 1,337 Lines • ▼ Show 20 Lines	else if (ResumePhi && llvm::is_contained(ResumePhi->blocks(), Incoming))
Incoming);		Incoming);
else		else
BCBlockPhi->addIncoming(ReductionStartValue, Incoming);		BCBlockPhi->addIncoming(ReductionStartValue, Incoming);
}		}

// Set the resume value for this reduction		// Set the resume value for this reduction
ReductionResumeValues.insert({&RdxDesc, BCBlockPhi});		ReductionResumeValues.insert({&RdxDesc, BCBlockPhi});

		// If there were stores of the reduction value to a uniform memory address
		// inside the loop, create the final store here.
		if (StoreInst *SI = RdxDesc.IntermediateStore) {
		StoreInst *NewSI =
		Builder.CreateStore(ReducedPartRdx, SI->getPointerOperand());
		propagateMetadata(NewSI, SI);

		// If the reduction value is used in other places,
		// then let the code below create PHI's for that.
		}

// Now, we need to fix the users of the reduction variable		// Now, we need to fix the users of the reduction variable
// inside and outside of the scalar remainder loop.		// inside and outside of the scalar remainder loop.

// We know that the loop is in LCSSA form. We need to update the PHI nodes		// We know that the loop is in LCSSA form. We need to update the PHI nodes
// in the exit blocks. See comment on analogous loop in		// in the exit blocks. See comment on analogous loop in
// fixFirstOrderRecurrence for a more complete explaination of the logic.		// fixFirstOrderRecurrence for a more complete explaination of the logic.
if (!Cost->requiresScalarEpilogue(VF))		if (!Cost->requiresScalarEpilogue(VF))
for (PHINode &LCSSAPhi : LoopExitBlock->phis())		for (PHINode &LCSSAPhi : LoopExitBlock->phis())
▲ Show 20 Lines • Show All 3,399 Lines • ▼ Show 20 Lines	if (auto *Ptr = getLoadStorePointerOperand(Inst))
return Legal->isConsecutivePtr(getLoadStoreType(Inst), Ptr);		return Legal->isConsecutivePtr(getLoadStoreType(Inst), Ptr);
return false;		return false;
}		}

void LoopVectorizationCostModel::collectValuesToIgnore() {		void LoopVectorizationCostModel::collectValuesToIgnore() {
// Ignore ephemeral values.		// Ignore ephemeral values.
CodeMetrics::collectEphemeralValues(TheLoop, AC, ValuesToIgnore);		CodeMetrics::collectEphemeralValues(TheLoop, AC, ValuesToIgnore);

		// Find all stores to invariant variables. Since they are going to sink
		// outside the loop we do not need calculate cost for them.
		for (BasicBlock *BB : TheLoop->blocks())
		for (Instruction &I : *BB) {
		StoreInst *SI;
		if ((SI = dyn_cast<StoreInst>(&I)) &&
		Legal->isRecurringInvariantAddress(SI->getPointerOperand()))
		ValuesToIgnore.insert(&I);
		}

// Ignore type-promoting instructions we identified during reduction		// Ignore type-promoting instructions we identified during reduction
// detection.		// detection.
for (auto &Reduction : Legal->getReductionVars()) {		for (auto &Reduction : Legal->getReductionVars()) {
const RecurrenceDescriptor &RedDes = Reduction.second;		const RecurrenceDescriptor &RedDes = Reduction.second;
const SmallPtrSetImpl<Instruction *> &Casts = RedDes.getCastInsts();		const SmallPtrSetImpl<Instruction *> &Casts = RedDes.getCastInsts();
VecValuesToIgnore.insert(Casts.begin(), Casts.end());		VecValuesToIgnore.insert(Casts.begin(), Casts.end());
}		}
// Ignore type-casting instructions we identified during induction		// Ignore type-casting instructions we identified during induction
▲ Show 20 Lines • Show All 1,448 Lines • ▼ Show 20 Lines	for (Instruction &I : BB->instructionsWithoutDebug()) {
InductionsToMove.push_back(		InductionsToMove.push_back(
cast<VPWidenIntOrFpInductionRecipe>(Recipe));		cast<VPWidenIntOrFpInductionRecipe>(Recipe));
}		}
RecipeBuilder.setRecipe(Instr, Recipe);		RecipeBuilder.setRecipe(Instr, Recipe);
VPBB->appendRecipe(Recipe);		VPBB->appendRecipe(Recipe);
continue;		continue;
}		}

		// Invariant stores inside loop will be deleted and a single store
		fhahnUnsubmitted Done Reply Inline Actions `either deleted or go outside the loop` sounds a bit unclear. Aren't they moved to the vector exit block and store the final reduction value? fhahn: `either deleted or go outside the loop` sounds a bit unclear. Aren't they moved to the vector…
		// with the final reduction value will be added to the exit block
		StoreInst *SI;
		fhahnUnsubmitted Done Reply Inline Actions I think it would also be good to include at least some information in the VPlan that the store is handled as part of the reduction. Perhaps `VPReductionRecipe` should print if the result is stored after the loop? Please add a test case to `vplan-printing.ll` fhahn: I think it would also be good to include at least some information in the VPlan that the store…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions I'm not sure how to display this information properly. This is how output of `VReductionRecipe::print` looks like now: REDUCE ir<%red.next> = ir<%red> + fast reduce.fadd (ir<%lv>) I could add something to the end but it doesn't seem to fit there. Also I have not found any proper `V.Recipe` where I could place something like `DELETE store .`. igor.kirillov: I'm not sure how to display this information properly. This is how output of `VReductionRecipe…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions @fhahn I added some info to VReductionRecipe::print and a relevant test. What do you think about it? igor.kirillov: @fhahn I added some info to VReductionRecipe::print and a relevant test. What do you think…
		if ((SI = dyn_cast<StoreInst>(&I)) &&
		Legal->isRecurringInvariantAddress(SI->getPointerOperand()))
		continue;

// Otherwise, if all widening options failed, Instruction is to be		// Otherwise, if all widening options failed, Instruction is to be
// replicated. This may create a successor for VPBB.		// replicated. This may create a successor for VPBB.
VPBasicBlock *NextVPBB =		VPBasicBlock *NextVPBB =
RecipeBuilder.handleReplication(Instr, Range, VPBB, Plan);		RecipeBuilder.handleReplication(Instr, Range, VPBB, Plan);
if (NextVPBB != VPBB) {		if (NextVPBB != VPBB) {
VPBB = NextVPBB;		VPBB = NextVPBB;
VPBB->setName(BB->hasName() ? BB->getName() + "." + Twine(VPBBsForBB++)		VPBB->setName(BB->hasName() ? BB->getName() + "." + Twine(VPBBsForBB++)
: "");		: "");
▲ Show 20 Lines • Show All 1,651 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.cpp

Show First 20 Lines • Show All 1,334 Lines • ▼ Show 20 Lines	if (isa<FPMathOperator>(getUnderlyingInstr()))
O << getUnderlyingInstr()->getFastMathFlags();		O << getUnderlyingInstr()->getFastMathFlags();
O << " reduce." << Instruction::getOpcodeName(RdxDesc->getOpcode()) << " (";		O << " reduce." << Instruction::getOpcodeName(RdxDesc->getOpcode()) << " (";
getVecOp()->printAsOperand(O, SlotTracker);		getVecOp()->printAsOperand(O, SlotTracker);
if (getCondOp()) {		if (getCondOp()) {
O << ", ";		O << ", ";
getCondOp()->printAsOperand(O, SlotTracker);		getCondOp()->printAsOperand(O, SlotTracker);
}		}
O << ")";		O << ")";
		if (RdxDesc->IntermediateStore)
		O << " (with final reduction value stored in invariant address sank "
		"outside of loop)";
}		}

void VPReplicateRecipe::print(raw_ostream &O, const Twine &Indent,		void VPReplicateRecipe::print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const {		VPSlotTracker &SlotTracker) const {
O << Indent << (IsUniform ? "CLONE " : "REPLICATE ");		O << Indent << (IsUniform ? "CLONE " : "REPLICATE ");

if (!getUnderlyingInstr()->getType()->isVoidTy()) {		if (!getUnderlyingInstr()->getType()->isVoidTy()) {
printAsOperand(O, SlotTracker);		printAsOperand(O, SlotTracker);
▲ Show 20 Lines • Show All 307 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll

Show First 20 Lines • Show All 314 Lines • ▼ Show 20 Lines	for.body:
br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0		br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

for.end:		for.end:
ret float %.sroa.speculated		ret float %.sroa.speculated
}		}

; ADD (with reduction stored in invariant address)		; ADD (with reduction stored in invariant address)

; CHECK-REMARK: loop not vectorized: value that could not be identified as reduction is used outside the loop		; CHECK-REMARK: vectorized loop (vectorization width: vscale x 4, interleaved count: 2)
define void @invariant_store(i32* %dst, i32* readonly %src) {		define void @invariant_store(i32* %dst, i32* readonly %src) {
; CHECK-LABEL: @invariant_store		; CHECK-LABEL: @invariant_store
; CHECK-NOT: vector.body		; CHECK: vector.body:
		; CHECK: %[[LOAD1:.*]] = load <vscale x 4 x i32>
		; CHECK: %[[LOAD2:.*]] = load <vscale x 4 x i32>
		; CHECK: %[[ADD1:.]] = add <vscale x 4 x i32> %{{.}}, %[[LOAD1]]
		; CHECK: %[[ADD2:.]] = add <vscale x 4 x i32> %{{.}}, %[[LOAD2]]
		; CHECK: call void @llvm.masked.scatter.nxv4i32.nxv4p0i32(<vscale x 4 x i32> %[[ADD1]]
		; CHECK: call void @llvm.masked.scatter.nxv4i32.nxv4p0i32(<vscale x 4 x i32> %[[ADD2]]
		; CHECK: middle.block:
		; CHECK: %[[ADD:.*]] = add <vscale x 4 x i32> %[[ADD2]], %[[ADD1]]
		; CHECK-NEXT: %[[SUM:.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> %[[ADD]])
		; CHECK-NEXT: store i32 %[[SUM]], i32* %gep.dst, align 4
entry:		entry:
%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42		%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42
store i32 0, i32* %gep.dst, align 4		store i32 0, i32* %gep.dst, align 4
br label %for.body		br label %for.body
for.body:		for.body:
%sum = phi i32 [ 0, %entry ], [ %add, %for.body ]		%sum = phi i32 [ 0, %entry ], [ %add, %for.body ]
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%gep.src = getelementptr inbounds i32, i32* %src, i64 %indvars.iv		%gep.src = getelementptr inbounds i32, i32* %src, i64 %indvars.iv
▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll

; RUN: opt < %s -passes="loop-vectorize" -force-vector-interleave=1 -force-vector-width=4 -S \| FileCheck %s		; RUN: opt < %s -passes="loop-vectorize" -force-vector-interleave=1 -force-vector-width=4 -S \| FileCheck %s

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"		target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

; This test checks that we can vectorize loop with reduction variable		; This test checks that we can vectorize loop with reduction variable
; stored in an invariant address.		; stored in an invariant address.
;		;
; int sum = 0;		; int sum = 0;
; for(i=0..N) {		; for(i=0..N) {
; sum += src[i];		; sum += src[i];
; dst[42] = sum;		; dst[42] = sum;
; }		; }
; CHECK-LABEL: @reduc_store		; CHECK-LABEL: @reduc_store
; CHECK-NOT: vector.body		; CHECK: vector.body:
		; CHECK: phi <4 x i32>
		; CHECK: load <4 x i32>
		; CHECK: add <4 x i32>
		; CHECK-NOT: store i32 %{{[0-9]+}}, i32* %gep.dst
		; CHECK: middle.block:
		; CHECK-NEXT: [[TMP:%.*]] = call i32 @llvm.vector.reduce.add.v4i32
		; CHECK-NEXT: store i32 [[TMP]], i32* %gep.dst
define void @reduc_store(i32* %dst, i32* readonly %src) {		define void @reduc_store(i32* %dst, i32* readonly %src) {
entry:		entry:
%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42		%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42
store i32 0, i32* %gep.dst, align 4		store i32 0, i32* %gep.dst, align 4
br label %for.body		br label %for.body

for.body:		for.body:
%sum = phi i32 [ 0, %entry ], [ %add, %for.body ]		%sum = phi i32 [ 0, %entry ], [ %add, %for.body ]
Show All 13 Lines
; Same as above but with floating point numbers instead.		; Same as above but with floating point numbers instead.
;		;
; float sum = 0;		; float sum = 0;
; for(i=0..N) {		; for(i=0..N) {
; sum += src[i];		; sum += src[i];
; dst[42] = sum;		; dst[42] = sum;
; }		; }
; CHECK-LABEL: @reduc_store_fadd_fast		; CHECK-LABEL: @reduc_store_fadd_fast
; CHECK-NOT: vector.body		; CHECK: vector.body:
		; CHECK: phi <4 x float>
		; CHECK: load <4 x float>
		; CHECK: fadd fast <4 x float>
		; CHECK-NOT: store float %{{[0-9]+}}, float* %gep.dst
		; CHECK: middle.block:
		; CHECK-NEXT: [[TMP:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32
		; CHECK-NEXT: store float %{{[0-9]+}}, float* %gep.dst
define void @reduc_store_fadd_fast(float* %dst, float* readonly %src) {		define void @reduc_store_fadd_fast(float* %dst, float* readonly %src) {
entry:		entry:
%gep.dst = getelementptr inbounds float, float* %dst, i64 42		%gep.dst = getelementptr inbounds float, float* %dst, i64 42
store float 0.000000e+00, float* %gep.dst, align 4		store float 0.000000e+00, float* %gep.dst, align 4
br label %for.body		br label %for.body

for.body:		for.body:
%sum = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]		%sum = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
; int sum = 0;		; int sum = 0;
; for(int i=0; i < 1000; i+=2) {		; for(int i=0; i < 1000; i+=2) {
; sum += src[i];		; sum += src[i];
; dst[42] = sum;		; dst[42] = sum;
; sum += src[i+1];		; sum += src[i+1];
; dst[42] = sum;		; dst[42] = sum;
; }		; }
; CHECK-LABEL: @reduc_store_inside_unrolled		; CHECK-LABEL: @reduc_store_inside_unrolled
; CHECK-NOT: vector.body		; CHECK: vector.body:
		; CHECK-NOT: store i32 %{{[0-9]+}}, i32* %gep.dst
		fhahnUnsubmitted Done Reply Inline Actions I think it would be good to check the full reduction cycle & store here. I don't think this test case is handled correctly at the moment as the IR seems out of sync with the pseudo code (which is why personally I think the C pseudo code is a bit distracting) Note that in the IR the reduction cycle is `%sum -> sum.1 -> %sum` and not `%sum -> %sum.1 -> %sum.2 -> %sum`. When this gets vectorized, the final value written to dst[42] is the final value of %sum.1, not %sum.2 as it should be I think. fhahn: I think it would be good to check the full reduction cycle & store here. I don't think this…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions The IR was incorrect actually because of the error I made during renaming and the pseudo-code was showing my original plot for the test-case. Luckily it helped to expose a bug, so I added an extra check to `IVDescriptors.cpp` and now `reduc_store_not_final_value` function is testing this case (when we have IntermediateStore but no ExitInstruction and value stored is not actually a final value). igor.kirillov: The IR was incorrect actually because of the error I made during renaming and the pseudo-code…
		fhahnUnsubmitted Done Reply Inline Actions I think it would still be good to test that the full reduction cycle is generated correctly (probably worth checking the full vector body + middle.block), at least for some of the tests in the file, to make sure the full sequence is generated correctly. fhahn: I think it would still be good to test that the full reduction cycle is generated correctly…
		fhahnUnsubmitted Done Reply Inline Actions The IR was incorrect actually because of the error I made during renaming and the pseudo-code was showing my original plot for the test-case. This seems to indicate that the pseudo code may be distracting (: fhahn: > The IR was incorrect actually because of the error I made during renaming and the pseudo-code…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions @fhahn Added full checks for `reduc_store` and this function. The code generated is overscalarized for `reduc_store_inside_unrolled` but I checked and without invariant stores it is the same. igor.kirillov: @fhahn Added full checks for `reduc_store` and this function. The code generated is…
		; CHECK: middle.block:
		; CHECK-NEXT: [[TMP:%.*]] = call i32 @llvm.vector.reduce.add.v4i32
		; CHECK-NEXT: store i32 [[TMP]], i32* %gep.dst
		; CHECK: ret void
define void @reduc_store_inside_unrolled(i32* %dst, i32* readonly %src) {		define void @reduc_store_inside_unrolled(i32* %dst, i32* readonly %src) {
entry:		entry:
%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42		%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42
br label %for.body		br label %for.body

for.body:		for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%sum = phi i32 [ 0, %entry ], [ %sum.1, %for.body ]		%sum = phi i32 [ 0, %entry ], [ %sum.2, %for.body ]
%gep.src = getelementptr inbounds i32, i32* %src, i64 %iv		%gep.src = getelementptr inbounds i32, i32* %src, i64 %iv
%0 = load i32, i32* %gep.src, align 4		%0 = load i32, i32* %gep.src, align 4
%sum.1 = add nsw i32 %0, %sum		%sum.1 = add nsw i32 %0, %sum
store i32 %sum.1, i32* %gep.dst, align 4		store i32 %sum.1, i32* %gep.dst, align 4
%1 = or i64 %iv, 1		%1 = or i64 %iv, 1
%gep.src.1 = getelementptr inbounds i32, i32* %src, i64 %1		%gep.src.1 = getelementptr inbounds i32, i32* %src, i64 %1
%2 = load i32, i32* %gep.src.1, align 4		%2 = load i32, i32* %gep.src.1, align 4
%sum.2 = add nsw i32 %2, %sum.1		%sum.2 = add nsw i32 %2, %sum.1
store i32 %sum.2, i32* %gep.dst, align 4		store i32 %sum.2, i32* %gep.dst, align 4
%iv.next = add nuw nsw i64 %iv, 2		%iv.next = add nuw nsw i64 %iv, 2
%cmp = icmp slt i64 %iv.next, 1000		%cmp = icmp slt i64 %iv.next, 1000
br i1 %cmp, label %for.body, label %exit		br i1 %cmp, label %for.body, label %exit

exit:		exit:
ret void		ret void
}		}

		; Check that we cannot vectorize code if stored value is not the final reduction
		; value
		;
		; int sum = 0;
		; for(int i=0; i < 1000; i+=2) {
		david-armUnsubmitted Done Reply Inline Actions nit: Should be i++, not i+=2. david-arm: nit: Should be i++, not i+=2.
		; sum += src[i];
		; dst[42] = sum + 1;
		; }
		; CHECK-LABEL: @reduc_store_not_final_value
		; CHECK-NOT: vector.body:
		define void @reduc_store_not_final_value(i32* %dst, i32* readonly %src) {
		entry:
		%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42
		store i32 0, i32* %gep.dst, align 4
		br label %for.body

		for.body:
		%sum = phi i32 [ 0, %entry ], [ %add, %for.body ]
		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
		%gep.src = getelementptr inbounds i32, i32* %src, i64 %iv
		%0 = load i32, i32* %gep.src, align 4
		%add = add nsw i32 %sum, %0
		%sum_plus_one = add i32 %add, 1
		store i32 %sum_plus_one, i32* %gep.dst, align 4
		%iv.next = add nuw nsw i64 %iv, 1
		%exitcond = icmp eq i64 %iv.next, 1000
		br i1 %exitcond, label %exit, label %for.body

		exit:
		ret void
		}


; We cannot vectorize if two (or more) invariant stores exist in a loop.		; We cannot vectorize if two (or more) invariant stores exist in a loop.
;		;
; int sum = 0;		; int sum = 0;
; for(int i=0; i < 1000; i+=2) {		; for(int i=0; i < 1000; i+=2) {
; sum += src[i];		; sum += src[i];
; dst[42] = sum;		; dst[42] = sum;
; sum += src[i+1];		; sum += src[i+1];
; other_dst[42] = sum;		; other_dst[42] = sum;
; }		; }
; CHECK-LABEL: @reduc_double_invariant_store		; CHECK-LABEL: @reduc_double_invariant_store
; CHECK-NOT: vector.body:		; CHECK-NOT: vector.body:
define void @reduc_double_invariant_store(i32* %dst, i32* %other_dst, i32* readonly %src) {		define void @reduc_double_invariant_store(i32* %dst, i32* %other_dst, i32* readonly %src) {
entry:		entry:
%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42		%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42
%gep.other_dst = getelementptr inbounds i32, i32* %other_dst, i64 42		%gep.other_dst = getelementptr inbounds i32, i32* %other_dst, i64 42
br label %for.body		br label %for.body

for.body:		for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%sum = phi i32 [ 0, %entry ], [ %sum.1, %for.body ]		%sum = phi i32 [ 0, %entry ], [ %sum.2, %for.body ]
		david-armUnsubmitted Done Reply Inline Actions nit: Looks like an unnecessary change from %sum.1 -> %sum.2. david-arm: nit: Looks like an unnecessary change from %sum.1 -> %sum.2.
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions `sum.1` is `sum + src[i]` and `sum.2` is `sum + src[i] + src[i+1]`. Looks like everything is fine for me. igor.kirillov: `sum.1` is `sum + src[i]` and `sum.2` is `sum + src[i] + src[i+1]`. Looks like everything is…
%arrayidx = getelementptr inbounds i32, i32* %src, i64 %iv		%arrayidx = getelementptr inbounds i32, i32* %src, i64 %iv
%0 = load i32, i32* %arrayidx, align 4		%0 = load i32, i32* %arrayidx, align 4
%sum.1 = add nsw i32 %0, %sum		%sum.1 = add nsw i32 %0, %sum
store i32 %sum.1, i32* %gep.dst, align 4		store i32 %sum.1, i32* %gep.dst, align 4
%1 = or i64 %iv, 1		%1 = or i64 %iv, 1
%arrayidx4 = getelementptr inbounds i32, i32* %src, i64 %1		%arrayidx4 = getelementptr inbounds i32, i32* %src, i64 %1
%2 = load i32, i32* %arrayidx4, align 4		%2 = load i32, i32* %arrayidx4, align 4
%sum.2 = add nsw i32 %2, %sum.1		%sum.2 = add nsw i32 %2, %sum.1
Show All 10 Lines
; for(int i=0; i < 1000; i+=2) {		; for(int i=0; i < 1000; i+=2) {
; sum += src[i];		; sum += src[i];
; if (src[i+1] > 0)		; if (src[i+1] > 0)
; dst[42] = sum;		; dst[42] = sum;
; sum += src[i+1];		; sum += src[i+1];
; dst[42] = sum;		; dst[42] = sum;
; }		; }
; CHECK-LABEL: @reduc_store_middle_store_predicated		; CHECK-LABEL: @reduc_store_middle_store_predicated
; CHECK-NOT: vector.body		; CHECK: vector.body:
		; CHECK-NOT: store i32 %{{[0-9]+}}, i32* %gep.dst
		; CHECK: middle.block:
		; CHECK-NEXT: [[TMP:%.*]] = call i32 @llvm.vector.reduce.add.v4i32
		; CHECK-NEXT: store i32 [[TMP]], i32* %gep.dst
		; CHECK: ret void
define void @reduc_store_middle_store_predicated(i32* %dst, i32* readonly %src) {		define void @reduc_store_middle_store_predicated(i32* %dst, i32* readonly %src) {
entry:		entry:
%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42		%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42
br label %for.body		br label %for.body

for.body: ; preds = %latch, %entry		for.body: ; preds = %latch, %entry
%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
%sum = phi i32 [ 0, %entry ], [ %sum.2, %latch ]		%sum = phi i32 [ 0, %entry ], [ %sum.2, %latch ]
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	latch: ; preds = %predicated, %for.body
br i1 %cmp, label %for.body, label %exit		br i1 %cmp, label %for.body, label %exit

exit: ; preds = %latch		exit: ; preds = %latch
ret void		ret void
}		}

; Final value used outside of loop does not prevent vectorization		; Final value used outside of loop does not prevent vectorization
;		;
; int sum = 0;		; int sum = 0;
		fhahnUnsubmitted Done Reply Inline Actions Would it be possible to add runtime tests for the mis-compiles fixed to https://github.com/llvm/llvm-test-suite/tree/main/SingleSource/UnitTests/Vectorizer? fhahn: Would it be possible to add runtime tests for the mis-compiles fixed to https://github.
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Added a simple test here - https://reviews.llvm.org/D124609 I don't feel like any other test adds more robustness, but if you have an idea what could be also added I'll gladly implement it! igor.kirillov: Added a simple test here - https://reviews.llvm.org/D124609 I don't feel like any other test…
; for(int i=0; i < 1000; i++) {		; for(int i=0; i < 1000; i++) {
; sum += src[i];		; sum += src[i];
; dst[42] = sum;		; dst[42] = sum;
; }		; }
; dst[43] = sum;		; dst[43] = sum;
; CHECK-LABEL: @reduc_store_inoutside		; CHECK-LABEL: @reduc_store_inoutside
; CHECK-NOT: vector.body		; CHECK: vector.body:
		; CHECK-NOT: store i32 %{{[0-9]+}}, i32* %gep.src
		; CHECK: middle.block:
		; CHECK-NEXT: [[TMP:%.*]] = call i32 @llvm.vector.reduce.add.v4i32
		; CHECK-NEXT: store i32 [[TMP]], i32* %gep.dst
		; CHECK: exit:
		; CHECK: [[PHI:%.]] = phi i32 [ [[TMP1:%.]], %for.body ], [ [[TMP2:%.*]], %middle.block ]
		david-armUnsubmitted Done Reply Inline Actions nit: I think it's fine (and cleaner) to just do %[[PHI:%.]] = %[[ADDR:%.]] = here. Also, can you add the incoming value to the phi just to make sure it's the correct value? I think it should just be phi i32 {{ [[TMP]], %middle.block }} david-arm: nit: I think it's fine (and cleaner) to just do %[[PHI:%.]] = %[[ADDR:%.]] = here. Also…
		; CHECK: [[ADDR:%.]] = getelementptr inbounds i32, i32 %dst, i64 43
		; CHECK: store i32 [[PHI]], i32* [[ADDR]]
		; CHECK: ret void
define void @reduc_store_inoutside(i32* %dst, i32* readonly %src) {		define void @reduc_store_inoutside(i32* %dst, i32* readonly %src) {
entry:		entry:
%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42		%gep.dst = getelementptr inbounds i32, i32* %dst, i64 42
br label %for.body		br label %for.body

for.body:		for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%sum = phi i32 [ 0, %entry ], [ %sum.1, %for.body ]		%sum = phi i32 [ 0, %entry ], [ %sum.1, %for.body ]
%arrayidx = getelementptr inbounds i32, i32* %src, i64 %iv		%arrayidx = getelementptr inbounds i32, i32* %src, i64 %iv
%0 = load i32, i32* %arrayidx, align 4		%0 = load i32, i32* %arrayidx, align 4
%sum.1 = add nsw i32 %0, %sum		%sum.1 = add nsw i32 %0, %sum
store i32 %sum.1, i32* %gep.dst, align 4		store i32 %sum.1, i32* %gep.dst, align 4
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond = icmp eq i64 %iv.next, 1000		%exitcond = icmp eq i64 %iv.next, 1000
br i1 %exitcond, label %exit, label %for.body		br i1 %exitcond, label %exit, label %for.body

exit:		exit:
%sum.lcssa = phi i32 [ %sum.1, %for.body ]		%sum.lcssa = phi i32 [ %sum.1, %for.body ]
%gep.dst.1 = getelementptr inbounds i32, i32* %dst, i64 43		%gep.dst.1 = getelementptr inbounds i32, i32* %dst, i64 43
store i32 %sum.lcssa, i32* %gep.dst.1, align 4		store i32 %sum.lcssa, i32* %gep.dst.1, align 4
ret void		ret void
}		}
		david-armUnsubmitted Done Reply Inline Actions This looks a bit odd. I'm not sure why it's needed? If there is a CHECK-LABEL for every function I don't think this should happen. david-arm: This looks a bit odd. I'm not sure why it's needed? If there is a CHECK-LABEL for every…

llvm/test/Transforms/LoopVectorize/vplan-printing.ll

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	for.body: ; preds = %entry, %for.body
%iv.next = add i64 %iv, 1		%iv.next = add i64 %iv, 1
%exitcond = icmp eq i64 %iv.next, %n		%exitcond = icmp eq i64 %iv.next, %n
br i1 %exitcond, label %for.end, label %for.body		br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body, %entry		for.end: ; preds = %for.body, %entry
ret float %red.next		ret float %red.next
}		}

		define void @print_reduction_with_invariant_store(i64 %n, float* noalias %y, float* noalias %dst) {
		; CHECK-LABEL: Checking a loop in "print_reduction_with_invariant_store"
		; CHECK: VPlan 'Initial VPlan for VF={4},UF>=1' {
		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
		; CHECK-EMPTY:
		; CHECK-NEXT: <x1> vector loop: {
		; CHECK-NEXT: for.body:
		; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
		; CHECK-NEXT: WIDEN-INDUCTION %iv = phi %iv.next, 0
		; CHECK-NEXT: WIDEN-REDUCTION-PHI ir<%red> = phi ir<0.000000e+00>, ir<%red.next>
		; CHECK-NEXT: CLONE ir<%arrayidx> = getelementptr ir<%y>, ir<%iv>
		; CHECK-NEXT: WIDEN ir<%lv> = load ir<%arrayidx>
		; CHECK-NEXT: REDUCE ir<%red.next> = ir<%red> + fast reduce.fadd (ir<%lv>) (with final reduction value stored in invariant address sank outside of loop)
		; CHECK-NEXT: EMIT vp<[[CAN_IV_NEXT:%.+]]> = VF * UF +(nuw) vp<[[CAN_IV]]>
		; CHECK-NEXT: EMIT branch-on-count vp<[[CAN_IV_NEXT]]> vp<[[VEC_TC]]>
		; CHECK-NEXT: No successors
		; CHECK-NEXT: }
		; CHECK-NEXT: No successors
		; CHECK-NEXT: }
		;
		entry:
		br label %for.body

		for.body: ; preds = %entry, %for.body
		%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]
		%red = phi float [ %red.next, %for.body ], [ 0.0, %entry ]
		%arrayidx = getelementptr inbounds float, float* %y, i64 %iv
		%lv = load float, float* %arrayidx, align 4
		%red.next = fadd fast float %lv, %red
		store float %red.next, float* %dst, align 4
		%iv.next = add i64 %iv, 1
		%exitcond = icmp eq i64 %iv.next, %n
		br i1 %exitcond, label %for.end, label %for.body

		for.end: ; preds = %for.body, %entry
		ret void
		}

define void @print_replicate_predicated_phi(i64 %n, i64* %x) {		define void @print_replicate_predicated_phi(i64 %n, i64* %x) {
; CHECK-LABEL: Checking a loop in "print_replicate_predicated_phi"		; CHECK-LABEL: Checking a loop in "print_replicate_predicated_phi"
; CHECK: VPlan 'Initial VPlan for VF={4},UF>=1' {		; CHECK: VPlan 'Initial VPlan for VF={4},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: for.body:		; CHECK-NEXT: for.body:
; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION		; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
▲ Show 20 Lines • Show All 273 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Support reductions that store intermediary resultClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 405949

llvm/include/llvm/Analysis/IVDescriptors.h

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

llvm/lib/Analysis/IVDescriptors.cpp

llvm/lib/Analysis/LoopAccessAnalysis.cpp

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlan.cpp

llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll

llvm/test/Transforms/LoopVectorize/vplan-printing.ll

[LoopVectorize] Support reductions that store intermediary result
ClosedPublic