This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Support reductions that store intermediary result
ClosedPublic

Authored by igor.kirillov on Sep 22 2021, 5:51 AM.

Details

Summary

Adds ability to vectorize loops containing a store to a loop-invariant address as part of a reduction that isn't converted to SSA form due to lack of aliasing info. Runtime checks are generated to ensure the store does not alias any other accesses in the loop.
Ordered fadd reductions are not yet supported.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
fhahn added inline comments.Oct 11 2021, 2:37 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
900

What about loads to the same address in the loop? At the moment, LAA cannot analyze dependences with invariant addresses. But if this limitation gets removed the code here may become incorrect, because it relies on this limitation IIUC?

david-arm added inline comments.Oct 11 2021, 3:51 AM
llvm/include/llvm/Analysis/LoopAccessAnalysis.h
642

I assume here that InvariantStores refers to stores to an invariant (i.e. uniform?) address, rather than storing an invariant value to variant address. If so, perhaps it could be named StoresToInvariantAddress or something like that?

david-arm added inline comments.Oct 11 2021, 6:53 AM
llvm/test/Transforms/LoopVectorize/reduction.ll
471

Is it possible to add a simple floating point test with "fadd fast"?

It seems like the main problem is that we potentially bail out too early at the moment when checking for reductions due to the store, but once we generate runtime checks, sinking the store may become legal (see inline comment about loads to the same address)? If that's the case, ideally we'd just sink any such loads/stores before detecting reductions once we know they can be sunk due to runtime checks, but unfortunately I do not think that's possible with the current structure/ordering.

Is it worth for me to explore if it possible to do other way around or we should work on with the solution from this merge?

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
642

@fhah Just to make sure we are on the same line - for me uniform and invariant are just synonyms, isn't it so?

llvm/include/llvm/Transforms/Utils/LoopUtils.h
402 ↗(On Diff #378503)

What do you think if I make this function a public member of class ScalarEvolution? Or is there a better place for it?

llvm/lib/Analysis/IVDescriptors.cpp
300

I checked and only LoopInterchangeLegality is using this function and it is not affected by stores. Anyway, I can add a parameter to RecurrenceDescriptor::isReductionPHI or a member to RecurrenceDescriptor allowing or not to handle stores. What do you think about it?

The comment is to be updated, yes.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
900

Loads from or stores to the same address in the loop? I'm sorry could you clarify what the problem is. As it is I don't understand the message.

Hi @igor.kirillov, is it also possible to get this working for ordered reductions, i.e.

float sum = 0;
for(i=0..N) {
  sum += src[i];
  dst[42] = sum;
}

when building with -O3? I think it might mean updating checkOrderedReductions to look through the store. If it looks too difficult to do as part of this patch we can always follow-up with a patch later.

igor.kirillov marked 4 inline comments as done.

Update commit message and comments
Move storeToSameAddress to ScalarEvolution
Add fadd fast test
Do not apply patch for ordered fadd reductions

igor.kirillov marked an inline comment as done.Nov 7 2021, 5:01 AM

Hi @igor.kirillov, is it also possible to get this working for ordered reductions, i.e.

float sum = 0;
for(i=0..N) {
  sum += src[i];
  dst[42] = sum;
}

when building with -O3? I think it might mean updating checkOrderedReductions to look through the store. If it looks too difficult to do as part of this patch we can always follow-up with a patch later.

Yes, I added a check to LoopVectorizationLegality::canVectorizeFPMath so as not to allow this reduction when math is strict. Enabling it requires some work and it is better to do it separately.

llvm/test/Transforms/LoopVectorize/reduction.ll
471

Added! see reduc_store_fadd_fast function

igor.kirillov edited the summary of this revision. (Show Details)Nov 7 2021, 5:02 AM
igor.kirillov marked an inline comment as done.
david-arm added inline comments.Nov 8 2021, 6:05 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
964

Maybe this can be folded into the all_of case below, i.e.

return (all_of(getReductionVars(), [&](auto &Reduction) -> bool {
  const RecurrenceDescriptor &RdxDesc = Reduction.second;
  return !RdxDesc.hasExactFPMath() ||
         (RdxDesc.isOrdered() && !RdxDesc.IntermediateStore);
}));

Also, the problem with this code at the moment is that you could have a mixture of fast and ordered reductions in the same loop. There could be an intermediate store for one of the fast reductions, but not for the ordered ones. At the moment with your code we will just bail out in this case.

Update intermediate store check for ordered fadd vectorization

igor.kirillov added inline comments.Nov 9 2021, 7:24 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
964

You are right! It can be done much simpler and the check is needed only when ordered reduction is present.

david-arm added inline comments.Nov 10 2021, 5:40 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
902

Hi @igor.kirillov, I think you're missing a test case here. Suppose we have two stores in the loop to the same invariant address. If the first store is predicated, but the second isn't we should still vectorise because the second store wins and the first one can be removed. At least the code here suggests that. I couldn't find any test that exercised this code path.

910

I think you might be missing a test case for this. Can you make sure this code path is exercised please by your existing tests please?

llvm/test/Transforms/LoopVectorize/reduction.ll
477

Can you put all the CHECK lines in the same place near the top of the function please to be consistent with the existing tests?

517

Hi @igor.kirillov, this doesn't look right. The test is called @reduc_store_fadd_fast, but there is not fast keyword on the fadd instruction. I don't think we should even be vectorising this because it requires ordered reductions, which you haven't enabled for this test. Can you change the name of this to @reduc_store_fadd_ordered and investigate why we are vectorising this?

Can you also add a separate test called @reduc_store_fadd_fast that actually has the fast keyword too?

igor.kirillov marked 2 inline comments as done and an inline comment as not done.Nov 18 2021, 3:06 AM
igor.kirillov added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
902

Added reduc_store_final_store_predicated test that address both requests

910

It is now covered by new reduc_store_final_store_predicated test and an old test reduc_store_inside_unrolled also executes this path

llvm/test/Transforms/LoopVectorize/reduction.ll
517

I missed the fast keyword there. As for why this code gets vectorized - it happens because of Hints->allowReordering() returning true in LoopVectorizationLegality::canVectorizeFPMath. As you can see the test specifies vector width (-force-vector-width=4) and llvm allows to process ordered instructions in unordered manner (see also LoopVectorizeHints::allowReordering function) in that case.

I added a new test with fast keyword to AArch64/strict-fadd.ll and the loop is not vectorized there.

david-arm added inline comments.Nov 18 2021, 3:10 AM
llvm/test/Transforms/LoopVectorize/reduction.ll
656

Can you also add a test where the first store is predicated and the second one isn't? According to the code changes in this patch we should vectorise this case because the second one overrides the first.

Add test

llvm/test/Transforms/LoopVectorize/reduction.ll
656

Added! See reduc_store_middle_store_predicated

fhahn added inline comments.Nov 18 2021, 3:44 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
900

The case I was thinking about was something like the snippet below, where we have a load of the invariant address in the loop (%lv = load... in the example below).

define void @reduc_store(i32* %dst, i32* readonly %src, i32* noalias %dst.2) {
entry:
  %arrayidx = getelementptr inbounds i32, i32* %dst, i64 42
  store i32 0, i32* %arrayidx, align 4
  br label %for.body
for.body:
  %0 = phi i32 [ 0, %entry ], [ %add, %for.body ]
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %arrayidx1 = getelementptr inbounds i32, i32* %src, i64 %indvars.iv
  %1 = load i32, i32* %arrayidx1, align 4
  %add = add nsw i32 %0, %1
  %lv = load i32, i32* %arrayidx
  store i32 %add, i32* %arrayidx, align 4
  
  %gep.dst.2  = getelementptr inbounds i32, i32* %dst.2, i64 %indvars.iv
  store i32 %lv, i32* %gep.dst.2,

  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp eq i64 %indvars.iv.next, 1000
  br i1 %exitcond, label %for.cond.cleanup, label %for.body

for.cond.cleanup:
  ret void
}
llvm/test/Transforms/LoopVectorize/reduction.ll
468

Could you add a brief textual explanation of what the test covers?

482

perhaps rename to %sum or something like that, to make it a bit easier to read the test?

487

The names for the 2 different GEPs in the tests are very similar. Could you rename them to make it easier to distinguish them (e.g. something like %gep.src/%gep.dst)..

Hi @igor.kirillov, thanks for all the changes and the patch looks good now! I just had a couple of minor comments.

llvm/lib/Analysis/IVDescriptors.cpp
457

nit: Could you remove an extra level of indentation here before merging the patch?

i.e. this can be written like this:

if (isa<PHINode>(UI))
  PHIs.push_back(UI);
else if (auto *SI = dyn_cast<StoreInst>(UI)) {
  ...
} else
  NonPHIs.push_back(UI);
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
974

nit: I think you can you might be able to simplify the code here by removing FoundMatchingRecurrence . Then, lower down instead of the break you can just do

if (DSI && (DSI == SI)) {
  *IsPredicated = blockNeedsPredication(DSI->getParent());
  return true;
}

and at the bottom of the function just do:

return false;
igor.kirillov marked an inline comment as done.

Refactoring a bit

igor.kirillov marked 8 inline comments as done.Nov 18 2021, 7:22 AM
igor.kirillov added inline comments.
llvm/lib/Analysis/IVDescriptors.cpp
457

I made an extra step and simplified it even more

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
900

In that case we do not vectorize, the rejection happens in LoopAccessInfo::analyzeLoop when loads ape processed:

  for (LoadInst *LD : Loads) {
    Value *Ptr = LD->getPointerOperand();
...
    // See if there is an unsafe dependency between a load to a uniform address and
    // store to the same uniform address.
    if (UniformStores.count(Ptr)) {
      LLVM_DEBUG(dbgs() << "LAA: Found an unsafe dependency between a uniform "
                           "load and uniform store to the same address!\n");
      HasDependenceInvolvingLoopInvariantAddress = true;
    }
...

I added reduc_store_load test anyway

igor.kirillov marked an inline comment as done.

Add scalable reduction test

Little extension in a new test

david-arm accepted this revision.Nov 23 2021, 2:27 AM

LGTM! Thanks for making all the changes @igor.kirillov.

llvm/test/Transforms/LoopVectorize/reduction.ll
466

nit: I think this should be invariant

This revision is now accepted and ready to land.Nov 23 2021, 2:27 AM
igor.kirillov marked an inline comment as done.Nov 23 2021, 2:48 AM
fhahn added inline comments.Nov 23 2021, 2:48 AM
llvm/include/llvm/Analysis/IVDescriptors.h
265

nit: should be a doc-comment?

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
583

nit: could return ArrayRef, not leaking any details about the underlying container.

642

nit: Is there a reason for choosing 5 for the size? if not, nowadays SmallVector can pick a good size automatically.

llvm/include/llvm/Analysis/ScalarEvolution.h
1120

Do you anticipate this to be used outside Transforms/Vectorize? If not I'm not sure if it makes sense to live here, and extending the API interface of ScalarEvolution. Can this instead be defined somewhere in Transforms/Vectorize?

llvm/lib/Analysis/IVDescriptors.cpp
237

needs documentation?

300

I guess it would be good to have at least a test case for loop-interchange to make sure it can deal with the change properly?

495

nit: for consistency with other code avoid using != nullptr. This is more in line with the style use in LLVM in general (and you use ==/!= nullptr in the adjacent code here).

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
900

Yeah but this is only due to some limitations that currently existing in LAA, right? I think we should at least make this clear somewhere, e.g. in a comment.

906

nit: no need to use llvm::

977

nit: redundant ().

978

could we instead just do the check at the call site or is there a benefit of doing it here?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3054

We effectively sink the store outside the loop. In that case, I don't think we should create a recipe and also we should not consider its cost.

mgabka added a subscriber: mgabka.Nov 29 2021, 1:56 AM
igor.kirillov marked 9 inline comments as done.

Lots of updates related to the recent comments

igor.kirillov marked 5 inline comments as done.Nov 29 2021, 11:09 AM
igor.kirillov added inline comments.
llvm/lib/Analysis/IVDescriptors.cpp
300

You are right. We actually should not do loop interchange is an invariant address is present. Otherwise it may introduce incorrect result.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
900

Added comment

igor.kirillov marked 2 inline comments as done.Nov 29 2021, 11:09 AM

Hi, @fhahn! Since there have been several quite serious changes in the patch after your last review, I would be happy to receive you approval before merge (even though status is accepted now).

fhahn added a comment.Dec 10 2021, 1:34 AM

Hi, @fhahn! Since there have been several quite serious changes in the patch after your last review, I would be happy to receive you approval before merge (even though status is accepted now).

sure, I'll try to take another look by end-of-day Monday

fhahn added a comment.Jan 5 2022, 1:53 AM

Thanks for the latest update! I left some more additional comments. It might be good to pre-commit the tests and only include the diff in the patch. Another thing to consider is moving the reduction-store tests to a separate file, as reduction.ll is already quite large.

One thing to note on the overall approach is that it's a bit of a workaround some current limitations in our modeling, but I don't think there's an alternative in the short term and this is clearly a desirable case to support, so that seems fine to me in general. There's work in progress to model the pre-header and exit block in VPlan, which allows us to do the sinking of the store more easily, so we can improve things further then :)

llvm/include/llvm/Analysis/IVDescriptors.h
179

From an interface perspective, I think it would be better to make SE an optional argument and only perform the the store analysis when it is passed. This would also require users to explicitly opt-in, which would at least partially guarding against people ignoring IntermediateStore.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9440

either deleted or go outside the loop sounds a bit unclear. Aren't they moved to the vector exit block and store the final reduction value?

9442

I think it would also be good to include at least some information in the VPlan that the store is handled as part of the reduction. Perhaps VPReductionRecipe should print if the result is stored after the loop? Please add a test case to vplan-printing.ll

llvm/test/Transforms/LoopVectorize/reduction.ll
469

FWIW for such compact IR test cases, the pseudo code doesn't add much value in my personal opinion. Better to strive to make the tests as readable/compact as possible and have a comment explaining what it tests when needed.

546

nit: newline.

628

Here (and for the other tests it would be good to check at least the vector reduction sequence and that the correct value is stored.

fhahn added inline comments.Jan 5 2022, 1:53 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
894

It would probably be good to make clear what is handled here exactly and why we can handle those stores. IIUC this applies only to invariant stores that store reduction results and is safe because runtime checks guarantee that it won't alias with other objects. The store won't get vectorized, but sunk to the exit block during codegen.

llvm/test/Transforms/LoopVectorize/reduction.ll
582

not needed?

585

nit: is this needed? Can just pass %n as i64

672

move exit block to end of function?

711

move exit block to end of function?

753

move exit block to end of function?

803

move exit block to end of function?

Not sure if you saw, there's a somewhat related bugreport: https://github.com/llvm/llvm-project/issues/50286
Not sure if this already supports that pattern though.

igor.kirillov marked 11 inline comments as done.

Update tests, move invariant store tests into separate file, add more comments

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9442

I'm not sure how to display this information properly. This is how output of VReductionRecipe::print looks like now:

REDUCE ir<%red.next> = ir<%red> + fast reduce.fadd (ir<%lv>)

I could add something to the end but it doesn't seem to fit there. Also I have not found any proper V.*Recipe where I could place something like DELETE store .*.

llvm/test/Transforms/LoopVectorize/reduction.ll
469

When I look at pseudo-code I immediately understand what it is about, whereas looking at IR takes at least 30 seconds of cognitive exertions. And it is also easier to see the difference between and purpose of all those quite similar tests. But I delete it, of course, if you insist :)

@fhahn I created the review with tests only here - https://reviews.llvm.org/D117213. Once it is merged I'll update this one.

@lebedev.ri This patch addresses a different problem. Here we try to handle when a reduction value is stored in some address and if this address is invariant plus other lucky conditions are satisfied we manage to vectorize. In your example address where value is stored is not invariant at all. Nevertheless, the case is interesting.

Add invariant store information to VReductionRecipe::print output

reames resigned from this revision.Jan 27 2022, 1:26 PM
igor.kirillov added inline comments.Jan 27 2022, 1:29 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9442

@fhahn I added some info to VReductionRecipe::print and a relevant test. What do you think about it?

Hi @igor.kirillov, the patch looks good now! I just had one question about an odd-looking CHECK line in one of the test files.

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
341 ↗(On Diff #403776)

nit: I think it's fine (and cleaner) to just do

%[[PHI:%.*]] =
%[[ADDR:%.*]] =

here. Also, can you add the incoming value to the phi just to make sure it's the correct value? I think it should just be

phi i32 {{ [[TMP]], %middle.block }}
369 ↗(On Diff #403776)

This looks a bit odd. I'm not sure why it's needed? If there is a CHECK-LABEL for every function I don't think this should happen.

fhahn added a comment.Feb 3 2022, 8:41 AM

Thanks for the update!

llvm/lib/Analysis/IVDescriptors.cpp
343

Do we have to make sure that the stored value is feeding the phi again and not an earlier value in the cycle?

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
170 ↗(On Diff #403776)

I think it would be good to check the full reduction cycle & store here. I don't think this test case is handled correctly at the moment as the IR seems out of sync with the pseudo code (which is why personally I think the C pseudo code is a bit distracting)

Note that in the IR the reduction cycle is %sum -> sum.1 -> %sum and not %sum -> %sum.1 -> %sum.2 -> %sum. When this gets vectorized, the final value written to dst[42] is the final value of %sum.1, not %sum.2 as it should be I think.

igor.kirillov marked 2 inline comments as done.

Add fix that prevents vectorization for the case when not a final reduction value stored in a loop invariant address.
Add test for this case.
Remove some unused code and rename a couple of IR variables in the reduction-with-invariant-store.ll test.
Fix incorrect phi node in reduc_store_inside_unrolled and reduc_double_invariant_store tests.

igor.kirillov marked 3 inline comments as done.Feb 4 2022, 6:47 AM
igor.kirillov added inline comments.
llvm/lib/Analysis/IVDescriptors.cpp
343

Yes! I added this check and also a test case (reduc_store_not_final_value)

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
170 ↗(On Diff #403776)

The IR was incorrect actually because of the error I made during renaming and the pseudo-code was showing my original plot for the test-case. Luckily it helped to expose a bug, so I added an extra check to IVDescriptors.cpp and now reduc_store_not_final_value function is testing this case (when we have IntermediateStore but no ExitInstruction and value stored is not actually a final value).

igor.kirillov marked 2 inline comments as done.Feb 4 2022, 6:47 AM
fhahn added inline comments.Feb 14 2022, 2:16 AM
llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
170 ↗(On Diff #403776)

I think it would still be good to test that the full reduction cycle is generated correctly (probably worth checking the full vector body + middle.block), at least for some of the tests in the file, to make sure the full sequence is generated correctly.

fhahn added inline comments.Feb 14 2022, 2:18 AM
llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
170 ↗(On Diff #403776)

The IR was incorrect actually because of the error I made during renaming and the pseudo-code was showing my original plot for the test-case.

This seems to indicate that the pseudo code may be distracting (:

Add full vector.body and middle.block checks for reduc_store and reduc_store_inside_unrolled tests

igor.kirillov added inline comments.Feb 17 2022, 8:46 AM
llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
170 ↗(On Diff #403776)

@fhahn Added full checks for reduc_store and this function. The code generated is overscalarized for reduc_store_inside_unrolled but I checked and without invariant stores it is the same.

Thanks for the update with the tests! I think there might still be a correctness issue. More details inline.

llvm/include/llvm/Analysis/IVDescriptors.h
184

nit: addresses ?

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
924

I'm not sure if the comment matches the code. There's no guarantee that all stores to the same address come before the store of the reduction AFAICT. E.g. the test below has a store to the same address after the store of the reduction. I think this may get mis-compiled, as the store to 0 gets replaced with the store of the final value of the reduction.

define void @reduc_store(i32* %dst, i32* readonly %src) {
entry:
  %gep.dst = getelementptr inbounds i32, i32* %dst, i64 42
  store i32 1, i32* %gep.dst, align 4
  br label %for.body

for.body:
  %sum = phi i32 [ 0, %entry ], [ %add, %for.body ]
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
  %gep.src = getelementptr inbounds i32, i32* %src, i64 %iv
  %0 = load i32, i32* %gep.src, align 4
  %add = add nsw i32 %sum, %0
  store i32 %add, i32* %gep.dst, align 4
  store i32 0, i32* %gep.dst, align 4
  %iv.next = add nuw nsw i64 %iv, 1
  %exitcond = icmp eq i64 %iv.next, 1000
  br i1 %exitcond, label %exit, label %for.body

exit:
  ret void
}
976

nit: can drop DSI, as SI is guaranteed to be non-null?

Fix incorrectly processed case when reduction value stored in invariant could be overwritten inside loop

Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2022, 1:28 PM
igor.kirillov marked 4 inline comments as done.Mar 15 2022, 1:33 PM
igor.kirillov added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
924

Yeah, it was processed incorrectly. I added this test along with the fix.

fhahn added a comment.Apr 4 2022, 2:23 PM

Thanks for the update! I'll do a bit more testing and will take another look soon, but I *think* it should be good now.

@fhahn, @david-arm ping. Also I run lnt test-suite and everything is fine and 17 more loops were vectorized according to -stats

david-arm accepted this revision.Apr 25 2022, 7:01 AM

LGTM! Thanks for addressing all the comments and fixing bugs, adding new tests, etc. I have a couple of nits you can address before merging. I think given that you've run the LLVM test suite without seeing failures and we know more loops are vectorising then that gives us a good level of confidence. :)

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
256 ↗(On Diff #415557)

nit: Should be i++, not i+=2.

303 ↗(On Diff #415557)

nit: Looks like an unnecessary change from %sum.1 -> %sum.2.

fhahn accepted this revision.Apr 25 2022, 10:31 AM

LGTM, thanks! I added a few additional comments that can be addressed directly before committing.

FYI work has started to model the exit block in VPlan as well: D123457, D123537. fixReduction and the store sinking can and should now be migrated to be modeled explicitly in VPlan.

llvm/include/llvm/Analysis/IVDescriptors.h
264

nit: would be good to also document what this means, what the properties of such stores are.

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
310

nit: could be more specific, the increment of the reduction is used as stored operand, right?

Also, they only handle reductions right? Then using Reduction instead of Recurrence seems more descriptive, .e.g. isInvariantStoreOfReduction? (same for isRecurringInvariantAddress

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
431

With opaque pointers, this code may behave differently.

It's possible to have

store i32 0, ptr %x
store i8 0, ptr %x

In that case the pointer operands will be the same, but different store widths. I *think* we should also check that the types of the stored values match for now, as we use this to remove earlier stores. This should only be correct if the later store writes at least as many bits as the earlier stores.

973

nit: could be any_of

983

nit: could be any_of.

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
412 ↗(On Diff #415557)

Would it be possible to add runtime tests for the mis-compiles fixed to https://github.com/llvm/llvm-test-suite/tree/main/SingleSource/UnitTests/Vectorizer?

igor.kirillov marked 7 inline comments as done.

Lots of updates related to the recent comments

igor.kirillov marked an inline comment as done.Apr 27 2022, 6:34 AM
igor.kirillov added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
431

I think if this check is added here, then the purpose of the function would be different (address is still the same even if values have different size). So, instead of that I added a check to a place where those pointers are really processed - see LoopVectorizationLegality.cpp:975. There we now make sure that values stored in an invariant address are of the same type.

llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
303 ↗(On Diff #415557)

sum.1 is sum + src[i] and sum.2 is sum + src[i] + src[i+1]. Looks like everything is fine for me.

igor.kirillov marked an inline comment as done.Apr 27 2022, 6:34 AM
igor.kirillov marked an inline comment as done.Apr 28 2022, 5:54 AM
igor.kirillov added inline comments.
llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll
412 ↗(On Diff #415557)

Added a simple test here - https://reviews.llvm.org/D124609
I don't feel like any other test adds more robustness, but if you have an idea what could be also added I'll gladly implement it!

igor.kirillov marked an inline comment as done.Apr 28 2022, 5:54 AM
This revision was landed with ongoing or failed builds.May 3 2022, 2:13 AM
This revision was automatically updated to reflect the committed changes.

Hello,

I wrote
https://github.com/llvm/llvm-project/issues/57572
about a verifier complaint/crash that started happening with this patch.