Page MenuHomePhabricator

[LoopVectorize] Support reductions that store intermediary result
Needs ReviewPublic

Authored by igor.kirillov on Wed, Sep 22, 5:51 AM.

Details

Summary

Adds ability to vectorize loops containing a store to a loop-invariant
address as part of a reduction that isn't converted to SSA form due to
lack of aliasing info

Diff Detail

Event Timeline

igor.kirillov created this revision.Wed, Sep 22, 5:51 AM
igor.kirillov requested review of this revision.Wed, Sep 22, 5:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptWed, Sep 22, 5:51 AM
nikic added a subscriber: nikic.Wed, Sep 22, 6:11 AM
nikic added inline comments.
llvm/include/llvm/Analysis/IVDescriptors.h
21

Drive by note: This new include does not look necessary.

igor.kirillov added inline comments.Wed, Sep 22, 6:30 AM
llvm/include/llvm/Analysis/IVDescriptors.h
21

Unfortunately, llvm doesn't compile without this include

I'm in the process of increasing my knowledge of this kind of thing, so not an authoritative review from me. Have passed it through my eyeballs, hopefully can add some comments for useful general improvements.

llvm/include/llvm/Analysis/IVDescriptors.h
21

It looks like the needed include is #include "llvm/IR/Instructions.h" for StoreInst, or better forward declare class StoreInst; along with the others nearby.

https://include-what-you-use.org
https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/WhyIWYU.md

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
310

nit. 'recurrent'

llvm/lib/Analysis/IVDescriptors.cpp
305

Does this comment need updating to discuss your intermediate stores?

457

Suggestion: Does this warrant a comment? (I only observe that there are lots of comments around here and I spent a moment trying to guess at what this did without arriving at an answer).

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1992

Does the word 'variant' add anything? I couldn't find any other uses nearby which elucidate what you're trying to say here. It feels confusing because you are talking about 'InvariantStores' which means invariant in the address, so it looks like a typo. I feel this comment could be better: 'Record stores instructions to loop-invariant addresses', in contrast to the comment on UniformStores above? Do you need to say much about the value?

llvm/test/Transforms/LoopVectorize/reduction.ll
539

Suggestion for a test: what about testing two different destination addresses modified in the loop body?

590

CHECK-?

Update comment

igor.kirillov marked 7 inline comments as done.Sun, Oct 10, 6:42 AM
igor.kirillov added inline comments.
llvm/include/llvm/Analysis/IVDescriptors.h
21

Fixed

llvm/lib/Analysis/IVDescriptors.cpp
305

Yes, indeed

457

I hope this one clarifies. And, actually we should exit if reduction variable is used as an address (this should be highly unlikely case but nevertheless)

llvm/test/Transforms/LoopVectorize/reduction.ll
539

Actually this is not supported so far, only one invariant address is possible. (See llvm/lib/Analysis/IVDescriptors.cpp:318). Do you think we need test that checks that vectorisation is not happening?

igor.kirillov marked 3 inline comments as done.Sun, Oct 10, 6:42 AM

Tagging some more potential reviewers.

llvm/test/Transforms/LoopVectorize/reduction.ll
539

Given that it is not supported, such a test would demonstrate that your bailout logic for this case is working as intended.

fhahn added a comment.Mon, Oct 11, 2:37 AM

to SSA form due to lack of aliasing info

It would be good to mention *why* those particular stores can be sunk during loop vectorization. IIUC the patch relies on the runtime checks generated to ensure the store does not alias other accesses in the loop.

It seems like the main problem is that we potentially bail out too early at the moment when checking for reductions due to the store, but once we generate runtime checks, sinking the store may become legal (see inline comment about loads to the same address)? If that's the case, ideally we'd just sink any such loads/stores before detecting reductions once we know they can be sunk due to runtime checks, but unfortunately I do not think that's possible with the current structure/ordering.

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
642

Are the stores invariant or to a uniform address? InvariantStores implies they are invariant, which may not be the case?

llvm/include/llvm/Transforms/Utils/LoopUtils.h
402

Is LoopUtils.h the right place for this? Is there anything loop related?

llvm/lib/Analysis/IVDescriptors.cpp
301

What about existing users of isReductionPHI which currently may rely on the fact that all instruction in the loop must either be phis or reduction operations? Also, with respect to the store restriction, the important bit is that the final value is also stored, right?

310

You are only checking for loop-invariant addresses, so should this be loop invariant memory location?

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
900

What about loads to the same address in the loop? At the moment, LAA cannot analyze dependences with invariant addresses. But if this limitation gets removed the code here may become incorrect, because it relies on this limitation IIUC?

david-arm added inline comments.Mon, Oct 11, 3:51 AM
llvm/include/llvm/Analysis/LoopAccessAnalysis.h
642

I assume here that InvariantStores refers to stores to an invariant (i.e. uniform?) address, rather than storing an invariant value to variant address. If so, perhaps it could be named StoresToInvariantAddress or something like that?

david-arm added inline comments.Mon, Oct 11, 6:53 AM
llvm/test/Transforms/LoopVectorize/reduction.ll
471

Is it possible to add a simple floating point test with "fadd fast"?

It seems like the main problem is that we potentially bail out too early at the moment when checking for reductions due to the store, but once we generate runtime checks, sinking the store may become legal (see inline comment about loads to the same address)? If that's the case, ideally we'd just sink any such loads/stores before detecting reductions once we know they can be sunk due to runtime checks, but unfortunately I do not think that's possible with the current structure/ordering.

Is it worth for me to explore if it possible to do other way around or we should work on with the solution from this merge?

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
642

@fhah Just to make sure we are on the same line - for me uniform and invariant are just synonyms, isn't it so?

llvm/include/llvm/Transforms/Utils/LoopUtils.h
402

What do you think if I make this function a public member of class ScalarEvolution? Or is there a better place for it?

llvm/lib/Analysis/IVDescriptors.cpp
301

I checked and only LoopInterchangeLegality is using this function and it is not affected by stores. Anyway, I can add a parameter to RecurrenceDescriptor::isReductionPHI or a member to RecurrenceDescriptor allowing or not to handle stores. What do you think about it?

The comment is to be updated, yes.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
900

Loads from or stores to the same address in the loop? I'm sorry could you clarify what the problem is. As it is I don't understand the message.

Hi @igor.kirillov, is it also possible to get this working for ordered reductions, i.e.

float sum = 0;
for(i=0..N) {
  sum += src[i];
  dst[42] = sum;
}

when building with -O3? I think it might mean updating checkOrderedReductions to look through the store. If it looks too difficult to do as part of this patch we can always follow-up with a patch later.