Page MenuHomePhabricator

[SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
Needs ReviewPublic

Authored by ABataev on Feb 7 2017, 7:41 AM.

Details

Summary

Initially SLP vectorizer replaced all going-to-be-vectorized
instructions with Undef values. It may break ScalarEvaluation and may
cause a crash.
Reworked SLP vectorizer so that it does not replace vectorized
instructions by UndefValue anymore. Instead vectorized instructions are
marked for deletion inside if BoUpSLP class and deleted upon class
destruction.

Diff Detail

Event Timeline

ABataev created this revision.Feb 7 2017, 7:41 AM
mkuper edited edge metadata.Feb 7 2017, 1:24 PM

Test case?

Also, this looks a bit weird. Could you explain what exactly is going wrong?
I'm asking because this somehow seems like it's the wrong granularity for this - after any change, you invalidate the disposition of the innermost loop that contains the basic block in which the root/seed lives. I'm not sure why this is the right thing to do.

majnemer added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
5221–5229

In general, we don't use this sort of overloading in LLVM...

5229

Probably should mark this as explicit.

sanjoy added a comment.Feb 8 2017, 5:29 PM

Typically SCEV dispositions need to be changed if a loop variant value is made loop invariant (i.e. a LICM like transform, perhaps via makeLoopInvariant or something like that). As @mkuper said on the email thread, it is difficult to diagnose what's going on unless we have a reproducible test case, so I'd focus on that as a first priority.

ABataev updated this revision to Diff 87795.Feb 9 2017, 5:24 AM

Address comments + test

Added the test that crashes for me without the changes in this patch

So, do we know what exactly is going wrong here? Which disposition we're changing and why?

Also, any chance to (bugpoint-?)reduce the test case, or is this as small as it gets?

Also, any chance to (bugpoint-?)reduce the test case, or is this as small as it gets?

That's the smallest test case that crashes for me, could not reduce it more.

ABataev added a comment.EditedFeb 14 2017, 5:44 AM

So, do we know what exactly is going wrong here? Which disposition we're changing and why?

We're not changing the disposition but seems to me some of the SCEV nodes are returned to cache but without clearing of their dispositions. After that, during some SCEV internal optimizations, SCEV tries to simplify some of SCEV nodes, which are LoopInvariant. During this reassociation, it creates a new SCEV node, which is not actually created but is taken from the cache of previously used SCEV nodes. But this new SCEV node for some reasons has the disposition LoopVariant (it was used before, then freed, but not tracked that the disposition for this node should be cleared). And we have a situation where from 2 LoopInvariant nodes we're getting 1 LoopVariant node, while the code expects only LoopInvariants, of course.
Seems to me, we're breaking SCEV during vectorization of select instructions, will try to check it somehow

ABataev added a comment.EditedFeb 14 2017, 9:39 AM

Here is what happens with this test:
Initially we have 2 SCEVs (-1 * %cond) and (-1 *%cond14), they are combined into (-1 * (%cond + %cond14))<nsw>. All these SCEVs have LoopVariant dispositions. After vectorization scalars %cond and %cond14 are replaced by undef. SCEVs for these undefs have the same addresses, just like for original instructions (they are reused), but their dispositions are cleared and recalculated to LoopInvariant. But after combining of these SCEVs we get SCEV, previously used as (-1 * (%cond + %cond14))<nsw> (because FoldingSetNodeID, used as key, is rebuilt with the same args - identifier (scMulExpr and 2 addresses of the SCEVs, that were LoopVariant, but now are LoopInvariant)). But we did not clear the loop disposition for this SCEV, because ScalarEvolution is unable to do this. So, for two SCEVs with LoopInvariant dispositions, we're getting the resulting SCEV (-1 * (undef + undef))<nsw>, which has disposition LoopVariant. That's why I believe we should clear loop dispositions after each successful attempt of vectorization.

I believe this is exactly the situation @sanjoy described: LoopVariants are transformed to LoopInvariants.

You can't replace the operands of an instruction in a way that it computes a new value while still preserving SCEV, since SCEV keys off of the Instruction *. Loop disposition is only one of the things that this breaks. e.g. if you have 1 + %KnownPositive and you replace the %KnownPositive operand with %MayBeNegative ("in place"), that breaks the getRange cache.

Can the loop vectorizer do without replacing %cond and %cond14 to undef?

ABataev updated this revision to Diff 88731.Feb 16 2017, 7:27 AM

Reworked SLP vectorizer to not replace instructions to be deleted with UndefValue.

ABataev edited the summary of this revision. (Show Details)Feb 16 2017, 7:29 AM
hans added a subscriber: hans.Feb 22 2017, 2:07 PM

Ping, everyone :-)

This more or less makes sense to me, although I'm not sure it's the right solution.
One other option would be to actually delete things on the fly, and solve the AliasCache option in a different way. (Why does the SLP vectorizer even have its own alias cache?)
Yet another, probably more realistic, option would be to keep the WeakVHs to solve the reallocation issue, but, actually "delete" everything we need to delete, instead of leaving hanging undefs.

Regardless, I believe Hans wants to merge this into 4.0 to fix PR31847. This doesn't look at all safe, and I don't feel confident enough in this to endorse it for the branch, without baking in-tree for a while. And another pair of eyes would be great.

Hal?

lib/Transforms/Vectorize/SLPVectorizer.cpp
1516–1517

Is this change (and the others turning instructions into insertelementinst) related?
If not, can they go into a separate patch?

4319

This comment no longer makes sense.
Does the whole check?

6406

Is there any interaction with the "extra values" stuff?

6984

Again, this is an unrelated change, right?

ABataev updated this revision to Diff 90319.Mar 2 2017, 5:07 AM
ABataev marked an inline comment as done.
ABataev edited the summary of this revision. (Show Details)

Address Michael's comments

ABataev added inline comments.Mar 6 2017, 11:55 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
4319

I'll fix the comment, but the check itself is important.

6406

I reworked this part. Extra args are not removed, but the users of extra args are stored as ReductionOps and will be automatically marked for deletion

anemet added a subscriber: anemet.Apr 16 2017, 11:56 AM
Gerolf added a subscriber: Gerolf.Apr 16 2017, 2:37 PM

Hi,

I would like to understand this better. Could somebody explain what assumptions SCEV makes about its clients? Which assumption(s) is broken by SLP? It seems to me that this issue potentially touches fundamental design decisions/questions and I don't see any verifiers in place.

Thanks
Gerolf

test/Transforms/SLPVectorizer/X86/crash-SCEV.ll
1 ↗(On Diff #90319)

What is this supposed to test? I'm wondering if there can be a smaller test for what this one is supposed to do. As is at the least it looks hard to maintain.

Hi,

I would like to understand this better. Could somebody explain what assumptions SCEV makes about its clients? Which assumption(s) is broken by SLP? It seems to me that this issue potentially touches fundamental design decisions/questions and I don't see any verifiers in place.

Thanks
Gerolf

Hi Gerolf, sorry for the delay with the answer.
I suppose Sanjoy answered your question already:

You can't replace the operands of an instruction in a way that it computes a new value while still preserving SCEV, since SCEV keys off of the Instruction *. Loop disposition is only one of the things that this breaks. e.g. if you have 1 + %KnownPositive and you replace the %KnownPositive operand with %MayBeNegative ("in place"), that breaks the getRange cache.

Can the loop vectorizer do without replacing %cond and %cond14 to undef?

So, I think you can't replace instructions with one disposition by the instruction with another disposition. You can just remove it.

test/Transforms/SLPVectorizer/X86/crash-SCEV.ll
1 ↗(On Diff #90319)

This is a reproducer, that currently crashes the compiler for sure. I tried to make it as small as possible, but could not reduce it anymore.

hans added a comment.Aug 3 2017, 9:04 AM

What's the status here?

In D29641#830653, @hans wrote:

What's the status here?

It is still under review

RKSimon edited edge metadata.Sep 10 2017, 3:24 AM

What state is this in? PR31847 was cleared as a 5.0.0 blocker, so it probably won't need to be merged for 5.0.1 either

test/Transforms/SLPVectorizer/X86/crash-SCEV.ll
1 ↗(On Diff #90319)

Possibly rename the test file pr31847.ll? Test called 'crash' etc. aren't that useful imo.

ABataev updated this revision to Diff 114895.Sep 12 2017, 1:34 PM

Update after review

RKSimon added inline comments.Oct 29 2017, 5:08 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
2076

What do we gain from using std::for_each instead of basic range for loops here and in ~BoUpSLP()?

5757

Would it be better to move the return into each min/max case, drop the RK_None case and just have the llvm_unreachable at the end of the function?

Similar question for initReductionOps / addReductionOps below

5981–5984

Remove the break.

5983

Don't have a break after return - some compilers will shout at you.

ABataev marked 2 inline comments as done.Oct 31 2017, 9:18 AM
ABataev added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
2076

Nothing special, chnged it.

5757

Ok, reworked.

ABataev updated this revision to Diff 120996.Oct 31 2017, 9:25 AM

Update after review

Herald added a project: Restricted Project. · View Herald TranscriptWed, Sep 4, 1:10 PM
RKSimon added inline comments.Thu, Sep 5, 8:29 AM
include/llvm/Transforms/Vectorize/SLPVectorizer.h
27

Do we need this include any more? Or can it be moved to SLPVectorizer.cpp?

lib/Transforms/Vectorize/SLPVectorizer.cpp
5875

Why is this change necessary?

5895

Why is this change necessary?