Initially SLP vectorizer replaced all going-to-be-vectorized
instructions with Undef values. It may break ScalarEvaluation and may
cause a crash.
Reworked SLP vectorizer so that it does not replace vectorized
instructions by UndefValue anymore. Instead vectorized instructions are
marked for deletion inside if BoUpSLP class and deleted upon class
destruction.
Details
- Reviewers
- mzolotukhin - mkuper - hfinkel - RKSimon - davide - spatel 
- Commits
- rG8b1eeafb9133: [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) &&…
 rL373166: [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) &&…
 rG6a278d9073bd: [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) &&…
 rL372626: [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) &&…
Diff Detail
- Build Status
- Buildable 11658 - Build 11658: arc lint + arc unit 
Event Timeline
Test case?
Also, this looks a bit weird. Could you explain what exactly is going wrong?
I'm asking because this somehow seems like it's the wrong granularity for this - after any change, you invalidate the disposition of the innermost loop that contains the basic block in which the root/seed lives. I'm not sure why this is the right thing to do.
Typically SCEV dispositions need to be changed if a loop variant value is made loop invariant (i.e. a LICM like transform, perhaps via makeLoopInvariant or something like that). As @mkuper said on the email thread, it is difficult to diagnose what's going on unless we have a reproducible test case, so I'd focus on that as a first priority.
So, do we know what exactly is going wrong here? Which disposition we're changing and why?
Also, any chance to (bugpoint-?)reduce the test case, or is this as small as it gets?
We're not changing the disposition but seems to me some of the SCEV nodes are returned to cache but without clearing of their dispositions. After that, during some SCEV internal optimizations, SCEV tries to simplify some of SCEV nodes, which are LoopInvariant. During this reassociation, it creates a new SCEV node, which is not actually created but is taken from the cache of previously used SCEV nodes. But this new SCEV node for some reasons has the disposition LoopVariant (it was used before, then freed, but not tracked that the disposition for this node should be cleared). And we have a situation where from 2 LoopInvariant nodes we're getting 1 LoopVariant node, while the code expects only LoopInvariants, of course.
Seems to me, we're breaking SCEV during vectorization of select instructions, will try to check it somehow
Here is what happens with this test:
Initially we have 2 SCEVs (-1 * %cond) and (-1 *%cond14), they are combined into (-1 * (%cond + %cond14))<nsw>. All these SCEVs have LoopVariant dispositions. After vectorization scalars %cond and %cond14 are replaced by undef. SCEVs for these undefs have the same addresses, just like for original instructions (they are reused), but their dispositions are cleared and recalculated to LoopInvariant. But after combining of these SCEVs we get SCEV, previously used as (-1 * (%cond + %cond14))<nsw> (because FoldingSetNodeID, used as key, is rebuilt with the same args - identifier (scMulExpr and 2 addresses of the SCEVs, that were LoopVariant, but now are LoopInvariant)). But we did not clear the loop disposition for this SCEV, because ScalarEvolution is unable to do this. So, for two SCEVs with LoopInvariant dispositions, we're getting the resulting SCEV (-1 * (undef + undef))<nsw>, which has disposition LoopVariant. That's why I believe we should clear loop dispositions after each successful attempt of vectorization.
I believe this is exactly the situation @sanjoy described: LoopVariants are transformed to LoopInvariants.
You can't replace the operands of an instruction in a way that it computes a new value while still preserving SCEV, since SCEV keys off of the Instruction *. Loop disposition is only one of the things that this breaks. e.g. if you have 1 + %KnownPositive and you replace the %KnownPositive operand with %MayBeNegative ("in place"), that breaks the getRange cache.
Can the loop vectorizer do without replacing %cond and %cond14 to undef?
This more or less makes sense to me, although I'm not sure it's the right solution.
One other option would be to actually delete things on the fly, and solve the AliasCache option in a different way. (Why does the SLP vectorizer even have its own alias cache?)
Yet another, probably more realistic, option would be to keep the WeakVHs to solve the reallocation issue, but, actually "delete" everything we need to delete, instead of leaving hanging undefs.
Regardless, I believe Hans wants to merge this into 4.0 to fix PR31847. This doesn't look at all safe, and I don't feel confident enough in this to endorse it for the branch, without baking in-tree for a while. And another pair of eyes would be great.
Hal?
| lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
|---|---|---|
| 851–852 | Is this change (and the others turning instructions into insertelementinst) related? | |
| 3319 | This comment no longer makes sense. | |
| 5291 | Is there any interaction with the "extra values" stuff? | |
| 5863 | Again, this is an unrelated change, right? | |
Hi,
I would like to understand this better. Could somebody explain what assumptions SCEV makes about its clients? Which assumption(s) is broken by SLP? It seems to me that this issue potentially touches fundamental design decisions/questions and I don't see any verifiers in place.
Thanks
Gerolf
| test/Transforms/SLPVectorizer/X86/crash-SCEV.ll | ||
|---|---|---|
| 1 ↗ | (On Diff #90319) | What is this supposed to test? I'm wondering if there can be a smaller test for what this one is supposed to do. As is at the least it looks hard to maintain. | 
Hi Gerolf, sorry for the delay with the answer.
I suppose Sanjoy answered your question already:
You can't replace the operands of an instruction in a way that it computes a new value while still preserving SCEV, since SCEV keys off of the Instruction *. Loop disposition is only one of the things that this breaks. e.g. if you have 1 + %KnownPositive and you replace the %KnownPositive operand with %MayBeNegative ("in place"), that breaks the getRange cache.
Can the loop vectorizer do without replacing %cond and %cond14 to undef?
So, I think you can't replace instructions with one disposition by the instruction with another disposition. You can just remove it.
| test/Transforms/SLPVectorizer/X86/crash-SCEV.ll | ||
|---|---|---|
| 1 ↗ | (On Diff #90319) | This is a reproducer, that currently crashes the compiler for sure. I tried to make it as small as possible, but could not reduce it anymore. | 
What state is this in? PR31847 was cleared as a 5.0.0 blocker, so it probably won't need to be merged for 5.0.1 either
| test/Transforms/SLPVectorizer/X86/crash-SCEV.ll | ||
|---|---|---|
| 1 ↗ | (On Diff #90319) | Possibly rename the test file pr31847.ll? Test called 'crash' etc. aren't that useful imo. | 
| lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
|---|---|---|
| 1338 | What do we gain from using std::for_each instead of basic range for loops here and in ~BoUpSLP()? | |
| 4740 | Would it be better to move the return into each min/max case, drop the RK_None case and just have the llvm_unreachable at the end of the function? Similar question for initReductionOps / addReductionOps below | |
| 4966 | Don't have a break after return - some compilers will shout at you. | |
| 4969–4993 | Remove the break. | |
| lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
|---|---|---|
| 4878 | Restored original code | |
Do we need this include any more? Or can it be moved to SLPVectorizer.cpp?