If the ExtractElement instructions can be optimized out during the
vectorization and we need to reshuffle the parent vector, this
ShuffleInstruction may be inserted in the wrong place causing compiler
to produce incorrect code.
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 20802 Build 20802: arc lint + arc unit
Event Timeline
Someone more familiar with SLP should take a look, but this seems ok to me.
The testcase can be simplified to a single BB:
define void @f1(<2 x i16> %x, i16* %a) { %t2 = extractelement <2 x i16> %x, i32 0 %t3 = extractelement <2 x i16> %x, i32 1 %ptr0 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 0 %ptr1 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 1 %ptr2 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 2 %ptr3 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 3 store i16 %t2, i16* %a store i16 %t2, i16* %ptr0 store i16 %t3, i16* %ptr1 store i16 %t3, i16* %ptr2 store i16 %t2, i16* %ptr3 ret void }
llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
3116 ↗ | (On Diff #158246) | I don't think std::next(I->getIterator()) is a valid insertion point in all cases. V could be a phi, or an invoke, or some exception-handling instruction. |
llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
3119 ↗ | (On Diff #158246) | Also, you can't just stick the instruction at the beginning of the function in this case; it's illegal to hoist a reference to a canTrap() constant. |
llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
3116 ↗ | (On Diff #158246) | What if something like this: auto It = std::next(I->getIterator()); for (auto E = I->getParent()->end(); It != E; ++It) { if (!isa<PHINode>(&*It)) { if (It->isEHPad()) ++It; break; } } Builder.SetInsertPoint(I->getParent(), It); ? |
llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
3116 ↗ | (On Diff #158246) | That doesn't really help... among other things, there are some blocks which have no legal insertion points. Maybe there's some constraint on SLP vectorization which would help choose a better insertion point? Why can't we just use VL0 as the insertion point? |
llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
3116 ↗ | (On Diff #158246) | Yes, maybe that's the best option here. Will fix this. |