The cost of the InsertSubvector shuffle kind cost is not complete and
may end up with just extracts + inserts costs in many cases. Added
a workaround to represent it as a generic PermuteSingleSrc, which is
still pessimistic but better than InsertSubvector.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
LGTM - I'm wondering if SK_InsertSubvector is actually usable in most cases as a shuffle kind or whether we're better off replacing it with something like SK_WidenSubvector + SK_ConcatSubvectors patterns?
Not sure if I understand it correctly but I think SK_InsertSubvector is better here, just need a proper cost for it.
My main annoyance with SK_InsertSubvector is that it doesn't match an actual pattern we can create with a single shufflevector instruction.
Thanks for the reproducer. I don't think this patch causes it directly (it just adjusts cost model, nothing else), most probably it reveals some deeper issue in the compiler. Will check what's happening.
Investigated it, actually hangs in X86 DAG->DAG Instruction Selection on function (_ZN12SpinningCube6UpdateEv) even without SLP vectorizer with the attached reproducer (hangs with 12.0.0 and trunk, 11.0.1 does not recognize poison but passes if I replace poison by undef. 12.0.0 and trunk still hangs even in this case). Command to reproduce is llc reduced.ll -o /dev/null. Looks like the bug was introduced somewhat between llvm 11 and 12. Hangs in the loop in lines 1523-1598, lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Looks like endless X86 shuffle combine loop
Legalizing: t89: v4f32 = X86ISD::SHUFP t87, t80, TargetConstant:i8<36> Legal node: nothing to do Combining: t89: v4f32 = X86ISD::SHUFP t87, t80, TargetConstant:i8<36> Creating fp constant: t87: v4f32 = BUILD_VECTOR ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00> Creating fp constant: t87: v4f32 = BUILD_VECTOR ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00> Creating new node: t1580: v4f32 = undef Creating fp constant: t87: v4f32 = BUILD_VECTOR ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00> Creating new node: t1581: v4f32 = X86ISD::UNPCKL t68, undef:v4f32 Replacing.2 t1578: v4f32 = X86ISD::SHUFP t87, t68, TargetConstant:i8<-44> With: t1581: v4f32 = X86ISD::UNPCKL t68, undef:v4f32 Legalizing: t1581: v4f32 = X86ISD::UNPCKL t68, undef:v4f32 Legal node: nothing to do Combining: t1581: v4f32 = X86ISD::UNPCKL t68, undef:v4f32 Creating fp constant: t87: v4f32 = BUILD_VECTOR ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00> Creating constant: t1582: i8 = TargetConstant<-44> Creating new node: t1583: v4f32 = X86ISD::SHUFP t87, t68, TargetConstant:i8<-44> ... into: t1583: v4f32 = X86ISD::SHUFP t87, t68, TargetConstant:i8<-44> Legalizing: t1583: v4f32 = X86ISD::SHUFP t87, t68, TargetConstant:i8<-44> Legal node: nothing to do Combining: t1583: v4f32 = X86ISD::SHUFP t87, t68, TargetConstant:i8<-44> Creating new node: t1584: v4f32 = undef Creating fp constant: t87: v4f32 = BUILD_VECTOR ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00> Legalizing: t1582: i8 = TargetConstant<-44> Combining: t1582: i8 = TargetConstant<-44> Legalizing: t89: v4f32 = X86ISD::SHUFP t87, t80, TargetConstant:i8<36> Legal node: nothing to do Combining: t89: v4f32 = X86ISD::SHUFP t87, t80, TargetConstant:i8<36> <here we go again>