This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Workaround for InsertSubVector cost.
ClosedPublic

Authored by ABataev on Jul 12 2021, 11:06 AM.

Details

Summary

The cost of the InsertSubvector shuffle kind cost is not complete and
may end up with just extracts + inserts costs in many cases. Added
a workaround to represent it as a generic PermuteSingleSrc, which is
still pessimistic but better than InsertSubvector.

Diff Detail

Event Timeline

ABataev created this revision.Jul 12 2021, 11:06 AM
ABataev requested review of this revision.Jul 12 2021, 11:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2021, 11:06 AM
RKSimon accepted this revision.Jul 13 2021, 2:08 PM

LGTM - I'm wondering if SK_InsertSubvector is actually usable in most cases as a shuffle kind or whether we're better off replacing it with something like SK_WidenSubvector + SK_ConcatSubvectors patterns?

This revision is now accepted and ready to land.Jul 13 2021, 2:08 PM

LGTM - I'm wondering if SK_InsertSubvector is actually usable in most cases as a shuffle kind or whether we're better off replacing it with something like SK_WidenSubvector + SK_ConcatSubvectors patterns?

Not sure about replacing, but there's definitively room for SK_WidenSubvector.

LGTM - I'm wondering if SK_InsertSubvector is actually usable in most cases as a shuffle kind or whether we're better off replacing it with something like SK_WidenSubvector + SK_ConcatSubvectors patterns?

Not sure if I understand it correctly but I think SK_InsertSubvector is better here, just need a proper cost for it.

LGTM - I'm wondering if SK_InsertSubvector is actually usable in most cases as a shuffle kind or whether we're better off replacing it with something like SK_WidenSubvector + SK_ConcatSubvectors patterns?

Not sure if I understand it correctly but I think SK_InsertSubvector is better here, just need a proper cost for it.

My main annoyance with SK_InsertSubvector is that it doesn't match an actual pattern we can create with a single shufflevector instruction.

This revision was landed with ongoing or failed builds.Jul 14 2021, 8:09 AM
This revision was automatically updated to reflect the committed changes.

This is causing hangs
$ clang++ -O2 -c reduced2.ll -o /dev/null

This is causing hangs
$ clang++ -O2 -c reduced2.ll -o /dev/null

Thanks for the reproducer. I don't think this patch causes it directly (it just adjusts cost model, nothing else), most probably it reveals some deeper issue in the compiler. Will check what's happening.

ABataev added a comment.EditedJul 15 2021, 1:58 PM

This is causing hangs
$ clang++ -O2 -c reduced2.ll -o /dev/null

Investigated it, actually hangs in X86 DAG->DAG Instruction Selection on function (_ZN12SpinningCube6UpdateEv) even without SLP vectorizer with the attached reproducer (hangs with 12.0.0 and trunk, 11.0.1 does not recognize poison but passes if I replace poison by undef. 12.0.0 and trunk still hangs even in this case). Command to reproduce is llc reduced.ll -o /dev/null. Looks like the bug was introduced somewhat between llvm 11 and 12. Hangs in the loop in lines 1523-1598, lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This is causing hangs
$ clang++ -O2 -c reduced2.ll -o /dev/null

Investigated it, actually hangs in X86 DAG->DAG Instruction Selection on function (_ZN12SpinningCube6UpdateEv) even without SLP vectorizer with the attached reproducer (hangs with 12.0.0 and trunk, 11.0.1 does not recognize poison but passes if I replace poison by undef. 12.0.0 and trunk still hangs even in this case). Command to reproduce is llc reduced.ll -o /dev/null. Looks like the bug was introduced somewhat between llvm 11 and 12. Hangs in the loop in lines 1523-1598, lib/CodeGen/SelectionDAG/DAGCombiner.cpp

Looks like endless X86 shuffle combine loop

Legalizing: t89: v4f32 = X86ISD::SHUFP t87, t80, TargetConstant:i8<36>
Legal node: nothing to do

Combining: t89: v4f32 = X86ISD::SHUFP t87, t80, TargetConstant:i8<36>
Creating fp constant: t87: v4f32 = BUILD_VECTOR ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>
Creating fp constant: t87: v4f32 = BUILD_VECTOR ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>
Creating new node: t1580: v4f32 = undef
Creating fp constant: t87: v4f32 = BUILD_VECTOR ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>
Creating new node: t1581: v4f32 = X86ISD::UNPCKL t68, undef:v4f32

Replacing.2 t1578: v4f32 = X86ISD::SHUFP t87, t68, TargetConstant:i8<-44>

With: t1581: v4f32 = X86ISD::UNPCKL t68, undef:v4f32


Legalizing: t1581: v4f32 = X86ISD::UNPCKL t68, undef:v4f32
Legal node: nothing to do

Combining: t1581: v4f32 = X86ISD::UNPCKL t68, undef:v4f32
Creating fp constant: t87: v4f32 = BUILD_VECTOR ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>
Creating constant: t1582: i8 = TargetConstant<-44>
Creating new node: t1583: v4f32 = X86ISD::SHUFP t87, t68, TargetConstant:i8<-44>
 ... into: t1583: v4f32 = X86ISD::SHUFP t87, t68, TargetConstant:i8<-44>

Legalizing: t1583: v4f32 = X86ISD::SHUFP t87, t68, TargetConstant:i8<-44>
Legal node: nothing to do

Combining: t1583: v4f32 = X86ISD::SHUFP t87, t68, TargetConstant:i8<-44>
Creating new node: t1584: v4f32 = undef
Creating fp constant: t87: v4f32 = BUILD_VECTOR ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<0.000000e+00>

Legalizing: t1582: i8 = TargetConstant<-44>

Combining: t1582: i8 = TargetConstant<-44>

Legalizing: t89: v4f32 = X86ISD::SHUFP t87, t80, TargetConstant:i8<36>
Legal node: nothing to do

Combining: t89: v4f32 = X86ISD::SHUFP t87, t80, TargetConstant:i8<36>
<here we go again>

Thanks for the reduced test case - looking at this now

Thanks for the reduced test case - looking at this now

Hi @RKSimon, thanks for the quick fix, going to reland the patch again.

+1, thanks for the investigation and fix!