This is an archive of the discontinued LLVM Phabricator instance.

[TTI] Add basic SK_InsertSubvector shuffle mask recognition
ClosedPublic

Authored by RKSimon on Jul 31 2021, 11:47 AM.

Details

Summary

This patch adds an initial ShuffleVectorInst::isInsertSubvectorMask helper to recognize 2-op shuffles where the lowest elements of one of the sources are being inserted into the "in-place" other operand, this includes "concat_vectors" patterns as can be seen in the Arm shuffle cost changes. This also helped fix a x86 issue with irregular/length-changing SK_InsertSubvector costs - I'm hoping this will help with D107188

This doesn't currently attempt to work with 1-op shuffles that could either be a "widening" shuffle or a self-insertion. The widening case probably needs addressing separately as in most cases this should have a zero cost. The self-insertion case is trickier, but we currently always match this with the existing SK_PermuteSingleSrc logic.

Masks with a high number of undef elts will still struggle to match optimal subvector widths - its currently bounded by minimum-width possible insertion, whilst some cases would benefit from wider subvectors.

Diff Detail

Event Timeline

RKSimon created this revision.Jul 31 2021, 11:47 AM
RKSimon requested review of this revision.Jul 31 2021, 11:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2021, 11:47 AM
ABataev added inline comments.Jul 31 2021, 12:02 PM
llvm/test/Analysis/CostModel/X86/shuffle-insert_subvector.ll
323

-1?

RKSimon added inline comments.Jul 31 2021, 12:43 PM
llvm/test/Analysis/CostModel/X86/shuffle-insert_subvector.ll
323

This is an example of subvector widening that I was wondering whether we should just set cost = 0 - at the moment the shuffle mask matcher doesn't recognise it at all (hence the -1).

ABataev added inline comments.Jul 31 2021, 12:46 PM
llvm/test/Analysis/CostModel/X86/shuffle-insert_subvector.ll
323

Yes, I think better to make it 0.

RKSimon updated this revision to Diff 363318.Jul 31 2021, 1:17 PM
RKSimon added a reviewer: foad.

Treat a 'subvector widening' shuffle as free - this introduces what I think are benign changes to a amdgpu test

ABataev accepted this revision.Jul 31 2021, 1:34 PM

LG with a nit

llvm/lib/IR/Instructions.cpp
2282

I

This revision is now accepted and ready to land.Jul 31 2021, 1:34 PM
foad added a subscriber: dfukalov.Aug 2 2021, 1:50 AM

Treat a 'subvector widening' shuffle as free - this introduces what I think are benign changes to a amdgpu test

Adding @dfukalov for awareness.

Treat a 'subvector widening' shuffle as free - this introduces what I think are benign changes to a amdgpu test

Adding @dfukalov for awareness.

BTW - I'm intending to commit the widening cost = 0 patch separately so if it does cause any problems we can address them without affecting the SK_InsertSubvector recogniser. If it proves necessary we can add a SK_WidenSubvector class to handle those cases.

This revision was landed with ongoing or failed builds.Aug 2 2021, 3:25 AM
This revision was automatically updated to reflect the committed changes.

Just a FYI.... I have ported an ARM test (that was changed by this patch) to AArch64 in rG46a861af3d1c.

Just a FYI.... I have ported an ARM test (that was changed by this patch) to AArch64 in rG46a861af3d1c.

Thanks - now that we at least have basic recognition of the insert_subvector pattern in analysis I'm hoping we can actually have useful test coverage!

Note that we still could use TTI::SK_WidenVector to use it instead of TTI::SK_InsertSubvector when upper bits don't need to be preserved.