This is an archive of the discontinued LLVM Phabricator instance.

[TTI][X86] Add SSE2 sub-128bit vXi16/32 and v2i64 stride 2 interleaved load costs
ClosedPublic

Authored by RKSimon on Oct 16 2021, 6:59 AM.

Details

Summary

These cases uses the same codegen as AVX2 (pshuflw/pshufd) for the sub-128bit vector deinterleaving, and unpcklqdq for v2i64.

It's going to take a while to add full interleaved cost coverage, but since these are the same for SSE2 -> AVX2 it should be an easy win.

Fixes PR47437

Diff Detail

Event Timeline

RKSimon created this revision.Oct 16 2021, 6:59 AM
RKSimon requested review of this revision.Oct 16 2021, 6:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 16 2021, 6:59 AM
lebedev.ri added inline comments.Oct 16 2021, 7:24 AM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
5224

Looking at llvm-project/llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-2.ll,
VF4 codegen is really different between SSE2 and AVX2.

5230

Looking at llvm-project/llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-2.ll,
@load_i32_stride2_vf4 also seems to match.

RKSimon added inline comments.Oct 16 2021, 8:11 AM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
5224

nice catch!

5230

every little helps :)

RKSimon updated this revision to Diff 380186.Oct 16 2021, 8:14 AM

Address review comment

This revision is now accepted and ready to land.Oct 16 2021, 8:18 AM
This revision was landed with ongoing or failed builds.Oct 16 2021, 8:22 AM
This revision was automatically updated to reflect the committed changes.