Continuation of the https://reviews.llvm.org/D149071
Using unpack for splitting and using double-bitwidth shifts may improve performance according to https://uica.uops.info tests.
- No cross-lane shuffles
- No dirtying double-width registers
Paths
| Differential D149463
[X86] LowerFunnelShift: prefer unpack-based algorithm Needs ReviewPublic Authored by Nekotekina on Apr 28 2023, 9:54 AM.
Details Summary Continuation of the https://reviews.llvm.org/D149071 Using unpack for splitting and using double-bitwidth shifts may improve performance according to https://uica.uops.info tests.
Diff Detail
Revision Contents
Diff 517965 llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/vector-fshl-128.ll
llvm/test/CodeGen/X86/vector-fshl-256.ll
llvm/test/CodeGen/X86/vector-fshl-512.ll
llvm/test/CodeGen/X86/vector-fshr-128.ll
llvm/test/CodeGen/X86/vector-fshr-256.ll
llvm/test/CodeGen/X86/vector-fshr-512.ll
|