Similar to what we've done for other ops, this patch widens VPTERNLOG to a 512-bit op for non-VLX targets.
Fixes regressions in D113192
Paths
| Differential D113827
[X86] Widen 128/256-bit VPTERNLOG patterns to 512-bit on non-VLX targets ClosedPublic Authored by RKSimon on Nov 13 2021, 8:18 AM.
Details Summary Similar to what we've done for other ops, this patch widens VPTERNLOG to a 512-bit op for non-VLX targets. Fixes regressions in D113192
Diff Detail
Event TimelineComment Actions
I think the title is misleading. We widen to 512-bit VPTERNLOG rather than use 128/256-bit. Besides, should be better to mention the broadcastable work there?
RKSimon retitled this revision from [X86] Enable use of 128/256-bit VPTERNLOG on non-VLX targets to [X86] Widen 128/256-bit VPTERNLOG patterns to 512-bit on non-VLX targets.Nov 14 2021, 2:38 AM
This revision is now accepted and ready to land.Nov 14 2021, 4:59 AM Closed by commit rGf4143ffed76e: [X86] Widen 128/256-bit VPTERNLOG patterns to 512-bit on non-VLX targets (authored by RKSimon). · Explain WhyNov 14 2021, 5:41 AM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 387026 llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/avx512fp16-arith.ll
llvm/test/CodeGen/X86/combine-bitselect.ll
llvm/test/CodeGen/X86/min-legal-vector-width.ll
llvm/test/CodeGen/X86/vector-fshl-128.ll
llvm/test/CodeGen/X86/vector-fshl-256.ll
llvm/test/CodeGen/X86/vector-fshl-512.ll
llvm/test/CodeGen/X86/vector-fshl-rot-128.ll
llvm/test/CodeGen/X86/vector-fshl-rot-256.ll
llvm/test/CodeGen/X86/vector-fshl-rot-512.ll
llvm/test/CodeGen/X86/vector-fshr-128.ll
llvm/test/CodeGen/X86/vector-fshr-256.ll
llvm/test/CodeGen/X86/vector-fshr-512.ll
llvm/test/CodeGen/X86/vector-fshr-rot-128.ll
llvm/test/CodeGen/X86/vector-fshr-rot-256.ll
llvm/test/CodeGen/X86/vector-fshr-rot-512.ll
llvm/test/CodeGen/X86/vector-rotate-128.ll
llvm/test/CodeGen/X86/vector-rotate-256.ll
llvm/test/CodeGen/X86/vector-rotate-512.ll
|
Why do we need to do it? Do we have new VT type other than i32/i64 now? Or the previous code can handle them already?