This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
CMakeLists.txt
1
X86.h
25
X86FixupVectorConstants.cpp
-
X86InstrFoldTables.h
2
X86InstrFoldTables.cpp
-
X86TargetMachine.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
avx512-calling-conv.ll
-
avx512-ext.ll
-
avx512-logic.ll
-
avx512fp16-cvt-ph-w-vl-intrinsics.ll
-
avx512vl-logic.ll
-
bitcast-vector-bool.ll
-
combine-and.ll
-
combine-sdiv.ll
-
dpbusd_const.ll
-
dpbusd_i4.ll
-
gfni-funnel-shifts.ll
-
gfni-rotates.ll
-
gfni-shifts.ll
-
horizontal-reduce-smax.ll
-
horizontal-reduce-smin.ll
-
i64-to-float.ll
-
icmp-pow2-diff.ll
-
midpoint-int-vec-128.ll
-
midpoint-int-vec-256.ll
-
midpoint-int-vec-512.ll
-
min-legal-vector-width.ll
-
movmsk-cmp.ll
-
opt-pipeline.ll
-
paddus.ll
-
prefer-avx256-lzcnt.ll
-
prefer-avx256-mulo.ll
-
prefer-avx256-shift.ll
-
prefer-avx256-trunc.ll
-
prefer-avx256-wide-mul.ll
-
psubus.ll
-
rotate-extract-vector.ll
-
rotate_vec.ll
-
sadd_sat_vec.ll
-
srem-seteq-vec-nonsplat.ll
-
ssub_sat_vec.ll
-
usub_sat_vec.ll
-
vec-strict-inttofp-128-fp16.ll
-
vec-strict-inttofp-256-fp16.ll
-
vec-strict-inttofp-256.ll
-
vec-strict-inttofp-512-fp16.ll
-
vector-fshl-128.ll
-
vector-fshl-256.ll
-
vector-fshl-512.ll
-
vector-fshl-rot-128.ll
-
vector-fshl-rot-256.ll
-
vector-fshl-rot-512.ll
-
vector-fshr-128.ll
-
vector-fshr-256.ll
-
vector-fshr-512.ll
-
vector-fshr-rot-128.ll
-
vector-fshr-rot-256.ll
-
vector-fshr-rot-512.ll
-
vector-idiv-sdiv-512.ll
-
vector-idiv-udiv-512.ll
-
vector-lzcnt-128.ll
-
vector-lzcnt-256.ll
-
vector-lzcnt-512.ll
-
vector-mul.ll
-
vector-pack-128.ll
-
vector-pack-256.ll
-
vector-pack-512.ll
-
vector-pcmp.ll
-
vector-reduce-add-mask.ll
-
vector-reduce-or-bool.ll
-
vector-reduce-or-cmp.ll
-
vector-reduce-smax.ll
-
vector-reduce-smin.ll
-
vector-rotate-128.ll
-
vector-rotate-256.ll
-
vector-rotate-512.ll
-
vector-shift-ashr-128.ll
-
vector-shift-ashr-256.ll
-
vector-shift-ashr-512.ll
-
vector-shift-ashr-sub128.ll
-
vector-shift-lshr-128.ll
-
vector-shift-lshr-256.ll
-
vector-shift-lshr-512.ll
-
vector-shift-lshr-sub128.ll
-
vector-shift-shl-128.ll
-
vector-shift-shl-256.ll
-
vector-shift-shl-512.ll
-
vector-shift-shl-sub128.ll
-
vector-shuffle-512-v16.ll
-
vselect-pcmp.ll

Differential D150526

[X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as broadcast folds
ClosedPublic

Authored by RKSimon on May 14 2023, 8:20 AM.

Download Raw Diff

Details

Reviewers

pengfei
goldstein.w.n
craig.topper
skan
yubing
LuoYuanke

Commits

rG0b91de5ea32d: [X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as…

Summary

This patch analyzes AVX512 instructions for full vector width folded loads from the constant pool and attempts to determine if it can be replaced with a smaller broadcast folded variant. Typically the broadcast opportunities were missed by type-width mismatches or mulituse limitations which have been removed in later passes.

As well as introducing broadcast fold tables (which can hopefully be extended/automated in the future), this also handles mismatches in the AND/ANDN/OR/XOR/TERNLOG type-widths, catching additional missed opportunities.

This is patch is pulled from the ongoing work based on D150143, but without removing the existing DAG constant broadcast lowering code - this patch is currently a late stage cleanup only.

The intention is to add additional broadcast/extension handling of constants in future patches, but it turned out that AVX512 broadcast handling was the easiest to start with.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

RKSimon created this revision.May 14 2023, 8:20 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2023, 8:20 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

RKSimon requested review of this revision.May 14 2023, 8:20 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2023, 8:20 AM

Harbormaster completed remote builds in B231866: Diff 522004.May 14 2023, 8:54 AM

goldstein.w.n added inline comments.May 14 2023, 8:55 AM

llvm/lib/Target/X86/X86FixupVectorConstants.cpp
39	Since this is AVX512 only, maybe it should be "X86 Fixup Vector AVX512 Constants"? Likewise for the filename.
266	Is it only for logic ops? Why can't we do something like `add`, `sub`, etc... iff we can `rebuildSplatableConstant` with the correct `BitWidth`?
268	Can we convert `rm` ops that aren't in `evex` encoding? Or since is that dangerous in some way?
371	Why no attempt for `OpBcst256/128/16/8` on any of these? If they are truly unused can we remove this to kill the deadcode?
373	do all the `OpBcst\d+` opcodes not always match? Seems to be the same for all of them. If so I think it would be alot simpler to try all `OpBcst` sizes < Native_BitWidth. You could have an overload if you have want to support only checking a specific width later.

RKSimon added inline comments.May 14 2023, 9:19 AM

llvm/lib/Target/X86/X86FixupVectorConstants.cpp
39	No, as I said in the summary, the intention is to expand this to handle broadcast/extensions in future patches.
266	The broadcast width matches the element width, we can't replace a VPADDD with a VPADDQ - but bitwise ops don't care (as long as we're not using the predicate mask which we aren't for these particular cases).
268	This pass is before the X86EvexToVex pass so we shouldn't encounter such cases.
371	There are no OpBcst256/128/16/8 equivalent VPTERNLOG ops - we'd have to pull out the broadcast into a separate op, which isn't what we're after. I can remove the deadcode if you think it important, but it will be readded in a follow up patch very soon.
373	I had wondered about trying to handle this inside lookupBroadcastFoldTable but I wasn't sure if it'd get confusing to have a memop for one element width returning a bcstop for another element width.

goldstein.w.n added inline comments.May 14 2023, 10:48 AM

llvm/lib/Target/X86/X86FixupVectorConstants.cpp
373	Think it would be clearer. Having 7 arguments to a function, where most of them are magic numbers to a degree isn't particularly clear either. Also seems less bug prone (could imagine easily accidentally do `ConvertToBroadcast(0, X86::Opcode, ...)` instead of `ConvertToBroadcast(0, 0, X86::Opcode, ...)`.

Move bit logic broadcasts into broadcast fold tables

Add additional custom broadcast fold tables with element size mismatch

Harbormaster completed remote builds in B231880: Diff 522021.May 14 2023, 2:11 PM

ping?

Will it increase compile time much?

llvm/lib/Target/X86/X86.h
67	a pass
llvm/lib/Target/X86/X86FixupVectorConstants.cpp
105	Should check `isBFloatTy` too?
201	SclTy->is16bitFPTy()
219	Can we use `if (!SclTy->isIntegerTy())` so that we can unify and put them in one loop?
274	Do we have chance to pass one more OpBcstN at the same time? Otherwise, we can simply pass `BitWidth` and a single `OpBcst`.
296	Why not `MachineInstr &MI : MBB`?
llvm/lib/Target/X86/X86InstrFoldTables.cpp
37	Missing `X86::VANDNPSZrr`

RKSimon added inline comments.May 18 2023, 7:47 AM

llvm/lib/Target/X86/X86FixupVectorConstants.cpp
219	ConstantDataVector get/getFP helpers use the width of the raw bits to help determine the type so we wouldn't be able to merge the loops.
274	I'll see if I can refactor the loop so that we only have a single call to ConvertToBroadcast We're going to need this layout of ConvertToBroadcast for followup patches which adds some of the basic load -> broadcast support from D150143, so I'd prefer not to change it unless you think it absolutely necessary?
llvm/lib/Target/X86/X86InstrFoldTables.cpp
37	Good catch!

Address comments from @pengfei

Will it increase compile time much?

Iterating the the basic blocks is very cheap, the key is to avoid extracting the constant raw bits data too frequently - I think I've addressed this by pulling out the getConstantFromPool call, but only trying to find a splat (and generate the new constant) if we have a suitable opcode.

The cost of the lookupBroadcastFoldTable calls are reduced by sorting the tables on first use (as we already do for the other folding tables) and using a lower_bound search.

Harbormaster completed remote builds in B232881: Diff 523398.May 18 2023, 10:00 AM

goldstein.w.n added inline comments.May 18 2023, 10:36 AM

llvm/lib/Target/X86/X86FixupVectorConstants.cpp
130	Do we need a check that Bits->getBitWidth() >= SplatBitWidth?
232	Is there a way to check that MI is VEC type? If so maybe check that too before going to the foldingtable lookup?

RKSimon added inline comments.May 18 2023, 10:45 AM

llvm/lib/Target/X86/X86FixupVectorConstants.cpp
130	Adding an assert at the top would probably be enough, checking C->getPrimitiveSizeInBits % SplatBitWidth == 0.
232	Yes, we can probably do a check using the MCInstrDesc flags

Address @goldstein.w.n suggestsions

Harbormaster completed remote builds in B232990: Diff 523529.May 18 2023, 2:26 PM

LuoYuanke added inline comments.May 18 2023, 7:17 PM

llvm/lib/Target/X86/X86FixupVectorConstants.cpp
78	I'm not quite familar with MachineConstantPoolValue, so I don't understand why we should bail for MachineConstantPoolEntry. What's the difference between MachineConstantPoolValue and Constant?
128	It looks to me `isSplatableConstant` should return a bool value, but it return APInt value. Does it make sense to rename it to `getSplatableConstant`?

RKSimon added inline comments.May 19 2023, 2:04 AM

llvm/lib/Target/X86/X86FixupVectorConstants.cpp
78	MachineConstantPoolValue are an abstract class that are difficult to extract any data from as its not supposed to be manipulated. Constant entries however, which X86 nearly always uses, are trivial to work with. class MachineConstantPoolEntry { public: union { const Constant ConstVal; MachineConstantPoolValue MachineCPVal; } Val;

Replace isSplatableConstant with getSplatableConstant

Harbormaster completed remote builds in B233122: Diff 523699.May 19 2023, 2:53 AM

LGTM.

Thanks @LuoYuanke - any more comments?

In D150526#4356290, @RKSimon wrote:

Thanks @LuoYuanke - any more comments?

No more comments from me.

LGTM.

This revision is now accepted and ready to land.May 19 2023, 8:33 AM

Matt added a subscriber: Matt.May 22 2023, 2:31 PM

This revision was landed with ongoing or failed builds.May 23 2023, 3:01 AM

Closed by commit rG0b91de5ea32d: [X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as… (authored by RKSimon). · Explain Why

This revision was automatically updated to reflect the committed changes.

RKSimon added a commit: rG0b91de5ea32d: [X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as….

RKSimon mentioned this in rG0f8e0f422880: [X86] lowerBuildVectorAsBroadcast - broadcast Constant of original….May 27 2023, 6:30 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

CMakeLists.txt

1 line

X86.h

6 lines

X86FixupVectorConstants.cpp

309 lines

X86InstrFoldTables.h

5 lines

X86InstrFoldTables.cpp

178 lines

X86TargetMachine.cpp

1 line

test/

CodeGen/

X86/

avx512-calling-conv.ll

4 lines

avx512-ext.ll

10 lines

avx512-logic.ll

24 lines

avx512fp16-cvt-ph-w-vl-intrinsics.ll

2 lines

avx512vl-logic.ll

48 lines

bitcast-vector-bool.ll

2 lines

26 lines

22 lines

2 lines

6 lines

gfni-funnel-shifts.ll

4 lines

gfni-rotates.ll

4 lines

gfni-shifts.ll

48 lines

horizontal-reduce-smax.ll

214 lines

horizontal-reduce-smin.ll

214 lines

i64-to-float.ll

8 lines

icmp-pow2-diff.ll

4 lines

midpoint-int-vec-128.ll

20 lines

midpoint-int-vec-256.ll

20 lines

midpoint-int-vec-512.ll

30 lines

min-legal-vector-width.ll

6 lines

movmsk-cmp.ll

8 lines

opt-pipeline.ll

3 lines

paddus.ll

12 lines

prefer-avx256-lzcnt.ll

4 lines

prefer-avx256-mulo.ll

4 lines

prefer-avx256-shift.ll

30 lines

prefer-avx256-trunc.ll

2 lines

prefer-avx256-wide-mul.ll

28 lines

psubus.ll

38 lines

rotate-extract-vector.ll

6 lines

rotate_vec.ll

6 lines

sadd_sat_vec.ll

2 lines

srem-seteq-vec-nonsplat.ll

2 lines

ssub_sat_vec.ll

4 lines

usub_sat_vec.ll

2 lines

vec-strict-inttofp-128-fp16.ll

4 lines

vec-strict-inttofp-256-fp16.ll

4 lines

vec-strict-inttofp-256.ll

8 lines

vec-strict-inttofp-512-fp16.ll

4 lines

vector-fshl-128.ll

8 lines

vector-fshl-256.ll

14 lines

vector-fshl-512.ll

16 lines

vector-fshl-rot-128.ll

10 lines

vector-fshl-rot-256.ll

10 lines

vector-fshl-rot-512.ll

12 lines

vector-fshr-128.ll

6 lines

vector-fshr-256.ll

16 lines

vector-fshr-512.ll

16 lines

vector-fshr-rot-128.ll

10 lines

vector-fshr-rot-256.ll

8 lines

vector-fshr-rot-512.ll

12 lines

vector-idiv-sdiv-512.ll

14 lines

vector-idiv-udiv-512.ll

12 lines

502 lines

32 lines

8 lines

14 lines

106 lines

18 lines

16 lines

24 lines

vector-reduce-add-mask.ll

82 lines

vector-reduce-or-bool.ll

4 lines

vector-reduce-or-cmp.ll

2 lines

vector-reduce-smax.ll

308 lines

vector-reduce-smin.ll

308 lines

vector-rotate-128.ll

18 lines

vector-rotate-256.ll

18 lines

vector-rotate-512.ll

36 lines

vector-shift-ashr-128.ll

6 lines

vector-shift-ashr-256.ll

4 lines

vector-shift-ashr-512.ll

2 lines

vector-shift-ashr-sub128.ll

6 lines

vector-shift-lshr-128.ll

6 lines

vector-shift-lshr-256.ll

10 lines

vector-shift-lshr-512.ll

10 lines

vector-shift-lshr-sub128.ll

6 lines

vector-shift-shl-128.ll

6 lines

vector-shift-shl-256.ll

8 lines

vector-shift-shl-512.ll

8 lines

vector-shift-shl-sub128.ll

6 lines

vector-shuffle-512-v16.ll

2 lines

vselect-pcmp.ll

12 lines

Diff 524633

llvm/lib/Target/X86/CMakeLists.txt

Show All 39 Lines	set(sources
X86FastPreTileConfig.cpp		X86FastPreTileConfig.cpp
X86FastTileConfig.cpp		X86FastTileConfig.cpp
X86PreTileConfig.cpp		X86PreTileConfig.cpp
X86ExpandPseudo.cpp		X86ExpandPseudo.cpp
X86FastISel.cpp		X86FastISel.cpp
X86FixupBWInsts.cpp		X86FixupBWInsts.cpp
X86FixupLEAs.cpp		X86FixupLEAs.cpp
X86FixupInstTuning.cpp		X86FixupInstTuning.cpp
		X86FixupVectorConstants.cpp
X86AvoidStoreForwardingBlocks.cpp		X86AvoidStoreForwardingBlocks.cpp
X86DynAllocaExpander.cpp		X86DynAllocaExpander.cpp
X86FixupSetCC.cpp		X86FixupSetCC.cpp
X86FlagsCopyLowering.cpp		X86FlagsCopyLowering.cpp
X86FloatingPoint.cpp		X86FloatingPoint.cpp
X86FrameLowering.cpp		X86FrameLowering.cpp
X86InstructionSelector.cpp		X86InstructionSelector.cpp
X86ISelDAGToDAG.cpp		X86ISelDAGToDAG.cpp
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86.h

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	/// This will prevent a stall when returning on the Atom.			/// This will prevent a stall when returning on the Atom.
	FunctionPass *createX86PadShortFunctions();			FunctionPass *createX86PadShortFunctions();

	/// Return a pass that selectively replaces certain instructions (like add,			/// Return a pass that selectively replaces certain instructions (like add,
	/// sub, inc, dec, some shifts, and some multiplies) by equivalent LEA			/// sub, inc, dec, some shifts, and some multiplies) by equivalent LEA
	/// instructions, in order to eliminate execution delays in some processors.			/// instructions, in order to eliminate execution delays in some processors.
	FunctionPass *createX86FixupLEAs();			FunctionPass *createX86FixupLEAs();

	/// Return as pass that replaces equivilent slower instructions with faster			/// Return a pass that replaces equivalent slower instructions with faster
	/// ones.			/// ones.
	FunctionPass *createX86FixupInstTuning();			FunctionPass *createX86FixupInstTuning();

				/// Return a pass that reduces the size of vector constant pool loads.
				pengfeiUnsubmitted Not Done Reply Inline Actions a pass pengfei: a pass
				FunctionPass *createX86FixupVectorConstants();

	/// Return a pass that removes redundant LEA instructions and redundant address			/// Return a pass that removes redundant LEA instructions and redundant address
	/// recalculations.			/// recalculations.
	FunctionPass *createX86OptimizeLEAs();			FunctionPass *createX86OptimizeLEAs();

	/// Return a pass that transforms setcc + movzx pairs into xor + setcc.			/// Return a pass that transforms setcc + movzx pairs into xor + setcc.
	FunctionPass *createX86FixupSetCC();			FunctionPass *createX86FixupSetCC();

	/// Return a pass that avoids creating store forward block issues in the hardware.			/// Return a pass that avoids creating store forward block issues in the hardware.
	▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	FunctionPass *createX86ArgumentStackSlotPass();			FunctionPass *createX86ArgumentStackSlotPass();

	void initializeEvexToVexInstPassPass(PassRegistry &);			void initializeEvexToVexInstPassPass(PassRegistry &);
	void initializeFPSPass(PassRegistry &);			void initializeFPSPass(PassRegistry &);
	void initializeFixupBWInstPassPass(PassRegistry &);			void initializeFixupBWInstPassPass(PassRegistry &);
	void initializeFixupLEAPassPass(PassRegistry &);			void initializeFixupLEAPassPass(PassRegistry &);
	void initializeX86ArgumentStackSlotPassPass(PassRegistry &);			void initializeX86ArgumentStackSlotPassPass(PassRegistry &);
	void initializeX86FixupInstTuningPassPass(PassRegistry &);			void initializeX86FixupInstTuningPassPass(PassRegistry &);
				void initializeX86FixupVectorConstantsPassPass(PassRegistry &);
	void initializeWinEHStatePassPass(PassRegistry &);			void initializeWinEHStatePassPass(PassRegistry &);
	void initializeX86AvoidSFBPassPass(PassRegistry &);			void initializeX86AvoidSFBPassPass(PassRegistry &);
	void initializeX86AvoidTrailingCallPassPass(PassRegistry &);			void initializeX86AvoidTrailingCallPassPass(PassRegistry &);
	void initializeX86CallFrameOptimizationPass(PassRegistry &);			void initializeX86CallFrameOptimizationPass(PassRegistry &);
	void initializeX86CmovConverterPassPass(PassRegistry &);			void initializeX86CmovConverterPassPass(PassRegistry &);
	void initializeX86DAGToDAGISelPass(PassRegistry &);			void initializeX86DAGToDAGISelPass(PassRegistry &);
	void initializeX86DomainReassignmentPass(PassRegistry &);			void initializeX86DomainReassignmentPass(PassRegistry &);
	void initializeX86ExecutionDomainFixPass(PassRegistry &);			void initializeX86ExecutionDomainFixPass(PassRegistry &);
	Show All 33 Lines

llvm/lib/Target/X86/X86FixupVectorConstants.cpp

This file was added.

				//===-- X86FixupVectorConstants.cpp - optimize constant generation -------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file examines all full size vector constant pool loads and attempts to
				// replace them with smaller constant pool entries, including:
				// * Converting AVX512 memory-fold instructions to their broadcast-fold form
				// * TODO: Broadcasting of full width loads.
				// * TODO: Sign/Zero extension of full width loads.
				//
				//===----------------------------------------------------------------------===//

				#include "X86.h"
				#include "X86InstrFoldTables.h"
				#include "X86InstrInfo.h"
				#include "X86Subtarget.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/CodeGen/MachineConstantPool.h"

				using namespace llvm;

				#define DEBUG_TYPE "x86-fixup-vector-constants"

				STATISTIC(NumInstChanges, "Number of instructions changes");

				namespace {
				class X86FixupVectorConstantsPass : public MachineFunctionPass {
				public:
				static char ID;

				X86FixupVectorConstantsPass() : MachineFunctionPass(ID) {}

				StringRef getPassName() const override {
				return "X86 Fixup Vector Constants";
				}
				goldstein.w.nUnsubmitted Not Done Reply Inline Actions Since this is AVX512 only, maybe it should be "X86 Fixup Vector AVX512 Constants"? Likewise for the filename. goldstein.w.n: Since this is AVX512 only, maybe it should be "X86 Fixup Vector AVX512 Constants"? Likewise for…
				RKSimonAuthorUnsubmitted Not Done Reply Inline Actions No, as I said in the summary, the intention is to expand this to handle broadcast/extensions in future patches. RKSimon: No, as I said in the summary, the intention is to expand this to handle broadcast/extensions in…

				bool runOnMachineFunction(MachineFunction &MF) override;
				bool processInstruction(MachineFunction &MF, MachineBasicBlock &MBB,
				MachineInstr &MI);

				// This pass runs after regalloc and doesn't support VReg operands.
				MachineFunctionProperties getRequiredProperties() const override {
				return MachineFunctionProperties().set(
				MachineFunctionProperties::Property::NoVRegs);
				}

				private:
				const X86InstrInfo *TII = nullptr;
				const X86Subtarget *ST = nullptr;
				const MCSchedModel *SM = nullptr;
				};
				} // end anonymous namespace

				char X86FixupVectorConstantsPass::ID = 0;

				INITIALIZE_PASS(X86FixupVectorConstantsPass, DEBUG_TYPE, DEBUG_TYPE, false, false)

				FunctionPass *llvm::createX86FixupVectorConstants() {
				return new X86FixupVectorConstantsPass();
				}

				static const Constant *getConstantFromPool(const MachineInstr &MI,
				const MachineOperand &Op) {
				if (!Op.isCPI() \|\| Op.getOffset() != 0)
				return nullptr;

				ArrayRef<MachineConstantPoolEntry> Constants =
				MI.getParent()->getParent()->getConstantPool()->getConstants();
				const MachineConstantPoolEntry &ConstantEntry = Constants[Op.getIndex()];

				// Bail if this is a machine constant pool entry, we won't be able to dig out
				// anything useful.
				if (ConstantEntry.isMachineConstantPoolEntry())
				return nullptr;
				LuoYuankeUnsubmitted Not Done Reply Inline Actions I'm not quite familar with MachineConstantPoolValue, so I don't understand why we should bail for MachineConstantPoolEntry. What's the difference between MachineConstantPoolValue and Constant? LuoYuanke: I'm not quite familar with MachineConstantPoolValue, so I don't understand why we should bail…
				RKSimonAuthorUnsubmitted Not Done Reply Inline Actions MachineConstantPoolValue are an abstract class that are difficult to extract any data from as its not supposed to be manipulated. Constant entries however, which X86 nearly always uses, are trivial to work with. class MachineConstantPoolEntry { public: union { const Constant ConstVal; MachineConstantPoolValue MachineCPVal; } Val; RKSimon: MachineConstantPoolValue are an abstract class that are difficult to extract any data from as…

				return ConstantEntry.Val.ConstVal;
				}

				// Attempt to extract the full width of bits data from the constant.
				static std::optional<APInt> extractConstantBits(const Constant *C) {
				unsigned NumBits = C->getType()->getPrimitiveSizeInBits();

				if (auto *CInt = dyn_cast<ConstantInt>(C))
				return CInt->getValue();

				if (auto *CFP = dyn_cast<ConstantFP>(C))
				return CFP->getValue().bitcastToAPInt();

				if (auto *CV = dyn_cast<ConstantVector>(C)) {
				if (auto CVSplat = CV->getSplatValue(/AllowUndefs*/ true)) {
				if (std::optional<APInt> Bits = extractConstantBits(CVSplat)) {
				assert((NumBits % Bits->getBitWidth()) == 0 && "Illegal splat");
				return APInt::getSplat(NumBits, *Bits);
				}
				}
				}

				if (auto *CDS = dyn_cast<ConstantDataSequential>(C)) {
				bool IsInteger = CDS->getElementType()->isIntegerTy();
				bool IsFloat = CDS->getElementType()->isHalfTy() \|\|
				CDS->getElementType()->isBFloatTy() \|\|
				pengfeiUnsubmitted Not Done Reply Inline Actions Should check `isBFloatTy` too? pengfei: Should check `isBFloatTy` too?
				CDS->getElementType()->isFloatTy() \|\|
				CDS->getElementType()->isDoubleTy();
				if (IsInteger \|\| IsFloat) {
				APInt Bits = APInt::getZero(NumBits);
				unsigned EltBits = CDS->getElementType()->getPrimitiveSizeInBits();
				for (unsigned I = 0, E = CDS->getNumElements(); I != E; ++I) {
				if (IsInteger)
				Bits.insertBits(CDS->getElementAsAPInt(I), I * EltBits);
				else
				Bits.insertBits(CDS->getElementAsAPFloat(I).bitcastToAPInt(),
				I * EltBits);
				}
				return Bits;
				}
				}

				return std::nullopt;
				}

				// Attempt to compute the splat width of bits data by normalizing the splat to
				// remove undefs.
				static std::optional<APInt> getSplatableConstant(const Constant *C,
				unsigned SplatBitWidth) {
				LuoYuankeUnsubmitted Not Done Reply Inline Actions It looks to me `isSplatableConstant` should return a bool value, but it return APInt value. Does it make sense to rename it to `getSplatableConstant`? LuoYuanke: It looks to me `isSplatableConstant` should return a bool value, but it return APInt value.
				const Type *Ty = C->getType();
				assert((Ty->getPrimitiveSizeInBits() % SplatBitWidth) == 0 &&
				goldstein.w.nUnsubmitted Not Done Reply Inline Actions Do we need a check that Bits->getBitWidth() >= SplatBitWidth? goldstein.w.n: Do we need a check that Bits->getBitWidth() >= SplatBitWidth?
				RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Adding an assert at the top would probably be enough, checking C->getPrimitiveSizeInBits % SplatBitWidth == 0. RKSimon: Adding an assert at the top would probably be enough, checking C->getPrimitiveSizeInBits %…
				"Illegal splat width");

				if (std::optional<APInt> Bits = extractConstantBits(C))
				if (Bits->isSplat(SplatBitWidth))
				return Bits->trunc(SplatBitWidth);

				// Detect general splats with undefs.
				// TODO: Do we need to handle NumEltsBits > SplatBitWidth splitting?
				if (auto *CV = dyn_cast<ConstantVector>(C)) {
				unsigned NumOps = CV->getNumOperands();
				unsigned NumEltsBits = Ty->getScalarSizeInBits();
				unsigned NumScaleOps = SplatBitWidth / NumEltsBits;
				if ((SplatBitWidth % NumEltsBits) == 0) {
				// Collect the elements and ensure that within the repeated splat sequence
				// they either match or are undef.
				SmallVector<Constant *, 16> Sequence(NumScaleOps, nullptr);
				for (unsigned Idx = 0; Idx != NumOps; ++Idx) {
				if (Constant *Elt = CV->getAggregateElement(Idx)) {
				if (isa<UndefValue>(Elt))
				continue;
				unsigned SplatIdx = Idx % NumScaleOps;
				if (!Sequence[SplatIdx] \|\| Sequence[SplatIdx] == Elt) {
				Sequence[SplatIdx] = Elt;
				continue;
				}
				}
				return std::nullopt;
				}
				// Extract the constant bits forming the splat and insert into the bits
				// data, leave undef as zero.
				APInt SplatBits = APInt::getZero(SplatBitWidth);
				for (unsigned I = 0; I != NumScaleOps; ++I) {
				if (!Sequence[I])
				continue;
				if (std::optional<APInt> Bits = extractConstantBits(Sequence[I])) {
				SplatBits.insertBits(Bits, I Bits->getBitWidth());
				continue;
				}
				return std::nullopt;
				}
				return SplatBits;
				}
				}

				return std::nullopt;
				}

				// Attempt to rebuild a normalized splat vector constant of the requested splat
				// width, built up of potentially smaller scalar values.
				// NOTE: We don't always bother converting to scalars if the vector length is 1.
				static Constant rebuildSplatableConstant(const Constant C,
				unsigned SplatBitWidth) {
				std::optional<APInt> Splat = getSplatableConstant(C, SplatBitWidth);
				if (!Splat)
				return nullptr;

				// Determine scalar size to use for the constant splat vector, clamping as we
				// might have found a splat smaller than the original constant data.
				const Type *OriginalType = C->getType();
				Type *SclTy = OriginalType->getScalarType();
				unsigned NumSclBits = SclTy->getPrimitiveSizeInBits();
				NumSclBits = std::min<unsigned>(NumSclBits, SplatBitWidth);

				if (NumSclBits == 8) {
				SmallVector<uint8_t> RawBits;
				for (unsigned I = 0; I != SplatBitWidth; I += 8)
				RawBits.push_back(Splat->extractBits(8, I).getZExtValue());
				return ConstantDataVector::get(OriginalType->getContext(), RawBits);
				}

				if (NumSclBits == 16) {
				pengfeiUnsubmitted Not Done Reply Inline Actions SclTy->is16bitFPTy() pengfei: SclTy->is16bitFPTy()
				SmallVector<uint16_t> RawBits;
				for (unsigned I = 0; I != SplatBitWidth; I += 16)
				RawBits.push_back(Splat->extractBits(16, I).getZExtValue());
				if (SclTy->is16bitFPTy())
				return ConstantDataVector::getFP(SclTy, RawBits);
				return ConstantDataVector::get(OriginalType->getContext(), RawBits);
				}

				if (NumSclBits == 32) {
				SmallVector<uint32_t> RawBits;
				for (unsigned I = 0; I != SplatBitWidth; I += 32)
				RawBits.push_back(Splat->extractBits(32, I).getZExtValue());
				if (SclTy->isFloatTy())
				return ConstantDataVector::getFP(SclTy, RawBits);
				return ConstantDataVector::get(OriginalType->getContext(), RawBits);
				}

				// Fallback to i64 / double.
				pengfeiUnsubmitted Not Done Reply Inline Actions Can we use `if (!SclTy->isIntegerTy())` so that we can unify and put them in one loop? pengfei: Can we use `if (!SclTy->isIntegerTy())` so that we can unify and put them in one loop?
				RKSimonAuthorUnsubmitted Not Done Reply Inline Actions ConstantDataVector get/getFP helpers use the width of the raw bits to help determine the type so we wouldn't be able to merge the loops. RKSimon: ConstantDataVector get/getFP helpers use the width of the raw bits to help determine the type…
				SmallVector<uint64_t> RawBits;
				for (unsigned I = 0; I != SplatBitWidth; I += 64)
				RawBits.push_back(Splat->extractBits(64, I).getZExtValue());
				if (SclTy->isDoubleTy())
				return ConstantDataVector::getFP(SclTy, RawBits);
				return ConstantDataVector::get(OriginalType->getContext(), RawBits);
				}

				bool X86FixupVectorConstantsPass::processInstruction(MachineFunction &MF,
				MachineBasicBlock &MBB,
				MachineInstr &MI) {
				unsigned Opc = MI.getOpcode();
				MachineConstantPool *CP = MI.getParent()->getParent()->getConstantPool();
				goldstein.w.nUnsubmitted Not Done Reply Inline Actions Is there a way to check that MI is VEC type? If so maybe check that too before going to the foldingtable lookup? goldstein.w.n: Is there a way to check that MI is VEC type? If so maybe check that too before going to the…
				RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Yes, we can probably do a check using the MCInstrDesc flags RKSimon: Yes, we can probably do a check using the MCInstrDesc flags

				auto ConvertToBroadcast = [&](unsigned OpBcst256, unsigned OpBcst128,
				unsigned OpBcst64, unsigned OpBcst32,
				unsigned OpBcst16, unsigned OpBcst8,
				unsigned OperandNo) {
				assert(MI.getNumOperands() >= (OperandNo + X86::AddrNumOperands) &&
				"Unexpected number of operands!");

				MachineOperand &CstOp = MI.getOperand(OperandNo + X86::AddrDisp);
				if (auto *C = getConstantFromPool(MI, CstOp)) {
				// Attempt to detect a suitable splat from increasing splat widths.
				std::pair<unsigned, unsigned> Broadcasts[] = {
				{8, OpBcst8}, {16, OpBcst16}, {32, OpBcst32},
				{64, OpBcst64}, {128, OpBcst128}, {256, OpBcst256},
				};
				for (auto [BitWidth, OpBcst] : Broadcasts) {
				if (OpBcst) {
				// Construct a suitable splat constant and adjust the MI to
				// use the new constant pool entry.
				if (Constant *NewCst = rebuildSplatableConstant(C, BitWidth)) {
				unsigned NewCPI =
				CP->getConstantPoolIndex(NewCst, Align(BitWidth / 8));
				MI.setDesc(TII->get(OpBcst));
				CstOp.setIndex(NewCPI);
				return true;
				}
				}
				}
				}
				return false;
				};

				// Attempt to find a AVX512 mapping from a full width memory-fold instruction
				// to a broadcast-fold instruction variant.
				goldstein.w.nUnsubmitted Not Done Reply Inline Actions Is it only for logic ops? Why can't we do something like `add`, `sub`, etc... iff we can `rebuildSplatableConstant` with the correct `BitWidth`? goldstein.w.n: Is it only for logic ops? Why can't we do something like `add`, `sub`, etc... iff we can…
				RKSimonAuthorUnsubmitted Not Done Reply Inline Actions The broadcast width matches the element width, we can't replace a VPADDD with a VPADDQ - but bitwise ops don't care (as long as we're not using the predicate mask which we aren't for these particular cases). RKSimon: The broadcast width matches the element width, we can't replace a VPADDD with a VPADDQ - but…
				if ((MI.getDesc().TSFlags & X86II::EncodingMask) == X86II::EVEX) {
				unsigned OpBcst32 = 0, OpBcst64 = 0;
				goldstein.w.nUnsubmitted Not Done Reply Inline Actions Can we convert `rm` ops that aren't in `evex` encoding? Or since is that dangerous in some way? goldstein.w.n: Can we convert `rm` ops that aren't in `evex` encoding? Or since is that dangerous in some way?
				RKSimonAuthorUnsubmitted Not Done Reply Inline Actions This pass is before the X86EvexToVex pass so we shouldn't encounter such cases. RKSimon: This pass is before the X86EvexToVex pass so we shouldn't encounter such cases.
				unsigned OpNoBcst32 = 0, OpNoBcst64 = 0;
				if (const X86MemoryFoldTableEntry *Mem2Bcst =
				llvm::lookupBroadcastFoldTable(Opc, 32)) {
				OpBcst32 = Mem2Bcst->DstOp;
				OpNoBcst32 = Mem2Bcst->Flags & TB_INDEX_MASK;
				}
				pengfeiUnsubmitted Not Done Reply Inline Actions Do we have chance to pass one more OpBcstN at the same time? Otherwise, we can simply pass `BitWidth` and a single `OpBcst`. pengfei: Do we have chance to pass one more OpBcstN at the same time? Otherwise, we can simply pass…
				RKSimonAuthorUnsubmitted Not Done Reply Inline Actions I'll see if I can refactor the loop so that we only have a single call to ConvertToBroadcast We're going to need this layout of ConvertToBroadcast for followup patches which adds some of the basic load -> broadcast support from D150143, so I'd prefer not to change it unless you think it absolutely necessary? RKSimon: I'll see if I can refactor the loop so that we only have a single call to ConvertToBroadcast…
				if (const X86MemoryFoldTableEntry *Mem2Bcst =
				llvm::lookupBroadcastFoldTable(Opc, 64)) {
				OpBcst64 = Mem2Bcst->DstOp;
				OpNoBcst64 = Mem2Bcst->Flags & TB_INDEX_MASK;
				}
				assert(((OpBcst32 == 0) \|\| (OpBcst64 == 0) \|\| (OpNoBcst32 == OpNoBcst64)) &&
				"OperandNo mismatch");

				if (OpBcst32 \|\| OpBcst64) {
				unsigned OpNo = OpBcst32 == 0 ? OpNoBcst64 : OpNoBcst32;
				return ConvertToBroadcast(0, 0, OpBcst64, OpBcst32, 0, 0, OpNo);
				}
				}

				return false;
				}

				bool X86FixupVectorConstantsPass::runOnMachineFunction(MachineFunction &MF) {
				LLVM_DEBUG(dbgs() << "Start X86FixupVectorConstants\n";);
				bool Changed = false;
				ST = &MF.getSubtarget<X86Subtarget>();
				TII = ST->getInstrInfo();
				pengfeiUnsubmitted Not Done Reply Inline Actions Why not `MachineInstr &MI : MBB`? pengfei: Why not `MachineInstr &MI : MBB`?
				SM = &ST->getSchedModel();

				for (MachineBasicBlock &MBB : MF) {
				for (MachineInstr &MI : MBB) {
				if (processInstruction(MF, MBB, MI)) {
				++NumInstChanges;
				Changed = true;
				}
				}
				}
				LLVM_DEBUG(dbgs() << "End X86FixupVectorConstants\n";);
				return Changed;
				}
				goldstein.w.nUnsubmitted Not Done Reply Inline Actions do all the `OpBcst\d+` opcodes not always match? Seems to be the same for all of them. If so I think it would be alot simpler to try all `OpBcst` sizes < Native_BitWidth. You could have an overload if you have want to support only checking a specific width later. goldstein.w.n: do all the `OpBcst\d+` opcodes not always match? Seems to be the same for all of them. If so I…
				RKSimonAuthorUnsubmitted Not Done Reply Inline Actions I had wondered about trying to handle this inside lookupBroadcastFoldTable but I wasn't sure if it'd get confusing to have a memop for one element width returning a bcstop for another element width. RKSimon: I had wondered about trying to handle this inside lookupBroadcastFoldTable but I wasn't sure if…
				goldstein.w.nUnsubmitted Not Done Reply Inline Actions Think it would be clearer. Having 7 arguments to a function, where most of them are magic numbers to a degree isn't particularly clear either. Also seems less bug prone (could imagine easily accidentally do `ConvertToBroadcast(0, X86::Opcode, ...)` instead of `ConvertToBroadcast(0, 0, X86::Opcode, ...)`. goldstein.w.n: Think it would be clearer. Having 7 arguments to a function, where most of them are magic…
				goldstein.w.nUnsubmitted Not Done Reply Inline Actions Why no attempt for `OpBcst256/128/16/8` on any of these? If they are truly unused can we remove this to kill the deadcode? goldstein.w.n: Why no attempt for `OpBcst256/128/16/8` on any of these? If they are truly unused can we remove…
				RKSimonAuthorUnsubmitted Not Done Reply Inline Actions There are no OpBcst256/128/16/8 equivalent VPTERNLOG ops - we'd have to pull out the broadcast into a separate op, which isn't what we're after. I can remove the deadcode if you think it important, but it will be readded in a follow up patch very soon. RKSimon: There are no OpBcst256/128/16/8 equivalent VPTERNLOG ops - we'd have to pull out the broadcast…

llvm/lib/Target/X86/X86InstrFoldTables.h

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines

	// Look up the memory folding table entry for folding a load or store with			// Look up the memory folding table entry for folding a load or store with
	// operand OpNum.			// operand OpNum.
	const X86MemoryFoldTableEntry *lookupFoldTable(unsigned RegOp, unsigned OpNum);			const X86MemoryFoldTableEntry *lookupFoldTable(unsigned RegOp, unsigned OpNum);

	// Look up the memory unfolding table entry for this instruction.			// Look up the memory unfolding table entry for this instruction.
	const X86MemoryFoldTableEntry *lookupUnfoldTable(unsigned MemOp);			const X86MemoryFoldTableEntry *lookupUnfoldTable(unsigned MemOp);

				// Look up the broadcast memory folding table entry for this instruction from
				// the regular memory instruction.
				const X86MemoryFoldTableEntry *lookupBroadcastFoldTable(unsigned MemOp,
				unsigned BroadcastBits);

	} // namespace llvm			} // namespace llvm

	#endif			#endif

llvm/lib/Target/X86/X86InstrFoldTables.cpp

Show All 24 Lines
#include "X86GenFoldTables.inc"		#include "X86GenFoldTables.inc"
static const X86MemoryFoldTableEntry BroadcastFoldTable2[] = {		static const X86MemoryFoldTableEntry BroadcastFoldTable2[] = {
{ X86::VADDPDZ128rr, X86::VADDPDZ128rmb, TB_BCAST_SD },		{ X86::VADDPDZ128rr, X86::VADDPDZ128rmb, TB_BCAST_SD },
{ X86::VADDPDZ256rr, X86::VADDPDZ256rmb, TB_BCAST_SD },		{ X86::VADDPDZ256rr, X86::VADDPDZ256rmb, TB_BCAST_SD },
{ X86::VADDPDZrr, X86::VADDPDZrmb, TB_BCAST_SD },		{ X86::VADDPDZrr, X86::VADDPDZrmb, TB_BCAST_SD },
{ X86::VADDPSZ128rr, X86::VADDPSZ128rmb, TB_BCAST_SS },		{ X86::VADDPSZ128rr, X86::VADDPSZ128rmb, TB_BCAST_SS },
{ X86::VADDPSZ256rr, X86::VADDPSZ256rmb, TB_BCAST_SS },		{ X86::VADDPSZ256rr, X86::VADDPSZ256rmb, TB_BCAST_SS },
{ X86::VADDPSZrr, X86::VADDPSZrmb, TB_BCAST_SS },		{ X86::VADDPSZrr, X86::VADDPSZrmb, TB_BCAST_SS },
		{ X86::VANDNPDZ128rr, X86::VANDNPDZ128rmb, TB_BCAST_SD },
		{ X86::VANDNPDZ256rr, X86::VANDNPDZ256rmb, TB_BCAST_SD },
		{ X86::VANDNPDZrr, X86::VANDNPDZrmb, TB_BCAST_SD },
		{ X86::VANDNPSZ128rr, X86::VANDNPSZ128rmb, TB_BCAST_SS },
		{ X86::VANDNPSZ256rr, X86::VANDNPSZ256rmb, TB_BCAST_SS },
		pengfeiUnsubmitted Not Done Reply Inline Actions Missing `X86::VANDNPSZrr` pengfei: Missing `X86::VANDNPSZrr`
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Good catch! RKSimon: Good catch!
		{ X86::VANDNPSZrr, X86::VANDNPSZrmb, TB_BCAST_SS },
		{ X86::VANDPDZ128rr, X86::VANDPDZ128rmb, TB_BCAST_SD },
		{ X86::VANDPDZ256rr, X86::VANDPDZ256rmb, TB_BCAST_SD },
		{ X86::VANDPDZrr, X86::VANDPDZrmb, TB_BCAST_SD },
		{ X86::VANDPSZ128rr, X86::VANDPSZ128rmb, TB_BCAST_SS },
		{ X86::VANDPSZ256rr, X86::VANDPSZ256rmb, TB_BCAST_SS },
		{ X86::VANDPSZrr, X86::VANDPSZrmb, TB_BCAST_SS },
{ X86::VCMPPDZ128rri, X86::VCMPPDZ128rmbi, TB_BCAST_SD },		{ X86::VCMPPDZ128rri, X86::VCMPPDZ128rmbi, TB_BCAST_SD },
{ X86::VCMPPDZ256rri, X86::VCMPPDZ256rmbi, TB_BCAST_SD },		{ X86::VCMPPDZ256rri, X86::VCMPPDZ256rmbi, TB_BCAST_SD },
{ X86::VCMPPDZrri, X86::VCMPPDZrmbi, TB_BCAST_SD },		{ X86::VCMPPDZrri, X86::VCMPPDZrmbi, TB_BCAST_SD },
{ X86::VCMPPSZ128rri, X86::VCMPPSZ128rmbi, TB_BCAST_SS },		{ X86::VCMPPSZ128rri, X86::VCMPPSZ128rmbi, TB_BCAST_SS },
{ X86::VCMPPSZ256rri, X86::VCMPPSZ256rmbi, TB_BCAST_SS },		{ X86::VCMPPSZ256rri, X86::VCMPPSZ256rmbi, TB_BCAST_SS },
{ X86::VCMPPSZrri, X86::VCMPPSZrmbi, TB_BCAST_SS },		{ X86::VCMPPSZrri, X86::VCMPPSZrmbi, TB_BCAST_SS },
{ X86::VDIVPDZ128rr, X86::VDIVPDZ128rmb, TB_BCAST_SD },		{ X86::VDIVPDZ128rr, X86::VDIVPDZ128rmb, TB_BCAST_SD },
{ X86::VDIVPDZ256rr, X86::VDIVPDZ256rmb, TB_BCAST_SD },		{ X86::VDIVPDZ256rr, X86::VDIVPDZ256rmb, TB_BCAST_SD },
Show All 26 Lines	static const X86MemoryFoldTableEntry BroadcastFoldTable2[] = {
{ X86::VMINPSZ256rr, X86::VMINPSZ256rmb, TB_BCAST_SS },		{ X86::VMINPSZ256rr, X86::VMINPSZ256rmb, TB_BCAST_SS },
{ X86::VMINPSZrr, X86::VMINPSZrmb, TB_BCAST_SS },		{ X86::VMINPSZrr, X86::VMINPSZrmb, TB_BCAST_SS },
{ X86::VMULPDZ128rr, X86::VMULPDZ128rmb, TB_BCAST_SD },		{ X86::VMULPDZ128rr, X86::VMULPDZ128rmb, TB_BCAST_SD },
{ X86::VMULPDZ256rr, X86::VMULPDZ256rmb, TB_BCAST_SD },		{ X86::VMULPDZ256rr, X86::VMULPDZ256rmb, TB_BCAST_SD },
{ X86::VMULPDZrr, X86::VMULPDZrmb, TB_BCAST_SD },		{ X86::VMULPDZrr, X86::VMULPDZrmb, TB_BCAST_SD },
{ X86::VMULPSZ128rr, X86::VMULPSZ128rmb, TB_BCAST_SS },		{ X86::VMULPSZ128rr, X86::VMULPSZ128rmb, TB_BCAST_SS },
{ X86::VMULPSZ256rr, X86::VMULPSZ256rmb, TB_BCAST_SS },		{ X86::VMULPSZ256rr, X86::VMULPSZ256rmb, TB_BCAST_SS },
{ X86::VMULPSZrr, X86::VMULPSZrmb, TB_BCAST_SS },		{ X86::VMULPSZrr, X86::VMULPSZrmb, TB_BCAST_SS },
		{ X86::VORPDZ128rr, X86::VORPDZ128rmb, TB_BCAST_SD },
		{ X86::VORPDZ256rr, X86::VORPDZ256rmb, TB_BCAST_SD },
		{ X86::VORPDZrr, X86::VORPDZrmb, TB_BCAST_SD },
		{ X86::VORPSZ128rr, X86::VORPSZ128rmb, TB_BCAST_SS },
		{ X86::VORPSZ256rr, X86::VORPSZ256rmb, TB_BCAST_SS },
		{ X86::VORPSZrr, X86::VORPSZrmb, TB_BCAST_SS },
{ X86::VPADDDZ128rr, X86::VPADDDZ128rmb, TB_BCAST_D },		{ X86::VPADDDZ128rr, X86::VPADDDZ128rmb, TB_BCAST_D },
{ X86::VPADDDZ256rr, X86::VPADDDZ256rmb, TB_BCAST_D },		{ X86::VPADDDZ256rr, X86::VPADDDZ256rmb, TB_BCAST_D },
{ X86::VPADDDZrr, X86::VPADDDZrmb, TB_BCAST_D },		{ X86::VPADDDZrr, X86::VPADDDZrmb, TB_BCAST_D },
{ X86::VPADDQZ128rr, X86::VPADDQZ128rmb, TB_BCAST_Q },		{ X86::VPADDQZ128rr, X86::VPADDQZ128rmb, TB_BCAST_Q },
{ X86::VPADDQZ256rr, X86::VPADDQZ256rmb, TB_BCAST_Q },		{ X86::VPADDQZ256rr, X86::VPADDQZ256rmb, TB_BCAST_Q },
{ X86::VPADDQZrr, X86::VPADDQZrmb, TB_BCAST_Q },		{ X86::VPADDQZrr, X86::VPADDQZrmb, TB_BCAST_Q },
{ X86::VPANDDZ128rr, X86::VPANDDZ128rmb, TB_BCAST_D },		{ X86::VPANDDZ128rr, X86::VPANDDZ128rmb, TB_BCAST_D },
{ X86::VPANDDZ256rr, X86::VPANDDZ256rmb, TB_BCAST_D },		{ X86::VPANDDZ256rr, X86::VPANDDZ256rmb, TB_BCAST_D },
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	static const X86MemoryFoldTableEntry BroadcastFoldTable2[] = {
{ X86::VPXORQZ256rr, X86::VPXORQZ256rmb, TB_BCAST_Q },		{ X86::VPXORQZ256rr, X86::VPXORQZ256rmb, TB_BCAST_Q },
{ X86::VPXORQZrr, X86::VPXORQZrmb, TB_BCAST_Q },		{ X86::VPXORQZrr, X86::VPXORQZrmb, TB_BCAST_Q },
{ X86::VSUBPDZ128rr, X86::VSUBPDZ128rmb, TB_BCAST_SD },		{ X86::VSUBPDZ128rr, X86::VSUBPDZ128rmb, TB_BCAST_SD },
{ X86::VSUBPDZ256rr, X86::VSUBPDZ256rmb, TB_BCAST_SD },		{ X86::VSUBPDZ256rr, X86::VSUBPDZ256rmb, TB_BCAST_SD },
{ X86::VSUBPDZrr, X86::VSUBPDZrmb, TB_BCAST_SD },		{ X86::VSUBPDZrr, X86::VSUBPDZrmb, TB_BCAST_SD },
{ X86::VSUBPSZ128rr, X86::VSUBPSZ128rmb, TB_BCAST_SS },		{ X86::VSUBPSZ128rr, X86::VSUBPSZ128rmb, TB_BCAST_SS },
{ X86::VSUBPSZ256rr, X86::VSUBPSZ256rmb, TB_BCAST_SS },		{ X86::VSUBPSZ256rr, X86::VSUBPSZ256rmb, TB_BCAST_SS },
{ X86::VSUBPSZrr, X86::VSUBPSZrmb, TB_BCAST_SS },		{ X86::VSUBPSZrr, X86::VSUBPSZrmb, TB_BCAST_SS },
		{ X86::VXORPDZ128rr, X86::VXORPDZ128rmb, TB_BCAST_SD },
		{ X86::VXORPDZ256rr, X86::VXORPDZ256rmb, TB_BCAST_SD },
		{ X86::VXORPDZrr, X86::VXORPDZrmb, TB_BCAST_SD },
		{ X86::VXORPSZ128rr, X86::VXORPSZ128rmb, TB_BCAST_SS },
		{ X86::VXORPSZ256rr, X86::VXORPSZ256rmb, TB_BCAST_SS },
		{ X86::VXORPSZrr, X86::VXORPSZrmb, TB_BCAST_SS },
};		};

static const X86MemoryFoldTableEntry BroadcastFoldTable3[] = {		static const X86MemoryFoldTableEntry BroadcastFoldTable3[] = {
{ X86::VFMADD132PDZ128r, X86::VFMADD132PDZ128mb, TB_BCAST_SD },		{ X86::VFMADD132PDZ128r, X86::VFMADD132PDZ128mb, TB_BCAST_SD },
{ X86::VFMADD132PDZ256r, X86::VFMADD132PDZ256mb, TB_BCAST_SD },		{ X86::VFMADD132PDZ256r, X86::VFMADD132PDZ256mb, TB_BCAST_SD },
{ X86::VFMADD132PDZr, X86::VFMADD132PDZmb, TB_BCAST_SD },		{ X86::VFMADD132PDZr, X86::VFMADD132PDZmb, TB_BCAST_SD },
{ X86::VFMADD132PSZ128r, X86::VFMADD132PSZ128mb, TB_BCAST_SS },		{ X86::VFMADD132PSZ128r, X86::VFMADD132PSZ128mb, TB_BCAST_SS },
{ X86::VFMADD132PSZ256r, X86::VFMADD132PSZ256mb, TB_BCAST_SS },		{ X86::VFMADD132PSZ256r, X86::VFMADD132PSZ256mb, TB_BCAST_SS },
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	static const X86MemoryFoldTableEntry BroadcastFoldTable3[] = {
{ X86::VPTERNLOGDZ128rri, X86::VPTERNLOGDZ128rmbi, TB_BCAST_D },		{ X86::VPTERNLOGDZ128rri, X86::VPTERNLOGDZ128rmbi, TB_BCAST_D },
{ X86::VPTERNLOGDZ256rri, X86::VPTERNLOGDZ256rmbi, TB_BCAST_D },		{ X86::VPTERNLOGDZ256rri, X86::VPTERNLOGDZ256rmbi, TB_BCAST_D },
{ X86::VPTERNLOGDZrri, X86::VPTERNLOGDZrmbi, TB_BCAST_D },		{ X86::VPTERNLOGDZrri, X86::VPTERNLOGDZrmbi, TB_BCAST_D },
{ X86::VPTERNLOGQZ128rri, X86::VPTERNLOGQZ128rmbi, TB_BCAST_Q },		{ X86::VPTERNLOGQZ128rri, X86::VPTERNLOGQZ128rmbi, TB_BCAST_Q },
{ X86::VPTERNLOGQZ256rri, X86::VPTERNLOGQZ256rmbi, TB_BCAST_Q },		{ X86::VPTERNLOGQZ256rri, X86::VPTERNLOGQZ256rmbi, TB_BCAST_Q },
{ X86::VPTERNLOGQZrri, X86::VPTERNLOGQZrmbi, TB_BCAST_Q },		{ X86::VPTERNLOGQZrri, X86::VPTERNLOGQZrmbi, TB_BCAST_Q },
};		};

		// Table to map instructions safe to broadcast using a different width from the
		// element width.
		static const X86MemoryFoldTableEntry BroadcastSizeFoldTable2[] = {
		{ X86::VANDNPDZ128rr, X86::VANDNPSZ128rmb, TB_BCAST_SS },
		{ X86::VANDNPDZ256rr, X86::VANDNPSZ256rmb, TB_BCAST_SS },
		{ X86::VANDNPDZrr, X86::VANDNPSZrmb, TB_BCAST_SS },
		{ X86::VANDNPSZ128rr, X86::VANDNPDZ128rmb, TB_BCAST_SD },
		{ X86::VANDNPSZ256rr, X86::VANDNPDZ256rmb, TB_BCAST_SD },
		{ X86::VANDNPSZrr, X86::VANDNPDZrmb, TB_BCAST_SD },
		{ X86::VANDPDZ128rr, X86::VANDPSZ128rmb, TB_BCAST_SS },
		{ X86::VANDPDZ256rr, X86::VANDPSZ256rmb, TB_BCAST_SS },
		{ X86::VANDPDZrr, X86::VANDPSZrmb, TB_BCAST_SS },
		{ X86::VANDPSZ128rr, X86::VANDPDZ128rmb, TB_BCAST_SD },
		{ X86::VANDPSZ256rr, X86::VANDPDZ256rmb, TB_BCAST_SD },
		{ X86::VANDPSZrr, X86::VANDPDZrmb, TB_BCAST_SD },
		{ X86::VORPDZ128rr, X86::VORPSZ128rmb, TB_BCAST_SS },
		{ X86::VORPDZ256rr, X86::VORPSZ256rmb, TB_BCAST_SS },
		{ X86::VORPDZrr, X86::VORPSZrmb, TB_BCAST_SS },
		{ X86::VORPSZ128rr, X86::VORPDZ128rmb, TB_BCAST_SD },
		{ X86::VORPSZ256rr, X86::VORPDZ256rmb, TB_BCAST_SD },
		{ X86::VORPSZrr, X86::VORPDZrmb, TB_BCAST_SD },
		{ X86::VPANDDZ128rr, X86::VPANDQZ128rmb, TB_BCAST_Q },
		{ X86::VPANDDZ256rr, X86::VPANDQZ256rmb, TB_BCAST_Q },
		{ X86::VPANDDZrr, X86::VPANDQZrmb, TB_BCAST_Q },
		{ X86::VPANDNDZ128rr, X86::VPANDNQZ128rmb, TB_BCAST_Q },
		{ X86::VPANDNDZ256rr, X86::VPANDNQZ256rmb, TB_BCAST_Q },
		{ X86::VPANDNDZrr, X86::VPANDNQZrmb, TB_BCAST_Q },
		{ X86::VPANDNQZ128rr, X86::VPANDNDZ128rmb, TB_BCAST_D },
		{ X86::VPANDNQZ256rr, X86::VPANDNDZ256rmb, TB_BCAST_D },
		{ X86::VPANDNQZrr, X86::VPANDNDZrmb, TB_BCAST_D },
		{ X86::VPANDQZ128rr, X86::VPANDDZ128rmb, TB_BCAST_D },
		{ X86::VPANDQZ256rr, X86::VPANDDZ256rmb, TB_BCAST_D },
		{ X86::VPANDQZrr, X86::VPANDDZrmb, TB_BCAST_D },
		{ X86::VPORDZ128rr, X86::VPORQZ128rmb, TB_BCAST_Q },
		{ X86::VPORDZ256rr, X86::VPORQZ256rmb, TB_BCAST_Q },
		{ X86::VPORDZrr, X86::VPORQZrmb, TB_BCAST_Q },
		{ X86::VPORQZ128rr, X86::VPORDZ128rmb, TB_BCAST_D },
		{ X86::VPORQZ256rr, X86::VPORDZ256rmb, TB_BCAST_D },
		{ X86::VPORQZrr, X86::VPORDZrmb, TB_BCAST_D },
		{ X86::VPXORDZ128rr, X86::VPXORQZ128rmb, TB_BCAST_Q },
		{ X86::VPXORDZ256rr, X86::VPXORQZ256rmb, TB_BCAST_Q },
		{ X86::VPXORDZrr, X86::VPXORQZrmb, TB_BCAST_Q },
		{ X86::VPXORQZ128rr, X86::VPXORDZ128rmb, TB_BCAST_D },
		{ X86::VPXORQZ256rr, X86::VPXORDZ256rmb, TB_BCAST_D },
		{ X86::VPXORQZrr, X86::VPXORDZrmb, TB_BCAST_D },
		{ X86::VXORPDZ128rr, X86::VXORPSZ128rmb, TB_BCAST_SS },
		{ X86::VXORPDZ256rr, X86::VXORPSZ256rmb, TB_BCAST_SS },
		{ X86::VXORPDZrr, X86::VXORPSZrmb, TB_BCAST_SS },
		{ X86::VXORPSZ128rr, X86::VXORPDZ128rmb, TB_BCAST_SD },
		{ X86::VXORPSZ256rr, X86::VXORPDZ256rmb, TB_BCAST_SD },
		{ X86::VXORPSZrr, X86::VXORPDZrmb, TB_BCAST_SD },
		};

		static const X86MemoryFoldTableEntry BroadcastSizeFoldTable3[] = {
		{ X86::VPTERNLOGDZ128rri, X86::VPTERNLOGQZ128rmbi, TB_BCAST_Q },
		{ X86::VPTERNLOGDZ256rri, X86::VPTERNLOGQZ256rmbi, TB_BCAST_Q },
		{ X86::VPTERNLOGDZrri, X86::VPTERNLOGQZrmbi, TB_BCAST_Q },
		{ X86::VPTERNLOGQZ128rri, X86::VPTERNLOGDZ128rmbi, TB_BCAST_D },
		{ X86::VPTERNLOGQZ256rri, X86::VPTERNLOGDZ256rmbi, TB_BCAST_D },
		{ X86::VPTERNLOGQZrri, X86::VPTERNLOGDZrmbi, TB_BCAST_D },
		};

static const X86MemoryFoldTableEntry *		static const X86MemoryFoldTableEntry *
lookupFoldTableImpl(ArrayRef<X86MemoryFoldTableEntry> Table, unsigned RegOp) {		lookupFoldTableImpl(ArrayRef<X86MemoryFoldTableEntry> Table, unsigned RegOp) {
#ifndef NDEBUG		#ifndef NDEBUG
// Make sure the tables are sorted.		// Make sure the tables are sorted.
static std::atomic<bool> FoldTablesChecked(false);		static std::atomic<bool> FoldTablesChecked(false);
if (!FoldTablesChecked.load(std::memory_order_relaxed)) {		if (!FoldTablesChecked.load(std::memory_order_relaxed)) {
assert(llvm::is_sorted(MemoryFoldTable2Addr) &&		assert(llvm::is_sorted(MemoryFoldTable2Addr) &&
std::adjacent_find(std::begin(MemoryFoldTable2Addr),		std::adjacent_find(std::begin(MemoryFoldTable2Addr),
Show All 30 Lines	assert(llvm::is_sorted(BroadcastFoldTable2) &&
std::end(BroadcastFoldTable2)) ==		std::end(BroadcastFoldTable2)) ==
std::end(BroadcastFoldTable2) &&		std::end(BroadcastFoldTable2) &&
"BroadcastFoldTable2 is not sorted and unique!");		"BroadcastFoldTable2 is not sorted and unique!");
assert(llvm::is_sorted(BroadcastFoldTable3) &&		assert(llvm::is_sorted(BroadcastFoldTable3) &&
std::adjacent_find(std::begin(BroadcastFoldTable3),		std::adjacent_find(std::begin(BroadcastFoldTable3),
std::end(BroadcastFoldTable3)) ==		std::end(BroadcastFoldTable3)) ==
std::end(BroadcastFoldTable3) &&		std::end(BroadcastFoldTable3) &&
"BroadcastFoldTable3 is not sorted and unique!");		"BroadcastFoldTable3 is not sorted and unique!");
		assert(llvm::is_sorted(BroadcastSizeFoldTable2) &&
		std::adjacent_find(std::begin(BroadcastSizeFoldTable2),
		std::end(BroadcastSizeFoldTable2)) ==
		std::end(BroadcastSizeFoldTable2) &&
		"BroadcastSizeFoldTable2 is not sorted and unique!");
		assert(llvm::is_sorted(BroadcastSizeFoldTable3) &&
		std::adjacent_find(std::begin(BroadcastSizeFoldTable3),
		std::end(BroadcastSizeFoldTable3)) ==
		std::end(BroadcastSizeFoldTable3) &&
		"BroadcastSizeFoldTable3 is not sorted and unique!");
FoldTablesChecked.store(true, std::memory_order_relaxed);		FoldTablesChecked.store(true, std::memory_order_relaxed);
}		}
#endif		#endif

const X86MemoryFoldTableEntry *Data = llvm::lower_bound(Table, RegOp);		const X86MemoryFoldTableEntry *Data = llvm::lower_bound(Table, RegOp);
if (Data != Table.end() && Data->KeyOp == RegOp &&		if (Data != Table.end() && Data->KeyOp == RegOp &&
!(Data->Flags & TB_NO_FORWARD))		!(Data->Flags & TB_NO_FORWARD))
return Data;		return Data;
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	llvm::lookupUnfoldTable(unsigned MemOp) {
static X86MemUnfoldTable MemUnfoldTable;		static X86MemUnfoldTable MemUnfoldTable;
auto &Table = MemUnfoldTable.Table;		auto &Table = MemUnfoldTable.Table;
auto I = llvm::lower_bound(Table, MemOp);		auto I = llvm::lower_bound(Table, MemOp);
if (I != Table.end() && I->KeyOp == MemOp)		if (I != Table.end() && I->KeyOp == MemOp)
return &*I;		return &*I;
return nullptr;		return nullptr;
}		}

		namespace {

		// This class stores the memory -> broadcast folding tables. It is instantiated
		// as a function scope static variable to lazily init the folding table.
		struct X86MemBroadcastFoldTable {
		// Stores memory broadcast folding tables entries sorted by opcode.
		std::vector<X86MemoryFoldTableEntry> Table;

		X86MemBroadcastFoldTable() {
		// Broadcast tables.
		for (const X86MemoryFoldTableEntry &Reg2Bcst : BroadcastFoldTable2) {
		unsigned RegOp = Reg2Bcst.KeyOp;
		unsigned BcstOp = Reg2Bcst.DstOp;
		if (const X86MemoryFoldTableEntry *Reg2Mem = lookupFoldTable(RegOp, 2)) {
		unsigned MemOp = Reg2Mem->DstOp;
		uint16_t Flags = Reg2Mem->Flags \| Reg2Bcst.Flags \| TB_INDEX_2 \|
		TB_FOLDED_LOAD \| TB_FOLDED_BCAST;
		Table.push_back({MemOp, BcstOp, Flags});
		}
		}
		for (const X86MemoryFoldTableEntry &Reg2Bcst : BroadcastSizeFoldTable2) {
		unsigned RegOp = Reg2Bcst.KeyOp;
		unsigned BcstOp = Reg2Bcst.DstOp;
		if (const X86MemoryFoldTableEntry *Reg2Mem = lookupFoldTable(RegOp, 2)) {
		unsigned MemOp = Reg2Mem->DstOp;
		uint16_t Flags = Reg2Mem->Flags \| Reg2Bcst.Flags \| TB_INDEX_2 \|
		TB_FOLDED_LOAD \| TB_FOLDED_BCAST;
		Table.push_back({MemOp, BcstOp, Flags});
		}
		}

		for (const X86MemoryFoldTableEntry &Reg2Bcst : BroadcastFoldTable3) {
		unsigned RegOp = Reg2Bcst.KeyOp;
		unsigned BcstOp = Reg2Bcst.DstOp;
		if (const X86MemoryFoldTableEntry *Reg2Mem = lookupFoldTable(RegOp, 3)) {
		unsigned MemOp = Reg2Mem->DstOp;
		uint16_t Flags = Reg2Mem->Flags \| Reg2Bcst.Flags \| TB_INDEX_3 \|
		TB_FOLDED_LOAD \| TB_FOLDED_BCAST;
		Table.push_back({MemOp, BcstOp, Flags});
		}
		}
		for (const X86MemoryFoldTableEntry &Reg2Bcst : BroadcastSizeFoldTable3) {
		unsigned RegOp = Reg2Bcst.KeyOp;
		unsigned BcstOp = Reg2Bcst.DstOp;
		if (const X86MemoryFoldTableEntry *Reg2Mem = lookupFoldTable(RegOp, 3)) {
		unsigned MemOp = Reg2Mem->DstOp;
		uint16_t Flags = Reg2Mem->Flags \| Reg2Bcst.Flags \| TB_INDEX_3 \|
		TB_FOLDED_LOAD \| TB_FOLDED_BCAST;
		Table.push_back({MemOp, BcstOp, Flags});
		}
		}

		// Sort the memory->broadcast fold table.
		array_pod_sort(Table.begin(), Table.end());
		}
		};
		} // namespace

		static bool matchBroadcastSize(const X86MemoryFoldTableEntry &Entry,
		unsigned BroadcastBits) {
		switch (Entry.Flags & TB_BCAST_MASK) {
		case TB_BCAST_SD:
		case TB_BCAST_Q:
		return BroadcastBits == 64;
		case TB_BCAST_SS:
		case TB_BCAST_D:
		return BroadcastBits == 32;
		}
		return false;
		}

		const X86MemoryFoldTableEntry *
		llvm::lookupBroadcastFoldTable(unsigned MemOp, unsigned BroadcastBits) {
		static X86MemBroadcastFoldTable MemBroadcastFoldTable;
		auto &Table = MemBroadcastFoldTable.Table;
		for (auto I = llvm::lower_bound(Table, MemOp);
		I != Table.end() && I->KeyOp == MemOp; ++I) {
		if (matchBroadcastSize(*I, BroadcastBits))
		return &*I;
		}
		return nullptr;
		}

llvm/lib/Target/X86/X86TargetMachine.cpp

Show First 20 Lines • Show All 568 Lines • ▼ Show 20 Lines	void X86PassConfig::addPreEmitPass() {

addPass(createX86IssueVZeroUpperPass());		addPass(createX86IssueVZeroUpperPass());

if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
addPass(createX86FixupBWInsts());		addPass(createX86FixupBWInsts());
addPass(createX86PadShortFunctions());		addPass(createX86PadShortFunctions());
addPass(createX86FixupLEAs());		addPass(createX86FixupLEAs());
addPass(createX86FixupInstTuning());		addPass(createX86FixupInstTuning());
		addPass(createX86FixupVectorConstants());
}		}
addPass(createX86EvexToVexInsts());		addPass(createX86EvexToVexInsts());
addPass(createX86DiscriminateMemOpsPass());		addPass(createX86DiscriminateMemOpsPass());
addPass(createX86InsertPrefetchPass());		addPass(createX86InsertPrefetchPass());
addPass(createX86InsertX87waitPass());		addPass(createX86InsertX87waitPass());
}		}

void X86PassConfig::addPreEmitPass2() {		void X86PassConfig::addPreEmitPass2() {
▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx512-calling-conv.ll

	Show First 20 Lines • Show All 280 Lines • ▼ Show 20 Lines
	; SKX-LABEL: test7a:			; SKX-LABEL: test7a:
	; SKX: ## %bb.0:			; SKX: ## %bb.0:
	; SKX-NEXT: pushq %rax			; SKX-NEXT: pushq %rax
	; SKX-NEXT: .cfi_def_cfa_offset 16			; SKX-NEXT: .cfi_def_cfa_offset 16
	; SKX-NEXT: vpcmpgtd %ymm1, %ymm0, %k0			; SKX-NEXT: vpcmpgtd %ymm1, %ymm0, %k0
	; SKX-NEXT: vpmovm2w %k0, %xmm0			; SKX-NEXT: vpmovm2w %k0, %xmm0
	; SKX-NEXT: vzeroupper			; SKX-NEXT: vzeroupper
	; SKX-NEXT: callq _func8xi1			; SKX-NEXT: callq _func8xi1
	; SKX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; SKX-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; SKX-NEXT: popq %rax			; SKX-NEXT: popq %rax
	; SKX-NEXT: retq			; SKX-NEXT: retq
	;			;
	; KNL_X32-LABEL: test7a:			; KNL_X32-LABEL: test7a:
	; KNL_X32: ## %bb.0:			; KNL_X32: ## %bb.0:
	; KNL_X32-NEXT: subl $12, %esp			; KNL_X32-NEXT: subl $12, %esp
	; KNL_X32-NEXT: .cfi_def_cfa_offset 16			; KNL_X32-NEXT: .cfi_def_cfa_offset 16
	; KNL_X32-NEXT: vpcmpgtd %ymm1, %ymm0, %ymm0			; KNL_X32-NEXT: vpcmpgtd %ymm1, %ymm0, %ymm0
	; KNL_X32-NEXT: vpmovdw %zmm0, %ymm0			; KNL_X32-NEXT: vpmovdw %zmm0, %ymm0
	; KNL_X32-NEXT: ## kill: def $xmm0 killed $xmm0 killed $ymm0			; KNL_X32-NEXT: ## kill: def $xmm0 killed $xmm0 killed $ymm0
	; KNL_X32-NEXT: calll _func8xi1			; KNL_X32-NEXT: calll _func8xi1
	; KNL_X32-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0			; KNL_X32-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
	; KNL_X32-NEXT: addl $12, %esp			; KNL_X32-NEXT: addl $12, %esp
	; KNL_X32-NEXT: retl			; KNL_X32-NEXT: retl
	;			;
	; FASTISEL-LABEL: test7a:			; FASTISEL-LABEL: test7a:
	; FASTISEL: ## %bb.0:			; FASTISEL: ## %bb.0:
	; FASTISEL-NEXT: pushq %rax			; FASTISEL-NEXT: pushq %rax
	; FASTISEL-NEXT: .cfi_def_cfa_offset 16			; FASTISEL-NEXT: .cfi_def_cfa_offset 16
	; FASTISEL-NEXT: vpcmpgtd %ymm1, %ymm0, %k0			; FASTISEL-NEXT: vpcmpgtd %ymm1, %ymm0, %k0
	; FASTISEL-NEXT: vpmovm2w %k0, %xmm0			; FASTISEL-NEXT: vpmovm2w %k0, %xmm0
	; FASTISEL-NEXT: vzeroupper			; FASTISEL-NEXT: vzeroupper
	; FASTISEL-NEXT: callq _func8xi1			; FASTISEL-NEXT: callq _func8xi1
	; FASTISEL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; FASTISEL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; FASTISEL-NEXT: popq %rax			; FASTISEL-NEXT: popq %rax
	; FASTISEL-NEXT: retq			; FASTISEL-NEXT: retq
	%cmpRes = icmp sgt <8 x i32>%a, %b			%cmpRes = icmp sgt <8 x i32>%a, %b
	%resi = call <8 x i1> @func8xi1(<8 x i1> %cmpRes)			%resi = call <8 x i1> @func8xi1(<8 x i1> %cmpRes)
	%res = and <8 x i1>%resi, <i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false>			%res = and <8 x i1>%resi, <i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false>
	ret <8 x i1> %res			ret <8 x i1> %res
	}			}

	▲ Show 20 Lines • Show All 3,866 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx512-ext.ll

	Show First 20 Lines • Show All 2,889 Lines • ▼ Show 20 Lines
	define <64 x i8> @zext_64xi1_to_64xi8(<64 x i8> %x, <64 x i8> %y) #0 {			define <64 x i8> @zext_64xi1_to_64xi8(<64 x i8> %x, <64 x i8> %y) #0 {
	; KNL-LABEL: zext_64xi1_to_64xi8:			; KNL-LABEL: zext_64xi1_to_64xi8:
	; KNL: # %bb.0:			; KNL: # %bb.0:
	; KNL-NEXT: vextracti64x4 $1, %zmm1, %ymm2			; KNL-NEXT: vextracti64x4 $1, %zmm1, %ymm2
	; KNL-NEXT: vextracti64x4 $1, %zmm0, %ymm3			; KNL-NEXT: vextracti64x4 $1, %zmm0, %ymm3
	; KNL-NEXT: vpcmpeqb %ymm2, %ymm3, %ymm2			; KNL-NEXT: vpcmpeqb %ymm2, %ymm3, %ymm2
	; KNL-NEXT: vpcmpeqb %ymm1, %ymm0, %ymm0			; KNL-NEXT: vpcmpeqb %ymm1, %ymm0, %ymm0
	; KNL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; KNL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; KNL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; KNL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; KNL-NEXT: retq			; KNL-NEXT: retq
	;			;
	; SKX-LABEL: zext_64xi1_to_64xi8:			; SKX-LABEL: zext_64xi1_to_64xi8:
	; SKX: # %bb.0:			; SKX: # %bb.0:
	; SKX-NEXT: vpcmpeqb %zmm1, %zmm0, %k1			; SKX-NEXT: vpcmpeqb %zmm1, %zmm0, %k1
	; SKX-NEXT: vmovdqu8 {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0 {%k1} {z}			; SKX-NEXT: vmovdqu8 {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0 {%k1} {z}
	; SKX-NEXT: retq			; SKX-NEXT: retq
	;			;
	; AVX512DQNOBW-LABEL: zext_64xi1_to_64xi8:			; AVX512DQNOBW-LABEL: zext_64xi1_to_64xi8:
	; AVX512DQNOBW: # %bb.0:			; AVX512DQNOBW: # %bb.0:
	; AVX512DQNOBW-NEXT: vextracti64x4 $1, %zmm1, %ymm2			; AVX512DQNOBW-NEXT: vextracti64x4 $1, %zmm1, %ymm2
	; AVX512DQNOBW-NEXT: vextracti64x4 $1, %zmm0, %ymm3			; AVX512DQNOBW-NEXT: vextracti64x4 $1, %zmm0, %ymm3
	; AVX512DQNOBW-NEXT: vpcmpeqb %ymm2, %ymm3, %ymm2			; AVX512DQNOBW-NEXT: vpcmpeqb %ymm2, %ymm3, %ymm2
	; AVX512DQNOBW-NEXT: vpcmpeqb %ymm1, %ymm0, %ymm0			; AVX512DQNOBW-NEXT: vpcmpeqb %ymm1, %ymm0, %ymm0
	; AVX512DQNOBW-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512DQNOBW-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512DQNOBW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512DQNOBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512DQNOBW-NEXT: retq			; AVX512DQNOBW-NEXT: retq
	%mask = icmp eq <64 x i8> %x, %y			%mask = icmp eq <64 x i8> %x, %y
	%1 = zext <64 x i1> %mask to <64 x i8>			%1 = zext <64 x i1> %mask to <64 x i8>
	ret <64 x i8> %1			ret <64 x i8> %1
	}			}

	define <32 x i16> @zext_32xi1_to_32xi16(<32 x i16> %x, <32 x i16> %y) #0 {			define <32 x i16> @zext_32xi1_to_32xi16(<32 x i16> %x, <32 x i16> %y) #0 {
	; KNL-LABEL: zext_32xi1_to_32xi16:			; KNL-LABEL: zext_32xi1_to_32xi16:
	; KNL: # %bb.0:			; KNL: # %bb.0:
	; KNL-NEXT: vextracti64x4 $1, %zmm1, %ymm2			; KNL-NEXT: vextracti64x4 $1, %zmm1, %ymm2
	; KNL-NEXT: vextracti64x4 $1, %zmm0, %ymm3			; KNL-NEXT: vextracti64x4 $1, %zmm0, %ymm3
	; KNL-NEXT: vpcmpeqw %ymm2, %ymm3, %ymm2			; KNL-NEXT: vpcmpeqw %ymm2, %ymm3, %ymm2
	; KNL-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0			; KNL-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0
	; KNL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; KNL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; KNL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; KNL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; KNL-NEXT: retq			; KNL-NEXT: retq
	;			;
	; SKX-LABEL: zext_32xi1_to_32xi16:			; SKX-LABEL: zext_32xi1_to_32xi16:
	; SKX: # %bb.0:			; SKX: # %bb.0:
	; SKX-NEXT: vpcmpeqw %zmm1, %zmm0, %k0			; SKX-NEXT: vpcmpeqw %zmm1, %zmm0, %k0
	; SKX-NEXT: vpmovm2w %k0, %zmm0			; SKX-NEXT: vpmovm2w %k0, %zmm0
	; SKX-NEXT: vpsrlw $15, %zmm0, %zmm0			; SKX-NEXT: vpsrlw $15, %zmm0, %zmm0
	; SKX-NEXT: retq			; SKX-NEXT: retq
	;			;
	; AVX512DQNOBW-LABEL: zext_32xi1_to_32xi16:			; AVX512DQNOBW-LABEL: zext_32xi1_to_32xi16:
	; AVX512DQNOBW: # %bb.0:			; AVX512DQNOBW: # %bb.0:
	; AVX512DQNOBW-NEXT: vextracti64x4 $1, %zmm1, %ymm2			; AVX512DQNOBW-NEXT: vextracti64x4 $1, %zmm1, %ymm2
	; AVX512DQNOBW-NEXT: vextracti64x4 $1, %zmm0, %ymm3			; AVX512DQNOBW-NEXT: vextracti64x4 $1, %zmm0, %ymm3
	; AVX512DQNOBW-NEXT: vpcmpeqw %ymm2, %ymm3, %ymm2			; AVX512DQNOBW-NEXT: vpcmpeqw %ymm2, %ymm3, %ymm2
	; AVX512DQNOBW-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0			; AVX512DQNOBW-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0
	; AVX512DQNOBW-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512DQNOBW-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512DQNOBW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512DQNOBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512DQNOBW-NEXT: retq			; AVX512DQNOBW-NEXT: retq
	%mask = icmp eq <32 x i16> %x, %y			%mask = icmp eq <32 x i16> %x, %y
	%1 = zext <32 x i1> %mask to <32 x i16>			%1 = zext <32 x i1> %mask to <32 x i16>
	ret <32 x i16> %1			ret <32 x i16> %1
	}			}

	define <16 x i16> @zext_16xi1_to_16xi16(<16 x i16> %x, <16 x i16> %y) #0 {			define <16 x i16> @zext_16xi1_to_16xi16(<16 x i16> %x, <16 x i16> %y) #0 {
	; ALL-LABEL: zext_16xi1_to_16xi16:			; ALL-LABEL: zext_16xi1_to_16xi16:
	Show All 34 Lines
	; AVX512DQNOBW-NEXT: vextracti64x4 $1, %zmm0, %ymm3			; AVX512DQNOBW-NEXT: vextracti64x4 $1, %zmm0, %ymm3
	; AVX512DQNOBW-NEXT: vpcmpeqw %ymm2, %ymm3, %ymm2			; AVX512DQNOBW-NEXT: vpcmpeqw %ymm2, %ymm3, %ymm2
	; AVX512DQNOBW-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0			; AVX512DQNOBW-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0
	; AVX512DQNOBW-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512DQNOBW-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	; AVX512DQNOBW-NEXT: vpmovdb %zmm0, %xmm0			; AVX512DQNOBW-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512DQNOBW-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero			; AVX512DQNOBW-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero
	; AVX512DQNOBW-NEXT: vpmovdb %zmm1, %xmm1			; AVX512DQNOBW-NEXT: vpmovdb %zmm1, %xmm1
	; AVX512DQNOBW-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0			; AVX512DQNOBW-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
	; AVX512DQNOBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512DQNOBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX512DQNOBW-NEXT: retq			; AVX512DQNOBW-NEXT: retq
	%mask = icmp eq <32 x i16> %x, %y			%mask = icmp eq <32 x i16> %x, %y
	%1 = zext <32 x i1> %mask to <32 x i8>			%1 = zext <32 x i1> %mask to <32 x i8>
	ret <32 x i8> %1			ret <32 x i8> %1
	}			}

	define <4 x i32> @zext_4xi1_to_4x32(<4 x i8> %x, <4 x i8> %y) #0 {			define <4 x i32> @zext_4xi1_to_4x32(<4 x i8> %x, <4 x i8> %y) #0 {
	; KNL-LABEL: zext_4xi1_to_4x32:			; KNL-LABEL: zext_4xi1_to_4x32:
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx512-logic.ll

Show First 20 Lines • Show All 883 Lines • ▼ Show 20 Lines	; ALL-NEXT: retq
%b = and <16 x i32> %y, %a		%b = and <16 x i32> %y, %a
%c = xor <16 x i32> %b, %z		%c = xor <16 x i32> %b, %z
ret <16 x i32> %c		ret <16 x i32> %c
}		}

define <16 x i32> @ternlog_or_and_mask(<16 x i32> %x, <16 x i32> %y) {		define <16 x i32> @ternlog_or_and_mask(<16 x i32> %x, <16 x i32> %y) {
; ALL-LABEL: ternlog_or_and_mask:		; ALL-LABEL: ternlog_or_and_mask:
; ALL: ## %bb.0:		; ALL: ## %bb.0:
; ALL-NEXT: vpternlogd $236, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm0		; ALL-NEXT: vpternlogd $236, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm0
; ALL-NEXT: retq		; ALL-NEXT: retq
%a = and <16 x i32> %x, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>		%a = and <16 x i32> %x, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
%b = or <16 x i32> %a, %y		%b = or <16 x i32> %a, %y
ret <16 x i32> %b		ret <16 x i32> %b
}		}

define <8 x i64> @ternlog_xor_and_mask(<8 x i64> %x, <8 x i64> %y) {		define <8 x i64> @ternlog_xor_and_mask(<8 x i64> %x, <8 x i64> %y) {
; ALL-LABEL: ternlog_xor_and_mask:		; ALL-LABEL: ternlog_xor_and_mask:
; ALL: ## %bb.0:		; ALL: ## %bb.0:
; ALL-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm0		; ALL-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm1, %zmm0
; ALL-NEXT: retq		; ALL-NEXT: retq
%a = and <8 x i64> %x, <i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295>		%a = and <8 x i64> %x, <i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295>
%b = xor <8 x i64> %a, %y		%b = xor <8 x i64> %a, %y
ret <8 x i64> %b		ret <8 x i64> %b
}		}

define <16 x i32> @ternlog_maskz_or_and_mask(<16 x i32> %x, <16 x i32> %y, <16 x i32> %mask) {		define <16 x i32> @ternlog_maskz_or_and_mask(<16 x i32> %x, <16 x i32> %y, <16 x i32> %mask) {
; ALL-LABEL: ternlog_maskz_or_and_mask:		; ALL-LABEL: ternlog_maskz_or_and_mask:
; ALL: ## %bb.0:		; ALL: ## %bb.0:
; ALL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm3		; ALL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm3
; ALL-NEXT: vpsrad $31, %zmm2, %zmm0		; ALL-NEXT: vpsrad $31, %zmm2, %zmm0
; ALL-NEXT: vpternlogd $224, %zmm1, %zmm3, %zmm0		; ALL-NEXT: vpternlogd $224, %zmm1, %zmm3, %zmm0
; ALL-NEXT: retq		; ALL-NEXT: retq
%m = icmp slt <16 x i32> %mask, zeroinitializer		%m = icmp slt <16 x i32> %mask, zeroinitializer
%a = and <16 x i32> %x, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>		%a = and <16 x i32> %x, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
%b = or <16 x i32> %a, %y		%b = or <16 x i32> %a, %y
%c = select <16 x i1> %m, <16 x i32> %b, <16 x i32> zeroinitializer		%c = select <16 x i1> %m, <16 x i32> %b, <16 x i32> zeroinitializer
ret <16 x i32> %c		ret <16 x i32> %c
}		}

define <8 x i64> @ternlog_maskz_xor_and_mask(<8 x i64> %x, <8 x i64> %y, <8 x i64> %mask) {		define <8 x i64> @ternlog_maskz_xor_and_mask(<8 x i64> %x, <8 x i64> %y, <8 x i64> %mask) {
; ALL-LABEL: ternlog_maskz_xor_and_mask:		; ALL-LABEL: ternlog_maskz_xor_and_mask:
; ALL: ## %bb.0:		; ALL: ## %bb.0:
; ALL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm3		; ALL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm0, %zmm3
; ALL-NEXT: vpsraq $63, %zmm2, %zmm0		; ALL-NEXT: vpsraq $63, %zmm2, %zmm0
; ALL-NEXT: vpternlogq $96, %zmm1, %zmm3, %zmm0		; ALL-NEXT: vpternlogq $96, %zmm1, %zmm3, %zmm0
; ALL-NEXT: retq		; ALL-NEXT: retq
%m = icmp slt <8 x i64> %mask, zeroinitializer		%m = icmp slt <8 x i64> %mask, zeroinitializer
%a = and <8 x i64> %x, <i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295>		%a = and <8 x i64> %x, <i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295>
%b = xor <8 x i64> %a, %y		%b = xor <8 x i64> %a, %y
%c = select <8 x i1> %m, <8 x i64> %b, <8 x i64> zeroinitializer		%c = select <8 x i1> %m, <8 x i64> %b, <8 x i64> zeroinitializer
ret <8 x i64> %c		ret <8 x i64> %c
}		}

define <16 x i32> @ternlog_maskx_or_and_mask(<16 x i32> %x, <16 x i32> %y, <16 x i32> %mask) {		define <16 x i32> @ternlog_maskx_or_and_mask(<16 x i32> %x, <16 x i32> %y, <16 x i32> %mask) {
; KNL-LABEL: ternlog_maskx_or_and_mask:		; KNL-LABEL: ternlog_maskx_or_and_mask:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3		; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3
; KNL-NEXT: vpcmpgtd %zmm2, %zmm3, %k1		; KNL-NEXT: vpcmpgtd %zmm2, %zmm3, %k1
; KNL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm2		; KNL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm2
; KNL-NEXT: vpord %zmm1, %zmm2, %zmm0 {%k1}		; KNL-NEXT: vpord %zmm1, %zmm2, %zmm0 {%k1}
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: ternlog_maskx_or_and_mask:		; SKX-LABEL: ternlog_maskx_or_and_mask:
; SKX: ## %bb.0:		; SKX: ## %bb.0:
; SKX-NEXT: vpmovd2m %zmm2, %k1		; SKX-NEXT: vpmovd2m %zmm2, %k1
; SKX-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm2		; SKX-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm2
; SKX-NEXT: vorps %zmm1, %zmm2, %zmm0 {%k1}		; SKX-NEXT: vorps %zmm1, %zmm2, %zmm0 {%k1}
; SKX-NEXT: retq		; SKX-NEXT: retq
%m = icmp slt <16 x i32> %mask, zeroinitializer		%m = icmp slt <16 x i32> %mask, zeroinitializer
%a = and <16 x i32> %x, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>		%a = and <16 x i32> %x, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
%b = or <16 x i32> %a, %y		%b = or <16 x i32> %a, %y
%c = select <16 x i1> %m, <16 x i32> %b, <16 x i32> %x		%c = select <16 x i1> %m, <16 x i32> %b, <16 x i32> %x
ret <16 x i32> %c		ret <16 x i32> %c
}		}

define <16 x i32> @ternlog_masky_or_and_mask(<16 x i32> %x, <16 x i32> %y, <16 x i32> %mask) {		define <16 x i32> @ternlog_masky_or_and_mask(<16 x i32> %x, <16 x i32> %y, <16 x i32> %mask) {
; KNL-LABEL: ternlog_masky_or_and_mask:		; KNL-LABEL: ternlog_masky_or_and_mask:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3		; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3
; KNL-NEXT: vpcmpgtd %zmm2, %zmm3, %k1		; KNL-NEXT: vpcmpgtd %zmm2, %zmm3, %k1
; KNL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0		; KNL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
; KNL-NEXT: vpord %zmm1, %zmm0, %zmm1 {%k1}		; KNL-NEXT: vpord %zmm1, %zmm0, %zmm1 {%k1}
; KNL-NEXT: vmovdqa64 %zmm1, %zmm0		; KNL-NEXT: vmovdqa64 %zmm1, %zmm0
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: ternlog_masky_or_and_mask:		; SKX-LABEL: ternlog_masky_or_and_mask:
; SKX: ## %bb.0:		; SKX: ## %bb.0:
; SKX-NEXT: vpmovd2m %zmm2, %k1		; SKX-NEXT: vpmovd2m %zmm2, %k1
; SKX-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0		; SKX-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
; SKX-NEXT: vorps %zmm1, %zmm0, %zmm1 {%k1}		; SKX-NEXT: vorps %zmm1, %zmm0, %zmm1 {%k1}
; SKX-NEXT: vmovaps %zmm1, %zmm0		; SKX-NEXT: vmovaps %zmm1, %zmm0
; SKX-NEXT: retq		; SKX-NEXT: retq
%m = icmp slt <16 x i32> %mask, zeroinitializer		%m = icmp slt <16 x i32> %mask, zeroinitializer
%a = and <16 x i32> %x, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>		%a = and <16 x i32> %x, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
%b = or <16 x i32> %a, %y		%b = or <16 x i32> %a, %y
%c = select <16 x i1> %m, <16 x i32> %b, <16 x i32> %y		%c = select <16 x i1> %m, <16 x i32> %b, <16 x i32> %y
ret <16 x i32> %c		ret <16 x i32> %c
}		}

define <8 x i64> @ternlog_maskx_xor_and_mask(<8 x i64> %x, <8 x i64> %y, <8 x i64> %mask) {		define <8 x i64> @ternlog_maskx_xor_and_mask(<8 x i64> %x, <8 x i64> %y, <8 x i64> %mask) {
; KNL-LABEL: ternlog_maskx_xor_and_mask:		; KNL-LABEL: ternlog_maskx_xor_and_mask:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3		; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3
; KNL-NEXT: vpcmpgtq %zmm2, %zmm3, %k1		; KNL-NEXT: vpcmpgtq %zmm2, %zmm3, %k1
; KNL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm2		; KNL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm0, %zmm2
; KNL-NEXT: vpxorq %zmm1, %zmm2, %zmm0 {%k1}		; KNL-NEXT: vpxorq %zmm1, %zmm2, %zmm0 {%k1}
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: ternlog_maskx_xor_and_mask:		; SKX-LABEL: ternlog_maskx_xor_and_mask:
; SKX: ## %bb.0:		; SKX: ## %bb.0:
; SKX-NEXT: vpmovq2m %zmm2, %k1		; SKX-NEXT: vpmovq2m %zmm2, %k1
; SKX-NEXT: vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm2		; SKX-NEXT: vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm0, %zmm2
; SKX-NEXT: vxorpd %zmm1, %zmm2, %zmm0 {%k1}		; SKX-NEXT: vxorpd %zmm1, %zmm2, %zmm0 {%k1}
; SKX-NEXT: retq		; SKX-NEXT: retq
%m = icmp slt <8 x i64> %mask, zeroinitializer		%m = icmp slt <8 x i64> %mask, zeroinitializer
%a = and <8 x i64> %x, <i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295>		%a = and <8 x i64> %x, <i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295>
%b = xor <8 x i64> %a, %y		%b = xor <8 x i64> %a, %y
%c = select <8 x i1> %m, <8 x i64> %b, <8 x i64> %x		%c = select <8 x i1> %m, <8 x i64> %b, <8 x i64> %x
ret <8 x i64> %c		ret <8 x i64> %c
}		}

define <8 x i64> @ternlog_masky_xor_and_mask(<8 x i64> %x, <8 x i64> %y, <8 x i64> %mask) {		define <8 x i64> @ternlog_masky_xor_and_mask(<8 x i64> %x, <8 x i64> %y, <8 x i64> %mask) {
; KNL-LABEL: ternlog_masky_xor_and_mask:		; KNL-LABEL: ternlog_masky_xor_and_mask:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3		; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3
; KNL-NEXT: vpcmpgtq %zmm2, %zmm3, %k1		; KNL-NEXT: vpcmpgtq %zmm2, %zmm3, %k1
; KNL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0		; KNL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm0, %zmm0
; KNL-NEXT: vpxorq %zmm1, %zmm0, %zmm1 {%k1}		; KNL-NEXT: vpxorq %zmm1, %zmm0, %zmm1 {%k1}
; KNL-NEXT: vmovdqa64 %zmm1, %zmm0		; KNL-NEXT: vmovdqa64 %zmm1, %zmm0
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: ternlog_masky_xor_and_mask:		; SKX-LABEL: ternlog_masky_xor_and_mask:
; SKX: ## %bb.0:		; SKX: ## %bb.0:
; SKX-NEXT: vpmovq2m %zmm2, %k1		; SKX-NEXT: vpmovq2m %zmm2, %k1
; SKX-NEXT: vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0		; SKX-NEXT: vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm0, %zmm0
; SKX-NEXT: vxorpd %zmm1, %zmm0, %zmm1 {%k1}		; SKX-NEXT: vxorpd %zmm1, %zmm0, %zmm1 {%k1}
; SKX-NEXT: vmovapd %zmm1, %zmm0		; SKX-NEXT: vmovapd %zmm1, %zmm0
; SKX-NEXT: retq		; SKX-NEXT: retq
%m = icmp slt <8 x i64> %mask, zeroinitializer		%m = icmp slt <8 x i64> %mask, zeroinitializer
%a = and <8 x i64> %x, <i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295>		%a = and <8 x i64> %x, <i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295>
%b = xor <8 x i64> %a, %y		%b = xor <8 x i64> %a, %y
%c = select <8 x i1> %m, <8 x i64> %b, <8 x i64> %y		%c = select <8 x i1> %m, <8 x i64> %b, <8 x i64> %y
ret <8 x i64> %c		ret <8 x i64> %c
}		}

llvm/test/CodeGen/X86/avx512fp16-cvt-ph-w-vl-intrinsics.ll

Show First 20 Lines • Show All 734 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%res = sitofp <2 x i8> %arg0 to <2 x half>		%res = sitofp <2 x i8> %arg0 to <2 x half>
ret <2 x half> %res		ret <2 x half> %res
}		}

define <2 x half> @test_u1tofp2(<2 x i1> %arg0) {		define <2 x half> @test_u1tofp2(<2 x i1> %arg0) {
; CHECK-LABEL: test_u1tofp2:		; CHECK-LABEL: test_u1tofp2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vpmovqw %xmm0, %xmm0		; CHECK-NEXT: vpmovqw %xmm0, %xmm0
; CHECK-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; CHECK-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
; CHECK-NEXT: vcvtuw2ph %xmm0, %xmm0		; CHECK-NEXT: vcvtuw2ph %xmm0, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%res = uitofp <2 x i1> %arg0 to <2 x half>		%res = uitofp <2 x i1> %arg0 to <2 x half>
ret <2 x half> %res		ret <2 x half> %res
}		}

define <4 x half> @test_s17tofp4(<4 x i17> %arg0) {		define <4 x half> @test_s17tofp4(<4 x i17> %arg0) {
; CHECK-LABEL: test_s17tofp4:		; CHECK-LABEL: test_s17tofp4:
Show All 36 Lines

llvm/test/CodeGen/X86/avx512vl-logic.ll

Show First 20 Lines • Show All 1,033 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%b = and <4 x i32> %y, %a		%b = and <4 x i32> %y, %a
%c = xor <4 x i32> %b, %z		%c = xor <4 x i32> %b, %z
ret <4 x i32> %c		ret <4 x i32> %c
}		}

define <4 x i32> @ternlog_or_and_mask(<4 x i32> %x, <4 x i32> %y) {		define <4 x i32> @ternlog_or_and_mask(<4 x i32> %x, <4 x i32> %y) {
; CHECK-LABEL: ternlog_or_and_mask:		; CHECK-LABEL: ternlog_or_and_mask:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: vpternlogd $236, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0		; CHECK-NEXT: vpternlogd $236, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%a = and <4 x i32> %x, <i32 255, i32 255, i32 255, i32 255>		%a = and <4 x i32> %x, <i32 255, i32 255, i32 255, i32 255>
%b = or <4 x i32> %a, %y		%b = or <4 x i32> %a, %y
ret <4 x i32> %b		ret <4 x i32> %b
}		}

define <8 x i32> @ternlog_or_and_mask_ymm(<8 x i32> %x, <8 x i32> %y) {		define <8 x i32> @ternlog_or_and_mask_ymm(<8 x i32> %x, <8 x i32> %y) {
; CHECK-LABEL: ternlog_or_and_mask_ymm:		; CHECK-LABEL: ternlog_or_and_mask_ymm:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: vpternlogd $236, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm0		; CHECK-NEXT: vpternlogd $236, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%a = and <8 x i32> %x, <i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216>		%a = and <8 x i32> %x, <i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216>
%b = or <8 x i32> %a, %y		%b = or <8 x i32> %a, %y
ret <8 x i32> %b		ret <8 x i32> %b
}		}

define <2 x i64> @ternlog_xor_and_mask(<2 x i64> %x, <2 x i64> %y) {		define <2 x i64> @ternlog_xor_and_mask(<2 x i64> %x, <2 x i64> %y) {
; CHECK-LABEL: ternlog_xor_and_mask:		; CHECK-LABEL: ternlog_xor_and_mask:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0		; CHECK-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm1, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%a = and <2 x i64> %x, <i64 1099511627775, i64 1099511627775>		%a = and <2 x i64> %x, <i64 1099511627775, i64 1099511627775>
%b = xor <2 x i64> %a, %y		%b = xor <2 x i64> %a, %y
ret <2 x i64> %b		ret <2 x i64> %b
}		}

define <4 x i64> @ternlog_xor_and_mask_ymm(<4 x i64> %x, <4 x i64> %y) {		define <4 x i64> @ternlog_xor_and_mask_ymm(<4 x i64> %x, <4 x i64> %y) {
; CHECK-LABEL: ternlog_xor_and_mask_ymm:		; CHECK-LABEL: ternlog_xor_and_mask_ymm:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm0		; CHECK-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm1, %ymm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%a = and <4 x i64> %x, <i64 72057594037927935, i64 72057594037927935, i64 72057594037927935, i64 72057594037927935>		%a = and <4 x i64> %x, <i64 72057594037927935, i64 72057594037927935, i64 72057594037927935, i64 72057594037927935>
%b = xor <4 x i64> %a, %y		%b = xor <4 x i64> %a, %y
ret <4 x i64> %b		ret <4 x i64> %b
}		}

define <4 x i32> @ternlog_maskz_or_and_mask(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z, <4 x i32> %mask) {		define <4 x i32> @ternlog_maskz_or_and_mask(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z, <4 x i32> %mask) {
; CHECK-LABEL: ternlog_maskz_or_and_mask:		; CHECK-LABEL: ternlog_maskz_or_and_mask:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm2		; CHECK-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm2
; CHECK-NEXT: vpsrad $31, %xmm3, %xmm0		; CHECK-NEXT: vpsrad $31, %xmm3, %xmm0
; CHECK-NEXT: vpternlogd $224, %xmm1, %xmm2, %xmm0		; CHECK-NEXT: vpternlogd $224, %xmm1, %xmm2, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%m = icmp slt <4 x i32> %mask, zeroinitializer		%m = icmp slt <4 x i32> %mask, zeroinitializer
%a = and <4 x i32> %x, <i32 255, i32 255, i32 255, i32 255>		%a = and <4 x i32> %x, <i32 255, i32 255, i32 255, i32 255>
%b = or <4 x i32> %a, %y		%b = or <4 x i32> %a, %y
%c = select <4 x i1> %m, <4 x i32> %b, <4 x i32> zeroinitializer		%c = select <4 x i1> %m, <4 x i32> %b, <4 x i32> zeroinitializer
ret <4 x i32> %c		ret <4 x i32> %c
}		}

define <8 x i32> @ternlog_maskz_or_and_mask_ymm(<8 x i32> %x, <8 x i32> %y, <8 x i32> %mask) {		define <8 x i32> @ternlog_maskz_or_and_mask_ymm(<8 x i32> %x, <8 x i32> %y, <8 x i32> %mask) {
; CHECK-LABEL: ternlog_maskz_or_and_mask_ymm:		; CHECK-LABEL: ternlog_maskz_or_and_mask_ymm:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm3		; CHECK-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm3
; CHECK-NEXT: vpsrad $31, %ymm2, %ymm0		; CHECK-NEXT: vpsrad $31, %ymm2, %ymm0
; CHECK-NEXT: vpternlogd $224, %ymm1, %ymm3, %ymm0		; CHECK-NEXT: vpternlogd $224, %ymm1, %ymm3, %ymm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%m = icmp slt <8 x i32> %mask, zeroinitializer		%m = icmp slt <8 x i32> %mask, zeroinitializer
%a = and <8 x i32> %x, <i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216>		%a = and <8 x i32> %x, <i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216>
%b = or <8 x i32> %a, %y		%b = or <8 x i32> %a, %y
%c = select <8 x i1> %m, <8 x i32> %b, <8 x i32> zeroinitializer		%c = select <8 x i1> %m, <8 x i32> %b, <8 x i32> zeroinitializer
ret <8 x i32> %c		ret <8 x i32> %c
}		}

define <2 x i64> @ternlog_maskz_xor_and_mask(<2 x i64> %x, <2 x i64> %y, <2 x i64> %mask) {		define <2 x i64> @ternlog_maskz_xor_and_mask(<2 x i64> %x, <2 x i64> %y, <2 x i64> %mask) {
; CHECK-LABEL: ternlog_maskz_xor_and_mask:		; CHECK-LABEL: ternlog_maskz_xor_and_mask:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm3		; CHECK-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm0, %xmm3
; CHECK-NEXT: vpsraq $63, %xmm2, %xmm0		; CHECK-NEXT: vpsraq $63, %xmm2, %xmm0
; CHECK-NEXT: vpternlogq $96, %xmm1, %xmm3, %xmm0		; CHECK-NEXT: vpternlogq $96, %xmm1, %xmm3, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%m = icmp slt <2 x i64> %mask, zeroinitializer		%m = icmp slt <2 x i64> %mask, zeroinitializer
%a = and <2 x i64> %x, <i64 1099511627775, i64 1099511627775>		%a = and <2 x i64> %x, <i64 1099511627775, i64 1099511627775>
%b = xor <2 x i64> %a, %y		%b = xor <2 x i64> %a, %y
%c = select <2 x i1> %m, <2 x i64> %b, <2 x i64> zeroinitializer		%c = select <2 x i1> %m, <2 x i64> %b, <2 x i64> zeroinitializer
ret <2 x i64> %c		ret <2 x i64> %c
}		}

define <4 x i64> @ternlog_maskz_xor_and_mask_ymm(<4 x i64> %x, <4 x i64> %y, <4 x i64> %mask) {		define <4 x i64> @ternlog_maskz_xor_and_mask_ymm(<4 x i64> %x, <4 x i64> %y, <4 x i64> %mask) {
; CHECK-LABEL: ternlog_maskz_xor_and_mask_ymm:		; CHECK-LABEL: ternlog_maskz_xor_and_mask_ymm:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm3		; CHECK-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm0, %ymm3
; CHECK-NEXT: vpsraq $63, %ymm2, %ymm0		; CHECK-NEXT: vpsraq $63, %ymm2, %ymm0
; CHECK-NEXT: vpternlogq $96, %ymm1, %ymm3, %ymm0		; CHECK-NEXT: vpternlogq $96, %ymm1, %ymm3, %ymm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%m = icmp slt <4 x i64> %mask, zeroinitializer		%m = icmp slt <4 x i64> %mask, zeroinitializer
%a = and <4 x i64> %x, <i64 72057594037927935, i64 72057594037927935, i64 72057594037927935, i64 72057594037927935>		%a = and <4 x i64> %x, <i64 72057594037927935, i64 72057594037927935, i64 72057594037927935, i64 72057594037927935>
%b = xor <4 x i64> %a, %y		%b = xor <4 x i64> %a, %y
%c = select <4 x i1> %m, <4 x i64> %b, <4 x i64> zeroinitializer		%c = select <4 x i1> %m, <4 x i64> %b, <4 x i64> zeroinitializer
ret <4 x i64> %c		ret <4 x i64> %c
}		}

define <4 x i32> @ternlog_maskx_or_and_mask(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z, <4 x i32> %mask) {		define <4 x i32> @ternlog_maskx_or_and_mask(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z, <4 x i32> %mask) {
; KNL-LABEL: ternlog_maskx_or_and_mask:		; KNL-LABEL: ternlog_maskx_or_and_mask:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: vpxor %xmm2, %xmm2, %xmm2		; KNL-NEXT: vpxor %xmm2, %xmm2, %xmm2
; KNL-NEXT: vpcmpgtd %xmm3, %xmm2, %k1		; KNL-NEXT: vpcmpgtd %xmm3, %xmm2, %k1
; KNL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm2		; KNL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm2
; KNL-NEXT: vpord %xmm1, %xmm2, %xmm0 {%k1}		; KNL-NEXT: vpord %xmm1, %xmm2, %xmm0 {%k1}
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: ternlog_maskx_or_and_mask:		; SKX-LABEL: ternlog_maskx_or_and_mask:
; SKX: ## %bb.0:		; SKX: ## %bb.0:
; SKX-NEXT: vpmovd2m %xmm3, %k1		; SKX-NEXT: vpmovd2m %xmm3, %k1
; SKX-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm2		; SKX-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm2
; SKX-NEXT: vorps %xmm1, %xmm2, %xmm0 {%k1}		; SKX-NEXT: vorps %xmm1, %xmm2, %xmm0 {%k1}
; SKX-NEXT: retq		; SKX-NEXT: retq
%m = icmp slt <4 x i32> %mask, zeroinitializer		%m = icmp slt <4 x i32> %mask, zeroinitializer
%a = and <4 x i32> %x, <i32 255, i32 255, i32 255, i32 255>		%a = and <4 x i32> %x, <i32 255, i32 255, i32 255, i32 255>
%b = or <4 x i32> %a, %y		%b = or <4 x i32> %a, %y
%c = select <4 x i1> %m, <4 x i32> %b, <4 x i32> %x		%c = select <4 x i1> %m, <4 x i32> %b, <4 x i32> %x
ret <4 x i32> %c		ret <4 x i32> %c
}		}

define <8 x i32> @ternlog_maskx_or_and_mask_ymm(<8 x i32> %x, <8 x i32> %y, <8 x i32> %mask) {		define <8 x i32> @ternlog_maskx_or_and_mask_ymm(<8 x i32> %x, <8 x i32> %y, <8 x i32> %mask) {
; KNL-LABEL: ternlog_maskx_or_and_mask_ymm:		; KNL-LABEL: ternlog_maskx_or_and_mask_ymm:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3		; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3
; KNL-NEXT: vpcmpgtd %ymm2, %ymm3, %k1		; KNL-NEXT: vpcmpgtd %ymm2, %ymm3, %k1
; KNL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm2		; KNL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm2
; KNL-NEXT: vpord %ymm1, %ymm2, %ymm0 {%k1}		; KNL-NEXT: vpord %ymm1, %ymm2, %ymm0 {%k1}
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: ternlog_maskx_or_and_mask_ymm:		; SKX-LABEL: ternlog_maskx_or_and_mask_ymm:
; SKX: ## %bb.0:		; SKX: ## %bb.0:
; SKX-NEXT: vpmovd2m %ymm2, %k1		; SKX-NEXT: vpmovd2m %ymm2, %k1
; SKX-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm2		; SKX-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm2
; SKX-NEXT: vorps %ymm1, %ymm2, %ymm0 {%k1}		; SKX-NEXT: vorps %ymm1, %ymm2, %ymm0 {%k1}
; SKX-NEXT: retq		; SKX-NEXT: retq
%m = icmp slt <8 x i32> %mask, zeroinitializer		%m = icmp slt <8 x i32> %mask, zeroinitializer
%a = and <8 x i32> %x, <i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216>		%a = and <8 x i32> %x, <i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216>
%b = or <8 x i32> %a, %y		%b = or <8 x i32> %a, %y
%c = select <8 x i1> %m, <8 x i32> %b, <8 x i32> %x		%c = select <8 x i1> %m, <8 x i32> %b, <8 x i32> %x
ret <8 x i32> %c		ret <8 x i32> %c
}		}

define <2 x i64> @ternlog_maskx_xor_and_mask(<2 x i64> %x, <2 x i64> %y, <2 x i64> %mask) {		define <2 x i64> @ternlog_maskx_xor_and_mask(<2 x i64> %x, <2 x i64> %y, <2 x i64> %mask) {
; KNL-LABEL: ternlog_maskx_xor_and_mask:		; KNL-LABEL: ternlog_maskx_xor_and_mask:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3		; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3
; KNL-NEXT: vpcmpgtq %xmm2, %xmm3, %k1		; KNL-NEXT: vpcmpgtq %xmm2, %xmm3, %k1
; KNL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm2		; KNL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm0, %xmm2
; KNL-NEXT: vpxorq %xmm1, %xmm2, %xmm0 {%k1}		; KNL-NEXT: vpxorq %xmm1, %xmm2, %xmm0 {%k1}
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: ternlog_maskx_xor_and_mask:		; SKX-LABEL: ternlog_maskx_xor_and_mask:
; SKX: ## %bb.0:		; SKX: ## %bb.0:
; SKX-NEXT: vpmovq2m %xmm2, %k1		; SKX-NEXT: vpmovq2m %xmm2, %k1
; SKX-NEXT: vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm2		; SKX-NEXT: vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm0, %xmm2
; SKX-NEXT: vxorpd %xmm1, %xmm2, %xmm0 {%k1}		; SKX-NEXT: vxorpd %xmm1, %xmm2, %xmm0 {%k1}
; SKX-NEXT: retq		; SKX-NEXT: retq
%m = icmp slt <2 x i64> %mask, zeroinitializer		%m = icmp slt <2 x i64> %mask, zeroinitializer
%a = and <2 x i64> %x, <i64 1099511627775, i64 1099511627775>		%a = and <2 x i64> %x, <i64 1099511627775, i64 1099511627775>
%b = xor <2 x i64> %a, %y		%b = xor <2 x i64> %a, %y
%c = select <2 x i1> %m, <2 x i64> %b, <2 x i64> %x		%c = select <2 x i1> %m, <2 x i64> %b, <2 x i64> %x
ret <2 x i64> %c		ret <2 x i64> %c
}		}

define <4 x i64> @ternlog_maskx_xor_and_mask_ymm(<4 x i64> %x, <4 x i64> %y, <4 x i64> %mask) {		define <4 x i64> @ternlog_maskx_xor_and_mask_ymm(<4 x i64> %x, <4 x i64> %y, <4 x i64> %mask) {
; KNL-LABEL: ternlog_maskx_xor_and_mask_ymm:		; KNL-LABEL: ternlog_maskx_xor_and_mask_ymm:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3		; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3
; KNL-NEXT: vpcmpgtq %ymm2, %ymm3, %k1		; KNL-NEXT: vpcmpgtq %ymm2, %ymm3, %k1
; KNL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm2		; KNL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm0, %ymm2
; KNL-NEXT: vpxorq %ymm1, %ymm2, %ymm0 {%k1}		; KNL-NEXT: vpxorq %ymm1, %ymm2, %ymm0 {%k1}
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: ternlog_maskx_xor_and_mask_ymm:		; SKX-LABEL: ternlog_maskx_xor_and_mask_ymm:
; SKX: ## %bb.0:		; SKX: ## %bb.0:
; SKX-NEXT: vpmovq2m %ymm2, %k1		; SKX-NEXT: vpmovq2m %ymm2, %k1
; SKX-NEXT: vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm2		; SKX-NEXT: vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm0, %ymm2
; SKX-NEXT: vxorpd %ymm1, %ymm2, %ymm0 {%k1}		; SKX-NEXT: vxorpd %ymm1, %ymm2, %ymm0 {%k1}
; SKX-NEXT: retq		; SKX-NEXT: retq
%m = icmp slt <4 x i64> %mask, zeroinitializer		%m = icmp slt <4 x i64> %mask, zeroinitializer
%a = and <4 x i64> %x, <i64 72057594037927935, i64 72057594037927935, i64 72057594037927935, i64 72057594037927935>		%a = and <4 x i64> %x, <i64 72057594037927935, i64 72057594037927935, i64 72057594037927935, i64 72057594037927935>
%b = xor <4 x i64> %a, %y		%b = xor <4 x i64> %a, %y
%c = select <4 x i1> %m, <4 x i64> %b, <4 x i64> %x		%c = select <4 x i1> %m, <4 x i64> %b, <4 x i64> %x
ret <4 x i64> %c		ret <4 x i64> %c
}		}

define <4 x i32> @ternlog_masky_or_and_mask(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z, <4 x i32> %mask) {		define <4 x i32> @ternlog_masky_or_and_mask(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z, <4 x i32> %mask) {
; KNL-LABEL: ternlog_masky_or_and_mask:		; KNL-LABEL: ternlog_masky_or_and_mask:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: vpxor %xmm2, %xmm2, %xmm2		; KNL-NEXT: vpxor %xmm2, %xmm2, %xmm2
; KNL-NEXT: vpcmpgtd %xmm3, %xmm2, %k1		; KNL-NEXT: vpcmpgtd %xmm3, %xmm2, %k1
; KNL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; KNL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
; KNL-NEXT: vpord %xmm1, %xmm0, %xmm1 {%k1}		; KNL-NEXT: vpord %xmm1, %xmm0, %xmm1 {%k1}
; KNL-NEXT: vmovdqa %xmm1, %xmm0		; KNL-NEXT: vmovdqa %xmm1, %xmm0
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: ternlog_masky_or_and_mask:		; SKX-LABEL: ternlog_masky_or_and_mask:
; SKX: ## %bb.0:		; SKX: ## %bb.0:
; SKX-NEXT: vpmovd2m %xmm3, %k1		; SKX-NEXT: vpmovd2m %xmm3, %k1
; SKX-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; SKX-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
; SKX-NEXT: vorps %xmm1, %xmm0, %xmm1 {%k1}		; SKX-NEXT: vorps %xmm1, %xmm0, %xmm1 {%k1}
; SKX-NEXT: vmovaps %xmm1, %xmm0		; SKX-NEXT: vmovaps %xmm1, %xmm0
; SKX-NEXT: retq		; SKX-NEXT: retq
%m = icmp slt <4 x i32> %mask, zeroinitializer		%m = icmp slt <4 x i32> %mask, zeroinitializer
%a = and <4 x i32> %x, <i32 255, i32 255, i32 255, i32 255>		%a = and <4 x i32> %x, <i32 255, i32 255, i32 255, i32 255>
%b = or <4 x i32> %a, %y		%b = or <4 x i32> %a, %y
%c = select <4 x i1> %m, <4 x i32> %b, <4 x i32> %y		%c = select <4 x i1> %m, <4 x i32> %b, <4 x i32> %y
ret <4 x i32> %c		ret <4 x i32> %c
}		}

define <8 x i32> @ternlog_masky_or_and_mask_ymm(<8 x i32> %x, <8 x i32> %y, <8 x i32> %mask) {		define <8 x i32> @ternlog_masky_or_and_mask_ymm(<8 x i32> %x, <8 x i32> %y, <8 x i32> %mask) {
; KNL-LABEL: ternlog_masky_or_and_mask_ymm:		; KNL-LABEL: ternlog_masky_or_and_mask_ymm:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3		; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3
; KNL-NEXT: vpcmpgtd %ymm2, %ymm3, %k1		; KNL-NEXT: vpcmpgtd %ymm2, %ymm3, %k1
; KNL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm2		; KNL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm2
; KNL-NEXT: vpord %ymm1, %ymm2, %ymm0 {%k1}		; KNL-NEXT: vpord %ymm1, %ymm2, %ymm0 {%k1}
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: ternlog_masky_or_and_mask_ymm:		; SKX-LABEL: ternlog_masky_or_and_mask_ymm:
; SKX: ## %bb.0:		; SKX: ## %bb.0:
; SKX-NEXT: vpmovd2m %ymm2, %k1		; SKX-NEXT: vpmovd2m %ymm2, %k1
; SKX-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm2		; SKX-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm2
; SKX-NEXT: vorps %ymm1, %ymm2, %ymm0 {%k1}		; SKX-NEXT: vorps %ymm1, %ymm2, %ymm0 {%k1}
; SKX-NEXT: retq		; SKX-NEXT: retq
%m = icmp slt <8 x i32> %mask, zeroinitializer		%m = icmp slt <8 x i32> %mask, zeroinitializer
%a = and <8 x i32> %x, <i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216>		%a = and <8 x i32> %x, <i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216>
%b = or <8 x i32> %a, %y		%b = or <8 x i32> %a, %y
%c = select <8 x i1> %m, <8 x i32> %b, <8 x i32> %x		%c = select <8 x i1> %m, <8 x i32> %b, <8 x i32> %x
ret <8 x i32> %c		ret <8 x i32> %c
}		}

define <2 x i64> @ternlog_masky_xor_and_mask(<2 x i64> %x, <2 x i64> %y, <2 x i64> %mask) {		define <2 x i64> @ternlog_masky_xor_and_mask(<2 x i64> %x, <2 x i64> %y, <2 x i64> %mask) {
; KNL-LABEL: ternlog_masky_xor_and_mask:		; KNL-LABEL: ternlog_masky_xor_and_mask:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3		; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3
; KNL-NEXT: vpcmpgtq %xmm2, %xmm3, %k1		; KNL-NEXT: vpcmpgtq %xmm2, %xmm3, %k1
; KNL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; KNL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm0, %xmm0
; KNL-NEXT: vpxorq %xmm1, %xmm0, %xmm1 {%k1}		; KNL-NEXT: vpxorq %xmm1, %xmm0, %xmm1 {%k1}
; KNL-NEXT: vmovdqa %xmm1, %xmm0		; KNL-NEXT: vmovdqa %xmm1, %xmm0
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: ternlog_masky_xor_and_mask:		; SKX-LABEL: ternlog_masky_xor_and_mask:
; SKX: ## %bb.0:		; SKX: ## %bb.0:
; SKX-NEXT: vpmovq2m %xmm2, %k1		; SKX-NEXT: vpmovq2m %xmm2, %k1
; SKX-NEXT: vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; SKX-NEXT: vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm0, %xmm0
; SKX-NEXT: vxorpd %xmm1, %xmm0, %xmm1 {%k1}		; SKX-NEXT: vxorpd %xmm1, %xmm0, %xmm1 {%k1}
; SKX-NEXT: vmovapd %xmm1, %xmm0		; SKX-NEXT: vmovapd %xmm1, %xmm0
; SKX-NEXT: retq		; SKX-NEXT: retq
%m = icmp slt <2 x i64> %mask, zeroinitializer		%m = icmp slt <2 x i64> %mask, zeroinitializer
%a = and <2 x i64> %x, <i64 1099511627775, i64 1099511627775>		%a = and <2 x i64> %x, <i64 1099511627775, i64 1099511627775>
%b = xor <2 x i64> %a, %y		%b = xor <2 x i64> %a, %y
%c = select <2 x i1> %m, <2 x i64> %b, <2 x i64> %y		%c = select <2 x i1> %m, <2 x i64> %b, <2 x i64> %y
ret <2 x i64> %c		ret <2 x i64> %c
}		}

define <4 x i64> @ternlog_masky_xor_and_mask_ymm(<4 x i64> %x, <4 x i64> %y, <4 x i64> %mask) {		define <4 x i64> @ternlog_masky_xor_and_mask_ymm(<4 x i64> %x, <4 x i64> %y, <4 x i64> %mask) {
; KNL-LABEL: ternlog_masky_xor_and_mask_ymm:		; KNL-LABEL: ternlog_masky_xor_and_mask_ymm:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3		; KNL-NEXT: vpxor %xmm3, %xmm3, %xmm3
; KNL-NEXT: vpcmpgtq %ymm2, %ymm3, %k1		; KNL-NEXT: vpcmpgtq %ymm2, %ymm3, %k1
; KNL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0		; KNL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm0, %ymm0
; KNL-NEXT: vpxorq %ymm1, %ymm0, %ymm1 {%k1}		; KNL-NEXT: vpxorq %ymm1, %ymm0, %ymm1 {%k1}
; KNL-NEXT: vmovdqa %ymm1, %ymm0		; KNL-NEXT: vmovdqa %ymm1, %ymm0
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: ternlog_masky_xor_and_mask_ymm:		; SKX-LABEL: ternlog_masky_xor_and_mask_ymm:
; SKX: ## %bb.0:		; SKX: ## %bb.0:
; SKX-NEXT: vpmovq2m %ymm2, %k1		; SKX-NEXT: vpmovq2m %ymm2, %k1
; SKX-NEXT: vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0		; SKX-NEXT: vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm0, %ymm0
; SKX-NEXT: vxorpd %ymm1, %ymm0, %ymm1 {%k1}		; SKX-NEXT: vxorpd %ymm1, %ymm0, %ymm1 {%k1}
; SKX-NEXT: vmovapd %ymm1, %ymm0		; SKX-NEXT: vmovapd %ymm1, %ymm0
; SKX-NEXT: retq		; SKX-NEXT: retq
%m = icmp slt <4 x i64> %mask, zeroinitializer		%m = icmp slt <4 x i64> %mask, zeroinitializer
%a = and <4 x i64> %x, <i64 72057594037927935, i64 72057594037927935, i64 72057594037927935, i64 72057594037927935>		%a = and <4 x i64> %x, <i64 72057594037927935, i64 72057594037927935, i64 72057594037927935, i64 72057594037927935>
%b = xor <4 x i64> %a, %y		%b = xor <4 x i64> %a, %y
%c = select <4 x i1> %m, <4 x i64> %b, <4 x i64> %y		%c = select <4 x i1> %m, <4 x i64> %b, <4 x i64> %y
ret <4 x i64> %c		ret <4 x i64> %c
Show All 23 Lines

llvm/test/CodeGen/X86/bitcast-vector-bool.ll

	Show First 20 Lines • Show All 1,080 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0			; AVX2-NEXT: vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0
	; AVX2-NEXT: setne %al			; AVX2-NEXT: setne %al
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: trunc_v64i8_cmp:			; AVX512-LABEL: trunc_v64i8_cmp:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %k0			; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %k0
	; AVX512-NEXT: kortestw %k0, %k0			; AVX512-NEXT: kortestw %k0, %k0
	; AVX512-NEXT: setne %al			; AVX512-NEXT: setne %al
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%1 = trunc <64 x i8> %a0 to <64 x i1>			%1 = trunc <64 x i8> %a0 to <64 x i1>
	%2 = bitcast <64 x i1> %1 to i64			%2 = bitcast <64 x i1> %1 to i64
	%3 = icmp ne i64 %2, 0			%3 = icmp ne i64 %2, 0
	ret i1 %3			ret i1 %3
	▲ Show 20 Lines • Show All 387 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/combine-and.ll

	Show First 20 Lines • Show All 540 Lines • ▼ Show 20 Lines
	define <16 x i8> @PR34620(<16 x i8> %a0, <16 x i8> %a1) {			define <16 x i8> @PR34620(<16 x i8> %a0, <16 x i8> %a1) {
	; SSE-LABEL: PR34620:			; SSE-LABEL: PR34620:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: psrlw $1, %xmm0			; SSE-NEXT: psrlw $1, %xmm0
	; SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; SSE-NEXT: paddb %xmm1, %xmm0			; SSE-NEXT: paddb %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: PR34620:			; AVX1-LABEL: PR34620:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vpsrlw $1, %xmm0, %xmm0			; AVX1-NEXT: vpsrlw $1, %xmm0, %xmm0
	; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX-NEXT: vpaddb %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpaddb %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: PR34620:
				; AVX2: # %bb.0:
				; AVX2-NEXT: vpsrlw $1, %xmm0, %xmm0
				; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX2-NEXT: vpaddb %xmm1, %xmm0, %xmm0
				; AVX2-NEXT: retq
				;
				; AVX512-LABEL: PR34620:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vpsrlw $1, %xmm0, %xmm0
				; AVX512-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512-NEXT: vpaddb %xmm1, %xmm0, %xmm0
				; AVX512-NEXT: retq
	%1 = lshr <16 x i8> %a0, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%1 = lshr <16 x i8> %a0, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%2 = and <16 x i8> %1, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%2 = and <16 x i8> %1, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%3 = add <16 x i8> %2, %a1			%3 = add <16 x i8> %2, %a1
	ret <16 x i8> %3			ret <16 x i8> %3
	}			}

	;			;
	; Simplify and with a broadcasted negated scalar			; Simplify and with a broadcasted negated scalar
	▲ Show 20 Lines • Show All 666 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/combine-sdiv.ll

	Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vpsrld $4, %xmm0, %xmm1			; AVX1-NEXT: vpsrld $4, %xmm0, %xmm1
	; AVX1-NEXT: vpsrld $2, %xmm0, %xmm2			; AVX1-NEXT: vpsrld $2, %xmm0, %xmm2
	; AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm2[0,1,2,3],xmm1[4,5,6,7]			; AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm2[0,1,2,3],xmm1[4,5,6,7]
	; AVX1-NEXT: vpsrld $3, %xmm0, %xmm2			; AVX1-NEXT: vpsrld $3, %xmm0, %xmm2
	; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm2[4,5,6,7]			; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm2[4,5,6,7]
	; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3],xmm0[4,5],xmm1[6,7]			; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3],xmm0[4,5],xmm1[6,7]
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2ORLATER-LABEL: combine_vec_sdiv_by_pos1:			; AVX2-LABEL: combine_vec_sdiv_by_pos1:
	; AVX2ORLATER: # %bb.0:			; AVX2: # %bb.0:
	; AVX2ORLATER-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX2ORLATER-NEXT: vpsrlvd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX2-NEXT: vpsrlvd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX2ORLATER-NEXT: retq			; AVX2-NEXT: retq
				;
				; AVX512F-LABEL: combine_vec_sdiv_by_pos1:
				; AVX512F: # %bb.0:
				; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX512F-NEXT: vpsrlvd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX512F-NEXT: retq
				;
				; AVX512BW-LABEL: combine_vec_sdiv_by_pos1:
				; AVX512BW: # %bb.0:
				; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512BW-NEXT: vpsrlvd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX512BW-NEXT: retq
	;			;
	; XOP-LABEL: combine_vec_sdiv_by_pos1:			; XOP-LABEL: combine_vec_sdiv_by_pos1:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; XOP-NEXT: vpshld {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; XOP-NEXT: vpshld {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; XOP-NEXT: retq			; XOP-NEXT: retq
	%1 = and <4 x i32> %x, <i32 255, i32 255, i32 255, i32 255>			%1 = and <4 x i32> %x, <i32 255, i32 255, i32 255, i32 255>
	%2 = sdiv <4 x i32> %1, <i32 1, i32 4, i32 8, i32 16>			%2 = sdiv <4 x i32> %1, <i32 1, i32 4, i32 8, i32 16>
	▲ Show 20 Lines • Show All 2,949 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/dpbusd_const.ll

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	; AVX512VNNI-NEXT: vmovd %xmm1, %eax			; AVX512VNNI-NEXT: vmovd %xmm1, %eax
	; AVX512VNNI-NEXT: addl %edi, %eax			; AVX512VNNI-NEXT: addl %edi, %eax
	; AVX512VNNI-NEXT: vzeroupper			; AVX512VNNI-NEXT: vzeroupper
	; AVX512VNNI-NEXT: retq			; AVX512VNNI-NEXT: retq
	;			;
	; AVX512VLVNNI-LABEL: mul_4xi4_cz:			; AVX512VLVNNI-LABEL: mul_4xi4_cz:
	; AVX512VLVNNI: # %bb.0: # %entry			; AVX512VLVNNI: # %bb.0: # %entry
	; AVX512VLVNNI-NEXT: vpmovdb %xmm0, %xmm0			; AVX512VLVNNI-NEXT: vpmovdb %xmm0, %xmm0
	; AVX512VLVNNI-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VLVNNI-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VLVNNI-NEXT: vpxor %xmm1, %xmm1, %xmm1			; AVX512VLVNNI-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX512VLVNNI-NEXT: vpdpbusd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1			; AVX512VLVNNI-NEXT: vpdpbusd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1
	; AVX512VLVNNI-NEXT: vmovd %xmm1, %eax			; AVX512VLVNNI-NEXT: vmovd %xmm1, %eax
	; AVX512VLVNNI-NEXT: addl %edi, %eax			; AVX512VLVNNI-NEXT: addl %edi, %eax
	; AVX512VLVNNI-NEXT: retq			; AVX512VLVNNI-NEXT: retq
	entry:			entry:
	%0 = zext <4 x i4> %a to <4 x i32>			%0 = zext <4 x i4> %a to <4 x i32>
	%1 = mul nsw <4 x i32> <i32 0, i32 1, i32 2, i32 127>, %0			%1 = mul nsw <4 x i32> <i32 0, i32 1, i32 2, i32 127>, %0
	▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/dpbusd_i4.ll

Show All 23 Lines	entry:
%4 = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %3)		%4 = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %3)
%op.extra = add nsw i32 %4, %c		%op.extra = add nsw i32 %4, %c
ret i32 %op.extra		ret i32 %op.extra
}		}

define i32 @mul_i4i8(<16 x i4> %a, <16 x i8> %b, i32 %c) {		define i32 @mul_i4i8(<16 x i4> %a, <16 x i8> %b, i32 %c) {
; CHECK-LABEL: mul_i4i8:		; CHECK-LABEL: mul_i4i8:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; CHECK-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
; CHECK-NEXT: vpxor %xmm2, %xmm2, %xmm2		; CHECK-NEXT: vpxor %xmm2, %xmm2, %xmm2
; CHECK-NEXT: vpdpbusd %xmm1, %xmm0, %xmm2		; CHECK-NEXT: vpdpbusd %xmm1, %xmm0, %xmm2
; CHECK-NEXT: vpshufd {{.*#+}} xmm0 = xmm2[2,3,2,3]		; CHECK-NEXT: vpshufd {{.*#+}} xmm0 = xmm2[2,3,2,3]
; CHECK-NEXT: vpaddd %xmm0, %xmm2, %xmm0		; CHECK-NEXT: vpaddd %xmm0, %xmm2, %xmm0
; CHECK-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,1,1,1]		; CHECK-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,1,1,1]
; CHECK-NEXT: vpaddd %xmm1, %xmm0, %xmm0		; CHECK-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; CHECK-NEXT: vmovd %xmm0, %eax		; CHECK-NEXT: vmovd %xmm0, %eax
; CHECK-NEXT: addl %edi, %eax		; CHECK-NEXT: addl %edi, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
entry:		entry:
%0 = zext <16 x i4> %a to <16 x i32>		%0 = zext <16 x i4> %a to <16 x i32>
%1 = sext <16 x i8> %b to <16 x i32>		%1 = sext <16 x i8> %b to <16 x i32>
%2 = mul nsw <16 x i32> %0, %1		%2 = mul nsw <16 x i32> %0, %1
%3 = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %2)		%3 = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %2)
%op.extra = add nsw i32 %3, %c		%op.extra = add nsw i32 %3, %c
ret i32 %op.extra		ret i32 %op.extra
}		}

define i32 @mul_i4i4(<16 x i4> %a, <16 x i4> %b, i32 %c) {		define i32 @mul_i4i4(<16 x i4> %a, <16 x i4> %b, i32 %c) {
; CHECK-LABEL: mul_i4i4:		; CHECK-LABEL: mul_i4i4:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: vpsllw $4, %xmm1, %xmm1		; CHECK-NEXT: vpsllw $4, %xmm1, %xmm1
; CHECK-NEXT: vpsrlw $4, %xmm1, %xmm1		; CHECK-NEXT: vpsrlw $4, %xmm1, %xmm1
; CHECK-NEXT: vmovdqa {{.*#+}} xmm2 = [8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]		; CHECK-NEXT: vmovdqa {{.*#+}} xmm2 = [8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]
; CHECK-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm1		; CHECK-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm2, %xmm1
; CHECK-NEXT: vpsubb %xmm2, %xmm1, %xmm1		; CHECK-NEXT: vpsubb %xmm2, %xmm1, %xmm1
; CHECK-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; CHECK-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
; CHECK-NEXT: vpxor %xmm2, %xmm2, %xmm2		; CHECK-NEXT: vpxor %xmm2, %xmm2, %xmm2
; CHECK-NEXT: vpdpbusd %xmm1, %xmm0, %xmm2		; CHECK-NEXT: vpdpbusd %xmm1, %xmm0, %xmm2
; CHECK-NEXT: vpshufd {{.*#+}} xmm0 = xmm2[2,3,2,3]		; CHECK-NEXT: vpshufd {{.*#+}} xmm0 = xmm2[2,3,2,3]
; CHECK-NEXT: vpaddd %xmm0, %xmm2, %xmm0		; CHECK-NEXT: vpaddd %xmm0, %xmm2, %xmm0
; CHECK-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,1,1,1]		; CHECK-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,1,1,1]
; CHECK-NEXT: vpaddd %xmm1, %xmm0, %xmm0		; CHECK-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; CHECK-NEXT: vmovd %xmm0, %eax		; CHECK-NEXT: vmovd %xmm0, %eax
; CHECK-NEXT: addl %edi, %eax		; CHECK-NEXT: addl %edi, %eax
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/gfni-funnel-shifts.ll

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; GFNIAVX1OR2-NEXT: vpaddb %xmm0, %xmm0, %xmm0			; GFNIAVX1OR2-NEXT: vpaddb %xmm0, %xmm0, %xmm0
	; GFNIAVX1OR2-NEXT: vpor %xmm1, %xmm0, %xmm0			; GFNIAVX1OR2-NEXT: vpor %xmm1, %xmm0, %xmm0
	; GFNIAVX1OR2-NEXT: retq			; GFNIAVX1OR2-NEXT: retq
	;			;
	; GFNIAVX512-LABEL: splatconstant_fshr_v16i8:			; GFNIAVX512-LABEL: splatconstant_fshr_v16i8:
	; GFNIAVX512: # %bb.0:			; GFNIAVX512: # %bb.0:
	; GFNIAVX512-NEXT: vpsrlw $7, %xmm1, %xmm1			; GFNIAVX512-NEXT: vpsrlw $7, %xmm1, %xmm1
	; GFNIAVX512-NEXT: vpaddb %xmm0, %xmm0, %xmm0			; GFNIAVX512-NEXT: vpaddb %xmm0, %xmm0, %xmm0
	; GFNIAVX512-NEXT: vpternlogq $248, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0			; GFNIAVX512-NEXT: vpternlogd $248, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm0
	; GFNIAVX512-NEXT: retq			; GFNIAVX512-NEXT: retq
	%res = call <16 x i8> @llvm.fshr.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>)			%res = call <16 x i8> @llvm.fshr.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>)
	ret <16 x i8> %res			ret <16 x i8> %res
	}			}
	declare <16 x i8> @llvm.fshr.v16i8(<16 x i8>, <16 x i8>, <16 x i8>)			declare <16 x i8> @llvm.fshr.v16i8(<16 x i8>, <16 x i8>, <16 x i8>)

	;			;
	; 256 Bit Vector Funnel Shifts			; 256 Bit Vector Funnel Shifts
	▲ Show 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
	; GFNIAVX2-NEXT: vpaddb %ymm1, %ymm1, %ymm1			; GFNIAVX2-NEXT: vpaddb %ymm1, %ymm1, %ymm1
	; GFNIAVX2-NEXT: vpor %ymm2, %ymm1, %ymm1			; GFNIAVX2-NEXT: vpor %ymm2, %ymm1, %ymm1
	; GFNIAVX2-NEXT: retq			; GFNIAVX2-NEXT: retq
	;			;
	; GFNIAVX512-LABEL: splatconstant_fshl_v64i8:			; GFNIAVX512-LABEL: splatconstant_fshl_v64i8:
	; GFNIAVX512: # %bb.0:			; GFNIAVX512: # %bb.0:
	; GFNIAVX512-NEXT: vpsrlw $7, %zmm1, %zmm1			; GFNIAVX512-NEXT: vpsrlw $7, %zmm1, %zmm1
	; GFNIAVX512-NEXT: vpaddb %zmm0, %zmm0, %zmm0			; GFNIAVX512-NEXT: vpaddb %zmm0, %zmm0, %zmm0
	; GFNIAVX512-NEXT: vpternlogq $248, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm0			; GFNIAVX512-NEXT: vpternlogd $248, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm0
	; GFNIAVX512-NEXT: retq			; GFNIAVX512-NEXT: retq
	%res = call <64 x i8> @llvm.fshl.v64i8(<64 x i8> %a, <64 x i8> %b, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>)			%res = call <64 x i8> @llvm.fshl.v64i8(<64 x i8> %a, <64 x i8> %b, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>)
	ret <64 x i8> %res			ret <64 x i8> %res
	}			}
	declare <64 x i8> @llvm.fshl.v64i8(<64 x i8>, <64 x i8>, <64 x i8>)			declare <64 x i8> @llvm.fshl.v64i8(<64 x i8>, <64 x i8>, <64 x i8>)

	define <64 x i8> @splatconstant_fshr_v64i8(<64 x i8> %a, <64 x i8> %b) nounwind {			define <64 x i8> @splatconstant_fshr_v64i8(<64 x i8> %a, <64 x i8> %b) nounwind {
	; GFNISSE-LABEL: splatconstant_fshr_v64i8:			; GFNISSE-LABEL: splatconstant_fshr_v64i8:
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/gfni-rotates.ll

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; GFNIAVX1OR2-NEXT: vpaddb %xmm0, %xmm0, %xmm0			; GFNIAVX1OR2-NEXT: vpaddb %xmm0, %xmm0, %xmm0
	; GFNIAVX1OR2-NEXT: vpor %xmm1, %xmm0, %xmm0			; GFNIAVX1OR2-NEXT: vpor %xmm1, %xmm0, %xmm0
	; GFNIAVX1OR2-NEXT: retq			; GFNIAVX1OR2-NEXT: retq
	;			;
	; GFNIAVX512-LABEL: splatconstant_rotr_v16i8:			; GFNIAVX512-LABEL: splatconstant_rotr_v16i8:
	; GFNIAVX512: # %bb.0:			; GFNIAVX512: # %bb.0:
	; GFNIAVX512-NEXT: vpsrlw $7, %xmm0, %xmm1			; GFNIAVX512-NEXT: vpsrlw $7, %xmm0, %xmm1
	; GFNIAVX512-NEXT: vpaddb %xmm0, %xmm0, %xmm0			; GFNIAVX512-NEXT: vpaddb %xmm0, %xmm0, %xmm0
	; GFNIAVX512-NEXT: vpternlogq $248, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0			; GFNIAVX512-NEXT: vpternlogd $248, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm0
	; GFNIAVX512-NEXT: retq			; GFNIAVX512-NEXT: retq
	%res = call <16 x i8> @llvm.fshr.v16i8(<16 x i8> %a, <16 x i8> %a, <16 x i8> <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>)			%res = call <16 x i8> @llvm.fshr.v16i8(<16 x i8> %a, <16 x i8> %a, <16 x i8> <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>)
	ret <16 x i8> %res			ret <16 x i8> %res
	}			}
	declare <16 x i8> @llvm.fshr.v16i8(<16 x i8>, <16 x i8>, <16 x i8>)			declare <16 x i8> @llvm.fshr.v16i8(<16 x i8>, <16 x i8>, <16 x i8>)

	;			;
	; 256 Bit Vector Rotates			; 256 Bit Vector Rotates
	▲ Show 20 Lines • Show All 178 Lines • ▼ Show 20 Lines
	; GFNIAVX2-NEXT: vpaddb %ymm1, %ymm1, %ymm1			; GFNIAVX2-NEXT: vpaddb %ymm1, %ymm1, %ymm1
	; GFNIAVX2-NEXT: vpor %ymm2, %ymm1, %ymm1			; GFNIAVX2-NEXT: vpor %ymm2, %ymm1, %ymm1
	; GFNIAVX2-NEXT: retq			; GFNIAVX2-NEXT: retq
	;			;
	; GFNIAVX512-LABEL: splatconstant_rotl_v64i8:			; GFNIAVX512-LABEL: splatconstant_rotl_v64i8:
	; GFNIAVX512: # %bb.0:			; GFNIAVX512: # %bb.0:
	; GFNIAVX512-NEXT: vpsrlw $7, %zmm0, %zmm1			; GFNIAVX512-NEXT: vpsrlw $7, %zmm0, %zmm1
	; GFNIAVX512-NEXT: vpaddb %zmm0, %zmm0, %zmm0			; GFNIAVX512-NEXT: vpaddb %zmm0, %zmm0, %zmm0
	; GFNIAVX512-NEXT: vpternlogq $248, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm0			; GFNIAVX512-NEXT: vpternlogd $248, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm0
	; GFNIAVX512-NEXT: retq			; GFNIAVX512-NEXT: retq
	%res = call <64 x i8> @llvm.fshl.v64i8(<64 x i8> %a, <64 x i8> %a, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>)			%res = call <64 x i8> @llvm.fshl.v64i8(<64 x i8> %a, <64 x i8> %a, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>)
	ret <64 x i8> %res			ret <64 x i8> %res
	}			}
	declare <64 x i8> @llvm.fshl.v64i8(<64 x i8>, <64 x i8>, <64 x i8>)			declare <64 x i8> @llvm.fshl.v64i8(<64 x i8>, <64 x i8>, <64 x i8>)

	define <64 x i8> @splatconstant_rotr_v64i8(<64 x i8> %a) nounwind {			define <64 x i8> @splatconstant_rotr_v64i8(<64 x i8> %a) nounwind {
	; GFNISSE-LABEL: splatconstant_rotr_v64i8:			; GFNISSE-LABEL: splatconstant_rotr_v64i8:
	▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/gfni-shifts.ll

	Show All 9 Lines

	define <16 x i8> @splatconstant_shl_v16i8(<16 x i8> %a) nounwind {			define <16 x i8> @splatconstant_shl_v16i8(<16 x i8> %a) nounwind {
	; GFNISSE-LABEL: splatconstant_shl_v16i8:			; GFNISSE-LABEL: splatconstant_shl_v16i8:
	; GFNISSE: # %bb.0:			; GFNISSE: # %bb.0:
	; GFNISSE-NEXT: psllw $3, %xmm0			; GFNISSE-NEXT: psllw $3, %xmm0
	; GFNISSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; GFNISSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; GFNISSE-NEXT: retq			; GFNISSE-NEXT: retq
	;			;
	; GFNIAVX-LABEL: splatconstant_shl_v16i8:			; GFNIAVX1OR2-LABEL: splatconstant_shl_v16i8:
	; GFNIAVX: # %bb.0:			; GFNIAVX1OR2: # %bb.0:
	; GFNIAVX-NEXT: vpsllw $3, %xmm0, %xmm0			; GFNIAVX1OR2-NEXT: vpsllw $3, %xmm0, %xmm0
	; GFNIAVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; GFNIAVX1OR2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; GFNIAVX-NEXT: retq			; GFNIAVX1OR2-NEXT: retq
				;
				; GFNIAVX512-LABEL: splatconstant_shl_v16i8:
				; GFNIAVX512: # %bb.0:
				; GFNIAVX512-NEXT: vpsllw $3, %xmm0, %xmm0
				; GFNIAVX512-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; GFNIAVX512-NEXT: retq
	%shift = shl <16 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>			%shift = shl <16 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
	ret <16 x i8> %shift			ret <16 x i8> %shift
	}			}

	define <16 x i8> @splatconstant_lshr_v16i8(<16 x i8> %a) nounwind {			define <16 x i8> @splatconstant_lshr_v16i8(<16 x i8> %a) nounwind {
	; GFNISSE-LABEL: splatconstant_lshr_v16i8:			; GFNISSE-LABEL: splatconstant_lshr_v16i8:
	; GFNISSE: # %bb.0:			; GFNISSE: # %bb.0:
	; GFNISSE-NEXT: psrlw $7, %xmm0			; GFNISSE-NEXT: psrlw $7, %xmm0
	; GFNISSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; GFNISSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; GFNISSE-NEXT: retq			; GFNISSE-NEXT: retq
	;			;
	; GFNIAVX-LABEL: splatconstant_lshr_v16i8:			; GFNIAVX1OR2-LABEL: splatconstant_lshr_v16i8:
	; GFNIAVX: # %bb.0:			; GFNIAVX1OR2: # %bb.0:
	; GFNIAVX-NEXT: vpsrlw $7, %xmm0, %xmm0			; GFNIAVX1OR2-NEXT: vpsrlw $7, %xmm0, %xmm0
	; GFNIAVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; GFNIAVX1OR2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; GFNIAVX-NEXT: retq			; GFNIAVX1OR2-NEXT: retq
				;
				; GFNIAVX512-LABEL: splatconstant_lshr_v16i8:
				; GFNIAVX512: # %bb.0:
				; GFNIAVX512-NEXT: vpsrlw $7, %xmm0, %xmm0
				; GFNIAVX512-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; GFNIAVX512-NEXT: retq
	%shift = lshr <16 x i8> %a, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>			%shift = lshr <16 x i8> %a, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
	ret <16 x i8> %shift			ret <16 x i8> %shift
	}			}

	define <16 x i8> @splatconstant_ashr_v16i8(<16 x i8> %a) nounwind {			define <16 x i8> @splatconstant_ashr_v16i8(<16 x i8> %a) nounwind {
	; GFNISSE-LABEL: splatconstant_ashr_v16i8:			; GFNISSE-LABEL: splatconstant_ashr_v16i8:
	; GFNISSE: # %bb.0:			; GFNISSE: # %bb.0:
	; GFNISSE-NEXT: psrlw $4, %xmm0			; GFNISSE-NEXT: psrlw $4, %xmm0
	Show All 11 Lines
	; GFNIAVX1OR2-NEXT: vpxor %xmm1, %xmm0, %xmm0			; GFNIAVX1OR2-NEXT: vpxor %xmm1, %xmm0, %xmm0
	; GFNIAVX1OR2-NEXT: vpsubb %xmm1, %xmm0, %xmm0			; GFNIAVX1OR2-NEXT: vpsubb %xmm1, %xmm0, %xmm0
	; GFNIAVX1OR2-NEXT: retq			; GFNIAVX1OR2-NEXT: retq
	;			;
	; GFNIAVX512-LABEL: splatconstant_ashr_v16i8:			; GFNIAVX512-LABEL: splatconstant_ashr_v16i8:
	; GFNIAVX512: # %bb.0:			; GFNIAVX512: # %bb.0:
	; GFNIAVX512-NEXT: vpsrlw $4, %xmm0, %xmm0			; GFNIAVX512-NEXT: vpsrlw $4, %xmm0, %xmm0
	; GFNIAVX512-NEXT: vmovdqa {{.*#+}} xmm1 = [8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]			; GFNIAVX512-NEXT: vmovdqa {{.*#+}} xmm1 = [8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]
	; GFNIAVX512-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0			; GFNIAVX512-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm0
	; GFNIAVX512-NEXT: vpsubb %xmm1, %xmm0, %xmm0			; GFNIAVX512-NEXT: vpsubb %xmm1, %xmm0, %xmm0
	; GFNIAVX512-NEXT: retq			; GFNIAVX512-NEXT: retq
	%shift = ashr <16 x i8> %a, <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>			%shift = ashr <16 x i8> %a, <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>
	ret <16 x i8> %shift			ret <16 x i8> %shift
	}			}

	;			;
	; 256 Bit Vector Shifts			; 256 Bit Vector Shifts
	Show All 24 Lines
	; GFNIAVX2: # %bb.0:			; GFNIAVX2: # %bb.0:
	; GFNIAVX2-NEXT: vpsllw $6, %ymm0, %ymm0			; GFNIAVX2-NEXT: vpsllw $6, %ymm0, %ymm0
	; GFNIAVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; GFNIAVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; GFNIAVX2-NEXT: retq			; GFNIAVX2-NEXT: retq
	;			;
	; GFNIAVX512-LABEL: splatconstant_shl_v32i8:			; GFNIAVX512-LABEL: splatconstant_shl_v32i8:
	; GFNIAVX512: # %bb.0:			; GFNIAVX512: # %bb.0:
	; GFNIAVX512-NEXT: vpsllw $6, %ymm0, %ymm0			; GFNIAVX512-NEXT: vpsllw $6, %ymm0, %ymm0
	; GFNIAVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; GFNIAVX512-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; GFNIAVX512-NEXT: retq			; GFNIAVX512-NEXT: retq
	%shift = shl <32 x i8> %a, <i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6>			%shift = shl <32 x i8> %a, <i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6, i8 6>
	ret <32 x i8> %shift			ret <32 x i8> %shift
	}			}

	define <32 x i8> @splatconstant_lshr_v32i8(<32 x i8> %a) nounwind {			define <32 x i8> @splatconstant_lshr_v32i8(<32 x i8> %a) nounwind {
	; GFNISSE-LABEL: splatconstant_lshr_v32i8:			; GFNISSE-LABEL: splatconstant_lshr_v32i8:
	; GFNISSE: # %bb.0:			; GFNISSE: # %bb.0:
	Show All 19 Lines
	; GFNIAVX2: # %bb.0:			; GFNIAVX2: # %bb.0:
	; GFNIAVX2-NEXT: vpsrlw $1, %ymm0, %ymm0			; GFNIAVX2-NEXT: vpsrlw $1, %ymm0, %ymm0
	; GFNIAVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; GFNIAVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; GFNIAVX2-NEXT: retq			; GFNIAVX2-NEXT: retq
	;			;
	; GFNIAVX512-LABEL: splatconstant_lshr_v32i8:			; GFNIAVX512-LABEL: splatconstant_lshr_v32i8:
	; GFNIAVX512: # %bb.0:			; GFNIAVX512: # %bb.0:
	; GFNIAVX512-NEXT: vpsrlw $1, %ymm0, %ymm0			; GFNIAVX512-NEXT: vpsrlw $1, %ymm0, %ymm0
	; GFNIAVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; GFNIAVX512-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; GFNIAVX512-NEXT: retq			; GFNIAVX512-NEXT: retq
	%shift = lshr <32 x i8> %a, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%shift = lshr <32 x i8> %a, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	ret <32 x i8> %shift			ret <32 x i8> %shift
	}			}

	define <32 x i8> @splatconstant_ashr_v32i8(<32 x i8> %a) nounwind {			define <32 x i8> @splatconstant_ashr_v32i8(<32 x i8> %a) nounwind {
	; GFNISSE-LABEL: splatconstant_ashr_v32i8:			; GFNISSE-LABEL: splatconstant_ashr_v32i8:
	; GFNISSE: # %bb.0:			; GFNISSE: # %bb.0:
	Show All 33 Lines
	; GFNIAVX2-NEXT: vpxor %ymm1, %ymm0, %ymm0			; GFNIAVX2-NEXT: vpxor %ymm1, %ymm0, %ymm0
	; GFNIAVX2-NEXT: vpsubb %ymm1, %ymm0, %ymm0			; GFNIAVX2-NEXT: vpsubb %ymm1, %ymm0, %ymm0
	; GFNIAVX2-NEXT: retq			; GFNIAVX2-NEXT: retq
	;			;
	; GFNIAVX512-LABEL: splatconstant_ashr_v32i8:			; GFNIAVX512-LABEL: splatconstant_ashr_v32i8:
	; GFNIAVX512: # %bb.0:			; GFNIAVX512: # %bb.0:
	; GFNIAVX512-NEXT: vpsrlw $2, %ymm0, %ymm0			; GFNIAVX512-NEXT: vpsrlw $2, %ymm0, %ymm0
	; GFNIAVX512-NEXT: vmovdqa {{.*#+}} ymm1 = [32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32]			; GFNIAVX512-NEXT: vmovdqa {{.*#+}} ymm1 = [32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32]
	; GFNIAVX512-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm0			; GFNIAVX512-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm0
	; GFNIAVX512-NEXT: vpsubb %ymm1, %ymm0, %ymm0			; GFNIAVX512-NEXT: vpsubb %ymm1, %ymm0, %ymm0
	; GFNIAVX512-NEXT: retq			; GFNIAVX512-NEXT: retq
	%shift = ashr <32 x i8> %a, <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>			%shift = ashr <32 x i8> %a, <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>
	ret <32 x i8> %shift			ret <32 x i8> %shift
	}			}

	;			;
	; 512 Bit Vector Shifts			; 512 Bit Vector Shifts
	Show All 37 Lines
	; GFNIAVX2-NEXT: vpand %ymm2, %ymm0, %ymm0			; GFNIAVX2-NEXT: vpand %ymm2, %ymm0, %ymm0
	; GFNIAVX2-NEXT: vpsllw $5, %ymm1, %ymm1			; GFNIAVX2-NEXT: vpsllw $5, %ymm1, %ymm1
	; GFNIAVX2-NEXT: vpand %ymm2, %ymm1, %ymm1			; GFNIAVX2-NEXT: vpand %ymm2, %ymm1, %ymm1
	; GFNIAVX2-NEXT: retq			; GFNIAVX2-NEXT: retq
	;			;
	; GFNIAVX512-LABEL: splatconstant_shl_v64i8:			; GFNIAVX512-LABEL: splatconstant_shl_v64i8:
	; GFNIAVX512: # %bb.0:			; GFNIAVX512: # %bb.0:
	; GFNIAVX512-NEXT: vpsllw $5, %zmm0, %zmm0			; GFNIAVX512-NEXT: vpsllw $5, %zmm0, %zmm0
	; GFNIAVX512-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; GFNIAVX512-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; GFNIAVX512-NEXT: retq			; GFNIAVX512-NEXT: retq
	%shift = shl <64 x i8> %a, <i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5>			%shift = shl <64 x i8> %a, <i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5>
	ret <64 x i8> %shift			ret <64 x i8> %shift
	}			}

	define <64 x i8> @splatconstant_lshr_v64i8(<64 x i8> %a) nounwind {			define <64 x i8> @splatconstant_lshr_v64i8(<64 x i8> %a) nounwind {
	; GFNISSE-LABEL: splatconstant_lshr_v64i8:			; GFNISSE-LABEL: splatconstant_lshr_v64i8:
	; GFNISSE: # %bb.0:			; GFNISSE: # %bb.0:
	Show All 32 Lines
	; GFNIAVX2-NEXT: vpand %ymm2, %ymm0, %ymm0			; GFNIAVX2-NEXT: vpand %ymm2, %ymm0, %ymm0
	; GFNIAVX2-NEXT: vpsrlw $7, %ymm1, %ymm1			; GFNIAVX2-NEXT: vpsrlw $7, %ymm1, %ymm1
	; GFNIAVX2-NEXT: vpand %ymm2, %ymm1, %ymm1			; GFNIAVX2-NEXT: vpand %ymm2, %ymm1, %ymm1
	; GFNIAVX2-NEXT: retq			; GFNIAVX2-NEXT: retq
	;			;
	; GFNIAVX512-LABEL: splatconstant_lshr_v64i8:			; GFNIAVX512-LABEL: splatconstant_lshr_v64i8:
	; GFNIAVX512: # %bb.0:			; GFNIAVX512: # %bb.0:
	; GFNIAVX512-NEXT: vpsrlw $7, %zmm0, %zmm0			; GFNIAVX512-NEXT: vpsrlw $7, %zmm0, %zmm0
	; GFNIAVX512-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; GFNIAVX512-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; GFNIAVX512-NEXT: retq			; GFNIAVX512-NEXT: retq
	%shift = lshr <64 x i8> %a, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>			%shift = lshr <64 x i8> %a, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
	ret <64 x i8> %shift			ret <64 x i8> %shift
	}			}

	define <64 x i8> @splatconstant_ashr_v64i8(<64 x i8> %a) nounwind {			define <64 x i8> @splatconstant_ashr_v64i8(<64 x i8> %a) nounwind {
	; GFNISSE-LABEL: splatconstant_ashr_v64i8:			; GFNISSE-LABEL: splatconstant_ashr_v64i8:
	; GFNISSE: # %bb.0:			; GFNISSE: # %bb.0:
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; GFNIAVX2-NEXT: vpxor %ymm3, %ymm1, %ymm1			; GFNIAVX2-NEXT: vpxor %ymm3, %ymm1, %ymm1
	; GFNIAVX2-NEXT: vpsubb %ymm3, %ymm1, %ymm1			; GFNIAVX2-NEXT: vpsubb %ymm3, %ymm1, %ymm1
	; GFNIAVX2-NEXT: retq			; GFNIAVX2-NEXT: retq
	;			;
	; GFNIAVX512-LABEL: splatconstant_ashr_v64i8:			; GFNIAVX512-LABEL: splatconstant_ashr_v64i8:
	; GFNIAVX512: # %bb.0:			; GFNIAVX512: # %bb.0:
	; GFNIAVX512-NEXT: vpsrlw $1, %zmm0, %zmm0			; GFNIAVX512-NEXT: vpsrlw $1, %zmm0, %zmm0
	; GFNIAVX512-NEXT: vmovdqa64 {{.*#+}} zmm1 = [64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64]			; GFNIAVX512-NEXT: vmovdqa64 {{.*#+}} zmm1 = [64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64]
	; GFNIAVX512-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm0			; GFNIAVX512-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm0
	; GFNIAVX512-NEXT: vpsubb %zmm1, %zmm0, %zmm0			; GFNIAVX512-NEXT: vpsubb %zmm1, %zmm0, %zmm0
	; GFNIAVX512-NEXT: retq			; GFNIAVX512-NEXT: retq
	%shift = ashr <64 x i8> %a, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%shift = ashr <64 x i8> %a, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	ret <64 x i8> %shift			ret <64 x i8> %shift
	}			}
				;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
				; GFNIAVX: {{.*}}

llvm/test/CodeGen/X86/horizontal-reduce-smax.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+sse2 \| FileCheck %s --check-prefix=X86-SSE2			; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+sse2 \| FileCheck %s --check-prefix=X86-SSE2
	; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+sse4.2 \| FileCheck %s --check-prefix=X86-SSE42			; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+sse4.2 \| FileCheck %s --check-prefix=X86-SSE42
	; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+avx \| FileCheck %s --check-prefixes=X86-AVX,X86-AVX1			; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+avx \| FileCheck %s --check-prefixes=X86-AVX,X86-AVX1
	; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+avx2 \| FileCheck %s --check-prefixes=X86-AVX,X86-AVX2			; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+avx2 \| FileCheck %s --check-prefixes=X86-AVX,X86-AVX2
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+sse2 \| FileCheck %s --check-prefix=X64-SSE2			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+sse2 \| FileCheck %s --check-prefix=X64-SSE2
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+sse4.2 \| FileCheck %s --check-prefix=X64-SSE42			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+sse4.2 \| FileCheck %s --check-prefix=X64-SSE42
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX1			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX1OR2,X64-AVX1
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx2 \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX2			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx2 \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX1OR2,X64-AVX2
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX512			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX512

	;			;
	; 128-bit Vectors			; 128-bit Vectors
	;			;

	define i64 @test_reduce_v2i64(<2 x i64> %a0) {			define i64 @test_reduce_v2i64(<2 x i64> %a0) {
	; X86-SSE2-LABEL: test_reduce_v2i64:			; X86-SSE2-LABEL: test_reduce_v2i64:
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; X64-SSE42: ## %bb.0:			; X64-SSE42: ## %bb.0:
	; X64-SSE42-NEXT: movdqa %xmm0, %xmm1			; X64-SSE42-NEXT: movdqa %xmm0, %xmm1
	; X64-SSE42-NEXT: pshufd {{.*#+}} xmm2 = xmm0[2,3,2,3]			; X64-SSE42-NEXT: pshufd {{.*#+}} xmm2 = xmm0[2,3,2,3]
	; X64-SSE42-NEXT: pcmpgtq %xmm2, %xmm0			; X64-SSE42-NEXT: pcmpgtq %xmm2, %xmm0
	; X64-SSE42-NEXT: blendvpd %xmm0, %xmm1, %xmm2			; X64-SSE42-NEXT: blendvpd %xmm0, %xmm1, %xmm2
	; X64-SSE42-NEXT: movq %xmm2, %rax			; X64-SSE42-NEXT: movq %xmm2, %rax
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX1-LABEL: test_reduce_v2i64:			; X64-AVX1OR2-LABEL: test_reduce_v2i64:
	; X64-AVX1: ## %bb.0:			; X64-AVX1OR2: ## %bb.0:
	; X64-AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]			; X64-AVX1OR2-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
	; X64-AVX1-NEXT: vpcmpgtq %xmm1, %xmm0, %xmm2			; X64-AVX1OR2-NEXT: vpcmpgtq %xmm1, %xmm0, %xmm2
	; X64-AVX1-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0			; X64-AVX1OR2-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
	; X64-AVX1-NEXT: vmovq %xmm0, %rax			; X64-AVX1OR2-NEXT: vmovq %xmm0, %rax
	; X64-AVX1-NEXT: retq			; X64-AVX1OR2-NEXT: retq
	;
	; X64-AVX2-LABEL: test_reduce_v2i64:
	; X64-AVX2: ## %bb.0:
	; X64-AVX2-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
	; X64-AVX2-NEXT: vpcmpgtq %xmm1, %xmm0, %xmm2
	; X64-AVX2-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
	; X64-AVX2-NEXT: vmovq %xmm0, %rax
	; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512-LABEL: test_reduce_v2i64:			; X64-AVX512-LABEL: test_reduce_v2i64:
	; X64-AVX512: ## %bb.0:			; X64-AVX512: ## %bb.0:
	; X64-AVX512-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]			; X64-AVX512-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
	; X64-AVX512-NEXT: vpmaxsq %xmm1, %xmm0, %xmm0			; X64-AVX512-NEXT: vpmaxsq %xmm1, %xmm0, %xmm0
	; X64-AVX512-NEXT: vmovq %xmm0, %rax			; X64-AVX512-NEXT: vmovq %xmm0, %rax
	; X64-AVX512-NEXT: retq			; X64-AVX512-NEXT: retq
	%1 = shufflevector <2 x i64> %a0, <2 x i64> undef, <2 x i32> <i32 1, i32 undef>			%1 = shufflevector <2 x i64> %a0, <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
	▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	; X64-SSE42: ## %bb.0:			; X64-SSE42: ## %bb.0:
	; X64-SSE42-NEXT: pxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; X64-SSE42-NEXT: pxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; X64-SSE42-NEXT: phminposuw %xmm0, %xmm0			; X64-SSE42-NEXT: phminposuw %xmm0, %xmm0
	; X64-SSE42-NEXT: movd %xmm0, %eax			; X64-SSE42-NEXT: movd %xmm0, %eax
	; X64-SSE42-NEXT: xorl $32767, %eax ## imm = 0x7FFF			; X64-SSE42-NEXT: xorl $32767, %eax ## imm = 0x7FFF
	; X64-SSE42-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-SSE42-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX-LABEL: test_reduce_v8i16:			; X64-AVX1OR2-LABEL: test_reduce_v8i16:
	; X64-AVX: ## %bb.0:			; X64-AVX1OR2: ## %bb.0:
	; X64-AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX-NEXT: vmovd %xmm0, %eax			; X64-AVX1OR2-NEXT: vmovd %xmm0, %eax
	; X64-AVX-NEXT: xorl $32767, %eax ## imm = 0x7FFF			; X64-AVX1OR2-NEXT: xorl $32767, %eax ## imm = 0x7FFF
	; X64-AVX-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-AVX1OR2-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-AVX-NEXT: retq			; X64-AVX1OR2-NEXT: retq
				;
				; X64-AVX512-LABEL: test_reduce_v8i16:
				; X64-AVX512: ## %bb.0:
				; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
				; X64-AVX512-NEXT: vmovd %xmm0, %eax
				; X64-AVX512-NEXT: xorl $32767, %eax ## imm = 0x7FFF
				; X64-AVX512-NEXT: ## kill: def $ax killed $ax killed $eax
				; X64-AVX512-NEXT: retq
	%1 = shufflevector <8 x i16> %a0, <8 x i16> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <8 x i16> %a0, <8 x i16> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp sgt <8 x i16> %a0, %1			%2 = icmp sgt <8 x i16> %a0, %1
	%3 = select <8 x i1> %2, <8 x i16> %a0, <8 x i16> %1			%3 = select <8 x i1> %2, <8 x i16> %a0, <8 x i16> %1
	%4 = shufflevector <8 x i16> %3, <8 x i16> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%4 = shufflevector <8 x i16> %3, <8 x i16> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%5 = icmp sgt <8 x i16> %3, %4			%5 = icmp sgt <8 x i16> %3, %4
	%6 = select <8 x i1> %5, <8 x i16> %3, <8 x i16> %4			%6 = select <8 x i1> %5, <8 x i16> %3, <8 x i16> %4
	%7 = shufflevector <8 x i16> %6, <8 x i16> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%7 = shufflevector <8 x i16> %6, <8 x i16> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%8 = icmp sgt <8 x i16> %6, %7			%8 = icmp sgt <8 x i16> %6, %7
	▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
	; X64-SSE42-NEXT: psrlw $8, %xmm1			; X64-SSE42-NEXT: psrlw $8, %xmm1
	; X64-SSE42-NEXT: pminub %xmm0, %xmm1			; X64-SSE42-NEXT: pminub %xmm0, %xmm1
	; X64-SSE42-NEXT: phminposuw %xmm1, %xmm0			; X64-SSE42-NEXT: phminposuw %xmm1, %xmm0
	; X64-SSE42-NEXT: movd %xmm0, %eax			; X64-SSE42-NEXT: movd %xmm0, %eax
	; X64-SSE42-NEXT: xorb $127, %al			; X64-SSE42-NEXT: xorb $127, %al
	; X64-SSE42-NEXT: ## kill: def $al killed $al killed $eax			; X64-SSE42-NEXT: ## kill: def $al killed $al killed $eax
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX-LABEL: test_reduce_v16i8:			; X64-AVX1OR2-LABEL: test_reduce_v16i8:
	; X64-AVX: ## %bb.0:			; X64-AVX1OR2: ## %bb.0:
	; X64-AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vpsrlw $8, %xmm0, %xmm1			; X64-AVX1OR2-NEXT: vpsrlw $8, %xmm0, %xmm1
	; X64-AVX-NEXT: vpminub %xmm1, %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; X64-AVX-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX-NEXT: vmovd %xmm0, %eax			; X64-AVX1OR2-NEXT: vmovd %xmm0, %eax
	; X64-AVX-NEXT: xorb $127, %al			; X64-AVX1OR2-NEXT: xorb $127, %al
	; X64-AVX-NEXT: ## kill: def $al killed $al killed $eax			; X64-AVX1OR2-NEXT: ## kill: def $al killed $al killed $eax
	; X64-AVX-NEXT: retq			; X64-AVX1OR2-NEXT: retq
				;
				; X64-AVX512-LABEL: test_reduce_v16i8:
				; X64-AVX512: ## %bb.0:
				; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; X64-AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1
				; X64-AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0
				; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
				; X64-AVX512-NEXT: vmovd %xmm0, %eax
				; X64-AVX512-NEXT: xorb $127, %al
				; X64-AVX512-NEXT: ## kill: def $al killed $al killed $eax
				; X64-AVX512-NEXT: retq
	%1 = shufflevector <16 x i8> %a0, <16 x i8> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <16 x i8> %a0, <16 x i8> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp sgt <16 x i8> %a0, %1			%2 = icmp sgt <16 x i8> %a0, %1
	%3 = select <16 x i1> %2, <16 x i8> %a0, <16 x i8> %1			%3 = select <16 x i1> %2, <16 x i8> %a0, <16 x i8> %1
	%4 = shufflevector <16 x i8> %3, <16 x i8> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%4 = shufflevector <16 x i8> %3, <16 x i8> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%5 = icmp sgt <16 x i8> %3, %4			%5 = icmp sgt <16 x i8> %3, %4
	%6 = select <16 x i1> %5, <16 x i8> %3, <16 x i8> %4			%6 = select <16 x i1> %5, <16 x i8> %3, <16 x i8> %4
	%7 = shufflevector <16 x i8> %6, <16 x i8> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%7 = shufflevector <16 x i8> %6, <16 x i8> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%8 = icmp sgt <16 x i8> %6, %7			%8 = icmp sgt <16 x i8> %6, %7
	▲ Show 20 Lines • Show All 411 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-AVX2-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-AVX2-NEXT: vzeroupper			; X64-AVX2-NEXT: vzeroupper
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512-LABEL: test_reduce_v16i16:			; X64-AVX512-LABEL: test_reduce_v16i16:
	; X64-AVX512: ## %bb.0:			; X64-AVX512: ## %bb.0:
	; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1
	; X64-AVX512-NEXT: vpmaxsw %xmm1, %xmm0, %xmm0			; X64-AVX512-NEXT: vpmaxsw %xmm1, %xmm0, %xmm0
	; X64-AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX512-NEXT: vmovd %xmm0, %eax			; X64-AVX512-NEXT: vmovd %xmm0, %eax
	; X64-AVX512-NEXT: xorl $32767, %eax ## imm = 0x7FFF			; X64-AVX512-NEXT: xorl $32767, %eax ## imm = 0x7FFF
	; X64-AVX512-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-AVX512-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-AVX512-NEXT: vzeroupper			; X64-AVX512-NEXT: vzeroupper
	; X64-AVX512-NEXT: retq			; X64-AVX512-NEXT: retq
	%1 = shufflevector <16 x i16> %a0, <16 x i16> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <16 x i16> %a0, <16 x i16> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp sgt <16 x i16> %a0, %1			%2 = icmp sgt <16 x i16> %a0, %1
	▲ Show 20 Lines • Show All 167 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: ## kill: def $al killed $al killed $eax			; X64-AVX2-NEXT: ## kill: def $al killed $al killed $eax
	; X64-AVX2-NEXT: vzeroupper			; X64-AVX2-NEXT: vzeroupper
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512-LABEL: test_reduce_v32i8:			; X64-AVX512-LABEL: test_reduce_v32i8:
	; X64-AVX512: ## %bb.0:			; X64-AVX512: ## %bb.0:
	; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1
	; X64-AVX512-NEXT: vpmaxsb %xmm1, %xmm0, %xmm0			; X64-AVX512-NEXT: vpmaxsb %xmm1, %xmm0, %xmm0
	; X64-AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; X64-AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1			; X64-AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1
	; X64-AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0			; X64-AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX512-NEXT: vmovd %xmm0, %eax			; X64-AVX512-NEXT: vmovd %xmm0, %eax
	; X64-AVX512-NEXT: xorb $127, %al			; X64-AVX512-NEXT: xorb $127, %al
	; X64-AVX512-NEXT: ## kill: def $al killed $al killed $eax			; X64-AVX512-NEXT: ## kill: def $al killed $al killed $eax
	; X64-AVX512-NEXT: vzeroupper			; X64-AVX512-NEXT: vzeroupper
	; X64-AVX512-NEXT: retq			; X64-AVX512-NEXT: retq
	▲ Show 20 Lines • Show All 570 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512-LABEL: test_reduce_v32i16:			; X64-AVX512-LABEL: test_reduce_v32i16:
	; X64-AVX512: ## %bb.0:			; X64-AVX512: ## %bb.0:
	; X64-AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; X64-AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; X64-AVX512-NEXT: vpmaxsw %ymm1, %ymm0, %ymm0			; X64-AVX512-NEXT: vpmaxsw %ymm1, %ymm0, %ymm0
	; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1
	; X64-AVX512-NEXT: vpmaxsw %xmm1, %xmm0, %xmm0			; X64-AVX512-NEXT: vpmaxsw %xmm1, %xmm0, %xmm0
	; X64-AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX512-NEXT: vmovd %xmm0, %eax			; X64-AVX512-NEXT: vmovd %xmm0, %eax
	; X64-AVX512-NEXT: xorl $32767, %eax ## imm = 0x7FFF			; X64-AVX512-NEXT: xorl $32767, %eax ## imm = 0x7FFF
	; X64-AVX512-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-AVX512-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-AVX512-NEXT: vzeroupper			; X64-AVX512-NEXT: vzeroupper
	; X64-AVX512-NEXT: retq			; X64-AVX512-NEXT: retq
	%1 = shufflevector <32 x i16> %a0, <32 x i16> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <32 x i16> %a0, <32 x i16> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp sgt <32 x i16> %a0, %1			%2 = icmp sgt <32 x i16> %a0, %1
	▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512-LABEL: test_reduce_v64i8:			; X64-AVX512-LABEL: test_reduce_v64i8:
	; X64-AVX512: ## %bb.0:			; X64-AVX512: ## %bb.0:
	; X64-AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; X64-AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; X64-AVX512-NEXT: vpmaxsb %ymm1, %ymm0, %ymm0			; X64-AVX512-NEXT: vpmaxsb %ymm1, %ymm0, %ymm0
	; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1
	; X64-AVX512-NEXT: vpmaxsb %xmm1, %xmm0, %xmm0			; X64-AVX512-NEXT: vpmaxsb %xmm1, %xmm0, %xmm0
	; X64-AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; X64-AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1			; X64-AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1
	; X64-AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0			; X64-AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX512-NEXT: vmovd %xmm0, %eax			; X64-AVX512-NEXT: vmovd %xmm0, %eax
	; X64-AVX512-NEXT: xorb $127, %al			; X64-AVX512-NEXT: xorb $127, %al
	; X64-AVX512-NEXT: ## kill: def $al killed $al killed $eax			; X64-AVX512-NEXT: ## kill: def $al killed $al killed $eax
	; X64-AVX512-NEXT: vzeroupper			; X64-AVX512-NEXT: vzeroupper
	; X64-AVX512-NEXT: retq			; X64-AVX512-NEXT: retq
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; X64-SSE42: ## %bb.0:			; X64-SSE42: ## %bb.0:
	; X64-SSE42-NEXT: pxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; X64-SSE42-NEXT: pxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; X64-SSE42-NEXT: phminposuw %xmm0, %xmm0			; X64-SSE42-NEXT: phminposuw %xmm0, %xmm0
	; X64-SSE42-NEXT: movd %xmm0, %eax			; X64-SSE42-NEXT: movd %xmm0, %eax
	; X64-SSE42-NEXT: xorl $32767, %eax ## imm = 0x7FFF			; X64-SSE42-NEXT: xorl $32767, %eax ## imm = 0x7FFF
	; X64-SSE42-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-SSE42-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX-LABEL: test_reduce_v16i16_v8i16:			; X64-AVX1OR2-LABEL: test_reduce_v16i16_v8i16:
	; X64-AVX: ## %bb.0:			; X64-AVX1OR2: ## %bb.0:
	; X64-AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX-NEXT: vmovd %xmm0, %eax			; X64-AVX1OR2-NEXT: vmovd %xmm0, %eax
	; X64-AVX-NEXT: xorl $32767, %eax ## imm = 0x7FFF			; X64-AVX1OR2-NEXT: xorl $32767, %eax ## imm = 0x7FFF
	; X64-AVX-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-AVX1OR2-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-AVX-NEXT: vzeroupper			; X64-AVX1OR2-NEXT: vzeroupper
	; X64-AVX-NEXT: retq			; X64-AVX1OR2-NEXT: retq
				;
				; X64-AVX512-LABEL: test_reduce_v16i16_v8i16:
				; X64-AVX512: ## %bb.0:
				; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
				; X64-AVX512-NEXT: vmovd %xmm0, %eax
				; X64-AVX512-NEXT: xorl $32767, %eax ## imm = 0x7FFF
				; X64-AVX512-NEXT: ## kill: def $ax killed $ax killed $eax
				; X64-AVX512-NEXT: vzeroupper
				; X64-AVX512-NEXT: retq
	%1 = shufflevector <16 x i16> %a0, <16 x i16> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <16 x i16> %a0, <16 x i16> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp sgt <16 x i16> %a0, %1			%2 = icmp sgt <16 x i16> %a0, %1
	%3 = select <16 x i1> %2, <16 x i16> %a0, <16 x i16> %1			%3 = select <16 x i1> %2, <16 x i16> %a0, <16 x i16> %1
	%4 = shufflevector <16 x i16> %3, <16 x i16> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%4 = shufflevector <16 x i16> %3, <16 x i16> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%5 = icmp sgt <16 x i16> %3, %4			%5 = icmp sgt <16 x i16> %3, %4
	%6 = select <16 x i1> %5, <16 x i16> %3, <16 x i16> %4			%6 = select <16 x i1> %5, <16 x i16> %3, <16 x i16> %4
	%7 = shufflevector <16 x i16> %6, <16 x i16> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%7 = shufflevector <16 x i16> %6, <16 x i16> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%8 = icmp sgt <16 x i16> %6, %7			%8 = icmp sgt <16 x i16> %6, %7
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; X64-SSE42: ## %bb.0:			; X64-SSE42: ## %bb.0:
	; X64-SSE42-NEXT: pxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; X64-SSE42-NEXT: pxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; X64-SSE42-NEXT: phminposuw %xmm0, %xmm0			; X64-SSE42-NEXT: phminposuw %xmm0, %xmm0
	; X64-SSE42-NEXT: movd %xmm0, %eax			; X64-SSE42-NEXT: movd %xmm0, %eax
	; X64-SSE42-NEXT: xorl $32767, %eax ## imm = 0x7FFF			; X64-SSE42-NEXT: xorl $32767, %eax ## imm = 0x7FFF
	; X64-SSE42-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-SSE42-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX-LABEL: test_reduce_v32i16_v8i16:			; X64-AVX1OR2-LABEL: test_reduce_v32i16_v8i16:
	; X64-AVX: ## %bb.0:			; X64-AVX1OR2: ## %bb.0:
	; X64-AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX-NEXT: vmovd %xmm0, %eax			; X64-AVX1OR2-NEXT: vmovd %xmm0, %eax
	; X64-AVX-NEXT: xorl $32767, %eax ## imm = 0x7FFF			; X64-AVX1OR2-NEXT: xorl $32767, %eax ## imm = 0x7FFF
	; X64-AVX-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-AVX1OR2-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-AVX-NEXT: vzeroupper			; X64-AVX1OR2-NEXT: vzeroupper
	; X64-AVX-NEXT: retq			; X64-AVX1OR2-NEXT: retq
				;
				; X64-AVX512-LABEL: test_reduce_v32i16_v8i16:
				; X64-AVX512: ## %bb.0:
				; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
				; X64-AVX512-NEXT: vmovd %xmm0, %eax
				; X64-AVX512-NEXT: xorl $32767, %eax ## imm = 0x7FFF
				; X64-AVX512-NEXT: ## kill: def $ax killed $ax killed $eax
				; X64-AVX512-NEXT: vzeroupper
				; X64-AVX512-NEXT: retq
	%1 = shufflevector <32 x i16> %a0, <32 x i16> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <32 x i16> %a0, <32 x i16> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp sgt <32 x i16> %a0, %1			%2 = icmp sgt <32 x i16> %a0, %1
	%3 = select <32 x i1> %2, <32 x i16> %a0, <32 x i16> %1			%3 = select <32 x i1> %2, <32 x i16> %a0, <32 x i16> %1
	%4 = shufflevector <32 x i16> %3, <32 x i16> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%4 = shufflevector <32 x i16> %3, <32 x i16> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%5 = icmp sgt <32 x i16> %3, %4			%5 = icmp sgt <32 x i16> %3, %4
	%6 = select <32 x i1> %5, <32 x i16> %3, <32 x i16> %4			%6 = select <32 x i1> %5, <32 x i16> %3, <32 x i16> %4
	%7 = shufflevector <32 x i16> %6, <32 x i16> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%7 = shufflevector <32 x i16> %6, <32 x i16> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%8 = icmp sgt <32 x i16> %6, %7			%8 = icmp sgt <32 x i16> %6, %7
	▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
	; X64-SSE42-NEXT: psrlw $8, %xmm1			; X64-SSE42-NEXT: psrlw $8, %xmm1
	; X64-SSE42-NEXT: pminub %xmm0, %xmm1			; X64-SSE42-NEXT: pminub %xmm0, %xmm1
	; X64-SSE42-NEXT: phminposuw %xmm1, %xmm0			; X64-SSE42-NEXT: phminposuw %xmm1, %xmm0
	; X64-SSE42-NEXT: movd %xmm0, %eax			; X64-SSE42-NEXT: movd %xmm0, %eax
	; X64-SSE42-NEXT: xorb $127, %al			; X64-SSE42-NEXT: xorb $127, %al
	; X64-SSE42-NEXT: ## kill: def $al killed $al killed $eax			; X64-SSE42-NEXT: ## kill: def $al killed $al killed $eax
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX-LABEL: test_reduce_v32i8_v16i8:			; X64-AVX1OR2-LABEL: test_reduce_v32i8_v16i8:
	; X64-AVX: ## %bb.0:			; X64-AVX1OR2: ## %bb.0:
	; X64-AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vpsrlw $8, %xmm0, %xmm1			; X64-AVX1OR2-NEXT: vpsrlw $8, %xmm0, %xmm1
	; X64-AVX-NEXT: vpminub %xmm1, %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; X64-AVX-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX-NEXT: vmovd %xmm0, %eax			; X64-AVX1OR2-NEXT: vmovd %xmm0, %eax
	; X64-AVX-NEXT: xorb $127, %al			; X64-AVX1OR2-NEXT: xorb $127, %al
	; X64-AVX-NEXT: ## kill: def $al killed $al killed $eax			; X64-AVX1OR2-NEXT: ## kill: def $al killed $al killed $eax
	; X64-AVX-NEXT: vzeroupper			; X64-AVX1OR2-NEXT: vzeroupper
	; X64-AVX-NEXT: retq			; X64-AVX1OR2-NEXT: retq
				;
				; X64-AVX512-LABEL: test_reduce_v32i8_v16i8:
				; X64-AVX512: ## %bb.0:
				; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; X64-AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1
				; X64-AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0
				; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
				; X64-AVX512-NEXT: vmovd %xmm0, %eax
				; X64-AVX512-NEXT: xorb $127, %al
				; X64-AVX512-NEXT: ## kill: def $al killed $al killed $eax
				; X64-AVX512-NEXT: vzeroupper
				; X64-AVX512-NEXT: retq
	%1 = shufflevector <32 x i8> %a0, <32 x i8> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <32 x i8> %a0, <32 x i8> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp sgt <32 x i8> %a0, %1			%2 = icmp sgt <32 x i8> %a0, %1
	%3 = select <32 x i1> %2, <32 x i8> %a0, <32 x i8> %1			%3 = select <32 x i1> %2, <32 x i8> %a0, <32 x i8> %1
	%4 = shufflevector <32 x i8> %3, <32 x i8> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%4 = shufflevector <32 x i8> %3, <32 x i8> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%5 = icmp sgt <32 x i8> %3, %4			%5 = icmp sgt <32 x i8> %3, %4
	%6 = select <32 x i1> %5, <32 x i8> %3, <32 x i8> %4			%6 = select <32 x i1> %5, <32 x i8> %3, <32 x i8> %4
	%7 = shufflevector <32 x i8> %6, <32 x i8> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%7 = shufflevector <32 x i8> %6, <32 x i8> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%8 = icmp sgt <32 x i8> %6, %7			%8 = icmp sgt <32 x i8> %6, %7
	▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	; X64-SSE42-NEXT: psrlw $8, %xmm1			; X64-SSE42-NEXT: psrlw $8, %xmm1
	; X64-SSE42-NEXT: pminub %xmm0, %xmm1			; X64-SSE42-NEXT: pminub %xmm0, %xmm1
	; X64-SSE42-NEXT: phminposuw %xmm1, %xmm0			; X64-SSE42-NEXT: phminposuw %xmm1, %xmm0
	; X64-SSE42-NEXT: movd %xmm0, %eax			; X64-SSE42-NEXT: movd %xmm0, %eax
	; X64-SSE42-NEXT: xorb $127, %al			; X64-SSE42-NEXT: xorb $127, %al
	; X64-SSE42-NEXT: ## kill: def $al killed $al killed $eax			; X64-SSE42-NEXT: ## kill: def $al killed $al killed $eax
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX-LABEL: test_reduce_v64i8_v16i8:			; X64-AVX1OR2-LABEL: test_reduce_v64i8_v16i8:
	; X64-AVX: ## %bb.0:			; X64-AVX1OR2: ## %bb.0:
	; X64-AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vpsrlw $8, %xmm0, %xmm1			; X64-AVX1OR2-NEXT: vpsrlw $8, %xmm0, %xmm1
	; X64-AVX-NEXT: vpminub %xmm1, %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; X64-AVX-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX-NEXT: vmovd %xmm0, %eax			; X64-AVX1OR2-NEXT: vmovd %xmm0, %eax
	; X64-AVX-NEXT: xorb $127, %al			; X64-AVX1OR2-NEXT: xorb $127, %al
	; X64-AVX-NEXT: ## kill: def $al killed $al killed $eax			; X64-AVX1OR2-NEXT: ## kill: def $al killed $al killed $eax
	; X64-AVX-NEXT: vzeroupper			; X64-AVX1OR2-NEXT: vzeroupper
	; X64-AVX-NEXT: retq			; X64-AVX1OR2-NEXT: retq
				;
				; X64-AVX512-LABEL: test_reduce_v64i8_v16i8:
				; X64-AVX512: ## %bb.0:
				; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; X64-AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1
				; X64-AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0
				; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
				; X64-AVX512-NEXT: vmovd %xmm0, %eax
				; X64-AVX512-NEXT: xorb $127, %al
				; X64-AVX512-NEXT: ## kill: def $al killed $al killed $eax
				; X64-AVX512-NEXT: vzeroupper
				; X64-AVX512-NEXT: retq
	%1 = shufflevector <64 x i8> %a0, <64 x i8> undef, <64 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <64 x i8> %a0, <64 x i8> undef, <64 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp sgt <64 x i8> %a0, %1			%2 = icmp sgt <64 x i8> %a0, %1
	%3 = select <64 x i1> %2, <64 x i8> %a0, <64 x i8> %1			%3 = select <64 x i1> %2, <64 x i8> %a0, <64 x i8> %1
	%4 = shufflevector <64 x i8> %3, <64 x i8> undef, <64 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%4 = shufflevector <64 x i8> %3, <64 x i8> undef, <64 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%5 = icmp sgt <64 x i8> %3, %4			%5 = icmp sgt <64 x i8> %3, %4
	%6 = select <64 x i1> %5, <64 x i8> %3, <64 x i8> %4			%6 = select <64 x i1> %5, <64 x i8> %3, <64 x i8> %4
	%7 = shufflevector <64 x i8> %6, <64 x i8> undef, <64 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%7 = shufflevector <64 x i8> %6, <64 x i8> undef, <64 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%8 = icmp sgt <64 x i8> %6, %7			%8 = icmp sgt <64 x i8> %6, %7
	%9 = select <64 x i1> %8, <64 x i8> %6, <64 x i8> %7			%9 = select <64 x i1> %8, <64 x i8> %6, <64 x i8> %7
	%10 = shufflevector <64 x i8> %9, <64 x i8> undef, <64 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%10 = shufflevector <64 x i8> %9, <64 x i8> undef, <64 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%11 = icmp sgt <64 x i8> %9, %10			%11 = icmp sgt <64 x i8> %9, %10
	%12 = select <64 x i1> %11, <64 x i8> %9, <64 x i8> %10			%12 = select <64 x i1> %11, <64 x i8> %9, <64 x i8> %10
	%13 = extractelement <64 x i8> %12, i32 0			%13 = extractelement <64 x i8> %12, i32 0
	ret i8 %13			ret i8 %13
	}			}

llvm/test/CodeGen/X86/horizontal-reduce-smin.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+sse2 \| FileCheck %s --check-prefix=X86-SSE2			; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+sse2 \| FileCheck %s --check-prefix=X86-SSE2
	; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+sse4.2 \| FileCheck %s --check-prefix=X86-SSE42			; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+sse4.2 \| FileCheck %s --check-prefix=X86-SSE42
	; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+avx \| FileCheck %s --check-prefixes=X86-AVX,X86-AVX1			; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+avx \| FileCheck %s --check-prefixes=X86-AVX,X86-AVX1
	; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+avx2 \| FileCheck %s --check-prefixes=X86-AVX,X86-AVX2			; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+avx2 \| FileCheck %s --check-prefixes=X86-AVX,X86-AVX2
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+sse2 \| FileCheck %s --check-prefix=X64-SSE2			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+sse2 \| FileCheck %s --check-prefix=X64-SSE2
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+sse4.2 \| FileCheck %s --check-prefix=X64-SSE42			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+sse4.2 \| FileCheck %s --check-prefix=X64-SSE42
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX1			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX1OR2,X64-AVX1
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx2 \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX2			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx2 \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX1OR2,X64-AVX2
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX512			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX512

	;			;
	; 128-bit Vectors			; 128-bit Vectors
	;			;

	define i64 @test_reduce_v2i64(<2 x i64> %a0) {			define i64 @test_reduce_v2i64(<2 x i64> %a0) {
	; X86-SSE2-LABEL: test_reduce_v2i64:			; X86-SSE2-LABEL: test_reduce_v2i64:
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; X64-SSE42-NEXT: movdqa %xmm0, %xmm1			; X64-SSE42-NEXT: movdqa %xmm0, %xmm1
	; X64-SSE42-NEXT: pshufd {{.*#+}} xmm2 = xmm0[2,3,2,3]			; X64-SSE42-NEXT: pshufd {{.*#+}} xmm2 = xmm0[2,3,2,3]
	; X64-SSE42-NEXT: movdqa %xmm2, %xmm0			; X64-SSE42-NEXT: movdqa %xmm2, %xmm0
	; X64-SSE42-NEXT: pcmpgtq %xmm1, %xmm0			; X64-SSE42-NEXT: pcmpgtq %xmm1, %xmm0
	; X64-SSE42-NEXT: blendvpd %xmm0, %xmm1, %xmm2			; X64-SSE42-NEXT: blendvpd %xmm0, %xmm1, %xmm2
	; X64-SSE42-NEXT: movq %xmm2, %rax			; X64-SSE42-NEXT: movq %xmm2, %rax
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX1-LABEL: test_reduce_v2i64:			; X64-AVX1OR2-LABEL: test_reduce_v2i64:
	; X64-AVX1: ## %bb.0:			; X64-AVX1OR2: ## %bb.0:
	; X64-AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]			; X64-AVX1OR2-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
	; X64-AVX1-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm2			; X64-AVX1OR2-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm2
	; X64-AVX1-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0			; X64-AVX1OR2-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
	; X64-AVX1-NEXT: vmovq %xmm0, %rax			; X64-AVX1OR2-NEXT: vmovq %xmm0, %rax
	; X64-AVX1-NEXT: retq			; X64-AVX1OR2-NEXT: retq
	;
	; X64-AVX2-LABEL: test_reduce_v2i64:
	; X64-AVX2: ## %bb.0:
	; X64-AVX2-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
	; X64-AVX2-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm2
	; X64-AVX2-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
	; X64-AVX2-NEXT: vmovq %xmm0, %rax
	; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512-LABEL: test_reduce_v2i64:			; X64-AVX512-LABEL: test_reduce_v2i64:
	; X64-AVX512: ## %bb.0:			; X64-AVX512: ## %bb.0:
	; X64-AVX512-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]			; X64-AVX512-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
	; X64-AVX512-NEXT: vpminsq %xmm1, %xmm0, %xmm0			; X64-AVX512-NEXT: vpminsq %xmm1, %xmm0, %xmm0
	; X64-AVX512-NEXT: vmovq %xmm0, %rax			; X64-AVX512-NEXT: vmovq %xmm0, %rax
	; X64-AVX512-NEXT: retq			; X64-AVX512-NEXT: retq
	%1 = shufflevector <2 x i64> %a0, <2 x i64> undef, <2 x i32> <i32 1, i32 undef>			%1 = shufflevector <2 x i64> %a0, <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
	▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	; X64-SSE42: ## %bb.0:			; X64-SSE42: ## %bb.0:
	; X64-SSE42-NEXT: pxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; X64-SSE42-NEXT: pxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; X64-SSE42-NEXT: phminposuw %xmm0, %xmm0			; X64-SSE42-NEXT: phminposuw %xmm0, %xmm0
	; X64-SSE42-NEXT: movd %xmm0, %eax			; X64-SSE42-NEXT: movd %xmm0, %eax
	; X64-SSE42-NEXT: xorl $32768, %eax ## imm = 0x8000			; X64-SSE42-NEXT: xorl $32768, %eax ## imm = 0x8000
	; X64-SSE42-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-SSE42-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX-LABEL: test_reduce_v8i16:			; X64-AVX1OR2-LABEL: test_reduce_v8i16:
	; X64-AVX: ## %bb.0:			; X64-AVX1OR2: ## %bb.0:
	; X64-AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX-NEXT: vmovd %xmm0, %eax			; X64-AVX1OR2-NEXT: vmovd %xmm0, %eax
	; X64-AVX-NEXT: xorl $32768, %eax ## imm = 0x8000			; X64-AVX1OR2-NEXT: xorl $32768, %eax ## imm = 0x8000
	; X64-AVX-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-AVX1OR2-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-AVX-NEXT: retq			; X64-AVX1OR2-NEXT: retq
				;
				; X64-AVX512-LABEL: test_reduce_v8i16:
				; X64-AVX512: ## %bb.0:
				; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
				; X64-AVX512-NEXT: vmovd %xmm0, %eax
				; X64-AVX512-NEXT: xorl $32768, %eax ## imm = 0x8000
				; X64-AVX512-NEXT: ## kill: def $ax killed $ax killed $eax
				; X64-AVX512-NEXT: retq
	%1 = shufflevector <8 x i16> %a0, <8 x i16> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <8 x i16> %a0, <8 x i16> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp slt <8 x i16> %a0, %1			%2 = icmp slt <8 x i16> %a0, %1
	%3 = select <8 x i1> %2, <8 x i16> %a0, <8 x i16> %1			%3 = select <8 x i1> %2, <8 x i16> %a0, <8 x i16> %1
	%4 = shufflevector <8 x i16> %3, <8 x i16> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%4 = shufflevector <8 x i16> %3, <8 x i16> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%5 = icmp slt <8 x i16> %3, %4			%5 = icmp slt <8 x i16> %3, %4
	%6 = select <8 x i1> %5, <8 x i16> %3, <8 x i16> %4			%6 = select <8 x i1> %5, <8 x i16> %3, <8 x i16> %4
	%7 = shufflevector <8 x i16> %6, <8 x i16> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%7 = shufflevector <8 x i16> %6, <8 x i16> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%8 = icmp slt <8 x i16> %6, %7			%8 = icmp slt <8 x i16> %6, %7
	▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
	; X64-SSE42-NEXT: psrlw $8, %xmm1			; X64-SSE42-NEXT: psrlw $8, %xmm1
	; X64-SSE42-NEXT: pminub %xmm0, %xmm1			; X64-SSE42-NEXT: pminub %xmm0, %xmm1
	; X64-SSE42-NEXT: phminposuw %xmm1, %xmm0			; X64-SSE42-NEXT: phminposuw %xmm1, %xmm0
	; X64-SSE42-NEXT: movd %xmm0, %eax			; X64-SSE42-NEXT: movd %xmm0, %eax
	; X64-SSE42-NEXT: addb $-128, %al			; X64-SSE42-NEXT: addb $-128, %al
	; X64-SSE42-NEXT: ## kill: def $al killed $al killed $eax			; X64-SSE42-NEXT: ## kill: def $al killed $al killed $eax
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX-LABEL: test_reduce_v16i8:			; X64-AVX1OR2-LABEL: test_reduce_v16i8:
	; X64-AVX: ## %bb.0:			; X64-AVX1OR2: ## %bb.0:
	; X64-AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vpsrlw $8, %xmm0, %xmm1			; X64-AVX1OR2-NEXT: vpsrlw $8, %xmm0, %xmm1
	; X64-AVX-NEXT: vpminub %xmm1, %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; X64-AVX-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX-NEXT: vmovd %xmm0, %eax			; X64-AVX1OR2-NEXT: vmovd %xmm0, %eax
	; X64-AVX-NEXT: addb $-128, %al			; X64-AVX1OR2-NEXT: addb $-128, %al
	; X64-AVX-NEXT: ## kill: def $al killed $al killed $eax			; X64-AVX1OR2-NEXT: ## kill: def $al killed $al killed $eax
	; X64-AVX-NEXT: retq			; X64-AVX1OR2-NEXT: retq
				;
				; X64-AVX512-LABEL: test_reduce_v16i8:
				; X64-AVX512: ## %bb.0:
				; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; X64-AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1
				; X64-AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0
				; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
				; X64-AVX512-NEXT: vmovd %xmm0, %eax
				; X64-AVX512-NEXT: addb $-128, %al
				; X64-AVX512-NEXT: ## kill: def $al killed $al killed $eax
				; X64-AVX512-NEXT: retq
	%1 = shufflevector <16 x i8> %a0, <16 x i8> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <16 x i8> %a0, <16 x i8> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp slt <16 x i8> %a0, %1			%2 = icmp slt <16 x i8> %a0, %1
	%3 = select <16 x i1> %2, <16 x i8> %a0, <16 x i8> %1			%3 = select <16 x i1> %2, <16 x i8> %a0, <16 x i8> %1
	%4 = shufflevector <16 x i8> %3, <16 x i8> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%4 = shufflevector <16 x i8> %3, <16 x i8> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%5 = icmp slt <16 x i8> %3, %4			%5 = icmp slt <16 x i8> %3, %4
	%6 = select <16 x i1> %5, <16 x i8> %3, <16 x i8> %4			%6 = select <16 x i1> %5, <16 x i8> %3, <16 x i8> %4
	%7 = shufflevector <16 x i8> %6, <16 x i8> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%7 = shufflevector <16 x i8> %6, <16 x i8> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%8 = icmp slt <16 x i8> %6, %7			%8 = icmp slt <16 x i8> %6, %7
	▲ Show 20 Lines • Show All 413 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-AVX2-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-AVX2-NEXT: vzeroupper			; X64-AVX2-NEXT: vzeroupper
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512-LABEL: test_reduce_v16i16:			; X64-AVX512-LABEL: test_reduce_v16i16:
	; X64-AVX512: ## %bb.0:			; X64-AVX512: ## %bb.0:
	; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1
	; X64-AVX512-NEXT: vpminsw %xmm1, %xmm0, %xmm0			; X64-AVX512-NEXT: vpminsw %xmm1, %xmm0, %xmm0
	; X64-AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX512-NEXT: vmovd %xmm0, %eax			; X64-AVX512-NEXT: vmovd %xmm0, %eax
	; X64-AVX512-NEXT: xorl $32768, %eax ## imm = 0x8000			; X64-AVX512-NEXT: xorl $32768, %eax ## imm = 0x8000
	; X64-AVX512-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-AVX512-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-AVX512-NEXT: vzeroupper			; X64-AVX512-NEXT: vzeroupper
	; X64-AVX512-NEXT: retq			; X64-AVX512-NEXT: retq
	%1 = shufflevector <16 x i16> %a0, <16 x i16> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <16 x i16> %a0, <16 x i16> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp slt <16 x i16> %a0, %1			%2 = icmp slt <16 x i16> %a0, %1
	▲ Show 20 Lines • Show All 167 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: ## kill: def $al killed $al killed $eax			; X64-AVX2-NEXT: ## kill: def $al killed $al killed $eax
	; X64-AVX2-NEXT: vzeroupper			; X64-AVX2-NEXT: vzeroupper
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512-LABEL: test_reduce_v32i8:			; X64-AVX512-LABEL: test_reduce_v32i8:
	; X64-AVX512: ## %bb.0:			; X64-AVX512: ## %bb.0:
	; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1
	; X64-AVX512-NEXT: vpminsb %xmm1, %xmm0, %xmm0			; X64-AVX512-NEXT: vpminsb %xmm1, %xmm0, %xmm0
	; X64-AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; X64-AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1			; X64-AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1
	; X64-AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0			; X64-AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX512-NEXT: vmovd %xmm0, %eax			; X64-AVX512-NEXT: vmovd %xmm0, %eax
	; X64-AVX512-NEXT: addb $-128, %al			; X64-AVX512-NEXT: addb $-128, %al
	; X64-AVX512-NEXT: ## kill: def $al killed $al killed $eax			; X64-AVX512-NEXT: ## kill: def $al killed $al killed $eax
	; X64-AVX512-NEXT: vzeroupper			; X64-AVX512-NEXT: vzeroupper
	; X64-AVX512-NEXT: retq			; X64-AVX512-NEXT: retq
	▲ Show 20 Lines • Show All 570 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512-LABEL: test_reduce_v32i16:			; X64-AVX512-LABEL: test_reduce_v32i16:
	; X64-AVX512: ## %bb.0:			; X64-AVX512: ## %bb.0:
	; X64-AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; X64-AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; X64-AVX512-NEXT: vpminsw %ymm1, %ymm0, %ymm0			; X64-AVX512-NEXT: vpminsw %ymm1, %ymm0, %ymm0
	; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1
	; X64-AVX512-NEXT: vpminsw %xmm1, %xmm0, %xmm0			; X64-AVX512-NEXT: vpminsw %xmm1, %xmm0, %xmm0
	; X64-AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX512-NEXT: vmovd %xmm0, %eax			; X64-AVX512-NEXT: vmovd %xmm0, %eax
	; X64-AVX512-NEXT: xorl $32768, %eax ## imm = 0x8000			; X64-AVX512-NEXT: xorl $32768, %eax ## imm = 0x8000
	; X64-AVX512-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-AVX512-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-AVX512-NEXT: vzeroupper			; X64-AVX512-NEXT: vzeroupper
	; X64-AVX512-NEXT: retq			; X64-AVX512-NEXT: retq
	%1 = shufflevector <32 x i16> %a0, <32 x i16> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <32 x i16> %a0, <32 x i16> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp slt <32 x i16> %a0, %1			%2 = icmp slt <32 x i16> %a0, %1
	▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512-LABEL: test_reduce_v64i8:			; X64-AVX512-LABEL: test_reduce_v64i8:
	; X64-AVX512: ## %bb.0:			; X64-AVX512: ## %bb.0:
	; X64-AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; X64-AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; X64-AVX512-NEXT: vpminsb %ymm1, %ymm0, %ymm0			; X64-AVX512-NEXT: vpminsb %ymm1, %ymm0, %ymm0
	; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; X64-AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1
	; X64-AVX512-NEXT: vpminsb %xmm1, %xmm0, %xmm0			; X64-AVX512-NEXT: vpminsb %xmm1, %xmm0, %xmm0
	; X64-AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; X64-AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1			; X64-AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1
	; X64-AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0			; X64-AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX512-NEXT: vmovd %xmm0, %eax			; X64-AVX512-NEXT: vmovd %xmm0, %eax
	; X64-AVX512-NEXT: addb $-128, %al			; X64-AVX512-NEXT: addb $-128, %al
	; X64-AVX512-NEXT: ## kill: def $al killed $al killed $eax			; X64-AVX512-NEXT: ## kill: def $al killed $al killed $eax
	; X64-AVX512-NEXT: vzeroupper			; X64-AVX512-NEXT: vzeroupper
	; X64-AVX512-NEXT: retq			; X64-AVX512-NEXT: retq
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; X64-SSE42: ## %bb.0:			; X64-SSE42: ## %bb.0:
	; X64-SSE42-NEXT: pxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; X64-SSE42-NEXT: pxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; X64-SSE42-NEXT: phminposuw %xmm0, %xmm0			; X64-SSE42-NEXT: phminposuw %xmm0, %xmm0
	; X64-SSE42-NEXT: movd %xmm0, %eax			; X64-SSE42-NEXT: movd %xmm0, %eax
	; X64-SSE42-NEXT: xorl $32768, %eax ## imm = 0x8000			; X64-SSE42-NEXT: xorl $32768, %eax ## imm = 0x8000
	; X64-SSE42-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-SSE42-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX-LABEL: test_reduce_v16i16_v8i16:			; X64-AVX1OR2-LABEL: test_reduce_v16i16_v8i16:
	; X64-AVX: ## %bb.0:			; X64-AVX1OR2: ## %bb.0:
	; X64-AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX-NEXT: vmovd %xmm0, %eax			; X64-AVX1OR2-NEXT: vmovd %xmm0, %eax
	; X64-AVX-NEXT: xorl $32768, %eax ## imm = 0x8000			; X64-AVX1OR2-NEXT: xorl $32768, %eax ## imm = 0x8000
	; X64-AVX-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-AVX1OR2-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-AVX-NEXT: vzeroupper			; X64-AVX1OR2-NEXT: vzeroupper
	; X64-AVX-NEXT: retq			; X64-AVX1OR2-NEXT: retq
				;
				; X64-AVX512-LABEL: test_reduce_v16i16_v8i16:
				; X64-AVX512: ## %bb.0:
				; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
				; X64-AVX512-NEXT: vmovd %xmm0, %eax
				; X64-AVX512-NEXT: xorl $32768, %eax ## imm = 0x8000
				; X64-AVX512-NEXT: ## kill: def $ax killed $ax killed $eax
				; X64-AVX512-NEXT: vzeroupper
				; X64-AVX512-NEXT: retq
	%1 = shufflevector <16 x i16> %a0, <16 x i16> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <16 x i16> %a0, <16 x i16> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp slt <16 x i16> %a0, %1			%2 = icmp slt <16 x i16> %a0, %1
	%3 = select <16 x i1> %2, <16 x i16> %a0, <16 x i16> %1			%3 = select <16 x i1> %2, <16 x i16> %a0, <16 x i16> %1
	%4 = shufflevector <16 x i16> %3, <16 x i16> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%4 = shufflevector <16 x i16> %3, <16 x i16> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%5 = icmp slt <16 x i16> %3, %4			%5 = icmp slt <16 x i16> %3, %4
	%6 = select <16 x i1> %5, <16 x i16> %3, <16 x i16> %4			%6 = select <16 x i1> %5, <16 x i16> %3, <16 x i16> %4
	%7 = shufflevector <16 x i16> %6, <16 x i16> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%7 = shufflevector <16 x i16> %6, <16 x i16> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%8 = icmp slt <16 x i16> %6, %7			%8 = icmp slt <16 x i16> %6, %7
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; X64-SSE42: ## %bb.0:			; X64-SSE42: ## %bb.0:
	; X64-SSE42-NEXT: pxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; X64-SSE42-NEXT: pxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; X64-SSE42-NEXT: phminposuw %xmm0, %xmm0			; X64-SSE42-NEXT: phminposuw %xmm0, %xmm0
	; X64-SSE42-NEXT: movd %xmm0, %eax			; X64-SSE42-NEXT: movd %xmm0, %eax
	; X64-SSE42-NEXT: xorl $32768, %eax ## imm = 0x8000			; X64-SSE42-NEXT: xorl $32768, %eax ## imm = 0x8000
	; X64-SSE42-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-SSE42-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX-LABEL: test_reduce_v32i16_v8i16:			; X64-AVX1OR2-LABEL: test_reduce_v32i16_v8i16:
	; X64-AVX: ## %bb.0:			; X64-AVX1OR2: ## %bb.0:
	; X64-AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX-NEXT: vmovd %xmm0, %eax			; X64-AVX1OR2-NEXT: vmovd %xmm0, %eax
	; X64-AVX-NEXT: xorl $32768, %eax ## imm = 0x8000			; X64-AVX1OR2-NEXT: xorl $32768, %eax ## imm = 0x8000
	; X64-AVX-NEXT: ## kill: def $ax killed $ax killed $eax			; X64-AVX1OR2-NEXT: ## kill: def $ax killed $ax killed $eax
	; X64-AVX-NEXT: vzeroupper			; X64-AVX1OR2-NEXT: vzeroupper
	; X64-AVX-NEXT: retq			; X64-AVX1OR2-NEXT: retq
				;
				; X64-AVX512-LABEL: test_reduce_v32i16_v8i16:
				; X64-AVX512: ## %bb.0:
				; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
				; X64-AVX512-NEXT: vmovd %xmm0, %eax
				; X64-AVX512-NEXT: xorl $32768, %eax ## imm = 0x8000
				; X64-AVX512-NEXT: ## kill: def $ax killed $ax killed $eax
				; X64-AVX512-NEXT: vzeroupper
				; X64-AVX512-NEXT: retq
	%1 = shufflevector <32 x i16> %a0, <32 x i16> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <32 x i16> %a0, <32 x i16> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp slt <32 x i16> %a0, %1			%2 = icmp slt <32 x i16> %a0, %1
	%3 = select <32 x i1> %2, <32 x i16> %a0, <32 x i16> %1			%3 = select <32 x i1> %2, <32 x i16> %a0, <32 x i16> %1
	%4 = shufflevector <32 x i16> %3, <32 x i16> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%4 = shufflevector <32 x i16> %3, <32 x i16> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%5 = icmp slt <32 x i16> %3, %4			%5 = icmp slt <32 x i16> %3, %4
	%6 = select <32 x i1> %5, <32 x i16> %3, <32 x i16> %4			%6 = select <32 x i1> %5, <32 x i16> %3, <32 x i16> %4
	%7 = shufflevector <32 x i16> %6, <32 x i16> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%7 = shufflevector <32 x i16> %6, <32 x i16> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%8 = icmp slt <32 x i16> %6, %7			%8 = icmp slt <32 x i16> %6, %7
	▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
	; X64-SSE42-NEXT: psrlw $8, %xmm1			; X64-SSE42-NEXT: psrlw $8, %xmm1
	; X64-SSE42-NEXT: pminub %xmm0, %xmm1			; X64-SSE42-NEXT: pminub %xmm0, %xmm1
	; X64-SSE42-NEXT: phminposuw %xmm1, %xmm0			; X64-SSE42-NEXT: phminposuw %xmm1, %xmm0
	; X64-SSE42-NEXT: movd %xmm0, %eax			; X64-SSE42-NEXT: movd %xmm0, %eax
	; X64-SSE42-NEXT: addb $-128, %al			; X64-SSE42-NEXT: addb $-128, %al
	; X64-SSE42-NEXT: ## kill: def $al killed $al killed $eax			; X64-SSE42-NEXT: ## kill: def $al killed $al killed $eax
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX-LABEL: test_reduce_v32i8_v16i8:			; X64-AVX1OR2-LABEL: test_reduce_v32i8_v16i8:
	; X64-AVX: ## %bb.0:			; X64-AVX1OR2: ## %bb.0:
	; X64-AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vpsrlw $8, %xmm0, %xmm1			; X64-AVX1OR2-NEXT: vpsrlw $8, %xmm0, %xmm1
	; X64-AVX-NEXT: vpminub %xmm1, %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; X64-AVX-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX-NEXT: vmovd %xmm0, %eax			; X64-AVX1OR2-NEXT: vmovd %xmm0, %eax
	; X64-AVX-NEXT: addb $-128, %al			; X64-AVX1OR2-NEXT: addb $-128, %al
	; X64-AVX-NEXT: ## kill: def $al killed $al killed $eax			; X64-AVX1OR2-NEXT: ## kill: def $al killed $al killed $eax
	; X64-AVX-NEXT: vzeroupper			; X64-AVX1OR2-NEXT: vzeroupper
	; X64-AVX-NEXT: retq			; X64-AVX1OR2-NEXT: retq
				;
				; X64-AVX512-LABEL: test_reduce_v32i8_v16i8:
				; X64-AVX512: ## %bb.0:
				; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; X64-AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1
				; X64-AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0
				; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
				; X64-AVX512-NEXT: vmovd %xmm0, %eax
				; X64-AVX512-NEXT: addb $-128, %al
				; X64-AVX512-NEXT: ## kill: def $al killed $al killed $eax
				; X64-AVX512-NEXT: vzeroupper
				; X64-AVX512-NEXT: retq
	%1 = shufflevector <32 x i8> %a0, <32 x i8> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <32 x i8> %a0, <32 x i8> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp slt <32 x i8> %a0, %1			%2 = icmp slt <32 x i8> %a0, %1
	%3 = select <32 x i1> %2, <32 x i8> %a0, <32 x i8> %1			%3 = select <32 x i1> %2, <32 x i8> %a0, <32 x i8> %1
	%4 = shufflevector <32 x i8> %3, <32 x i8> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%4 = shufflevector <32 x i8> %3, <32 x i8> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%5 = icmp slt <32 x i8> %3, %4			%5 = icmp slt <32 x i8> %3, %4
	%6 = select <32 x i1> %5, <32 x i8> %3, <32 x i8> %4			%6 = select <32 x i1> %5, <32 x i8> %3, <32 x i8> %4
	%7 = shufflevector <32 x i8> %6, <32 x i8> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%7 = shufflevector <32 x i8> %6, <32 x i8> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%8 = icmp slt <32 x i8> %6, %7			%8 = icmp slt <32 x i8> %6, %7
	▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	; X64-SSE42-NEXT: psrlw $8, %xmm1			; X64-SSE42-NEXT: psrlw $8, %xmm1
	; X64-SSE42-NEXT: pminub %xmm0, %xmm1			; X64-SSE42-NEXT: pminub %xmm0, %xmm1
	; X64-SSE42-NEXT: phminposuw %xmm1, %xmm0			; X64-SSE42-NEXT: phminposuw %xmm1, %xmm0
	; X64-SSE42-NEXT: movd %xmm0, %eax			; X64-SSE42-NEXT: movd %xmm0, %eax
	; X64-SSE42-NEXT: addb $-128, %al			; X64-SSE42-NEXT: addb $-128, %al
	; X64-SSE42-NEXT: ## kill: def $al killed $al killed $eax			; X64-SSE42-NEXT: ## kill: def $al killed $al killed $eax
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX-LABEL: test_reduce_v64i8_v16i8:			; X64-AVX1OR2-LABEL: test_reduce_v64i8_v16i8:
	; X64-AVX: ## %bb.0:			; X64-AVX1OR2: ## %bb.0:
	; X64-AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vpsrlw $8, %xmm0, %xmm1			; X64-AVX1OR2-NEXT: vpsrlw $8, %xmm0, %xmm1
	; X64-AVX-NEXT: vpminub %xmm1, %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; X64-AVX-NEXT: vphminposuw %xmm0, %xmm0			; X64-AVX1OR2-NEXT: vphminposuw %xmm0, %xmm0
	; X64-AVX-NEXT: vmovd %xmm0, %eax			; X64-AVX1OR2-NEXT: vmovd %xmm0, %eax
	; X64-AVX-NEXT: addb $-128, %al			; X64-AVX1OR2-NEXT: addb $-128, %al
	; X64-AVX-NEXT: ## kill: def $al killed $al killed $eax			; X64-AVX1OR2-NEXT: ## kill: def $al killed $al killed $eax
	; X64-AVX-NEXT: vzeroupper			; X64-AVX1OR2-NEXT: vzeroupper
	; X64-AVX-NEXT: retq			; X64-AVX1OR2-NEXT: retq
				;
				; X64-AVX512-LABEL: test_reduce_v64i8_v16i8:
				; X64-AVX512: ## %bb.0:
				; X64-AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; X64-AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1
				; X64-AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0
				; X64-AVX512-NEXT: vphminposuw %xmm0, %xmm0
				; X64-AVX512-NEXT: vmovd %xmm0, %eax
				; X64-AVX512-NEXT: addb $-128, %al
				; X64-AVX512-NEXT: ## kill: def $al killed $al killed $eax
				; X64-AVX512-NEXT: vzeroupper
				; X64-AVX512-NEXT: retq
	%1 = shufflevector <64 x i8> %a0, <64 x i8> undef, <64 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <64 x i8> %a0, <64 x i8> undef, <64 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = icmp slt <64 x i8> %a0, %1			%2 = icmp slt <64 x i8> %a0, %1
	%3 = select <64 x i1> %2, <64 x i8> %a0, <64 x i8> %1			%3 = select <64 x i1> %2, <64 x i8> %a0, <64 x i8> %1
	%4 = shufflevector <64 x i8> %3, <64 x i8> undef, <64 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%4 = shufflevector <64 x i8> %3, <64 x i8> undef, <64 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%5 = icmp slt <64 x i8> %3, %4			%5 = icmp slt <64 x i8> %3, %4
	%6 = select <64 x i1> %5, <64 x i8> %3, <64 x i8> %4			%6 = select <64 x i1> %5, <64 x i8> %3, <64 x i8> %4
	%7 = shufflevector <64 x i8> %6, <64 x i8> undef, <64 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%7 = shufflevector <64 x i8> %6, <64 x i8> undef, <64 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%8 = icmp slt <64 x i8> %6, %7			%8 = icmp slt <64 x i8> %6, %7
	%9 = select <64 x i1> %8, <64 x i8> %6, <64 x i8> %7			%9 = select <64 x i1> %8, <64 x i8> %6, <64 x i8> %7
	%10 = shufflevector <64 x i8> %9, <64 x i8> undef, <64 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%10 = shufflevector <64 x i8> %9, <64 x i8> undef, <64 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%11 = icmp slt <64 x i8> %9, %10			%11 = icmp slt <64 x i8> %9, %10
	%12 = select <64 x i1> %11, <64 x i8> %9, <64 x i8> %10			%12 = select <64 x i1> %11, <64 x i8> %9, <64 x i8> %10
	%13 = extractelement <64 x i8> %12, i32 0			%13 = extractelement <64 x i8> %12, i32 0
	ret i8 %13			ret i8 %13
	}			}

llvm/test/CodeGen/X86/i64-to-float.ll

	Show First 20 Lines • Show All 299 Lines • ▼ Show 20 Lines
	; X86-AVX-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm2			; X86-AVX-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm2
	; X86-AVX-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0			; X86-AVX-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
	; X86-AVX-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]			; X86-AVX-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
	; X86-AVX-NEXT: vcvtdq2pd %xmm0, %xmm0			; X86-AVX-NEXT: vcvtdq2pd %xmm0, %xmm0
	; X86-AVX-NEXT: retl			; X86-AVX-NEXT: retl
	;			;
	; X86-AVX512F-LABEL: clamp_sitofp_2i64_2f64:			; X86-AVX512F-LABEL: clamp_sitofp_2i64_2f64:
	; X86-AVX512F: # %bb.0:			; X86-AVX512F: # %bb.0:
	; X86-AVX512F-NEXT: vpmaxsq {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0			; X86-AVX512F-NEXT: vpmaxsq {{\.?LCPI[0-9]+_[0-9]+}}{1to2}, %xmm0, %xmm0
	; X86-AVX512F-NEXT: vpminsq {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0			; X86-AVX512F-NEXT: vpminsq {{\.?LCPI[0-9]+_[0-9]+}}{1to2}, %xmm0, %xmm0
	; X86-AVX512F-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]			; X86-AVX512F-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
	; X86-AVX512F-NEXT: vcvtdq2pd %xmm0, %xmm0			; X86-AVX512F-NEXT: vcvtdq2pd %xmm0, %xmm0
	; X86-AVX512F-NEXT: retl			; X86-AVX512F-NEXT: retl
	;			;
	; X86-AVX512DQ-LABEL: clamp_sitofp_2i64_2f64:			; X86-AVX512DQ-LABEL: clamp_sitofp_2i64_2f64:
	; X86-AVX512DQ: # %bb.0:			; X86-AVX512DQ: # %bb.0:
	; X86-AVX512DQ-NEXT: vpmaxsq {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0			; X86-AVX512DQ-NEXT: vpmaxsq {{\.?LCPI[0-9]+_[0-9]+}}{1to2}, %xmm0, %xmm0
	; X86-AVX512DQ-NEXT: vpminsq {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0			; X86-AVX512DQ-NEXT: vpminsq {{\.?LCPI[0-9]+_[0-9]+}}{1to2}, %xmm0, %xmm0
	; X86-AVX512DQ-NEXT: vcvtqq2pd %xmm0, %xmm0			; X86-AVX512DQ-NEXT: vcvtqq2pd %xmm0, %xmm0
	; X86-AVX512DQ-NEXT: retl			; X86-AVX512DQ-NEXT: retl
	;			;
	; X64-SSE-LABEL: clamp_sitofp_2i64_2f64:			; X64-SSE-LABEL: clamp_sitofp_2i64_2f64:
	; X64-SSE: # %bb.0:			; X64-SSE: # %bb.0:
	; X64-SSE-NEXT: movdqa {{.*#+}} xmm1 = [2147483648,2147483648]			; X64-SSE-NEXT: movdqa {{.*#+}} xmm1 = [2147483648,2147483648]
	; X64-SSE-NEXT: movdqa %xmm0, %xmm2			; X64-SSE-NEXT: movdqa %xmm0, %xmm2
	; X64-SSE-NEXT: pxor %xmm1, %xmm2			; X64-SSE-NEXT: pxor %xmm1, %xmm2
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/icmp-pow2-diff.ll

Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	; SSE-NEXT: retq
%cmp2 = icmp ne <8 x i16> %x, <i16 -16385, i16 -257, i16 -33, i16 -8193, i16 -16385, i16 -257, i16 -33, i16 -8193>		%cmp2 = icmp ne <8 x i16> %x, <i16 -16385, i16 -257, i16 -33, i16 -8193, i16 -16385, i16 -257, i16 -33, i16 -8193>
%r = and <8 x i1> %cmp1, %cmp2		%r = and <8 x i1> %cmp1, %cmp2
ret <8 x i1> %r		ret <8 x i1> %r
}		}

define <8 x i1> @andnot_ne_v8i16(<8 x i16> %x) nounwind {		define <8 x i1> @andnot_ne_v8i16(<8 x i16> %x) nounwind {
; AVX512-LABEL: andnot_ne_v8i16:		; AVX512-LABEL: andnot_ne_v8i16:
; AVX512: # %bb.0:		; AVX512: # %bb.0:
; AVX512-NEXT: vpandn {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; AVX512-NEXT: vpandnd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
; AVX512-NEXT: vpxor %xmm1, %xmm1, %xmm1		; AVX512-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX512-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0		; AVX512-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
; AVX512-NEXT: vpternlogq $15, %xmm0, %xmm0, %xmm0		; AVX512-NEXT: vpternlogq $15, %xmm0, %xmm0, %xmm0
; AVX512-NEXT: retq		; AVX512-NEXT: retq
;		;
; AVX2-LABEL: andnot_ne_v8i16:		; AVX2-LABEL: andnot_ne_v8i16:
; AVX2: # %bb.0:		; AVX2: # %bb.0:
; AVX2-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1		; AVX2-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	; SSE-NEXT: retq
%cmp2 = icmp ne <16 x i8> %x, <i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127>		%cmp2 = icmp ne <16 x i8> %x, <i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127>
%r = and <16 x i1> %cmp1, %cmp2		%r = and <16 x i1> %cmp1, %cmp2
ret <16 x i1> %r		ret <16 x i1> %r
}		}

define <16 x i1> @andnot_ne_v16i8(<16 x i8> %x) nounwind {		define <16 x i1> @andnot_ne_v16i8(<16 x i8> %x) nounwind {
; AVX512-LABEL: andnot_ne_v16i8:		; AVX512-LABEL: andnot_ne_v16i8:
; AVX512: # %bb.0:		; AVX512: # %bb.0:
; AVX512-NEXT: vpandn {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; AVX512-NEXT: vpandnd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
; AVX512-NEXT: vpxor %xmm1, %xmm1, %xmm1		; AVX512-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX512-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0		; AVX512-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0
; AVX512-NEXT: vpternlogq $15, %xmm0, %xmm0, %xmm0		; AVX512-NEXT: vpternlogq $15, %xmm0, %xmm0, %xmm0
; AVX512-NEXT: retq		; AVX512-NEXT: retq
;		;
; AVX2-LABEL: andnot_ne_v16i8:		; AVX2-LABEL: andnot_ne_v16i8:
; AVX2: # %bb.0:		; AVX2: # %bb.0:
; AVX2-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1		; AVX2-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1
▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/midpoint-int-vec-128.ll

	Show First 20 Lines • Show All 2,548 Lines • ▼ Show 20 Lines
	;			;
	; AVX512VL-FALLBACK-LABEL: vec128_i8_signed_reg_reg:			; AVX512VL-FALLBACK-LABEL: vec128_i8_signed_reg_reg:
	; AVX512VL-FALLBACK: # %bb.0:			; AVX512VL-FALLBACK: # %bb.0:
	; AVX512VL-FALLBACK-NEXT: vpminsb %xmm1, %xmm0, %xmm2			; AVX512VL-FALLBACK-NEXT: vpminsb %xmm1, %xmm0, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %xmm1, %xmm0, %xmm3			; AVX512VL-FALLBACK-NEXT: vpmaxsb %xmm1, %xmm0, %xmm3
	; AVX512VL-FALLBACK-NEXT: vpsubb %xmm2, %xmm3, %xmm2			; AVX512VL-FALLBACK-NEXT: vpsubb %xmm2, %xmm3, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %xmm2, %xmm2			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %xmm2, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpcmpgtb %xmm1, %xmm0, %xmm1			; AVX512VL-FALLBACK-NEXT: vpcmpgtb %xmm1, %xmm0, %xmm1
	; AVX512VL-FALLBACK-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm2			; AVX512VL-FALLBACK-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpsubb %xmm1, %xmm2, %xmm1			; AVX512VL-FALLBACK-NEXT: vpsubb %xmm1, %xmm2, %xmm1
	; AVX512VL-FALLBACK-NEXT: vpaddb %xmm0, %xmm1, %xmm0			; AVX512VL-FALLBACK-NEXT: vpaddb %xmm0, %xmm1, %xmm0
	; AVX512VL-FALLBACK-NEXT: retq			; AVX512VL-FALLBACK-NEXT: retq
	;			;
	; AVX512BW-FALLBACK-LABEL: vec128_i8_signed_reg_reg:			; AVX512BW-FALLBACK-LABEL: vec128_i8_signed_reg_reg:
	; AVX512BW-FALLBACK: # %bb.0:			; AVX512BW-FALLBACK: # %bb.0:
	; AVX512BW-FALLBACK-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; AVX512BW-FALLBACK-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; AVX512BW-FALLBACK-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; AVX512BW-FALLBACK-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	Show All 16 Lines
	;			;
	; AVX512VLBW-LABEL: vec128_i8_signed_reg_reg:			; AVX512VLBW-LABEL: vec128_i8_signed_reg_reg:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpcmpgtb %xmm1, %xmm0, %k1			; AVX512VLBW-NEXT: vpcmpgtb %xmm1, %xmm0, %k1
	; AVX512VLBW-NEXT: vpminsb %xmm1, %xmm0, %xmm2			; AVX512VLBW-NEXT: vpminsb %xmm1, %xmm0, %xmm2
	; AVX512VLBW-NEXT: vpmaxsb %xmm1, %xmm0, %xmm1			; AVX512VLBW-NEXT: vpmaxsb %xmm1, %xmm0, %xmm1
	; AVX512VLBW-NEXT: vpsubb %xmm2, %xmm1, %xmm1			; AVX512VLBW-NEXT: vpsubb %xmm2, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpsrlw $1, %xmm1, %xmm1			; AVX512VLBW-NEXT: vpsrlw $1, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpsubb %xmm1, %xmm2, %xmm1 {%k1}			; AVX512VLBW-NEXT: vpsubb %xmm1, %xmm2, %xmm1 {%k1}
	; AVX512VLBW-NEXT: vpaddb %xmm0, %xmm1, %xmm0			; AVX512VLBW-NEXT: vpaddb %xmm0, %xmm1, %xmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	%t3 = icmp sgt <16 x i8> %a1, %a2 ; signed			%t3 = icmp sgt <16 x i8> %a1, %a2 ; signed
	%t4 = select <16 x i1> %t3, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t4 = select <16 x i1> %t3, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t5 = select <16 x i1> %t3, <16 x i8> %a2, <16 x i8> %a1			%t5 = select <16 x i1> %t3, <16 x i8> %a2, <16 x i8> %a1
	%t6 = select <16 x i1> %t3, <16 x i8> %a1, <16 x i8> %a2			%t6 = select <16 x i1> %t3, <16 x i8> %a1, <16 x i8> %a2
	▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines
	; AVX512VL-FALLBACK-LABEL: vec128_i8_unsigned_reg_reg:			; AVX512VL-FALLBACK-LABEL: vec128_i8_unsigned_reg_reg:
	; AVX512VL-FALLBACK: # %bb.0:			; AVX512VL-FALLBACK: # %bb.0:
	; AVX512VL-FALLBACK-NEXT: vpminub %xmm1, %xmm0, %xmm2			; AVX512VL-FALLBACK-NEXT: vpminub %xmm1, %xmm0, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpmaxub %xmm1, %xmm0, %xmm1			; AVX512VL-FALLBACK-NEXT: vpmaxub %xmm1, %xmm0, %xmm1
	; AVX512VL-FALLBACK-NEXT: vpsubb %xmm2, %xmm1, %xmm1			; AVX512VL-FALLBACK-NEXT: vpsubb %xmm2, %xmm1, %xmm1
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %xmm1, %xmm1			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %xmm1, %xmm1
	; AVX512VL-FALLBACK-NEXT: vpcmpeqb %xmm2, %xmm0, %xmm2			; AVX512VL-FALLBACK-NEXT: vpcmpeqb %xmm2, %xmm0, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpternlogq $15, %xmm2, %xmm2, %xmm2			; AVX512VL-FALLBACK-NEXT: vpternlogq $15, %xmm2, %xmm2, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm1			; AVX512VL-FALLBACK-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm2, %xmm1
	; AVX512VL-FALLBACK-NEXT: vpsubb %xmm2, %xmm1, %xmm1			; AVX512VL-FALLBACK-NEXT: vpsubb %xmm2, %xmm1, %xmm1
	; AVX512VL-FALLBACK-NEXT: vpaddb %xmm0, %xmm1, %xmm0			; AVX512VL-FALLBACK-NEXT: vpaddb %xmm0, %xmm1, %xmm0
	; AVX512VL-FALLBACK-NEXT: retq			; AVX512VL-FALLBACK-NEXT: retq
	;			;
	; AVX512BW-FALLBACK-LABEL: vec128_i8_unsigned_reg_reg:			; AVX512BW-FALLBACK-LABEL: vec128_i8_unsigned_reg_reg:
	; AVX512BW-FALLBACK: # %bb.0:			; AVX512BW-FALLBACK: # %bb.0:
	; AVX512BW-FALLBACK-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; AVX512BW-FALLBACK-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; AVX512BW-FALLBACK-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; AVX512BW-FALLBACK-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	Show All 16 Lines
	;			;
	; AVX512VLBW-LABEL: vec128_i8_unsigned_reg_reg:			; AVX512VLBW-LABEL: vec128_i8_unsigned_reg_reg:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpcmpnleub %xmm1, %xmm0, %k1			; AVX512VLBW-NEXT: vpcmpnleub %xmm1, %xmm0, %k1
	; AVX512VLBW-NEXT: vpminub %xmm1, %xmm0, %xmm2			; AVX512VLBW-NEXT: vpminub %xmm1, %xmm0, %xmm2
	; AVX512VLBW-NEXT: vpmaxub %xmm1, %xmm0, %xmm1			; AVX512VLBW-NEXT: vpmaxub %xmm1, %xmm0, %xmm1
	; AVX512VLBW-NEXT: vpsubb %xmm2, %xmm1, %xmm1			; AVX512VLBW-NEXT: vpsubb %xmm2, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpsrlw $1, %xmm1, %xmm1			; AVX512VLBW-NEXT: vpsrlw $1, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpsubb %xmm1, %xmm2, %xmm1 {%k1}			; AVX512VLBW-NEXT: vpsubb %xmm1, %xmm2, %xmm1 {%k1}
	; AVX512VLBW-NEXT: vpaddb %xmm0, %xmm1, %xmm0			; AVX512VLBW-NEXT: vpaddb %xmm0, %xmm1, %xmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	%t3 = icmp ugt <16 x i8> %a1, %a2			%t3 = icmp ugt <16 x i8> %a1, %a2
	%t4 = select <16 x i1> %t3, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t4 = select <16 x i1> %t3, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t5 = select <16 x i1> %t3, <16 x i8> %a2, <16 x i8> %a1			%t5 = select <16 x i1> %t3, <16 x i8> %a2, <16 x i8> %a1
	%t6 = select <16 x i1> %t3, <16 x i8> %a1, <16 x i8> %a2			%t6 = select <16 x i1> %t3, <16 x i8> %a1, <16 x i8> %a2
	▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines
	; AVX512VL-FALLBACK-LABEL: vec128_i8_signed_mem_reg:			; AVX512VL-FALLBACK-LABEL: vec128_i8_signed_mem_reg:
	; AVX512VL-FALLBACK: # %bb.0:			; AVX512VL-FALLBACK: # %bb.0:
	; AVX512VL-FALLBACK-NEXT: vmovdqa (%rdi), %xmm1			; AVX512VL-FALLBACK-NEXT: vmovdqa (%rdi), %xmm1
	; AVX512VL-FALLBACK-NEXT: vpminsb %xmm0, %xmm1, %xmm2			; AVX512VL-FALLBACK-NEXT: vpminsb %xmm0, %xmm1, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %xmm0, %xmm1, %xmm3			; AVX512VL-FALLBACK-NEXT: vpmaxsb %xmm0, %xmm1, %xmm3
	; AVX512VL-FALLBACK-NEXT: vpsubb %xmm2, %xmm3, %xmm2			; AVX512VL-FALLBACK-NEXT: vpsubb %xmm2, %xmm3, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %xmm2, %xmm2			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %xmm2, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm0			; AVX512VL-FALLBACK-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm0
	; AVX512VL-FALLBACK-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm2			; AVX512VL-FALLBACK-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpsubb %xmm0, %xmm2, %xmm0			; AVX512VL-FALLBACK-NEXT: vpsubb %xmm0, %xmm2, %xmm0
	; AVX512VL-FALLBACK-NEXT: vpaddb %xmm1, %xmm0, %xmm0			; AVX512VL-FALLBACK-NEXT: vpaddb %xmm1, %xmm0, %xmm0
	; AVX512VL-FALLBACK-NEXT: retq			; AVX512VL-FALLBACK-NEXT: retq
	;			;
	; AVX512BW-FALLBACK-LABEL: vec128_i8_signed_mem_reg:			; AVX512BW-FALLBACK-LABEL: vec128_i8_signed_mem_reg:
	; AVX512BW-FALLBACK: # %bb.0:			; AVX512BW-FALLBACK: # %bb.0:
	; AVX512BW-FALLBACK-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; AVX512BW-FALLBACK-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; AVX512BW-FALLBACK-NEXT: vmovdqa (%rdi), %xmm1			; AVX512BW-FALLBACK-NEXT: vmovdqa (%rdi), %xmm1
	Show All 17 Lines
	; AVX512VLBW-LABEL: vec128_i8_signed_mem_reg:			; AVX512VLBW-LABEL: vec128_i8_signed_mem_reg:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vmovdqa (%rdi), %xmm1			; AVX512VLBW-NEXT: vmovdqa (%rdi), %xmm1
	; AVX512VLBW-NEXT: vpcmpgtb %xmm0, %xmm1, %k1			; AVX512VLBW-NEXT: vpcmpgtb %xmm0, %xmm1, %k1
	; AVX512VLBW-NEXT: vpminsb %xmm0, %xmm1, %xmm2			; AVX512VLBW-NEXT: vpminsb %xmm0, %xmm1, %xmm2
	; AVX512VLBW-NEXT: vpmaxsb %xmm0, %xmm1, %xmm0			; AVX512VLBW-NEXT: vpmaxsb %xmm0, %xmm1, %xmm0
	; AVX512VLBW-NEXT: vpsubb %xmm2, %xmm0, %xmm0			; AVX512VLBW-NEXT: vpsubb %xmm2, %xmm0, %xmm0
	; AVX512VLBW-NEXT: vpsrlw $1, %xmm0, %xmm0			; AVX512VLBW-NEXT: vpsrlw $1, %xmm0, %xmm0
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpsubb %xmm0, %xmm2, %xmm0 {%k1}			; AVX512VLBW-NEXT: vpsubb %xmm0, %xmm2, %xmm0 {%k1}
	; AVX512VLBW-NEXT: vpaddb %xmm1, %xmm0, %xmm0			; AVX512VLBW-NEXT: vpaddb %xmm1, %xmm0, %xmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	%a1 = load <16 x i8>, ptr %a1_addr			%a1 = load <16 x i8>, ptr %a1_addr
	%t3 = icmp sgt <16 x i8> %a1, %a2 ; signed			%t3 = icmp sgt <16 x i8> %a1, %a2 ; signed
	%t4 = select <16 x i1> %t3, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t4 = select <16 x i1> %t3, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t5 = select <16 x i1> %t3, <16 x i8> %a2, <16 x i8> %a1			%t5 = select <16 x i1> %t3, <16 x i8> %a2, <16 x i8> %a1
	▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines
	; AVX512VL-FALLBACK-LABEL: vec128_i8_signed_reg_mem:			; AVX512VL-FALLBACK-LABEL: vec128_i8_signed_reg_mem:
	; AVX512VL-FALLBACK: # %bb.0:			; AVX512VL-FALLBACK: # %bb.0:
	; AVX512VL-FALLBACK-NEXT: vmovdqa (%rdi), %xmm1			; AVX512VL-FALLBACK-NEXT: vmovdqa (%rdi), %xmm1
	; AVX512VL-FALLBACK-NEXT: vpminsb %xmm1, %xmm0, %xmm2			; AVX512VL-FALLBACK-NEXT: vpminsb %xmm1, %xmm0, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %xmm1, %xmm0, %xmm3			; AVX512VL-FALLBACK-NEXT: vpmaxsb %xmm1, %xmm0, %xmm3
	; AVX512VL-FALLBACK-NEXT: vpsubb %xmm2, %xmm3, %xmm2			; AVX512VL-FALLBACK-NEXT: vpsubb %xmm2, %xmm3, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %xmm2, %xmm2			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %xmm2, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpcmpgtb %xmm1, %xmm0, %xmm1			; AVX512VL-FALLBACK-NEXT: vpcmpgtb %xmm1, %xmm0, %xmm1
	; AVX512VL-FALLBACK-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm2			; AVX512VL-FALLBACK-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpsubb %xmm1, %xmm2, %xmm1			; AVX512VL-FALLBACK-NEXT: vpsubb %xmm1, %xmm2, %xmm1
	; AVX512VL-FALLBACK-NEXT: vpaddb %xmm0, %xmm1, %xmm0			; AVX512VL-FALLBACK-NEXT: vpaddb %xmm0, %xmm1, %xmm0
	; AVX512VL-FALLBACK-NEXT: retq			; AVX512VL-FALLBACK-NEXT: retq
	;			;
	; AVX512BW-FALLBACK-LABEL: vec128_i8_signed_reg_mem:			; AVX512BW-FALLBACK-LABEL: vec128_i8_signed_reg_mem:
	; AVX512BW-FALLBACK: # %bb.0:			; AVX512BW-FALLBACK: # %bb.0:
	; AVX512BW-FALLBACK-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; AVX512BW-FALLBACK-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; AVX512BW-FALLBACK-NEXT: vmovdqa (%rdi), %xmm1			; AVX512BW-FALLBACK-NEXT: vmovdqa (%rdi), %xmm1
	Show All 17 Lines
	; AVX512VLBW-LABEL: vec128_i8_signed_reg_mem:			; AVX512VLBW-LABEL: vec128_i8_signed_reg_mem:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vmovdqa (%rdi), %xmm1			; AVX512VLBW-NEXT: vmovdqa (%rdi), %xmm1
	; AVX512VLBW-NEXT: vpcmpgtb %xmm1, %xmm0, %k1			; AVX512VLBW-NEXT: vpcmpgtb %xmm1, %xmm0, %k1
	; AVX512VLBW-NEXT: vpminsb %xmm1, %xmm0, %xmm2			; AVX512VLBW-NEXT: vpminsb %xmm1, %xmm0, %xmm2
	; AVX512VLBW-NEXT: vpmaxsb %xmm1, %xmm0, %xmm1			; AVX512VLBW-NEXT: vpmaxsb %xmm1, %xmm0, %xmm1
	; AVX512VLBW-NEXT: vpsubb %xmm2, %xmm1, %xmm1			; AVX512VLBW-NEXT: vpsubb %xmm2, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpsrlw $1, %xmm1, %xmm1			; AVX512VLBW-NEXT: vpsrlw $1, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpsubb %xmm1, %xmm2, %xmm1 {%k1}			; AVX512VLBW-NEXT: vpsubb %xmm1, %xmm2, %xmm1 {%k1}
	; AVX512VLBW-NEXT: vpaddb %xmm0, %xmm1, %xmm0			; AVX512VLBW-NEXT: vpaddb %xmm0, %xmm1, %xmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	%a2 = load <16 x i8>, ptr %a2_addr			%a2 = load <16 x i8>, ptr %a2_addr
	%t3 = icmp sgt <16 x i8> %a1, %a2 ; signed			%t3 = icmp sgt <16 x i8> %a1, %a2 ; signed
	%t4 = select <16 x i1> %t3, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t4 = select <16 x i1> %t3, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t5 = select <16 x i1> %t3, <16 x i8> %a2, <16 x i8> %a1			%t5 = select <16 x i1> %t3, <16 x i8> %a2, <16 x i8> %a1
	▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines
	; AVX512VL-FALLBACK: # %bb.0:			; AVX512VL-FALLBACK: # %bb.0:
	; AVX512VL-FALLBACK-NEXT: vmovdqa (%rdi), %xmm0			; AVX512VL-FALLBACK-NEXT: vmovdqa (%rdi), %xmm0
	; AVX512VL-FALLBACK-NEXT: vmovdqa (%rsi), %xmm1			; AVX512VL-FALLBACK-NEXT: vmovdqa (%rsi), %xmm1
	; AVX512VL-FALLBACK-NEXT: vpminsb %xmm1, %xmm0, %xmm2			; AVX512VL-FALLBACK-NEXT: vpminsb %xmm1, %xmm0, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %xmm1, %xmm0, %xmm3			; AVX512VL-FALLBACK-NEXT: vpmaxsb %xmm1, %xmm0, %xmm3
	; AVX512VL-FALLBACK-NEXT: vpsubb %xmm2, %xmm3, %xmm2			; AVX512VL-FALLBACK-NEXT: vpsubb %xmm2, %xmm3, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %xmm2, %xmm2			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %xmm2, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpcmpgtb %xmm1, %xmm0, %xmm1			; AVX512VL-FALLBACK-NEXT: vpcmpgtb %xmm1, %xmm0, %xmm1
	; AVX512VL-FALLBACK-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm2			; AVX512VL-FALLBACK-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm2
	; AVX512VL-FALLBACK-NEXT: vpsubb %xmm1, %xmm2, %xmm1			; AVX512VL-FALLBACK-NEXT: vpsubb %xmm1, %xmm2, %xmm1
	; AVX512VL-FALLBACK-NEXT: vpaddb %xmm0, %xmm1, %xmm0			; AVX512VL-FALLBACK-NEXT: vpaddb %xmm0, %xmm1, %xmm0
	; AVX512VL-FALLBACK-NEXT: retq			; AVX512VL-FALLBACK-NEXT: retq
	;			;
	; AVX512BW-FALLBACK-LABEL: vec128_i8_signed_mem_mem:			; AVX512BW-FALLBACK-LABEL: vec128_i8_signed_mem_mem:
	; AVX512BW-FALLBACK: # %bb.0:			; AVX512BW-FALLBACK: # %bb.0:
	; AVX512BW-FALLBACK-NEXT: vmovdqa (%rdi), %xmm0			; AVX512BW-FALLBACK-NEXT: vmovdqa (%rdi), %xmm0
	; AVX512BW-FALLBACK-NEXT: vmovdqa (%rsi), %xmm1			; AVX512BW-FALLBACK-NEXT: vmovdqa (%rsi), %xmm1
	Show All 18 Lines
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vmovdqa (%rdi), %xmm0			; AVX512VLBW-NEXT: vmovdqa (%rdi), %xmm0
	; AVX512VLBW-NEXT: vmovdqa (%rsi), %xmm1			; AVX512VLBW-NEXT: vmovdqa (%rsi), %xmm1
	; AVX512VLBW-NEXT: vpcmpgtb %xmm1, %xmm0, %k1			; AVX512VLBW-NEXT: vpcmpgtb %xmm1, %xmm0, %k1
	; AVX512VLBW-NEXT: vpminsb %xmm1, %xmm0, %xmm2			; AVX512VLBW-NEXT: vpminsb %xmm1, %xmm0, %xmm2
	; AVX512VLBW-NEXT: vpmaxsb %xmm1, %xmm0, %xmm1			; AVX512VLBW-NEXT: vpmaxsb %xmm1, %xmm0, %xmm1
	; AVX512VLBW-NEXT: vpsubb %xmm2, %xmm1, %xmm1			; AVX512VLBW-NEXT: vpsubb %xmm2, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpsrlw $1, %xmm1, %xmm1			; AVX512VLBW-NEXT: vpsrlw $1, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpsubb %xmm1, %xmm2, %xmm1 {%k1}			; AVX512VLBW-NEXT: vpsubb %xmm1, %xmm2, %xmm1 {%k1}
	; AVX512VLBW-NEXT: vpaddb %xmm0, %xmm1, %xmm0			; AVX512VLBW-NEXT: vpaddb %xmm0, %xmm1, %xmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	%a1 = load <16 x i8>, ptr %a1_addr			%a1 = load <16 x i8>, ptr %a1_addr
	%a2 = load <16 x i8>, ptr %a2_addr			%a2 = load <16 x i8>, ptr %a2_addr
	%t3 = icmp sgt <16 x i8> %a1, %a2 ; signed			%t3 = icmp sgt <16 x i8> %a1, %a2 ; signed
	%t4 = select <16 x i1> %t3, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t4 = select <16 x i1> %t3, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t5 = select <16 x i1> %t3, <16 x i8> %a2, <16 x i8> %a1			%t5 = select <16 x i1> %t3, <16 x i8> %a2, <16 x i8> %a1
	%t6 = select <16 x i1> %t3, <16 x i8> %a1, <16 x i8> %a2			%t6 = select <16 x i1> %t3, <16 x i8> %a1, <16 x i8> %a2
	%t7 = sub <16 x i8> %t6, %t5			%t7 = sub <16 x i8> %t6, %t5
	%t8 = lshr <16 x i8> %t7, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t8 = lshr <16 x i8> %t7, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t9 = mul nsw <16 x i8> %t8, %t4 ; signed			%t9 = mul nsw <16 x i8> %t8, %t4 ; signed
	%a10 = add nsw <16 x i8> %t9, %a1 ; signed			%a10 = add nsw <16 x i8> %t9, %a1 ; signed
	ret <16 x i8> %a10			ret <16 x i8> %a10
	}			}

llvm/test/CodeGen/X86/midpoint-int-vec-256.ll

	Show First 20 Lines • Show All 2,020 Lines • ▼ Show 20 Lines
	;			;
	; AVX512VL-FALLBACK-LABEL: vec256_i8_signed_reg_reg:			; AVX512VL-FALLBACK-LABEL: vec256_i8_signed_reg_reg:
	; AVX512VL-FALLBACK: # %bb.0:			; AVX512VL-FALLBACK: # %bb.0:
	; AVX512VL-FALLBACK-NEXT: vpminsb %ymm1, %ymm0, %ymm2			; AVX512VL-FALLBACK-NEXT: vpminsb %ymm1, %ymm0, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm1, %ymm0, %ymm3			; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm1, %ymm0, %ymm3
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm3, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm3, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm2, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm2, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpcmpgtb %ymm1, %ymm0, %ymm1			; AVX512VL-FALLBACK-NEXT: vpcmpgtb %ymm1, %ymm0, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm2			; AVX512VL-FALLBACK-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm2, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm2, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpaddb %ymm0, %ymm1, %ymm0			; AVX512VL-FALLBACK-NEXT: vpaddb %ymm0, %ymm1, %ymm0
	; AVX512VL-FALLBACK-NEXT: retq			; AVX512VL-FALLBACK-NEXT: retq
	;			;
	; AVX512BW-FALLBACK-LABEL: vec256_i8_signed_reg_reg:			; AVX512BW-FALLBACK-LABEL: vec256_i8_signed_reg_reg:
	; AVX512BW-FALLBACK: # %bb.0:			; AVX512BW-FALLBACK: # %bb.0:
	; AVX512BW-FALLBACK-NEXT: # kill: def $ymm1 killed $ymm1 def $zmm1			; AVX512BW-FALLBACK-NEXT: # kill: def $ymm1 killed $ymm1 def $zmm1
	; AVX512BW-FALLBACK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0			; AVX512BW-FALLBACK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
	Show All 15 Lines
	;			;
	; AVX512VLBW-LABEL: vec256_i8_signed_reg_reg:			; AVX512VLBW-LABEL: vec256_i8_signed_reg_reg:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpcmpgtb %ymm1, %ymm0, %k1			; AVX512VLBW-NEXT: vpcmpgtb %ymm1, %ymm0, %k1
	; AVX512VLBW-NEXT: vpminsb %ymm1, %ymm0, %ymm2			; AVX512VLBW-NEXT: vpminsb %ymm1, %ymm0, %ymm2
	; AVX512VLBW-NEXT: vpmaxsb %ymm1, %ymm0, %ymm1			; AVX512VLBW-NEXT: vpmaxsb %ymm1, %ymm0, %ymm1
	; AVX512VLBW-NEXT: vpsubb %ymm2, %ymm1, %ymm1			; AVX512VLBW-NEXT: vpsubb %ymm2, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512VLBW-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpsubb %ymm1, %ymm2, %ymm1 {%k1}			; AVX512VLBW-NEXT: vpsubb %ymm1, %ymm2, %ymm1 {%k1}
	; AVX512VLBW-NEXT: vpaddb %ymm0, %ymm1, %ymm0			; AVX512VLBW-NEXT: vpaddb %ymm0, %ymm1, %ymm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	%t3 = icmp sgt <32 x i8> %a1, %a2 ; signed			%t3 = icmp sgt <32 x i8> %a1, %a2 ; signed
	%t4 = select <32 x i1> %t3, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t4 = select <32 x i1> %t3, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t5 = select <32 x i1> %t3, <32 x i8> %a2, <32 x i8> %a1			%t5 = select <32 x i1> %t3, <32 x i8> %a2, <32 x i8> %a1
	%t6 = select <32 x i1> %t3, <32 x i8> %a1, <32 x i8> %a2			%t6 = select <32 x i1> %t3, <32 x i8> %a1, <32 x i8> %a2
	▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	; AVX512VL-FALLBACK-LABEL: vec256_i8_unsigned_reg_reg:			; AVX512VL-FALLBACK-LABEL: vec256_i8_unsigned_reg_reg:
	; AVX512VL-FALLBACK: # %bb.0:			; AVX512VL-FALLBACK: # %bb.0:
	; AVX512VL-FALLBACK-NEXT: vpminub %ymm1, %ymm0, %ymm2			; AVX512VL-FALLBACK-NEXT: vpminub %ymm1, %ymm0, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpmaxub %ymm1, %ymm0, %ymm1			; AVX512VL-FALLBACK-NEXT: vpmaxub %ymm1, %ymm0, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpcmpeqb %ymm2, %ymm0, %ymm2			; AVX512VL-FALLBACK-NEXT: vpcmpeqb %ymm2, %ymm0, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpternlogq $15, %ymm2, %ymm2, %ymm2			; AVX512VL-FALLBACK-NEXT: vpternlogq $15, %ymm2, %ymm2, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm1			; AVX512VL-FALLBACK-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpaddb %ymm0, %ymm1, %ymm0			; AVX512VL-FALLBACK-NEXT: vpaddb %ymm0, %ymm1, %ymm0
	; AVX512VL-FALLBACK-NEXT: retq			; AVX512VL-FALLBACK-NEXT: retq
	;			;
	; AVX512BW-FALLBACK-LABEL: vec256_i8_unsigned_reg_reg:			; AVX512BW-FALLBACK-LABEL: vec256_i8_unsigned_reg_reg:
	; AVX512BW-FALLBACK: # %bb.0:			; AVX512BW-FALLBACK: # %bb.0:
	; AVX512BW-FALLBACK-NEXT: # kill: def $ymm1 killed $ymm1 def $zmm1			; AVX512BW-FALLBACK-NEXT: # kill: def $ymm1 killed $ymm1 def $zmm1
	; AVX512BW-FALLBACK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0			; AVX512BW-FALLBACK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
	Show All 15 Lines
	;			;
	; AVX512VLBW-LABEL: vec256_i8_unsigned_reg_reg:			; AVX512VLBW-LABEL: vec256_i8_unsigned_reg_reg:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpcmpnleub %ymm1, %ymm0, %k1			; AVX512VLBW-NEXT: vpcmpnleub %ymm1, %ymm0, %k1
	; AVX512VLBW-NEXT: vpminub %ymm1, %ymm0, %ymm2			; AVX512VLBW-NEXT: vpminub %ymm1, %ymm0, %ymm2
	; AVX512VLBW-NEXT: vpmaxub %ymm1, %ymm0, %ymm1			; AVX512VLBW-NEXT: vpmaxub %ymm1, %ymm0, %ymm1
	; AVX512VLBW-NEXT: vpsubb %ymm2, %ymm1, %ymm1			; AVX512VLBW-NEXT: vpsubb %ymm2, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512VLBW-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpsubb %ymm1, %ymm2, %ymm1 {%k1}			; AVX512VLBW-NEXT: vpsubb %ymm1, %ymm2, %ymm1 {%k1}
	; AVX512VLBW-NEXT: vpaddb %ymm0, %ymm1, %ymm0			; AVX512VLBW-NEXT: vpaddb %ymm0, %ymm1, %ymm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	%t3 = icmp ugt <32 x i8> %a1, %a2			%t3 = icmp ugt <32 x i8> %a1, %a2
	%t4 = select <32 x i1> %t3, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t4 = select <32 x i1> %t3, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t5 = select <32 x i1> %t3, <32 x i8> %a2, <32 x i8> %a1			%t5 = select <32 x i1> %t3, <32 x i8> %a2, <32 x i8> %a1
	%t6 = select <32 x i1> %t3, <32 x i8> %a1, <32 x i8> %a2			%t6 = select <32 x i1> %t3, <32 x i8> %a1, <32 x i8> %a2
	▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	; AVX512VL-FALLBACK-LABEL: vec256_i8_signed_mem_reg:			; AVX512VL-FALLBACK-LABEL: vec256_i8_signed_mem_reg:
	; AVX512VL-FALLBACK: # %bb.0:			; AVX512VL-FALLBACK: # %bb.0:
	; AVX512VL-FALLBACK-NEXT: vmovdqa (%rdi), %ymm1			; AVX512VL-FALLBACK-NEXT: vmovdqa (%rdi), %ymm1
	; AVX512VL-FALLBACK-NEXT: vpminsb %ymm0, %ymm1, %ymm2			; AVX512VL-FALLBACK-NEXT: vpminsb %ymm0, %ymm1, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm0, %ymm1, %ymm3			; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm0, %ymm1, %ymm3
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm3, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm3, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm2, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm2, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpcmpgtb %ymm0, %ymm1, %ymm0			; AVX512VL-FALLBACK-NEXT: vpcmpgtb %ymm0, %ymm1, %ymm0
	; AVX512VL-FALLBACK-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm2			; AVX512VL-FALLBACK-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm0, %ymm2, %ymm0			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm0, %ymm2, %ymm0
	; AVX512VL-FALLBACK-NEXT: vpaddb %ymm1, %ymm0, %ymm0			; AVX512VL-FALLBACK-NEXT: vpaddb %ymm1, %ymm0, %ymm0
	; AVX512VL-FALLBACK-NEXT: retq			; AVX512VL-FALLBACK-NEXT: retq
	;			;
	; AVX512BW-FALLBACK-LABEL: vec256_i8_signed_mem_reg:			; AVX512BW-FALLBACK-LABEL: vec256_i8_signed_mem_reg:
	; AVX512BW-FALLBACK: # %bb.0:			; AVX512BW-FALLBACK: # %bb.0:
	; AVX512BW-FALLBACK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0			; AVX512BW-FALLBACK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
	; AVX512BW-FALLBACK-NEXT: vmovdqa (%rdi), %ymm1			; AVX512BW-FALLBACK-NEXT: vmovdqa (%rdi), %ymm1
	Show All 16 Lines
	; AVX512VLBW-LABEL: vec256_i8_signed_mem_reg:			; AVX512VLBW-LABEL: vec256_i8_signed_mem_reg:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vmovdqa (%rdi), %ymm1			; AVX512VLBW-NEXT: vmovdqa (%rdi), %ymm1
	; AVX512VLBW-NEXT: vpcmpgtb %ymm0, %ymm1, %k1			; AVX512VLBW-NEXT: vpcmpgtb %ymm0, %ymm1, %k1
	; AVX512VLBW-NEXT: vpminsb %ymm0, %ymm1, %ymm2			; AVX512VLBW-NEXT: vpminsb %ymm0, %ymm1, %ymm2
	; AVX512VLBW-NEXT: vpmaxsb %ymm0, %ymm1, %ymm0			; AVX512VLBW-NEXT: vpmaxsb %ymm0, %ymm1, %ymm0
	; AVX512VLBW-NEXT: vpsubb %ymm2, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpsubb %ymm2, %ymm0, %ymm0
	; AVX512VLBW-NEXT: vpsrlw $1, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpsrlw $1, %ymm0, %ymm0
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpsubb %ymm0, %ymm2, %ymm0 {%k1}			; AVX512VLBW-NEXT: vpsubb %ymm0, %ymm2, %ymm0 {%k1}
	; AVX512VLBW-NEXT: vpaddb %ymm1, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpaddb %ymm1, %ymm0, %ymm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	%a1 = load <32 x i8>, ptr %a1_addr			%a1 = load <32 x i8>, ptr %a1_addr
	%t3 = icmp sgt <32 x i8> %a1, %a2 ; signed			%t3 = icmp sgt <32 x i8> %a1, %a2 ; signed
	%t4 = select <32 x i1> %t3, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t4 = select <32 x i1> %t3, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t5 = select <32 x i1> %t3, <32 x i8> %a2, <32 x i8> %a1			%t5 = select <32 x i1> %t3, <32 x i8> %a2, <32 x i8> %a1
	▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines
	; AVX512VL-FALLBACK-LABEL: vec256_i8_signed_reg_mem:			; AVX512VL-FALLBACK-LABEL: vec256_i8_signed_reg_mem:
	; AVX512VL-FALLBACK: # %bb.0:			; AVX512VL-FALLBACK: # %bb.0:
	; AVX512VL-FALLBACK-NEXT: vmovdqa (%rdi), %ymm1			; AVX512VL-FALLBACK-NEXT: vmovdqa (%rdi), %ymm1
	; AVX512VL-FALLBACK-NEXT: vpminsb %ymm1, %ymm0, %ymm2			; AVX512VL-FALLBACK-NEXT: vpminsb %ymm1, %ymm0, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm1, %ymm0, %ymm3			; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm1, %ymm0, %ymm3
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm3, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm3, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm2, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm2, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpcmpgtb %ymm1, %ymm0, %ymm1			; AVX512VL-FALLBACK-NEXT: vpcmpgtb %ymm1, %ymm0, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm2			; AVX512VL-FALLBACK-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm2, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm2, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpaddb %ymm0, %ymm1, %ymm0			; AVX512VL-FALLBACK-NEXT: vpaddb %ymm0, %ymm1, %ymm0
	; AVX512VL-FALLBACK-NEXT: retq			; AVX512VL-FALLBACK-NEXT: retq
	;			;
	; AVX512BW-FALLBACK-LABEL: vec256_i8_signed_reg_mem:			; AVX512BW-FALLBACK-LABEL: vec256_i8_signed_reg_mem:
	; AVX512BW-FALLBACK: # %bb.0:			; AVX512BW-FALLBACK: # %bb.0:
	; AVX512BW-FALLBACK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0			; AVX512BW-FALLBACK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
	; AVX512BW-FALLBACK-NEXT: vmovdqa (%rdi), %ymm1			; AVX512BW-FALLBACK-NEXT: vmovdqa (%rdi), %ymm1
	Show All 16 Lines
	; AVX512VLBW-LABEL: vec256_i8_signed_reg_mem:			; AVX512VLBW-LABEL: vec256_i8_signed_reg_mem:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vmovdqa (%rdi), %ymm1			; AVX512VLBW-NEXT: vmovdqa (%rdi), %ymm1
	; AVX512VLBW-NEXT: vpcmpgtb %ymm1, %ymm0, %k1			; AVX512VLBW-NEXT: vpcmpgtb %ymm1, %ymm0, %k1
	; AVX512VLBW-NEXT: vpminsb %ymm1, %ymm0, %ymm2			; AVX512VLBW-NEXT: vpminsb %ymm1, %ymm0, %ymm2
	; AVX512VLBW-NEXT: vpmaxsb %ymm1, %ymm0, %ymm1			; AVX512VLBW-NEXT: vpmaxsb %ymm1, %ymm0, %ymm1
	; AVX512VLBW-NEXT: vpsubb %ymm2, %ymm1, %ymm1			; AVX512VLBW-NEXT: vpsubb %ymm2, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512VLBW-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpsubb %ymm1, %ymm2, %ymm1 {%k1}			; AVX512VLBW-NEXT: vpsubb %ymm1, %ymm2, %ymm1 {%k1}
	; AVX512VLBW-NEXT: vpaddb %ymm0, %ymm1, %ymm0			; AVX512VLBW-NEXT: vpaddb %ymm0, %ymm1, %ymm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	%a2 = load <32 x i8>, ptr %a2_addr			%a2 = load <32 x i8>, ptr %a2_addr
	%t3 = icmp sgt <32 x i8> %a1, %a2 ; signed			%t3 = icmp sgt <32 x i8> %a1, %a2 ; signed
	%t4 = select <32 x i1> %t3, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t4 = select <32 x i1> %t3, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t5 = select <32 x i1> %t3, <32 x i8> %a2, <32 x i8> %a1			%t5 = select <32 x i1> %t3, <32 x i8> %a2, <32 x i8> %a1
	▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines
	; AVX512VL-FALLBACK: # %bb.0:			; AVX512VL-FALLBACK: # %bb.0:
	; AVX512VL-FALLBACK-NEXT: vmovdqa (%rdi), %ymm0			; AVX512VL-FALLBACK-NEXT: vmovdqa (%rdi), %ymm0
	; AVX512VL-FALLBACK-NEXT: vmovdqa (%rsi), %ymm1			; AVX512VL-FALLBACK-NEXT: vmovdqa (%rsi), %ymm1
	; AVX512VL-FALLBACK-NEXT: vpminsb %ymm1, %ymm0, %ymm2			; AVX512VL-FALLBACK-NEXT: vpminsb %ymm1, %ymm0, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm1, %ymm0, %ymm3			; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm1, %ymm0, %ymm3
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm3, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm3, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm2, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm2, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpcmpgtb %ymm1, %ymm0, %ymm1			; AVX512VL-FALLBACK-NEXT: vpcmpgtb %ymm1, %ymm0, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm2			; AVX512VL-FALLBACK-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm2, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm2, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpaddb %ymm0, %ymm1, %ymm0			; AVX512VL-FALLBACK-NEXT: vpaddb %ymm0, %ymm1, %ymm0
	; AVX512VL-FALLBACK-NEXT: retq			; AVX512VL-FALLBACK-NEXT: retq
	;			;
	; AVX512BW-FALLBACK-LABEL: vec256_i8_signed_mem_mem:			; AVX512BW-FALLBACK-LABEL: vec256_i8_signed_mem_mem:
	; AVX512BW-FALLBACK: # %bb.0:			; AVX512BW-FALLBACK: # %bb.0:
	; AVX512BW-FALLBACK-NEXT: vmovdqa (%rdi), %ymm0			; AVX512BW-FALLBACK-NEXT: vmovdqa (%rdi), %ymm0
	; AVX512BW-FALLBACK-NEXT: vmovdqa (%rsi), %ymm1			; AVX512BW-FALLBACK-NEXT: vmovdqa (%rsi), %ymm1
	Show All 17 Lines
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vmovdqa (%rdi), %ymm0			; AVX512VLBW-NEXT: vmovdqa (%rdi), %ymm0
	; AVX512VLBW-NEXT: vmovdqa (%rsi), %ymm1			; AVX512VLBW-NEXT: vmovdqa (%rsi), %ymm1
	; AVX512VLBW-NEXT: vpcmpgtb %ymm1, %ymm0, %k1			; AVX512VLBW-NEXT: vpcmpgtb %ymm1, %ymm0, %k1
	; AVX512VLBW-NEXT: vpminsb %ymm1, %ymm0, %ymm2			; AVX512VLBW-NEXT: vpminsb %ymm1, %ymm0, %ymm2
	; AVX512VLBW-NEXT: vpmaxsb %ymm1, %ymm0, %ymm1			; AVX512VLBW-NEXT: vpmaxsb %ymm1, %ymm0, %ymm1
	; AVX512VLBW-NEXT: vpsubb %ymm2, %ymm1, %ymm1			; AVX512VLBW-NEXT: vpsubb %ymm2, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512VLBW-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpsubb %ymm1, %ymm2, %ymm1 {%k1}			; AVX512VLBW-NEXT: vpsubb %ymm1, %ymm2, %ymm1 {%k1}
	; AVX512VLBW-NEXT: vpaddb %ymm0, %ymm1, %ymm0			; AVX512VLBW-NEXT: vpaddb %ymm0, %ymm1, %ymm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	%a1 = load <32 x i8>, ptr %a1_addr			%a1 = load <32 x i8>, ptr %a1_addr
	%a2 = load <32 x i8>, ptr %a2_addr			%a2 = load <32 x i8>, ptr %a2_addr
	%t3 = icmp sgt <32 x i8> %a1, %a2 ; signed			%t3 = icmp sgt <32 x i8> %a1, %a2 ; signed
	%t4 = select <32 x i1> %t3, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t4 = select <32 x i1> %t3, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t5 = select <32 x i1> %t3, <32 x i8> %a2, <32 x i8> %a1			%t5 = select <32 x i1> %t3, <32 x i8> %a2, <32 x i8> %a1
	%t6 = select <32 x i1> %t3, <32 x i8> %a1, <32 x i8> %a2			%t6 = select <32 x i1> %t3, <32 x i8> %a1, <32 x i8> %a2
	%t7 = sub <32 x i8> %t6, %t5			%t7 = sub <32 x i8> %t6, %t5
	%t8 = lshr <32 x i8> %t7, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t8 = lshr <32 x i8> %t7, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t9 = mul nsw <32 x i8> %t8, %t4 ; signed			%t9 = mul nsw <32 x i8> %t8, %t4 ; signed
	%a10 = add nsw <32 x i8> %t9, %a1 ; signed			%a10 = add nsw <32 x i8> %t9, %a1 ; signed
	ret <32 x i8> %a10			ret <32 x i8> %a10
	}			}

llvm/test/CodeGen/X86/midpoint-int-vec-512.ll

	Show First 20 Lines • Show All 687 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vpmaxsb %ymm2, %ymm3, %ymm2			; AVX512F-NEXT: vpmaxsb %ymm2, %ymm3, %ymm2
	; AVX512F-NEXT: vpsubb %ymm5, %ymm2, %ymm2			; AVX512F-NEXT: vpsubb %ymm5, %ymm2, %ymm2
	; AVX512F-NEXT: vpminsb %ymm1, %ymm0, %ymm5			; AVX512F-NEXT: vpminsb %ymm1, %ymm0, %ymm5
	; AVX512F-NEXT: vpmaxsb %ymm1, %ymm0, %ymm1			; AVX512F-NEXT: vpmaxsb %ymm1, %ymm0, %ymm1
	; AVX512F-NEXT: vpsubb %ymm5, %ymm1, %ymm1			; AVX512F-NEXT: vpsubb %ymm5, %ymm1, %ymm1
	; AVX512F-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512F-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512F-NEXT: vpsrlw $1, %ymm2, %ymm2			; AVX512F-NEXT: vpsrlw $1, %ymm2, %ymm2
	; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm1, %zmm1			; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm1, %zmm1
	; AVX512F-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm2			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm2
	; AVX512F-NEXT: vpxor %xmm5, %xmm5, %xmm5			; AVX512F-NEXT: vpxor %xmm5, %xmm5, %xmm5
	; AVX512F-NEXT: vpsubb %ymm2, %ymm5, %ymm2			; AVX512F-NEXT: vpsubb %ymm2, %ymm5, %ymm2
	; AVX512F-NEXT: vpsubb %ymm1, %ymm5, %ymm5			; AVX512F-NEXT: vpsubb %ymm1, %ymm5, %ymm5
	; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm5, %zmm2			; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm5, %zmm2
	; AVX512F-NEXT: vpternlogq $226, %zmm1, %zmm4, %zmm2			; AVX512F-NEXT: vpternlogq $226, %zmm1, %zmm4, %zmm2
	; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm1			; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm1
	; AVX512F-NEXT: vpaddb %ymm3, %ymm1, %ymm1			; AVX512F-NEXT: vpaddb %ymm3, %ymm1, %ymm1
	Show All 12 Lines
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm2, %ymm3, %ymm2			; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm2, %ymm3, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm2, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm2, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpminsb %ymm1, %ymm0, %ymm5			; AVX512VL-FALLBACK-NEXT: vpminsb %ymm1, %ymm0, %ymm5
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm1, %ymm0, %ymm1			; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm1, %ymm0, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm2, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm2, %ymm2
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm2, %zmm1, %zmm1			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm2, %zmm1, %zmm1
	; AVX512VL-FALLBACK-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512VL-FALLBACK-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm1, %ymm2			; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm1, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpxor %xmm5, %xmm5, %xmm5			; AVX512VL-FALLBACK-NEXT: vpxor %xmm5, %xmm5, %xmm5
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm5, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm5, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm5, %ymm5			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm5, %ymm5
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm2, %zmm5, %zmm2			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm2, %zmm5, %zmm2
	; AVX512VL-FALLBACK-NEXT: vpternlogq $226, %zmm1, %zmm4, %zmm2			; AVX512VL-FALLBACK-NEXT: vpternlogq $226, %zmm1, %zmm4, %zmm2
	; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm2, %ymm1			; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm2, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpaddb %ymm3, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpaddb %ymm3, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpaddb %ymm0, %ymm2, %ymm0			; AVX512VL-FALLBACK-NEXT: vpaddb %ymm0, %ymm2, %ymm0
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512VL-FALLBACK-NEXT: retq			; AVX512VL-FALLBACK-NEXT: retq
	;			;
	; AVX512BW-LABEL: vec512_i8_signed_reg_reg:			; AVX512BW-LABEL: vec512_i8_signed_reg_reg:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpcmpgtb %zmm1, %zmm0, %k1			; AVX512BW-NEXT: vpcmpgtb %zmm1, %zmm0, %k1
	; AVX512BW-NEXT: vpminsb %zmm1, %zmm0, %zmm2			; AVX512BW-NEXT: vpminsb %zmm1, %zmm0, %zmm2
	; AVX512BW-NEXT: vpmaxsb %zmm1, %zmm0, %zmm1			; AVX512BW-NEXT: vpmaxsb %zmm1, %zmm0, %zmm1
	; AVX512BW-NEXT: vpsubb %zmm2, %zmm1, %zmm1			; AVX512BW-NEXT: vpsubb %zmm2, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsrlw $1, %zmm1, %zmm1			; AVX512BW-NEXT: vpsrlw $1, %zmm1, %zmm1
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512BW-NEXT: vpsubb %zmm1, %zmm2, %zmm1 {%k1}			; AVX512BW-NEXT: vpsubb %zmm1, %zmm2, %zmm1 {%k1}
	; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm0			; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%t3 = icmp sgt <64 x i8> %a1, %a2 ; signed			%t3 = icmp sgt <64 x i8> %a1, %a2 ; signed
	%t4 = select <64 x i1> %t3, <64 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t4 = select <64 x i1> %t3, <64 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t5 = select <64 x i1> %t3, <64 x i8> %a2, <64 x i8> %a1			%t5 = select <64 x i1> %t3, <64 x i8> %a2, <64 x i8> %a1
	%t6 = select <64 x i1> %t3, <64 x i8> %a1, <64 x i8> %a2			%t6 = select <64 x i1> %t3, <64 x i8> %a1, <64 x i8> %a2
	Show All 16 Lines
	; AVX512F-NEXT: vinserti64x4 $1, %ymm5, %zmm7, %zmm5			; AVX512F-NEXT: vinserti64x4 $1, %ymm5, %zmm7, %zmm5
	; AVX512F-NEXT: vpmaxub %ymm2, %ymm3, %ymm2			; AVX512F-NEXT: vpmaxub %ymm2, %ymm3, %ymm2
	; AVX512F-NEXT: vpsubb %ymm4, %ymm2, %ymm2			; AVX512F-NEXT: vpsubb %ymm4, %ymm2, %ymm2
	; AVX512F-NEXT: vpmaxub %ymm1, %ymm0, %ymm1			; AVX512F-NEXT: vpmaxub %ymm1, %ymm0, %ymm1
	; AVX512F-NEXT: vpsubb %ymm6, %ymm1, %ymm1			; AVX512F-NEXT: vpsubb %ymm6, %ymm1, %ymm1
	; AVX512F-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512F-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512F-NEXT: vpsrlw $1, %ymm2, %ymm2			; AVX512F-NEXT: vpsrlw $1, %ymm2, %ymm2
	; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm1, %zmm1			; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm1, %zmm1
	; AVX512F-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm2			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm2
	; AVX512F-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512F-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512F-NEXT: vpsubb %ymm2, %ymm4, %ymm2			; AVX512F-NEXT: vpsubb %ymm2, %ymm4, %ymm2
	; AVX512F-NEXT: vpsubb %ymm1, %ymm4, %ymm4			; AVX512F-NEXT: vpsubb %ymm1, %ymm4, %ymm4
	; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm4, %zmm2			; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm4, %zmm2
	; AVX512F-NEXT: vpternlogq $216, %zmm5, %zmm1, %zmm2			; AVX512F-NEXT: vpternlogq $216, %zmm5, %zmm1, %zmm2
	; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm1			; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm1
	; AVX512F-NEXT: vpaddb %ymm3, %ymm1, %ymm1			; AVX512F-NEXT: vpaddb %ymm3, %ymm1, %ymm1
	Show All 12 Lines
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm5, %zmm7, %zmm5			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm5, %zmm7, %zmm5
	; AVX512VL-FALLBACK-NEXT: vpmaxub %ymm2, %ymm3, %ymm2			; AVX512VL-FALLBACK-NEXT: vpmaxub %ymm2, %ymm3, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm4, %ymm2, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm4, %ymm2, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpmaxub %ymm1, %ymm0, %ymm1			; AVX512VL-FALLBACK-NEXT: vpmaxub %ymm1, %ymm0, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm6, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm6, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm2, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm2, %ymm2
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm2, %zmm1, %zmm1			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm2, %zmm1, %zmm1
	; AVX512VL-FALLBACK-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512VL-FALLBACK-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm1, %ymm2			; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm1, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VL-FALLBACK-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm4, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm4, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm4, %ymm4			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm4, %ymm4
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm2, %zmm4, %zmm2			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm2, %zmm4, %zmm2
	; AVX512VL-FALLBACK-NEXT: vpternlogq $216, %zmm5, %zmm1, %zmm2			; AVX512VL-FALLBACK-NEXT: vpternlogq $216, %zmm5, %zmm1, %zmm2
	; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm2, %ymm1			; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm2, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpaddb %ymm3, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpaddb %ymm3, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpaddb %ymm0, %ymm2, %ymm0			; AVX512VL-FALLBACK-NEXT: vpaddb %ymm0, %ymm2, %ymm0
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512VL-FALLBACK-NEXT: retq			; AVX512VL-FALLBACK-NEXT: retq
	;			;
	; AVX512BW-LABEL: vec512_i8_unsigned_reg_reg:			; AVX512BW-LABEL: vec512_i8_unsigned_reg_reg:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpcmpnleub %zmm1, %zmm0, %k1			; AVX512BW-NEXT: vpcmpnleub %zmm1, %zmm0, %k1
	; AVX512BW-NEXT: vpminub %zmm1, %zmm0, %zmm2			; AVX512BW-NEXT: vpminub %zmm1, %zmm0, %zmm2
	; AVX512BW-NEXT: vpmaxub %zmm1, %zmm0, %zmm1			; AVX512BW-NEXT: vpmaxub %zmm1, %zmm0, %zmm1
	; AVX512BW-NEXT: vpsubb %zmm2, %zmm1, %zmm1			; AVX512BW-NEXT: vpsubb %zmm2, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsrlw $1, %zmm1, %zmm1			; AVX512BW-NEXT: vpsrlw $1, %zmm1, %zmm1
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512BW-NEXT: vpsubb %zmm1, %zmm2, %zmm1 {%k1}			; AVX512BW-NEXT: vpsubb %zmm1, %zmm2, %zmm1 {%k1}
	; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm0			; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%t3 = icmp ugt <64 x i8> %a1, %a2			%t3 = icmp ugt <64 x i8> %a1, %a2
	%t4 = select <64 x i1> %t3, <64 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t4 = select <64 x i1> %t3, <64 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t5 = select <64 x i1> %t3, <64 x i8> %a2, <64 x i8> %a1			%t5 = select <64 x i1> %t3, <64 x i8> %a2, <64 x i8> %a1
	%t6 = select <64 x i1> %t3, <64 x i8> %a1, <64 x i8> %a2			%t6 = select <64 x i1> %t3, <64 x i8> %a1, <64 x i8> %a2
	Show All 19 Lines
	; AVX512F-NEXT: vpmaxsb %ymm1, %ymm3, %ymm1			; AVX512F-NEXT: vpmaxsb %ymm1, %ymm3, %ymm1
	; AVX512F-NEXT: vpsubb %ymm5, %ymm1, %ymm1			; AVX512F-NEXT: vpsubb %ymm5, %ymm1, %ymm1
	; AVX512F-NEXT: vpminsb %ymm0, %ymm2, %ymm5			; AVX512F-NEXT: vpminsb %ymm0, %ymm2, %ymm5
	; AVX512F-NEXT: vpmaxsb %ymm0, %ymm2, %ymm0			; AVX512F-NEXT: vpmaxsb %ymm0, %ymm2, %ymm0
	; AVX512F-NEXT: vpsubb %ymm5, %ymm0, %ymm0			; AVX512F-NEXT: vpsubb %ymm5, %ymm0, %ymm0
	; AVX512F-NEXT: vpsrlw $1, %ymm0, %ymm0			; AVX512F-NEXT: vpsrlw $1, %ymm0, %ymm0
	; AVX512F-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512F-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512F-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512F-NEXT: vpxor %xmm5, %xmm5, %xmm5			; AVX512F-NEXT: vpxor %xmm5, %xmm5, %xmm5
	; AVX512F-NEXT: vpsubb %ymm1, %ymm5, %ymm1			; AVX512F-NEXT: vpsubb %ymm1, %ymm5, %ymm1
	; AVX512F-NEXT: vpsubb %ymm0, %ymm5, %ymm5			; AVX512F-NEXT: vpsubb %ymm0, %ymm5, %ymm5
	; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm5, %zmm1			; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm5, %zmm1
	; AVX512F-NEXT: vpternlogq $226, %zmm0, %zmm4, %zmm1			; AVX512F-NEXT: vpternlogq $226, %zmm0, %zmm4, %zmm1
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm0			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm0
	; AVX512F-NEXT: vpaddb %ymm3, %ymm0, %ymm0			; AVX512F-NEXT: vpaddb %ymm3, %ymm0, %ymm0
	Show All 13 Lines
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm1, %ymm3, %ymm1			; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm1, %ymm3, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpminsb %ymm0, %ymm2, %ymm5			; AVX512VL-FALLBACK-NEXT: vpminsb %ymm0, %ymm2, %ymm5
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm0, %ymm2, %ymm0			; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm0, %ymm2, %ymm0
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm0, %ymm0			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm0, %ymm0
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm0, %ymm0			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm0, %ymm0
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512VL-FALLBACK-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512VL-FALLBACK-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpxor %xmm5, %xmm5, %xmm5			; AVX512VL-FALLBACK-NEXT: vpxor %xmm5, %xmm5, %xmm5
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm5, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm5, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm0, %ymm5, %ymm5			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm0, %ymm5, %ymm5
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm1, %zmm5, %zmm1			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm1, %zmm5, %zmm1
	; AVX512VL-FALLBACK-NEXT: vpternlogq $226, %zmm0, %zmm4, %zmm1			; AVX512VL-FALLBACK-NEXT: vpternlogq $226, %zmm0, %zmm4, %zmm1
	; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm1, %ymm0			; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm1, %ymm0
	; AVX512VL-FALLBACK-NEXT: vpaddb %ymm3, %ymm0, %ymm0			; AVX512VL-FALLBACK-NEXT: vpaddb %ymm3, %ymm0, %ymm0
	; AVX512VL-FALLBACK-NEXT: vpaddb %ymm2, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpaddb %ymm2, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0
	; AVX512VL-FALLBACK-NEXT: retq			; AVX512VL-FALLBACK-NEXT: retq
	;			;
	; AVX512BW-LABEL: vec512_i8_signed_mem_reg:			; AVX512BW-LABEL: vec512_i8_signed_mem_reg:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vmovdqa64 (%rdi), %zmm1			; AVX512BW-NEXT: vmovdqa64 (%rdi), %zmm1
	; AVX512BW-NEXT: vpcmpgtb %zmm0, %zmm1, %k1			; AVX512BW-NEXT: vpcmpgtb %zmm0, %zmm1, %k1
	; AVX512BW-NEXT: vpminsb %zmm0, %zmm1, %zmm2			; AVX512BW-NEXT: vpminsb %zmm0, %zmm1, %zmm2
	; AVX512BW-NEXT: vpmaxsb %zmm0, %zmm1, %zmm0			; AVX512BW-NEXT: vpmaxsb %zmm0, %zmm1, %zmm0
	; AVX512BW-NEXT: vpsubb %zmm2, %zmm0, %zmm0			; AVX512BW-NEXT: vpsubb %zmm2, %zmm0, %zmm0
	; AVX512BW-NEXT: vpsrlw $1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlw $1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512BW-NEXT: vpsubb %zmm0, %zmm2, %zmm0 {%k1}			; AVX512BW-NEXT: vpsubb %zmm0, %zmm2, %zmm0 {%k1}
	; AVX512BW-NEXT: vpaddb %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpaddb %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%a1 = load <64 x i8>, ptr %a1_addr			%a1 = load <64 x i8>, ptr %a1_addr
	%t3 = icmp sgt <64 x i8> %a1, %a2 ; signed			%t3 = icmp sgt <64 x i8> %a1, %a2 ; signed
	%t4 = select <64 x i1> %t3, <64 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t4 = select <64 x i1> %t3, <64 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t5 = select <64 x i1> %t3, <64 x i8> %a2, <64 x i8> %a1			%t5 = select <64 x i1> %t3, <64 x i8> %a2, <64 x i8> %a1
	Show All 18 Lines
	; AVX512F-NEXT: vpmaxsb %ymm2, %ymm3, %ymm2			; AVX512F-NEXT: vpmaxsb %ymm2, %ymm3, %ymm2
	; AVX512F-NEXT: vpsubb %ymm5, %ymm2, %ymm2			; AVX512F-NEXT: vpsubb %ymm5, %ymm2, %ymm2
	; AVX512F-NEXT: vpminsb %ymm1, %ymm0, %ymm5			; AVX512F-NEXT: vpminsb %ymm1, %ymm0, %ymm5
	; AVX512F-NEXT: vpmaxsb %ymm1, %ymm0, %ymm1			; AVX512F-NEXT: vpmaxsb %ymm1, %ymm0, %ymm1
	; AVX512F-NEXT: vpsubb %ymm5, %ymm1, %ymm1			; AVX512F-NEXT: vpsubb %ymm5, %ymm1, %ymm1
	; AVX512F-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512F-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512F-NEXT: vpsrlw $1, %ymm2, %ymm2			; AVX512F-NEXT: vpsrlw $1, %ymm2, %ymm2
	; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm1, %zmm1			; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm1, %zmm1
	; AVX512F-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm2			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm2
	; AVX512F-NEXT: vpxor %xmm5, %xmm5, %xmm5			; AVX512F-NEXT: vpxor %xmm5, %xmm5, %xmm5
	; AVX512F-NEXT: vpsubb %ymm2, %ymm5, %ymm2			; AVX512F-NEXT: vpsubb %ymm2, %ymm5, %ymm2
	; AVX512F-NEXT: vpsubb %ymm1, %ymm5, %ymm5			; AVX512F-NEXT: vpsubb %ymm1, %ymm5, %ymm5
	; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm5, %zmm2			; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm5, %zmm2
	; AVX512F-NEXT: vpternlogq $226, %zmm1, %zmm4, %zmm2			; AVX512F-NEXT: vpternlogq $226, %zmm1, %zmm4, %zmm2
	; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm1			; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm1
	; AVX512F-NEXT: vpaddb %ymm3, %ymm1, %ymm1			; AVX512F-NEXT: vpaddb %ymm3, %ymm1, %ymm1
	Show All 13 Lines
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm2, %ymm3, %ymm2			; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm2, %ymm3, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm2, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm2, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpminsb %ymm1, %ymm0, %ymm5			; AVX512VL-FALLBACK-NEXT: vpminsb %ymm1, %ymm0, %ymm5
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm1, %ymm0, %ymm1			; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm1, %ymm0, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm2, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm2, %ymm2
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm2, %zmm1, %zmm1			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm2, %zmm1, %zmm1
	; AVX512VL-FALLBACK-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512VL-FALLBACK-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm1, %ymm2			; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm1, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpxor %xmm5, %xmm5, %xmm5			; AVX512VL-FALLBACK-NEXT: vpxor %xmm5, %xmm5, %xmm5
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm5, %ymm2			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm2, %ymm5, %ymm2
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm5, %ymm5			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm5, %ymm5
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm2, %zmm5, %zmm2			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm2, %zmm5, %zmm2
	; AVX512VL-FALLBACK-NEXT: vpternlogq $226, %zmm1, %zmm4, %zmm2			; AVX512VL-FALLBACK-NEXT: vpternlogq $226, %zmm1, %zmm4, %zmm2
	; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm2, %ymm1			; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm2, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpaddb %ymm3, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpaddb %ymm3, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpaddb %ymm0, %ymm2, %ymm0			; AVX512VL-FALLBACK-NEXT: vpaddb %ymm0, %ymm2, %ymm0
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512VL-FALLBACK-NEXT: retq			; AVX512VL-FALLBACK-NEXT: retq
	;			;
	; AVX512BW-LABEL: vec512_i8_signed_reg_mem:			; AVX512BW-LABEL: vec512_i8_signed_reg_mem:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vmovdqa64 (%rdi), %zmm1			; AVX512BW-NEXT: vmovdqa64 (%rdi), %zmm1
	; AVX512BW-NEXT: vpcmpgtb %zmm1, %zmm0, %k1			; AVX512BW-NEXT: vpcmpgtb %zmm1, %zmm0, %k1
	; AVX512BW-NEXT: vpminsb %zmm1, %zmm0, %zmm2			; AVX512BW-NEXT: vpminsb %zmm1, %zmm0, %zmm2
	; AVX512BW-NEXT: vpmaxsb %zmm1, %zmm0, %zmm1			; AVX512BW-NEXT: vpmaxsb %zmm1, %zmm0, %zmm1
	; AVX512BW-NEXT: vpsubb %zmm2, %zmm1, %zmm1			; AVX512BW-NEXT: vpsubb %zmm2, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsrlw $1, %zmm1, %zmm1			; AVX512BW-NEXT: vpsrlw $1, %zmm1, %zmm1
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512BW-NEXT: vpsubb %zmm1, %zmm2, %zmm1 {%k1}			; AVX512BW-NEXT: vpsubb %zmm1, %zmm2, %zmm1 {%k1}
	; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm0			; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%a2 = load <64 x i8>, ptr %a2_addr			%a2 = load <64 x i8>, ptr %a2_addr
	%t3 = icmp sgt <64 x i8> %a1, %a2 ; signed			%t3 = icmp sgt <64 x i8> %a1, %a2 ; signed
	%t4 = select <64 x i1> %t3, <64 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t4 = select <64 x i1> %t3, <64 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t5 = select <64 x i1> %t3, <64 x i8> %a2, <64 x i8> %a1			%t5 = select <64 x i1> %t3, <64 x i8> %a2, <64 x i8> %a1
	Show All 19 Lines
	; AVX512F-NEXT: vpmaxsb %ymm1, %ymm3, %ymm1			; AVX512F-NEXT: vpmaxsb %ymm1, %ymm3, %ymm1
	; AVX512F-NEXT: vpsubb %ymm5, %ymm1, %ymm1			; AVX512F-NEXT: vpsubb %ymm5, %ymm1, %ymm1
	; AVX512F-NEXT: vpminsb %ymm0, %ymm2, %ymm5			; AVX512F-NEXT: vpminsb %ymm0, %ymm2, %ymm5
	; AVX512F-NEXT: vpmaxsb %ymm0, %ymm2, %ymm0			; AVX512F-NEXT: vpmaxsb %ymm0, %ymm2, %ymm0
	; AVX512F-NEXT: vpsubb %ymm5, %ymm0, %ymm0			; AVX512F-NEXT: vpsubb %ymm5, %ymm0, %ymm0
	; AVX512F-NEXT: vpsrlw $1, %ymm0, %ymm0			; AVX512F-NEXT: vpsrlw $1, %ymm0, %ymm0
	; AVX512F-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512F-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512F-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512F-NEXT: vpxor %xmm5, %xmm5, %xmm5			; AVX512F-NEXT: vpxor %xmm5, %xmm5, %xmm5
	; AVX512F-NEXT: vpsubb %ymm1, %ymm5, %ymm1			; AVX512F-NEXT: vpsubb %ymm1, %ymm5, %ymm1
	; AVX512F-NEXT: vpsubb %ymm0, %ymm5, %ymm5			; AVX512F-NEXT: vpsubb %ymm0, %ymm5, %ymm5
	; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm5, %zmm1			; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm5, %zmm1
	; AVX512F-NEXT: vpternlogq $226, %zmm0, %zmm4, %zmm1			; AVX512F-NEXT: vpternlogq $226, %zmm0, %zmm4, %zmm1
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm0			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm0
	; AVX512F-NEXT: vpaddb %ymm3, %ymm0, %ymm0			; AVX512F-NEXT: vpaddb %ymm3, %ymm0, %ymm0
	Show All 14 Lines
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm1, %ymm3, %ymm1			; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm1, %ymm3, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpminsb %ymm0, %ymm2, %ymm5			; AVX512VL-FALLBACK-NEXT: vpminsb %ymm0, %ymm2, %ymm5
	; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm0, %ymm2, %ymm0			; AVX512VL-FALLBACK-NEXT: vpmaxsb %ymm0, %ymm2, %ymm0
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm0, %ymm0			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm5, %ymm0, %ymm0
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm0, %ymm0			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm0, %ymm0
	; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512VL-FALLBACK-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512VL-FALLBACK-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpxor %xmm5, %xmm5, %xmm5			; AVX512VL-FALLBACK-NEXT: vpxor %xmm5, %xmm5, %xmm5
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm5, %ymm1			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm1, %ymm5, %ymm1
	; AVX512VL-FALLBACK-NEXT: vpsubb %ymm0, %ymm5, %ymm5			; AVX512VL-FALLBACK-NEXT: vpsubb %ymm0, %ymm5, %ymm5
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm1, %zmm5, %zmm1			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm1, %zmm5, %zmm1
	; AVX512VL-FALLBACK-NEXT: vpternlogq $226, %zmm0, %zmm4, %zmm1			; AVX512VL-FALLBACK-NEXT: vpternlogq $226, %zmm0, %zmm4, %zmm1
	; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm1, %ymm0			; AVX512VL-FALLBACK-NEXT: vextracti64x4 $1, %zmm1, %ymm0
	; AVX512VL-FALLBACK-NEXT: vpaddb %ymm3, %ymm0, %ymm0			; AVX512VL-FALLBACK-NEXT: vpaddb %ymm3, %ymm0, %ymm0
	; AVX512VL-FALLBACK-NEXT: vpaddb %ymm2, %ymm1, %ymm1			; AVX512VL-FALLBACK-NEXT: vpaddb %ymm2, %ymm1, %ymm1
	; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0			; AVX512VL-FALLBACK-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0
	; AVX512VL-FALLBACK-NEXT: retq			; AVX512VL-FALLBACK-NEXT: retq
	;			;
	; AVX512BW-LABEL: vec512_i8_signed_mem_mem:			; AVX512BW-LABEL: vec512_i8_signed_mem_mem:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vmovdqa64 (%rdi), %zmm0			; AVX512BW-NEXT: vmovdqa64 (%rdi), %zmm0
	; AVX512BW-NEXT: vmovdqa64 (%rsi), %zmm1			; AVX512BW-NEXT: vmovdqa64 (%rsi), %zmm1
	; AVX512BW-NEXT: vpcmpgtb %zmm1, %zmm0, %k1			; AVX512BW-NEXT: vpcmpgtb %zmm1, %zmm0, %k1
	; AVX512BW-NEXT: vpminsb %zmm1, %zmm0, %zmm2			; AVX512BW-NEXT: vpminsb %zmm1, %zmm0, %zmm2
	; AVX512BW-NEXT: vpmaxsb %zmm1, %zmm0, %zmm1			; AVX512BW-NEXT: vpmaxsb %zmm1, %zmm0, %zmm1
	; AVX512BW-NEXT: vpsubb %zmm2, %zmm1, %zmm1			; AVX512BW-NEXT: vpsubb %zmm2, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsrlw $1, %zmm1, %zmm1			; AVX512BW-NEXT: vpsrlw $1, %zmm1, %zmm1
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512BW-NEXT: vpsubb %zmm1, %zmm2, %zmm1 {%k1}			; AVX512BW-NEXT: vpsubb %zmm1, %zmm2, %zmm1 {%k1}
	; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm0			; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%a1 = load <64 x i8>, ptr %a1_addr			%a1 = load <64 x i8>, ptr %a1_addr
	%a2 = load <64 x i8>, ptr %a2_addr			%a2 = load <64 x i8>, ptr %a2_addr
	%t3 = icmp sgt <64 x i8> %a1, %a2 ; signed			%t3 = icmp sgt <64 x i8> %a1, %a2 ; signed
	%t4 = select <64 x i1> %t3, <64 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t4 = select <64 x i1> %t3, <64 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t5 = select <64 x i1> %t3, <64 x i8> %a2, <64 x i8> %a1			%t5 = select <64 x i1> %t3, <64 x i8> %a2, <64 x i8> %a1
	%t6 = select <64 x i1> %t3, <64 x i8> %a1, <64 x i8> %a2			%t6 = select <64 x i1> %t3, <64 x i8> %a1, <64 x i8> %a2
	%t7 = sub <64 x i8> %t6, %t5			%t7 = sub <64 x i8> %t6, %t5
	%t8 = lshr <64 x i8> %t7, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%t8 = lshr <64 x i8> %t7, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%t9 = mul nsw <64 x i8> %t8, %t4 ; signed			%t9 = mul nsw <64 x i8> %t8, %t4 ; signed
	%a10 = add nsw <64 x i8> %t9, %a1 ; signed			%a10 = add nsw <64 x i8> %t9, %a1 ; signed
	ret <64 x i8> %a10			ret <64 x i8> %a10
	}			}

llvm/test/CodeGen/X86/min-legal-vector-width.ll

Show First 20 Lines • Show All 1,922 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%ext = zext <8 x i1> %cmp to <8 x i64>		%ext = zext <8 x i1> %cmp to <8 x i64>
store <8 x i64> %ext, <8 x i64>* %zptr		store <8 x i64> %ext, <8 x i64>* %zptr
ret void		ret void
}		}

define <16 x i8> @var_rotate_v16i8(<16 x i8> %a, <16 x i8> %b) nounwind "min-legal-vector-width"="256" {		define <16 x i8> @var_rotate_v16i8(<16 x i8> %a, <16 x i8> %b) nounwind "min-legal-vector-width"="256" {
; CHECK-LABEL: var_rotate_v16i8:		; CHECK-LABEL: var_rotate_v16i8:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1		; CHECK-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
; CHECK-NEXT: vpxor %xmm2, %xmm2, %xmm2		; CHECK-NEXT: vpxor %xmm2, %xmm2, %xmm2
; CHECK-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]		; CHECK-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]
; CHECK-NEXT: vpunpckhbw {{.*#+}} xmm3 = xmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; CHECK-NEXT: vpunpckhbw {{.*#+}} xmm3 = xmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]
; CHECK-NEXT: vpsllvw %xmm2, %xmm3, %xmm2		; CHECK-NEXT: vpsllvw %xmm2, %xmm3, %xmm2
; CHECK-NEXT: vpsrlw $8, %xmm2, %xmm2		; CHECK-NEXT: vpsrlw $8, %xmm2, %xmm2
; CHECK-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; CHECK-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; CHECK-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero		; CHECK-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero
; CHECK-NEXT: vpsllvw %xmm1, %xmm0, %xmm0		; CHECK-NEXT: vpsllvw %xmm1, %xmm0, %xmm0
; CHECK-NEXT: vpsrlw $8, %xmm0, %xmm0		; CHECK-NEXT: vpsrlw $8, %xmm0, %xmm0
; CHECK-NEXT: vpackuswb %xmm2, %xmm0, %xmm0		; CHECK-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%b8 = sub <16 x i8> <i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8>, %b		%b8 = sub <16 x i8> <i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8, i8 8>, %b
%shl = shl <16 x i8> %a, %b		%shl = shl <16 x i8> %a, %b
%lshr = lshr <16 x i8> %a, %b8		%lshr = lshr <16 x i8> %a, %b8
%or = or <16 x i8> %shl, %lshr		%or = or <16 x i8> %shl, %lshr
ret <16 x i8> %or		ret <16 x i8> %or
}		}

define <32 x i8> @var_rotate_v32i8(<32 x i8> %a, <32 x i8> %b) nounwind "min-legal-vector-width"="256" {		define <32 x i8> @var_rotate_v32i8(<32 x i8> %a, <32 x i8> %b) nounwind "min-legal-vector-width"="256" {
; CHECK-LABEL: var_rotate_v32i8:		; CHECK-LABEL: var_rotate_v32i8:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1		; CHECK-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
; CHECK-NEXT: vpxor %xmm2, %xmm2, %xmm2		; CHECK-NEXT: vpxor %xmm2, %xmm2, %xmm2
; CHECK-NEXT: vpunpckhbw {{.*#+}} ymm3 = ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15],ymm1[24],ymm2[24],ymm1[25],ymm2[25],ymm1[26],ymm2[26],ymm1[27],ymm2[27],ymm1[28],ymm2[28],ymm1[29],ymm2[29],ymm1[30],ymm2[30],ymm1[31],ymm2[31]		; CHECK-NEXT: vpunpckhbw {{.*#+}} ymm3 = ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15],ymm1[24],ymm2[24],ymm1[25],ymm2[25],ymm1[26],ymm2[26],ymm1[27],ymm2[27],ymm1[28],ymm2[28],ymm1[29],ymm2[29],ymm1[30],ymm2[30],ymm1[31],ymm2[31]
; CHECK-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]		; CHECK-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]
; CHECK-NEXT: vpsllvw %ymm3, %ymm4, %ymm3		; CHECK-NEXT: vpsllvw %ymm3, %ymm4, %ymm3
; CHECK-NEXT: vpsrlw $8, %ymm3, %ymm3		; CHECK-NEXT: vpsrlw $8, %ymm3, %ymm3
; CHECK-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[16],ymm2[16],ymm1[17],ymm2[17],ymm1[18],ymm2[18],ymm1[19],ymm2[19],ymm1[20],ymm2[20],ymm1[21],ymm2[21],ymm1[22],ymm2[22],ymm1[23],ymm2[23]		; CHECK-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[16],ymm2[16],ymm1[17],ymm2[17],ymm1[18],ymm2[18],ymm1[19],ymm2[19],ymm1[20],ymm2[20],ymm1[21],ymm2[21],ymm1[22],ymm2[22],ymm1[23],ymm2[23]
; CHECK-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]		; CHECK-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]
; CHECK-NEXT: vpsllvw %ymm1, %ymm0, %ymm0		; CHECK-NEXT: vpsllvw %ymm1, %ymm0, %ymm0
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
}		}

define <32 x i8> @splatconstant_rotate_mask_v32i8(<32 x i8> %a) nounwind "min-legal-vector-width"="256" {		define <32 x i8> @splatconstant_rotate_mask_v32i8(<32 x i8> %a) nounwind "min-legal-vector-width"="256" {
; CHECK-LABEL: splatconstant_rotate_mask_v32i8:		; CHECK-LABEL: splatconstant_rotate_mask_v32i8:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vpsllw $4, %ymm0, %ymm1		; CHECK-NEXT: vpsllw $4, %ymm0, %ymm1
; CHECK-NEXT: vpsrlw $4, %ymm0, %ymm0		; CHECK-NEXT: vpsrlw $4, %ymm0, %ymm0
; CHECK-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm1, %ymm0		; CHECK-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm1, %ymm0
; CHECK-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0		; CHECK-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%shl = shl <32 x i8> %a, <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>		%shl = shl <32 x i8> %a, <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>
%lshr = lshr <32 x i8> %a, <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>		%lshr = lshr <32 x i8> %a, <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>
%rmask = and <32 x i8> %lshr, <i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55>		%rmask = and <32 x i8> %lshr, <i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55>
%lmask = and <32 x i8> %shl, <i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33>		%lmask = and <32 x i8> %shl, <i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33>
%or = or <32 x i8> %lmask, %rmask		%or = or <32 x i8> %lmask, %rmask
ret <32 x i8> %or		ret <32 x i8> %or
}		}

llvm/test/CodeGen/X86/movmsk-cmp.ll

	Show First 20 Lines • Show All 1,309 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0			; AVX2-NEXT: vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0
	; AVX2-NEXT: sete %al			; AVX2-NEXT: sete %al
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: allzeros_v64i8_and1:			; AVX512-LABEL: allzeros_v64i8_and1:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %k0			; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %k0
	; AVX512-NEXT: kortestw %k0, %k0			; AVX512-NEXT: kortestw %k0, %k0
	; AVX512-NEXT: sete %al			; AVX512-NEXT: sete %al
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%tmp = and <64 x i8> %arg, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%tmp = and <64 x i8> %arg, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%tmp1 = icmp ne <64 x i8> %tmp, zeroinitializer			%tmp1 = icmp ne <64 x i8> %tmp, zeroinitializer
	%tmp2 = bitcast <64 x i1> %tmp1 to i64			%tmp2 = bitcast <64 x i1> %tmp1 to i64
	%tmp3 = icmp eq i64 %tmp2, 0			%tmp3 = icmp eq i64 %tmp2, 0
	▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0			; AVX2-NEXT: vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0
	; AVX2-NEXT: sete %al			; AVX2-NEXT: sete %al
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: allzeros_v32i16_and1:			; AVX512-LABEL: allzeros_v32i16_and1:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %k0			; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %k0
	; AVX512-NEXT: kortestw %k0, %k0			; AVX512-NEXT: kortestw %k0, %k0
	; AVX512-NEXT: sete %al			; AVX512-NEXT: sete %al
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%tmp = and <32 x i16> %arg, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>			%tmp = and <32 x i16> %arg, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
	%tmp1 = icmp ne <32 x i16> %tmp, zeroinitializer			%tmp1 = icmp ne <32 x i16> %tmp, zeroinitializer
	%tmp2 = bitcast <32 x i1> %tmp1 to i32			%tmp2 = bitcast <32 x i1> %tmp1 to i32
	%tmp3 = icmp eq i32 %tmp2, 0			%tmp3 = icmp eq i32 %tmp2, 0
	▲ Show 20 Lines • Show All 872 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0			; AVX2-NEXT: vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0
	; AVX2-NEXT: sete %al			; AVX2-NEXT: sete %al
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: allzeros_v64i8_and4:			; AVX512-LABEL: allzeros_v64i8_and4:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %k0			; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %k0
	; AVX512-NEXT: kortestw %k0, %k0			; AVX512-NEXT: kortestw %k0, %k0
	; AVX512-NEXT: sete %al			; AVX512-NEXT: sete %al
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%tmp = and <64 x i8> %arg, <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>			%tmp = and <64 x i8> %arg, <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>
	%tmp1 = icmp ne <64 x i8> %tmp, zeroinitializer			%tmp1 = icmp ne <64 x i8> %tmp, zeroinitializer
	%tmp2 = bitcast <64 x i1> %tmp1 to i64			%tmp2 = bitcast <64 x i1> %tmp1 to i64
	%tmp3 = icmp eq i64 %tmp2, 0			%tmp3 = icmp eq i64 %tmp2, 0
	▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0			; AVX2-NEXT: vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0
	; AVX2-NEXT: sete %al			; AVX2-NEXT: sete %al
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: allzeros_v32i16_and4:			; AVX512-LABEL: allzeros_v32i16_and4:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %k0			; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %k0
	; AVX512-NEXT: kortestw %k0, %k0			; AVX512-NEXT: kortestw %k0, %k0
	; AVX512-NEXT: sete %al			; AVX512-NEXT: sete %al
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%tmp = and <32 x i16> %arg, <i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4>			%tmp = and <32 x i16> %arg, <i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4>
	%tmp1 = icmp ne <32 x i16> %tmp, zeroinitializer			%tmp1 = icmp ne <32 x i16> %tmp, zeroinitializer
	%tmp2 = bitcast <32 x i1> %tmp1 to i32			%tmp2 = bitcast <32 x i1> %tmp1 to i32
	%tmp3 = icmp eq i32 %tmp2, 0			%tmp3 = icmp eq i32 %tmp2, 0
	▲ Show 20 Lines • Show All 1,597 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/opt-pipeline.ll

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Assignment Tracking Analysis			; CHECK-NEXT: Assignment Tracking Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: X86 DAG->DAG Instruction Selection			; CHECK-NEXT: X86 DAG->DAG Instruction Selection
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Local Dynamic TLS Access Clean-up			; CHECK-NEXT: Local Dynamic TLS Access Clean-up
	; CHECK-NEXT: X86 PIC Global Base Reg Initialization			; CHECK-NEXT: X86 PIC Global Base Reg Initialization
	; CHECK-NEXT: Argument Stack Rebase			; CHECK-NEXT: Argument Stack Rebase
	; CHECK-NEXT: Finalize ISel and expand pseudo-instructions			; CHECK-NEXT: Finalize ISel and expand pseudo-instructions
	; CHECK-NEXT: X86 Domain Reassignment Pass			; CHECK-NEXT: X86 Domain Reassignment Pass
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Early Tail Duplication			; CHECK-NEXT: Early Tail Duplication
	; CHECK-NEXT: Optimize machine instruction PHIs			; CHECK-NEXT: Optimize machine instruction PHIs
	; CHECK-NEXT: Slot index numbering			; CHECK-NEXT: Slot index numbering
	; CHECK-NEXT: Merge disjoint stack slots			; CHECK-NEXT: Merge disjoint stack slots
	; CHECK-NEXT: Local Stack Slot Allocation			; CHECK-NEXT: Local Stack Slot Allocation
	; CHECK-NEXT: Remove dead machine instructions			; CHECK-NEXT: Remove dead machine instructions
	▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Machine Natural Loop Construction			; CHECK-NEXT: Machine Natural Loop Construction
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: X86 Byte/Word Instruction Fixup			; CHECK-NEXT: X86 Byte/Word Instruction Fixup
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: X86 Atom pad short functions			; CHECK-NEXT: X86 Atom pad short functions
	; CHECK-NEXT: X86 LEA Fixup			; CHECK-NEXT: X86 LEA Fixup
	; CHECK-NEXT: X86 Fixup Inst Tuning			; CHECK-NEXT: X86 Fixup Inst Tuning
				; CHECK-NEXT: X86 Fixup Vector Constants
	; CHECK-NEXT: Compressing EVEX instrs to VEX encoding when possible			; CHECK-NEXT: Compressing EVEX instrs to VEX encoding when possible
	; CHECK-NEXT: X86 Discriminate Memory Operands			; CHECK-NEXT: X86 Discriminate Memory Operands
	; CHECK-NEXT: X86 Insert Cache Prefetches			; CHECK-NEXT: X86 Insert Cache Prefetches
	; CHECK-NEXT: X86 insert wait instruction			; CHECK-NEXT: X86 insert wait instruction
	; CHECK-NEXT: Contiguously Lay Out Funclets			; CHECK-NEXT: Contiguously Lay Out Funclets
	; CHECK-NEXT: StackMap Liveness Analysis			; CHECK-NEXT: StackMap Liveness Analysis
	; CHECK-NEXT: Live DEBUG_VALUE analysis			; CHECK-NEXT: Live DEBUG_VALUE analysis
	; CHECK-NEXT: Machine Sanitizer Binary Metadata			; CHECK-NEXT: Machine Sanitizer Binary Metadata
	Show All 24 Lines

llvm/test/CodeGen/X86/paddus.ll

	Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpcmpeqb %xmm0, %xmm1, %xmm0			; AVX2-NEXT: vpcmpeqb %xmm0, %xmm1, %xmm0
	; AVX2-NEXT: vpcmpeqd %xmm2, %xmm2, %xmm2			; AVX2-NEXT: vpcmpeqd %xmm2, %xmm2, %xmm2
	; AVX2-NEXT: vpxor %xmm2, %xmm0, %xmm0			; AVX2-NEXT: vpxor %xmm2, %xmm0, %xmm0
	; AVX2-NEXT: vpor %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpor %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test5:			; AVX512-LABEL: test5:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1			; AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm1
	; AVX512-NEXT: vpcmpltub %xmm0, %xmm1, %k1			; AVX512-NEXT: vpcmpltub %xmm0, %xmm1, %k1
	; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX512-NEXT: vmovdqu8 %xmm0, %xmm1 {%k1}			; AVX512-NEXT: vmovdqu8 %xmm0, %xmm1 {%k1}
	; AVX512-NEXT: vmovdqa %xmm1, %xmm0			; AVX512-NEXT: vmovdqa %xmm1, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%1 = xor <16 x i8> %x, <i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128>			%1 = xor <16 x i8> %x, <i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128>
	%2 = icmp ult <16 x i8> %1, %x			%2 = icmp ult <16 x i8> %1, %x
	%3 = select <16 x i1> %2, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> %1			%3 = select <16 x i1> %2, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> %1
	▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpcmpeqb %ymm0, %ymm1, %ymm0			; AVX2-NEXT: vpcmpeqb %ymm0, %ymm1, %ymm0
	; AVX2-NEXT: vpcmpeqd %ymm2, %ymm2, %ymm2			; AVX2-NEXT: vpcmpeqd %ymm2, %ymm2, %ymm2
	; AVX2-NEXT: vpxor %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpxor %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test11:			; AVX512-LABEL: test11:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm1			; AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm1
	; AVX512-NEXT: vpcmpltub %ymm0, %ymm1, %k1			; AVX512-NEXT: vpcmpltub %ymm0, %ymm1, %k1
	; AVX512-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0			; AVX512-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
	; AVX512-NEXT: vmovdqu8 %ymm0, %ymm1 {%k1}			; AVX512-NEXT: vmovdqu8 %ymm0, %ymm1 {%k1}
	; AVX512-NEXT: vmovdqa %ymm1, %ymm0			; AVX512-NEXT: vmovdqa %ymm1, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%1 = xor <32 x i8> %x, <i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128>			%1 = xor <32 x i8> %x, <i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128>
	%2 = icmp ult <32 x i8> %1, %x			%2 = icmp ult <32 x i8> %1, %x
	%3 = select <32 x i1> %2, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> %1			%3 = select <32 x i1> %2, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> %1
	▲ Show 20 Lines • Show All 291 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpcmpeqb %ymm0, %ymm3, %ymm0			; AVX2-NEXT: vpcmpeqb %ymm0, %ymm3, %ymm0
	; AVX2-NEXT: vpxor %ymm4, %ymm0, %ymm0			; AVX2-NEXT: vpxor %ymm4, %ymm0, %ymm0
	; AVX2-NEXT: vpor %ymm3, %ymm0, %ymm0			; AVX2-NEXT: vpor %ymm3, %ymm0, %ymm0
	; AVX2-NEXT: vpor %ymm2, %ymm1, %ymm1			; AVX2-NEXT: vpor %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test17:			; AVX512-LABEL: test17:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpxorq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm1			; AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm1
	; AVX512-NEXT: vpcmpltub %zmm0, %zmm1, %k1			; AVX512-NEXT: vpcmpltub %zmm0, %zmm1, %k1
	; AVX512-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0			; AVX512-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0
	; AVX512-NEXT: vmovdqu8 %zmm0, %zmm1 {%k1}			; AVX512-NEXT: vmovdqu8 %zmm0, %zmm1 {%k1}
	; AVX512-NEXT: vmovdqa64 %zmm1, %zmm0			; AVX512-NEXT: vmovdqa64 %zmm1, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%1 = xor <64 x i8> %x, <i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128>			%1 = xor <64 x i8> %x, <i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128>
	%2 = icmp ult <64 x i8> %1, %x			%2 = icmp ult <64 x i8> %1, %x
	%3 = select <64 x i1> %2, <64 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <64 x i8> %1			%3 = select <64 x i1> %2, <64 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <64 x i8> %1
	▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpcmpeqw %xmm0, %xmm1, %xmm0			; AVX2-NEXT: vpcmpeqw %xmm0, %xmm1, %xmm0
	; AVX2-NEXT: vpcmpeqd %xmm2, %xmm2, %xmm2			; AVX2-NEXT: vpcmpeqd %xmm2, %xmm2, %xmm2
	; AVX2-NEXT: vpxor %xmm2, %xmm0, %xmm0			; AVX2-NEXT: vpxor %xmm2, %xmm0, %xmm0
	; AVX2-NEXT: vpor %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpor %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test23:			; AVX512-LABEL: test23:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1			; AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm1
	; AVX512-NEXT: vpcmpltuw %xmm0, %xmm1, %k1			; AVX512-NEXT: vpcmpltuw %xmm0, %xmm1, %k1
	; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX512-NEXT: vmovdqu16 %xmm0, %xmm1 {%k1}			; AVX512-NEXT: vmovdqu16 %xmm0, %xmm1 {%k1}
	; AVX512-NEXT: vmovdqa %xmm1, %xmm0			; AVX512-NEXT: vmovdqa %xmm1, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%1 = xor <8 x i16> %x, <i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768>			%1 = xor <8 x i16> %x, <i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768>
	%2 = icmp ult <8 x i16> %1, %x			%2 = icmp ult <8 x i16> %1, %x
	%3 = select <8 x i1> %2, <8 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <8 x i16> %1			%3 = select <8 x i1> %2, <8 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <8 x i16> %1
	▲ Show 20 Lines • Show All 234 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpcmpeqw %ymm0, %ymm1, %ymm0			; AVX2-NEXT: vpcmpeqw %ymm0, %ymm1, %ymm0
	; AVX2-NEXT: vpcmpeqd %ymm2, %ymm2, %ymm2			; AVX2-NEXT: vpcmpeqd %ymm2, %ymm2, %ymm2
	; AVX2-NEXT: vpxor %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpxor %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test29:			; AVX512-LABEL: test29:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm1			; AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm1
	; AVX512-NEXT: vpcmpltuw %ymm0, %ymm1, %k1			; AVX512-NEXT: vpcmpltuw %ymm0, %ymm1, %k1
	; AVX512-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0			; AVX512-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
	; AVX512-NEXT: vmovdqu16 %ymm0, %ymm1 {%k1}			; AVX512-NEXT: vmovdqu16 %ymm0, %ymm1 {%k1}
	; AVX512-NEXT: vmovdqa %ymm1, %ymm0			; AVX512-NEXT: vmovdqa %ymm1, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%1 = xor <16 x i16> %x, <i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768>			%1 = xor <16 x i16> %x, <i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768>
	%2 = icmp ult <16 x i16> %1, %x			%2 = icmp ult <16 x i16> %1, %x
	%3 = select <16 x i1> %2, <16 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <16 x i16> %1			%3 = select <16 x i1> %2, <16 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <16 x i16> %1
	▲ Show 20 Lines • Show All 347 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpcmpeqw %ymm0, %ymm3, %ymm0			; AVX2-NEXT: vpcmpeqw %ymm0, %ymm3, %ymm0
	; AVX2-NEXT: vpxor %ymm4, %ymm0, %ymm0			; AVX2-NEXT: vpxor %ymm4, %ymm0, %ymm0
	; AVX2-NEXT: vpor %ymm3, %ymm0, %ymm0			; AVX2-NEXT: vpor %ymm3, %ymm0, %ymm0
	; AVX2-NEXT: vpor %ymm2, %ymm1, %ymm1			; AVX2-NEXT: vpor %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test35:			; AVX512-LABEL: test35:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpxorq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm1			; AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm1
	; AVX512-NEXT: vpcmpltuw %zmm0, %zmm1, %k1			; AVX512-NEXT: vpcmpltuw %zmm0, %zmm1, %k1
	; AVX512-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0			; AVX512-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0
	; AVX512-NEXT: vmovdqu16 %zmm0, %zmm1 {%k1}			; AVX512-NEXT: vmovdqu16 %zmm0, %zmm1 {%k1}
	; AVX512-NEXT: vmovdqa64 %zmm1, %zmm0			; AVX512-NEXT: vmovdqa64 %zmm1, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%1 = xor <32 x i16> %x, <i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768>			%1 = xor <32 x i16> %x, <i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768>
	%2 = icmp ult <32 x i16> %1, %x			%2 = icmp ult <32 x i16> %1, %x
	%3 = select <32 x i1> %2, <32 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <32 x i16> %1			%3 = select <32 x i1> %2, <32 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <32 x i16> %1
	▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/prefer-avx256-lzcnt.ll

	Show All 35 Lines
	}			}

	define <16 x i8> @testv16i8(<16 x i8> %in) {			define <16 x i8> @testv16i8(<16 x i8> %in) {
	; AVX256-LABEL: testv16i8:			; AVX256-LABEL: testv16i8:
	; AVX256: # %bb.0:			; AVX256: # %bb.0:
	; AVX256-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX256-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX256-NEXT: vpshufb %xmm0, %xmm1, %xmm2			; AVX256-NEXT: vpshufb %xmm0, %xmm1, %xmm2
	; AVX256-NEXT: vpsrlw $4, %xmm0, %xmm0			; AVX256-NEXT: vpsrlw $4, %xmm0, %xmm0
	; AVX256-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX256-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX256-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX256-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX256-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm3			; AVX256-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm3
	; AVX256-NEXT: vpand %xmm3, %xmm2, %xmm2			; AVX256-NEXT: vpand %xmm3, %xmm2, %xmm2
	; AVX256-NEXT: vpshufb %xmm0, %xmm1, %xmm0			; AVX256-NEXT: vpshufb %xmm0, %xmm1, %xmm0
	; AVX256-NEXT: vpaddb %xmm0, %xmm2, %xmm0			; AVX256-NEXT: vpaddb %xmm0, %xmm2, %xmm0
	; AVX256-NEXT: retq			; AVX256-NEXT: retq
	;			;
	; AVX512-LABEL: testv16i8:			; AVX512-LABEL: testv16i8:
	Show All 34 Lines
	}			}

	define <32 x i8> @testv32i8(<32 x i8> %in) {			define <32 x i8> @testv32i8(<32 x i8> %in) {
	; AVX256-LABEL: testv32i8:			; AVX256-LABEL: testv32i8:
	; AVX256: # %bb.0:			; AVX256: # %bb.0:
	; AVX256-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX256-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX256-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX256-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX256-NEXT: vpsrlw $4, %ymm0, %ymm0			; AVX256-NEXT: vpsrlw $4, %ymm0, %ymm0
	; AVX256-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX256-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX256-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX256-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX256-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm3			; AVX256-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm3
	; AVX256-NEXT: vpand %ymm3, %ymm2, %ymm2			; AVX256-NEXT: vpand %ymm3, %ymm2, %ymm2
	; AVX256-NEXT: vpshufb %ymm0, %ymm1, %ymm0			; AVX256-NEXT: vpshufb %ymm0, %ymm1, %ymm0
	; AVX256-NEXT: vpaddb %ymm0, %ymm2, %ymm0			; AVX256-NEXT: vpaddb %ymm0, %ymm2, %ymm0
	; AVX256-NEXT: retq			; AVX256-NEXT: retq
	;			;
	; AVX512-LABEL: testv32i8:			; AVX512-LABEL: testv32i8:
	Show All 19 Lines

llvm/test/CodeGen/X86/prefer-avx256-mulo.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+prefer-256-bit \| FileCheck %s --check-prefix=AVX256			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+prefer-256-bit \| FileCheck %s --check-prefix=AVX256
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512

	define <16 x i1> @smulo_v16i8(<16 x i8> %a0, <16 x i8> %a1, ptr %p2) nounwind {			define <16 x i1> @smulo_v16i8(<16 x i8> %a0, <16 x i8> %a1, ptr %p2) nounwind {
	; AVX256-LABEL: smulo_v16i8:			; AVX256-LABEL: smulo_v16i8:
	; AVX256: # %bb.0:			; AVX256: # %bb.0:
	; AVX256-NEXT: vpmovsxbw %xmm1, %ymm1			; AVX256-NEXT: vpmovsxbw %xmm1, %ymm1
	; AVX256-NEXT: vpmovsxbw %xmm0, %ymm0			; AVX256-NEXT: vpmovsxbw %xmm0, %ymm0
	; AVX256-NEXT: vpmullw %ymm1, %ymm0, %ymm0			; AVX256-NEXT: vpmullw %ymm1, %ymm0, %ymm0
	; AVX256-NEXT: vpsrlw $8, %ymm0, %ymm1			; AVX256-NEXT: vpsrlw $8, %ymm0, %ymm1
	; AVX256-NEXT: vextracti128 $1, %ymm1, %xmm2			; AVX256-NEXT: vextracti128 $1, %ymm1, %xmm2
	; AVX256-NEXT: vpackuswb %xmm2, %xmm1, %xmm1			; AVX256-NEXT: vpackuswb %xmm2, %xmm1, %xmm1
	; AVX256-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX256-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX256-NEXT: vextracti128 $1, %ymm0, %xmm2			; AVX256-NEXT: vextracti128 $1, %ymm0, %xmm2
	; AVX256-NEXT: vpackuswb %xmm2, %xmm0, %xmm0			; AVX256-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
	; AVX256-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX256-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX256-NEXT: vpcmpgtb %xmm0, %xmm2, %xmm2			; AVX256-NEXT: vpcmpgtb %xmm0, %xmm2, %xmm2
	; AVX256-NEXT: vpcmpeqb %xmm1, %xmm2, %xmm1			; AVX256-NEXT: vpcmpeqb %xmm1, %xmm2, %xmm1
	; AVX256-NEXT: vpternlogq $15, %xmm1, %xmm1, %xmm1			; AVX256-NEXT: vpternlogq $15, %xmm1, %xmm1, %xmm1
	; AVX256-NEXT: vpshufd {{.*#+}} xmm2 = xmm1[2,3,2,3]			; AVX256-NEXT: vpshufd {{.*#+}} xmm2 = xmm1[2,3,2,3]
	; AVX256-NEXT: vpmovsxbd %xmm2, %ymm2			; AVX256-NEXT: vpmovsxbd %xmm2, %ymm2
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; AVX256-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX256-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX256-NEXT: vpcmpeqb %xmm2, %xmm1, %xmm1			; AVX256-NEXT: vpcmpeqb %xmm2, %xmm1, %xmm1
	; AVX256-NEXT: vpternlogq $15, %xmm1, %xmm1, %xmm1			; AVX256-NEXT: vpternlogq $15, %xmm1, %xmm1, %xmm1
	; AVX256-NEXT: vpshufd {{.*#+}} xmm2 = xmm1[2,3,2,3]			; AVX256-NEXT: vpshufd {{.*#+}} xmm2 = xmm1[2,3,2,3]
	; AVX256-NEXT: vpmovsxbd %xmm2, %ymm2			; AVX256-NEXT: vpmovsxbd %xmm2, %ymm2
	; AVX256-NEXT: vptestmd %ymm2, %ymm2, %k1			; AVX256-NEXT: vptestmd %ymm2, %ymm2, %k1
	; AVX256-NEXT: vpmovsxbd %xmm1, %ymm1			; AVX256-NEXT: vpmovsxbd %xmm1, %ymm1
	; AVX256-NEXT: vptestmd %ymm1, %ymm1, %k2			; AVX256-NEXT: vptestmd %ymm1, %ymm1, %k2
	; AVX256-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX256-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX256-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX256-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX256-NEXT: vpackuswb %xmm1, %xmm0, %xmm0			; AVX256-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
	; AVX256-NEXT: vmovdqa %xmm0, (%rdi)			; AVX256-NEXT: vmovdqa %xmm0, (%rdi)
	; AVX256-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0			; AVX256-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
	; AVX256-NEXT: vmovdqa32 %ymm0, %ymm1 {%k2} {z}			; AVX256-NEXT: vmovdqa32 %ymm0, %ymm1 {%k2} {z}
	; AVX256-NEXT: vpmovdw %ymm1, %xmm1			; AVX256-NEXT: vpmovdw %ymm1, %xmm1
	; AVX256-NEXT: vmovdqa32 %ymm0, %ymm0 {%k1} {z}			; AVX256-NEXT: vmovdqa32 %ymm0, %ymm0 {%k1} {z}
	; AVX256-NEXT: vpmovdw %ymm0, %xmm0			; AVX256-NEXT: vpmovdw %ymm0, %xmm0
	Show All 25 Lines

llvm/test/CodeGen/X86/prefer-avx256-shift.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,+prefer-256-bit \| FileCheck %s --check-prefixes=AVX256,AVX256BW		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,+prefer-256-bit \| FileCheck %s --check-prefixes=AVX256,AVX256BW
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,-prefer-256-bit \| FileCheck %s --check-prefixes=AVX512BW,AVX512BWVL		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,-prefer-256-bit \| FileCheck %s --check-prefixes=AVX512BW,AVX512BWVL
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+prefer-256-bit \| FileCheck %s --check-prefixes=AVX256,AVX256VL		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+prefer-256-bit \| FileCheck %s --check-prefixes=AVX256,AVX256VL
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512VL		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512VL
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+prefer-256-bit \| FileCheck %s --check-prefixes=AVX512BW,AVX512BWNOVL		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+prefer-256-bit \| FileCheck %s --check-prefixes=AVX512BW,AVX512BWNOVL
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,-prefer-256-bit \| FileCheck %s --check-prefixes=AVX512BW,AVX512BWNOVL		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,-prefer-256-bit \| FileCheck %s --check-prefixes=AVX512BW,AVX512BWNOVL

define <32 x i8> @var_shl_v32i8(<32 x i8> %a, <32 x i8> %b) {		define <32 x i8> @var_shl_v32i8(<32 x i8> %a, <32 x i8> %b) {
; AVX256-LABEL: var_shl_v32i8:		; AVX256-LABEL: var_shl_v32i8:
; AVX256: # %bb.0:		; AVX256: # %bb.0:
; AVX256-NEXT: vpsllw $5, %ymm1, %ymm1		; AVX256-NEXT: vpsllw $5, %ymm1, %ymm1
; AVX256-NEXT: vpsllw $4, %ymm0, %ymm2		; AVX256-NEXT: vpsllw $4, %ymm0, %ymm2
; AVX256-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2		; AVX256-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
; AVX256-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0		; AVX256-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
; AVX256-NEXT: vpsllw $2, %ymm0, %ymm2		; AVX256-NEXT: vpsllw $2, %ymm0, %ymm2
; AVX256-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2		; AVX256-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
; AVX256-NEXT: vpaddb %ymm1, %ymm1, %ymm1		; AVX256-NEXT: vpaddb %ymm1, %ymm1, %ymm1
; AVX256-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0		; AVX256-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
; AVX256-NEXT: vpaddb %ymm0, %ymm0, %ymm2		; AVX256-NEXT: vpaddb %ymm0, %ymm0, %ymm2
; AVX256-NEXT: vpaddb %ymm1, %ymm1, %ymm1		; AVX256-NEXT: vpaddb %ymm1, %ymm1, %ymm1
; AVX256-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0		; AVX256-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
; AVX256-NEXT: retq		; AVX256-NEXT: retq
;		;
; AVX512BW-LABEL: var_shl_v32i8:		; AVX512BW-LABEL: var_shl_v32i8:
; AVX512BW: # %bb.0:		; AVX512BW: # %bb.0:
; AVX512BW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero		; AVX512BW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero
; AVX512BW-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero		; AVX512BW-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0		; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0		; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0
; AVX512BW-NEXT: retq		; AVX512BW-NEXT: retq
;		;
; AVX512VL-LABEL: var_shl_v32i8:		; AVX512VL-LABEL: var_shl_v32i8:
; AVX512VL: # %bb.0:		; AVX512VL: # %bb.0:
; AVX512VL-NEXT: vpsllw $5, %ymm1, %ymm1		; AVX512VL-NEXT: vpsllw $5, %ymm1, %ymm1
; AVX512VL-NEXT: vpsllw $4, %ymm0, %ymm2		; AVX512VL-NEXT: vpsllw $4, %ymm0, %ymm2
; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2		; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
; AVX512VL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0		; AVX512VL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
; AVX512VL-NEXT: vpsllw $2, %ymm0, %ymm2		; AVX512VL-NEXT: vpsllw $2, %ymm0, %ymm2
; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2		; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1		; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1
; AVX512VL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0		; AVX512VL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
; AVX512VL-NEXT: vpaddb %ymm0, %ymm0, %ymm2		; AVX512VL-NEXT: vpaddb %ymm0, %ymm0, %ymm2
; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1		; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1
; AVX512VL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0		; AVX512VL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
; AVX512VL-NEXT: retq		; AVX512VL-NEXT: retq
%shift = shl <32 x i8> %a, %b		%shift = shl <32 x i8> %a, %b
ret <32 x i8> %shift		ret <32 x i8> %shift
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
; AVX512BWVL-NEXT: vpmovwb %ymm0, %xmm0		; AVX512BWVL-NEXT: vpmovwb %ymm0, %xmm0
; AVX512BWVL-NEXT: vzeroupper		; AVX512BWVL-NEXT: vzeroupper
; AVX512BWVL-NEXT: retq		; AVX512BWVL-NEXT: retq
;		;
; AVX256VL-LABEL: var_shl_v16i8:		; AVX256VL-LABEL: var_shl_v16i8:
; AVX256VL: # %bb.0:		; AVX256VL: # %bb.0:
; AVX256VL-NEXT: vpsllw $5, %xmm1, %xmm1		; AVX256VL-NEXT: vpsllw $5, %xmm1, %xmm1
; AVX256VL-NEXT: vpsllw $4, %xmm0, %xmm2		; AVX256VL-NEXT: vpsllw $4, %xmm0, %xmm2
; AVX256VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm2		; AVX256VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm2, %xmm2
; AVX256VL-NEXT: vpblendvb %xmm1, %xmm2, %xmm0, %xmm0		; AVX256VL-NEXT: vpblendvb %xmm1, %xmm2, %xmm0, %xmm0
; AVX256VL-NEXT: vpsllw $2, %xmm0, %xmm2		; AVX256VL-NEXT: vpsllw $2, %xmm0, %xmm2
; AVX256VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm2		; AVX256VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm2, %xmm2
; AVX256VL-NEXT: vpaddb %xmm1, %xmm1, %xmm1		; AVX256VL-NEXT: vpaddb %xmm1, %xmm1, %xmm1
; AVX256VL-NEXT: vpblendvb %xmm1, %xmm2, %xmm0, %xmm0		; AVX256VL-NEXT: vpblendvb %xmm1, %xmm2, %xmm0, %xmm0
; AVX256VL-NEXT: vpaddb %xmm0, %xmm0, %xmm2		; AVX256VL-NEXT: vpaddb %xmm0, %xmm0, %xmm2
; AVX256VL-NEXT: vpaddb %xmm1, %xmm1, %xmm1		; AVX256VL-NEXT: vpaddb %xmm1, %xmm1, %xmm1
; AVX256VL-NEXT: vpblendvb %xmm1, %xmm2, %xmm0, %xmm0		; AVX256VL-NEXT: vpblendvb %xmm1, %xmm2, %xmm0, %xmm0
; AVX256VL-NEXT: retq		; AVX256VL-NEXT: retq
;		;
; AVX512VL-LABEL: var_shl_v16i8:		; AVX512VL-LABEL: var_shl_v16i8:
Show All 18 Lines	; AVX512BWNOVL-NEXT: retq
ret <16 x i8> %shift		ret <16 x i8> %shift
}		}

define <32 x i8> @var_lshr_v32i8(<32 x i8> %a, <32 x i8> %b) {		define <32 x i8> @var_lshr_v32i8(<32 x i8> %a, <32 x i8> %b) {
; AVX256-LABEL: var_lshr_v32i8:		; AVX256-LABEL: var_lshr_v32i8:
; AVX256: # %bb.0:		; AVX256: # %bb.0:
; AVX256-NEXT: vpsllw $5, %ymm1, %ymm1		; AVX256-NEXT: vpsllw $5, %ymm1, %ymm1
; AVX256-NEXT: vpsrlw $4, %ymm0, %ymm2		; AVX256-NEXT: vpsrlw $4, %ymm0, %ymm2
; AVX256-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2		; AVX256-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
; AVX256-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0		; AVX256-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
; AVX256-NEXT: vpsrlw $2, %ymm0, %ymm2		; AVX256-NEXT: vpsrlw $2, %ymm0, %ymm2
; AVX256-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2		; AVX256-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
; AVX256-NEXT: vpaddb %ymm1, %ymm1, %ymm1		; AVX256-NEXT: vpaddb %ymm1, %ymm1, %ymm1
; AVX256-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0		; AVX256-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
; AVX256-NEXT: vpsrlw $1, %ymm0, %ymm2		; AVX256-NEXT: vpsrlw $1, %ymm0, %ymm2
; AVX256-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2		; AVX256-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
; AVX256-NEXT: vpaddb %ymm1, %ymm1, %ymm1		; AVX256-NEXT: vpaddb %ymm1, %ymm1, %ymm1
; AVX256-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0		; AVX256-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
; AVX256-NEXT: retq		; AVX256-NEXT: retq
;		;
; AVX512BW-LABEL: var_lshr_v32i8:		; AVX512BW-LABEL: var_lshr_v32i8:
; AVX512BW: # %bb.0:		; AVX512BW: # %bb.0:
; AVX512BW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero		; AVX512BW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero
; AVX512BW-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero		; AVX512BW-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0		; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0		; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0
; AVX512BW-NEXT: retq		; AVX512BW-NEXT: retq
;		;
; AVX512VL-LABEL: var_lshr_v32i8:		; AVX512VL-LABEL: var_lshr_v32i8:
; AVX512VL: # %bb.0:		; AVX512VL: # %bb.0:
; AVX512VL-NEXT: vpsllw $5, %ymm1, %ymm1		; AVX512VL-NEXT: vpsllw $5, %ymm1, %ymm1
; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm2		; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm2
; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2		; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
; AVX512VL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0		; AVX512VL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
; AVX512VL-NEXT: vpsrlw $2, %ymm0, %ymm2		; AVX512VL-NEXT: vpsrlw $2, %ymm0, %ymm2
; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2		; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1		; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1
; AVX512VL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0		; AVX512VL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
; AVX512VL-NEXT: vpsrlw $1, %ymm0, %ymm2		; AVX512VL-NEXT: vpsrlw $1, %ymm0, %ymm2
; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2		; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1		; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1
; AVX512VL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0		; AVX512VL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
; AVX512VL-NEXT: retq		; AVX512VL-NEXT: retq
%shift = lshr <32 x i8> %a, %b		%shift = lshr <32 x i8> %a, %b
ret <32 x i8> %shift		ret <32 x i8> %shift
}		}

define <16 x i16> @var_lshr_v16i16(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @var_lshr_v16i16(<16 x i16> %a, <16 x i16> %b) {
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
; AVX512BWVL-NEXT: vpmovwb %ymm0, %xmm0		; AVX512BWVL-NEXT: vpmovwb %ymm0, %xmm0
; AVX512BWVL-NEXT: vzeroupper		; AVX512BWVL-NEXT: vzeroupper
; AVX512BWVL-NEXT: retq		; AVX512BWVL-NEXT: retq
;		;
; AVX256VL-LABEL: var_lshr_v16i8:		; AVX256VL-LABEL: var_lshr_v16i8:
; AVX256VL: # %bb.0:		; AVX256VL: # %bb.0:
; AVX256VL-NEXT: vpsllw $5, %xmm1, %xmm1		; AVX256VL-NEXT: vpsllw $5, %xmm1, %xmm1
; AVX256VL-NEXT: vpsrlw $4, %xmm0, %xmm2		; AVX256VL-NEXT: vpsrlw $4, %xmm0, %xmm2
; AVX256VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm2		; AVX256VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm2, %xmm2
; AVX256VL-NEXT: vpblendvb %xmm1, %xmm2, %xmm0, %xmm0		; AVX256VL-NEXT: vpblendvb %xmm1, %xmm2, %xmm0, %xmm0
; AVX256VL-NEXT: vpsrlw $2, %xmm0, %xmm2		; AVX256VL-NEXT: vpsrlw $2, %xmm0, %xmm2
; AVX256VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm2		; AVX256VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm2, %xmm2
; AVX256VL-NEXT: vpaddb %xmm1, %xmm1, %xmm1		; AVX256VL-NEXT: vpaddb %xmm1, %xmm1, %xmm1
; AVX256VL-NEXT: vpblendvb %xmm1, %xmm2, %xmm0, %xmm0		; AVX256VL-NEXT: vpblendvb %xmm1, %xmm2, %xmm0, %xmm0
; AVX256VL-NEXT: vpsrlw $1, %xmm0, %xmm2		; AVX256VL-NEXT: vpsrlw $1, %xmm0, %xmm2
; AVX256VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm2		; AVX256VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm2, %xmm2
; AVX256VL-NEXT: vpaddb %xmm1, %xmm1, %xmm1		; AVX256VL-NEXT: vpaddb %xmm1, %xmm1, %xmm1
; AVX256VL-NEXT: vpblendvb %xmm1, %xmm2, %xmm0, %xmm0		; AVX256VL-NEXT: vpblendvb %xmm1, %xmm2, %xmm0, %xmm0
; AVX256VL-NEXT: retq		; AVX256VL-NEXT: retq
;		;
; AVX512VL-LABEL: var_lshr_v16i8:		; AVX512VL-LABEL: var_lshr_v16i8:
; AVX512VL: # %bb.0:		; AVX512VL: # %bb.0:
; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero,xmm1[4],zero,zero,zero,xmm1[5],zero,zero,zero,xmm1[6],zero,zero,zero,xmm1[7],zero,zero,zero,xmm1[8],zero,zero,zero,xmm1[9],zero,zero,zero,xmm1[10],zero,zero,zero,xmm1[11],zero,zero,zero,xmm1[12],zero,zero,zero,xmm1[13],zero,zero,zero,xmm1[14],zero,zero,zero,xmm1[15],zero,zero,zero		; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero,xmm1[4],zero,zero,zero,xmm1[5],zero,zero,zero,xmm1[6],zero,zero,zero,xmm1[7],zero,zero,zero,xmm1[8],zero,zero,zero,xmm1[9],zero,zero,zero,xmm1[10],zero,zero,zero,xmm1[11],zero,zero,zero,xmm1[12],zero,zero,zero,xmm1[13],zero,zero,zero,xmm1[14],zero,zero,zero,xmm1[15],zero,zero,zero
; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero		; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero
▲ Show 20 Lines • Show All 198 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/prefer-avx256-trunc.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+prefer-256-bit \| FileCheck %s --check-prefix=AVX256NOBW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+prefer-256-bit \| FileCheck %s --check-prefix=AVX256NOBW
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512NOBW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512NOBW
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+prefer-256-bit \| FileCheck %s --check-prefix=AVX512NOBW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+prefer-256-bit \| FileCheck %s --check-prefix=AVX512NOBW
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512NOBW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512NOBW
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+prefer-256-bit \| FileCheck %s --check-prefix=AVX512BW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+prefer-256-bit \| FileCheck %s --check-prefix=AVX512BW
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512BW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512BW
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,+prefer-256-bit \| FileCheck %s --check-prefix=AVX256BWVL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,+prefer-256-bit \| FileCheck %s --check-prefix=AVX256BWVL
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512BWVL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512BWVL

	define <16 x i8> @testv16i16_trunc_v16i8(<16 x i16> %x) {			define <16 x i8> @testv16i16_trunc_v16i8(<16 x i16> %x) {
	; AVX256NOBW-LABEL: testv16i16_trunc_v16i8:			; AVX256NOBW-LABEL: testv16i16_trunc_v16i8:
	; AVX256NOBW: # %bb.0:			; AVX256NOBW: # %bb.0:
	; AVX256NOBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX256NOBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX256NOBW-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX256NOBW-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX256NOBW-NEXT: vpackuswb %xmm1, %xmm0, %xmm0			; AVX256NOBW-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
	; AVX256NOBW-NEXT: vzeroupper			; AVX256NOBW-NEXT: vzeroupper
	; AVX256NOBW-NEXT: retq			; AVX256NOBW-NEXT: retq
	;			;
	; AVX512NOBW-LABEL: testv16i16_trunc_v16i8:			; AVX512NOBW-LABEL: testv16i16_trunc_v16i8:
	; AVX512NOBW: # %bb.0:			; AVX512NOBW: # %bb.0:
	; AVX512NOBW-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512NOBW-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	Show All 26 Lines

llvm/test/CodeGen/X86/prefer-avx256-wide-mul.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+avx512bw,+prefer-256-bit \| FileCheck %s --check-prefix=AVX256BW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+avx512bw,+prefer-256-bit \| FileCheck %s --check-prefix=AVX256BW
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+avx512bw,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512BW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+avx512bw,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512BWVL
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+prefer-256-bit \| FileCheck %s --check-prefix=AVX512BW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+prefer-256-bit \| FileCheck %s --check-prefix=AVX512BW
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512BW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,-prefer-256-bit \| FileCheck %s --check-prefix=AVX512BW

	define <32 x i8> @test_div7_32i8(<32 x i8> %a) {			define <32 x i8> @test_div7_32i8(<32 x i8> %a) {
	; AVX256BW-LABEL: test_div7_32i8:			; AVX256BW-LABEL: test_div7_32i8:
	; AVX256BW: # %bb.0:			; AVX256BW: # %bb.0:
	; AVX256BW-NEXT: vpxor %xmm1, %xmm1, %xmm1			; AVX256BW-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX256BW-NEXT: vpunpckhbw {{.*#+}} ymm2 = ymm0[8],ymm1[8],ymm0[9],ymm1[9],ymm0[10],ymm1[10],ymm0[11],ymm1[11],ymm0[12],ymm1[12],ymm0[13],ymm1[13],ymm0[14],ymm1[14],ymm0[15],ymm1[15],ymm0[24],ymm1[24],ymm0[25],ymm1[25],ymm0[26],ymm1[26],ymm0[27],ymm1[27],ymm0[28],ymm1[28],ymm0[29],ymm1[29],ymm0[30],ymm1[30],ymm0[31],ymm1[31]			; AVX256BW-NEXT: vpunpckhbw {{.*#+}} ymm2 = ymm0[8],ymm1[8],ymm0[9],ymm1[9],ymm0[10],ymm1[10],ymm0[11],ymm1[11],ymm0[12],ymm1[12],ymm0[13],ymm1[13],ymm0[14],ymm1[14],ymm0[15],ymm1[15],ymm0[24],ymm1[24],ymm0[25],ymm1[25],ymm0[26],ymm1[26],ymm0[27],ymm1[27],ymm0[28],ymm1[28],ymm0[29],ymm1[29],ymm0[30],ymm1[30],ymm0[31],ymm1[31]
	; AVX256BW-NEXT: vmovdqa {{.*#+}} ymm3 = [37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37]			; AVX256BW-NEXT: vmovdqa {{.*#+}} ymm3 = [37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37]
	; AVX256BW-NEXT: vpmullw %ymm3, %ymm2, %ymm2			; AVX256BW-NEXT: vpmullw %ymm3, %ymm2, %ymm2
	; AVX256BW-NEXT: vpsrlw $8, %ymm2, %ymm2			; AVX256BW-NEXT: vpsrlw $8, %ymm2, %ymm2
	; AVX256BW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm0[0],ymm1[0],ymm0[1],ymm1[1],ymm0[2],ymm1[2],ymm0[3],ymm1[3],ymm0[4],ymm1[4],ymm0[5],ymm1[5],ymm0[6],ymm1[6],ymm0[7],ymm1[7],ymm0[16],ymm1[16],ymm0[17],ymm1[17],ymm0[18],ymm1[18],ymm0[19],ymm1[19],ymm0[20],ymm1[20],ymm0[21],ymm1[21],ymm0[22],ymm1[22],ymm0[23],ymm1[23]			; AVX256BW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm0[0],ymm1[0],ymm0[1],ymm1[1],ymm0[2],ymm1[2],ymm0[3],ymm1[3],ymm0[4],ymm1[4],ymm0[5],ymm1[5],ymm0[6],ymm1[6],ymm0[7],ymm1[7],ymm0[16],ymm1[16],ymm0[17],ymm1[17],ymm0[18],ymm1[18],ymm0[19],ymm1[19],ymm0[20],ymm1[20],ymm0[21],ymm1[21],ymm0[22],ymm1[22],ymm0[23],ymm1[23]
	; AVX256BW-NEXT: vpmullw %ymm3, %ymm1, %ymm1			; AVX256BW-NEXT: vpmullw %ymm3, %ymm1, %ymm1
	; AVX256BW-NEXT: vpsrlw $8, %ymm1, %ymm1			; AVX256BW-NEXT: vpsrlw $8, %ymm1, %ymm1
	; AVX256BW-NEXT: vpackuswb %ymm2, %ymm1, %ymm1			; AVX256BW-NEXT: vpackuswb %ymm2, %ymm1, %ymm1
	; AVX256BW-NEXT: vpsubb %ymm1, %ymm0, %ymm0			; AVX256BW-NEXT: vpsubb %ymm1, %ymm0, %ymm0
	; AVX256BW-NEXT: vpsrlw $1, %ymm0, %ymm0			; AVX256BW-NEXT: vpsrlw $1, %ymm0, %ymm0
	; AVX256BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX256BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX256BW-NEXT: vpaddb %ymm1, %ymm0, %ymm0			; AVX256BW-NEXT: vpaddb %ymm1, %ymm0, %ymm0
	; AVX256BW-NEXT: vpsrlw $2, %ymm0, %ymm0			; AVX256BW-NEXT: vpsrlw $2, %ymm0, %ymm0
	; AVX256BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX256BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX256BW-NEXT: retq			; AVX256BW-NEXT: retq
	;			;
				; AVX512BWVL-LABEL: test_div7_32i8:
				; AVX512BWVL: # %bb.0:
				; AVX512BWVL-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
				; AVX512BWVL-NEXT: vpmullw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1
				; AVX512BWVL-NEXT: vpsrlw $8, %zmm1, %zmm1
				; AVX512BWVL-NEXT: vpmovwb %zmm1, %ymm1
				; AVX512BWVL-NEXT: vpsubb %ymm1, %ymm0, %ymm0
				; AVX512BWVL-NEXT: vpsrlw $1, %ymm0, %ymm0
				; AVX512BWVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
				; AVX512BWVL-NEXT: vpaddb %ymm1, %ymm0, %ymm0
				; AVX512BWVL-NEXT: vpsrlw $2, %ymm0, %ymm0
				; AVX512BWVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
				; AVX512BWVL-NEXT: retq
				;
	; AVX512BW-LABEL: test_div7_32i8:			; AVX512BW-LABEL: test_div7_32i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero			; AVX512BW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
	; AVX512BW-NEXT: vpmullw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpmullw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1
	; AVX512BW-NEXT: vpsrlw $8, %zmm1, %zmm1			; AVX512BW-NEXT: vpsrlw $8, %zmm1, %zmm1
	; AVX512BW-NEXT: vpmovwb %zmm1, %ymm1			; AVX512BW-NEXT: vpmovwb %zmm1, %ymm1
	; AVX512BW-NEXT: vpsubb %ymm1, %ymm0, %ymm0			; AVX512BW-NEXT: vpsubb %ymm1, %ymm0, %ymm0
	; AVX512BW-NEXT: vpsrlw $1, %ymm0, %ymm0			; AVX512BW-NEXT: vpsrlw $1, %ymm0, %ymm0
	Show All 16 Lines
	; AVX256BW-NEXT: vpand %ymm3, %ymm2, %ymm2			; AVX256BW-NEXT: vpand %ymm3, %ymm2, %ymm2
	; AVX256BW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]			; AVX256BW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]
	; AVX256BW-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]			; AVX256BW-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]
	; AVX256BW-NEXT: vpmullw %ymm1, %ymm0, %ymm0			; AVX256BW-NEXT: vpmullw %ymm1, %ymm0, %ymm0
	; AVX256BW-NEXT: vpand %ymm3, %ymm0, %ymm0			; AVX256BW-NEXT: vpand %ymm3, %ymm0, %ymm0
	; AVX256BW-NEXT: vpackuswb %ymm2, %ymm0, %ymm0			; AVX256BW-NEXT: vpackuswb %ymm2, %ymm0, %ymm0
	; AVX256BW-NEXT: retq			; AVX256BW-NEXT: retq
	;			;
				; AVX512BWVL-LABEL: test_mul_32i8:
				; AVX512BWVL: # %bb.0:
				; AVX512BWVL-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero
				; AVX512BWVL-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
				; AVX512BWVL-NEXT: vpmullw %zmm1, %zmm0, %zmm0
				; AVX512BWVL-NEXT: vpmovwb %zmm0, %ymm0
				; AVX512BWVL-NEXT: retq
				;
	; AVX512BW-LABEL: test_mul_32i8:			; AVX512BW-LABEL: test_mul_32i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero			; AVX512BW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero
	; AVX512BW-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero			; AVX512BW-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
	; AVX512BW-NEXT: vpmullw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpmullw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0			; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%res = mul <32 x i8> %a, %b			%res = mul <32 x i8> %a, %b
	ret <32 x i8> %res			ret <32 x i8> %res
	}			}

llvm/test/CodeGen/X86/psubus.ll

	Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; SSE-NEXT: pxor %xmm1, %xmm1			; SSE-NEXT: pxor %xmm1, %xmm1
	; SSE-NEXT: pcmpgtb %xmm0, %xmm1			; SSE-NEXT: pcmpgtb %xmm0, %xmm1
	; SSE-NEXT: movdqa %xmm1, (%rdi)			; SSE-NEXT: movdqa %xmm1, (%rdi)
	; SSE-NEXT: pxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; SSE-NEXT: pxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; SSE-NEXT: movdqa %xmm0, (%rsi)			; SSE-NEXT: movdqa %xmm0, (%rsi)
	; SSE-NEXT: pand %xmm1, %xmm0			; SSE-NEXT: pand %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: ashr_xor_and_commute_uses:			; AVX1-LABEL: ashr_xor_and_commute_uses:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vpxor %xmm1, %xmm1, %xmm1			; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm1			; AVX1-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm1
	; AVX-NEXT: vmovdqa %xmm1, (%rdi)			; AVX1-NEXT: vmovdqa %xmm1, (%rdi)
	; AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX1-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vpand %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpand %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: ashr_xor_and_commute_uses:
				; AVX2: # %bb.0:
				; AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1
				; AVX2-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm1
				; AVX2-NEXT: vmovdqa %xmm1, (%rdi)
				; AVX2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX2-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX2-NEXT: vpand %xmm1, %xmm0, %xmm0
				; AVX2-NEXT: retq
				;
				; AVX512-LABEL: ashr_xor_and_commute_uses:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vpxor %xmm1, %xmm1, %xmm1
				; AVX512-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm1
				; AVX512-NEXT: vmovdqa %xmm1, (%rdi)
				; AVX512-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX512-NEXT: vpand %xmm1, %xmm0, %xmm0
				; AVX512-NEXT: retq
	%signsplat = ashr <16 x i8> %x, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>			%signsplat = ashr <16 x i8> %x, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
	store <16 x i8> %signsplat, ptr %p1			store <16 x i8> %signsplat, ptr %p1
	%flipsign = xor <16 x i8> %x, <i8 undef, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128>			%flipsign = xor <16 x i8> %x, <i8 undef, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128, i8 128>
	store <16 x i8> %flipsign, ptr %p2			store <16 x i8> %flipsign, ptr %p2
	%res = and <16 x i8> %flipsign, %signsplat			%res = and <16 x i8> %flipsign, %signsplat
	ret <16 x i8> %res			ret <16 x i8> %res
	}			}

	▲ Show 20 Lines • Show All 3,012 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/rotate-extract-vector.ll

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
ret <2 x i64> %out		ret <2 x i64> %out
}		}

define <4 x i32> @vrolw_extract_mul_with_mask(<4 x i32> %i) nounwind {		define <4 x i32> @vrolw_extract_mul_with_mask(<4 x i32> %i) nounwind {
; X86-LABEL: vrolw_extract_mul_with_mask:		; X86-LABEL: vrolw_extract_mul_with_mask:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: vpmulld {{\.?LCPI[0-9]+_[0-9]+}}{1to4}, %xmm0, %xmm0		; X86-NEXT: vpmulld {{\.?LCPI[0-9]+_[0-9]+}}{1to4}, %xmm0, %xmm0
; X86-NEXT: vprold $7, %xmm0, %xmm0		; X86-NEXT: vprold $7, %xmm0, %xmm0
; X86-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0		; X86-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}{1to4}, %xmm0, %xmm0
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: vrolw_extract_mul_with_mask:		; X64-LABEL: vrolw_extract_mul_with_mask:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vpmulld {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0		; X64-NEXT: vpmulld {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
; X64-NEXT: vprold $7, %xmm0, %xmm0		; X64-NEXT: vprold $7, %xmm0, %xmm0
; X64-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; X64-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
; X64-NEXT: retq		; X64-NEXT: retq
%lhs_mul = mul <4 x i32> %i, <i32 1152, i32 1152, i32 1152, i32 1152>		%lhs_mul = mul <4 x i32> %i, <i32 1152, i32 1152, i32 1152, i32 1152>
%rhs_mul = mul <4 x i32> %i, <i32 9, i32 9, i32 9, i32 9>		%rhs_mul = mul <4 x i32> %i, <i32 9, i32 9, i32 9, i32 9>
%lhs_and = and <4 x i32> %lhs_mul, <i32 160, i32 160, i32 160, i32 160>		%lhs_and = and <4 x i32> %lhs_mul, <i32 160, i32 160, i32 160, i32 160>
%rhs_shift = lshr <4 x i32> %rhs_mul, <i32 25, i32 25, i32 25, i32 25>		%rhs_shift = lshr <4 x i32> %rhs_mul, <i32 25, i32 25, i32 25, i32 25>
%out = or <4 x i32> %lhs_and, %rhs_shift		%out = or <4 x i32> %lhs_and, %rhs_shift
ret <4 x i32> %out		ret <4 x i32> %out
}		}
Show All 22 Lines
}		}

; Result would undershift		; Result would undershift
define <4 x i64> @no_extract_shl(<4 x i64> %i) nounwind {		define <4 x i64> @no_extract_shl(<4 x i64> %i) nounwind {
; X86-LABEL: no_extract_shl:		; X86-LABEL: no_extract_shl:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: vpsllq $24, %ymm0, %ymm1		; X86-NEXT: vpsllq $24, %ymm0, %ymm1
; X86-NEXT: vpsrlq $39, %ymm0, %ymm0		; X86-NEXT: vpsrlq $39, %ymm0, %ymm0
; X86-NEXT: vpternlogq $236, {{\.?LCPI[0-9]+_[0-9]+}}, %ymm1, %ymm0		; X86-NEXT: vpternlogq $236, {{\.?LCPI[0-9]+_[0-9]+}}{1to4}, %ymm1, %ymm0
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: no_extract_shl:		; X64-LABEL: no_extract_shl:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vpsllq $24, %ymm0, %ymm1		; X64-NEXT: vpsllq $24, %ymm0, %ymm1
; X64-NEXT: vpsrlq $39, %ymm0, %ymm0		; X64-NEXT: vpsrlq $39, %ymm0, %ymm0
; X64-NEXT: vpternlogq $236, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm1, %ymm0		; X64-NEXT: vpternlogq $236, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm1, %ymm0
; X64-NEXT: retq		; X64-NEXT: retq
▲ Show 20 Lines • Show All 165 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/rotate_vec.ll

	Show All 39 Lines
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: vprotd $31, %xmm0, %xmm0			; XOP-NEXT: vprotd $31, %xmm0, %xmm0
	; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; XOP-NEXT: retq			; XOP-NEXT: retq
	;			;
	; AVX512-LABEL: rot_v4i32_splat_2masks:			; AVX512-LABEL: rot_v4i32_splat_2masks:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vprold $31, %xmm0, %xmm0			; AVX512-NEXT: vprold $31, %xmm0, %xmm0
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%1 = lshr <4 x i32> %x, <i32 1, i32 1, i32 1, i32 1>			%1 = lshr <4 x i32> %x, <i32 1, i32 1, i32 1, i32 1>
	%2 = and <4 x i32> %1, <i32 4294901760, i32 4294901760, i32 4294901760, i32 4294901760>			%2 = and <4 x i32> %1, <i32 4294901760, i32 4294901760, i32 4294901760, i32 4294901760>

	%3 = shl <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>			%3 = shl <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
	%4 = and <4 x i32> %3, <i32 0, i32 4294901760, i32 0, i32 4294901760>			%4 = and <4 x i32> %3, <i32 0, i32 4294901760, i32 0, i32 4294901760>
	%5 = or <4 x i32> %2, %4			%5 = or <4 x i32> %2, %4
	ret <4 x i32> %5			ret <4 x i32> %5
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; XOPAVX2: # %bb.0:			; XOPAVX2: # %bb.0:
	; XOPAVX2-NEXT: vpsravd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; XOPAVX2-NEXT: vpsravd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; XOPAVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; XOPAVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; XOPAVX2-NEXT: retq			; XOPAVX2-NEXT: retq
	;			;
	; AVX512-LABEL: rot_v4i32_mask_ashr0:			; AVX512-LABEL: rot_v4i32_mask_ashr0:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpsravd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: vpsravd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%1 = ashr <4 x i32> %a0, <i32 25, i32 26, i32 27, i32 28>			%1 = ashr <4 x i32> %a0, <i32 25, i32 26, i32 27, i32 28>
	%2 = call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %1, <4 x i32> %1, <4 x i32> <i32 1, i32 1, i32 1, i32 1>)			%2 = call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %1, <4 x i32> %1, <4 x i32> <i32 1, i32 1, i32 1, i32 1>)
	%3 = ashr <4 x i32> %2, <i32 1, i32 2, i32 3, i32 4>			%3 = ashr <4 x i32> %2, <i32 1, i32 2, i32 3, i32 4>
	%4 = and <4 x i32> %3, <i32 -32768, i32 -65536, i32 -32768, i32 -65536>			%4 = and <4 x i32> %3, <i32 -32768, i32 -65536, i32 -32768, i32 -65536>
	ret <4 x i32> %4			ret <4 x i32> %4
	}			}

	Show All 11 Lines
	; XOPAVX2-NEXT: vpbroadcastd %xmm0, %xmm0			; XOPAVX2-NEXT: vpbroadcastd %xmm0, %xmm0
	; XOPAVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; XOPAVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; XOPAVX2-NEXT: retq			; XOPAVX2-NEXT: retq
	;			;
	; AVX512-LABEL: rot_v4i32_mask_ashr1:			; AVX512-LABEL: rot_v4i32_mask_ashr1:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpsrad $25, %xmm0, %xmm0			; AVX512-NEXT: vpsrad $25, %xmm0, %xmm0
	; AVX512-NEXT: vpbroadcastd %xmm0, %xmm0			; AVX512-NEXT: vpbroadcastd %xmm0, %xmm0
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%1 = ashr <4 x i32> %a0, <i32 25, i32 26, i32 27, i32 28>			%1 = ashr <4 x i32> %a0, <i32 25, i32 26, i32 27, i32 28>
	%2 = call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %1, <4 x i32> %1, <4 x i32> <i32 1, i32 2, i32 3, i32 4>)			%2 = call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %1, <4 x i32> %1, <4 x i32> <i32 1, i32 2, i32 3, i32 4>)
	%3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> zeroinitializer			%3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> zeroinitializer
	%4 = ashr <4 x i32> %3, <i32 1, i32 2, i32 3, i32 4>			%4 = ashr <4 x i32> %3, <i32 1, i32 2, i32 3, i32 4>
	%5 = and <4 x i32> %4, <i32 -4096, i32 -8192, i32 -4096, i32 -8192>			%5 = and <4 x i32> %4, <i32 -4096, i32 -8192, i32 -4096, i32 -8192>
	ret <4 x i32> %5			ret <4 x i32> %5
	}			}
	▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/sadd_sat_vec.ll

	Show First 20 Lines • Show All 561 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vpsllw $4, %xmm1, %xmm1			; AVX512BW-NEXT: vpsllw $4, %xmm1, %xmm1
	; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm2 = [240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240]			; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm2 = [240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240]
	; AVX512BW-NEXT: vpand %xmm2, %xmm1, %xmm1			; AVX512BW-NEXT: vpand %xmm2, %xmm1, %xmm1
	; AVX512BW-NEXT: vpsllw $4, %xmm0, %xmm0			; AVX512BW-NEXT: vpsllw $4, %xmm0, %xmm0
	; AVX512BW-NEXT: vpand %xmm2, %xmm0, %xmm0			; AVX512BW-NEXT: vpand %xmm2, %xmm0, %xmm0
	; AVX512BW-NEXT: vpaddsb %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpaddsb %xmm1, %xmm0, %xmm0
	; AVX512BW-NEXT: vpsrlw $4, %xmm0, %xmm0			; AVX512BW-NEXT: vpsrlw $4, %xmm0, %xmm0
	; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm1 = [8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]			; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm1 = [8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]
	; AVX512BW-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0			; AVX512BW-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm0
	; AVX512BW-NEXT: vpsubb %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpsubb %xmm1, %xmm0, %xmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%z = call <16 x i4> @llvm.sadd.sat.v16i4(<16 x i4> %x, <16 x i4> %y)			%z = call <16 x i4> @llvm.sadd.sat.v16i4(<16 x i4> %x, <16 x i4> %y)
	ret <16 x i4> %z			ret <16 x i4> %z
	}			}

	define <16 x i1> @v16i1(<16 x i1> %x, <16 x i1> %y) nounwind {			define <16 x i1> @v16i1(<16 x i1> %x, <16 x i1> %y) nounwind {
	; SSE-LABEL: v16i1:			; SSE-LABEL: v16i1:
	▲ Show 20 Lines • Show All 1,272 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/srem-seteq-vec-nonsplat.ll

	Show First 20 Lines • Show All 2,483 Lines • ▼ Show 20 Lines
	; CHECK-AVX512VL-NEXT: vpunpcklbw {{.*#+}} ymm2 = ymm2[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]			; CHECK-AVX512VL-NEXT: vpunpcklbw {{.*#+}} ymm2 = ymm2[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]
	; CHECK-AVX512VL-NEXT: vpmullw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2			; CHECK-AVX512VL-NEXT: vpmullw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2
	; CHECK-AVX512VL-NEXT: vpsrlw $8, %ymm2, %ymm2			; CHECK-AVX512VL-NEXT: vpsrlw $8, %ymm2, %ymm2
	; CHECK-AVX512VL-NEXT: vpackuswb %ymm3, %ymm2, %ymm2			; CHECK-AVX512VL-NEXT: vpackuswb %ymm3, %ymm2, %ymm2
	; CHECK-AVX512VL-NEXT: vpminub {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm3			; CHECK-AVX512VL-NEXT: vpminub {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm3
	; CHECK-AVX512VL-NEXT: vpcmpeqb %ymm3, %ymm2, %ymm2			; CHECK-AVX512VL-NEXT: vpcmpeqb %ymm3, %ymm2, %ymm2
	; CHECK-AVX512VL-NEXT: vmovdqa {{.*#+}} ymm3 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0,255,255,255,255,255,255,255,255,255,255]			; CHECK-AVX512VL-NEXT: vmovdqa {{.*#+}} ymm3 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0,255,255,255,255,255,255,255,255,255,255]
	; CHECK-AVX512VL-NEXT: vpandn %ymm3, %ymm2, %ymm2			; CHECK-AVX512VL-NEXT: vpandn %ymm3, %ymm2, %ymm2
	; CHECK-AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; CHECK-AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; CHECK-AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4			; CHECK-AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; CHECK-AVX512VL-NEXT: vpcmpgtb %ymm4, %ymm0, %ymm0			; CHECK-AVX512VL-NEXT: vpcmpgtb %ymm4, %ymm0, %ymm0
	; CHECK-AVX512VL-NEXT: vpandn %ymm0, %ymm3, %ymm3			; CHECK-AVX512VL-NEXT: vpandn %ymm0, %ymm3, %ymm3
	; CHECK-AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm1, %ymm0			; CHECK-AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm1, %ymm0
	; CHECK-AVX512VL-NEXT: vpternlogq $14, %ymm3, %ymm2, %ymm0			; CHECK-AVX512VL-NEXT: vpternlogq $14, %ymm3, %ymm2, %ymm0
	; CHECK-AVX512VL-NEXT: retq			; CHECK-AVX512VL-NEXT: retq
	%rem = srem <32 x i8> %x, <i8 13, i8 5, i8 19, i8 34, i8 2, i8 8, i8 2, i8 88, i8 62, i8 62, i8 5, i8 7, i8 97, i8 2, i8 3, i8 60, i8 3, i8 87, i8 7, i8 6, i8 84, i8 -128, i8 127, i8 56, i8 114, i8 1, i8 50, i8 7, i8 2, i8 8, i8 97, i8 117>			%rem = srem <32 x i8> %x, <i8 13, i8 5, i8 19, i8 34, i8 2, i8 8, i8 2, i8 88, i8 62, i8 62, i8 5, i8 7, i8 97, i8 2, i8 3, i8 60, i8 3, i8 87, i8 7, i8 6, i8 84, i8 -128, i8 127, i8 56, i8 114, i8 1, i8 50, i8 7, i8 2, i8 8, i8 97, i8 117>
	%cmp = icmp ne <32 x i8> %rem, zeroinitializer			%cmp = icmp ne <32 x i8> %rem, zeroinitializer
	%cmp4 = icmp ne <32 x i8> %y, zeroinitializer			%cmp4 = icmp ne <32 x i8> %y, zeroinitializer
	%cmpres = and <32 x i1> %cmp4, %cmp			%cmpres = and <32 x i1> %cmp4, %cmp
	ret <32 x i1> %cmpres			ret <32 x i1> %cmpres
	}			}

llvm/test/CodeGen/X86/ssub_sat_vec.ll

	Show First 20 Lines • Show All 561 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vpsllw $4, %xmm1, %xmm1			; AVX512BW-NEXT: vpsllw $4, %xmm1, %xmm1
	; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm2 = [240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240]			; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm2 = [240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240]
	; AVX512BW-NEXT: vpand %xmm2, %xmm1, %xmm1			; AVX512BW-NEXT: vpand %xmm2, %xmm1, %xmm1
	; AVX512BW-NEXT: vpsllw $4, %xmm0, %xmm0			; AVX512BW-NEXT: vpsllw $4, %xmm0, %xmm0
	; AVX512BW-NEXT: vpand %xmm2, %xmm0, %xmm0			; AVX512BW-NEXT: vpand %xmm2, %xmm0, %xmm0
	; AVX512BW-NEXT: vpsubsb %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpsubsb %xmm1, %xmm0, %xmm0
	; AVX512BW-NEXT: vpsrlw $4, %xmm0, %xmm0			; AVX512BW-NEXT: vpsrlw $4, %xmm0, %xmm0
	; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm1 = [8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]			; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm1 = [8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]
	; AVX512BW-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0			; AVX512BW-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm0
	; AVX512BW-NEXT: vpsubb %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpsubb %xmm1, %xmm0, %xmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%z = call <16 x i4> @llvm.ssub.sat.v16i4(<16 x i4> %x, <16 x i4> %y)			%z = call <16 x i4> @llvm.ssub.sat.v16i4(<16 x i4> %x, <16 x i4> %y)
	ret <16 x i4> %z			ret <16 x i4> %z
	}			}

	define <16 x i1> @v16i1(<16 x i1> %x, <16 x i1> %y) nounwind {			define <16 x i1> @v16i1(<16 x i1> %x, <16 x i1> %y) nounwind {
	; SSE-LABEL: v16i1:			; SSE-LABEL: v16i1:
	Show All 17 Lines
	; AVX512F-LABEL: v16i1:			; AVX512F-LABEL: v16i1:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vxorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512F-NEXT: vxorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
	; AVX512F-NEXT: vandps %xmm1, %xmm0, %xmm0			; AVX512F-NEXT: vandps %xmm1, %xmm0, %xmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: v16i1:			; AVX512BW-LABEL: v16i1:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpternlogq $96, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0			; AVX512BW-NEXT: vpternlogd $96, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%z = call <16 x i1> @llvm.ssub.sat.v16i1(<16 x i1> %x, <16 x i1> %y)			%z = call <16 x i1> @llvm.ssub.sat.v16i1(<16 x i1> %x, <16 x i1> %y)
	ret <16 x i1> %z			ret <16 x i1> %z
	}			}

	; Expanded			; Expanded

	define <2 x i32> @v2i32(<2 x i32> %x, <2 x i32> %y) nounwind {			define <2 x i32> @v2i32(<2 x i32> %x, <2 x i32> %y) nounwind {
	▲ Show 20 Lines • Show All 1,469 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/usub_sat_vec.ll

	Show First 20 Lines • Show All 521 Lines • ▼ Show 20 Lines
	; AVX512F-LABEL: v16i1:			; AVX512F-LABEL: v16i1:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vxorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512F-NEXT: vxorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
	; AVX512F-NEXT: vandps %xmm1, %xmm0, %xmm0			; AVX512F-NEXT: vandps %xmm1, %xmm0, %xmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: v16i1:			; AVX512BW-LABEL: v16i1:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpternlogq $96, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0			; AVX512BW-NEXT: vpternlogd $96, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%z = call <16 x i1> @llvm.usub.sat.v16i1(<16 x i1> %x, <16 x i1> %y)			%z = call <16 x i1> @llvm.usub.sat.v16i1(<16 x i1> %x, <16 x i1> %y)
	ret <16 x i1> %z			ret <16 x i1> %z
	}			}

	; Expanded			; Expanded

	define <2 x i32> @v2i32(<2 x i32> %x, <2 x i32> %y) nounwind {			define <2 x i32> @v2i32(<2 x i32> %x, <2 x i32> %y) nounwind {
	▲ Show 20 Lines • Show All 610 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vec-strict-inttofp-128-fp16.ll

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	%result = call <8 x half> @llvm.experimental.constrained.sitofp.v8f16.v8i1(<8 x i1> %x,
metadata !"round.dynamic",		metadata !"round.dynamic",
metadata !"fpexcept.strict") #0		metadata !"fpexcept.strict") #0
ret <8 x half> %result		ret <8 x half> %result
}		}

define <8 x half> @uitofp_v8i1_v8f16(<8 x i1> %x) #0 {		define <8 x half> @uitofp_v8i1_v8f16(<8 x i1> %x) #0 {
; X86-LABEL: uitofp_v8i1_v8f16:		; X86-LABEL: uitofp_v8i1_v8f16:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0		; X86-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}{1to4}, %xmm0, %xmm0
; X86-NEXT: vcvtuw2ph %xmm0, %xmm0		; X86-NEXT: vcvtuw2ph %xmm0, %xmm0
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: uitofp_v8i1_v8f16:		; X64-LABEL: uitofp_v8i1_v8f16:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; X64-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
; X64-NEXT: vcvtuw2ph %xmm0, %xmm0		; X64-NEXT: vcvtuw2ph %xmm0, %xmm0
; X64-NEXT: retq		; X64-NEXT: retq
%result = call <8 x half> @llvm.experimental.constrained.uitofp.v8f16.v8i1(<8 x i1> %x,		%result = call <8 x half> @llvm.experimental.constrained.uitofp.v8f16.v8i1(<8 x i1> %x,
metadata !"round.dynamic",		metadata !"round.dynamic",
metadata !"fpexcept.strict") #0		metadata !"fpexcept.strict") #0
ret <8 x half> %result		ret <8 x half> %result
}		}

▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vec-strict-inttofp-256-fp16.ll

Show All 26 Lines	%result = call <16 x half> @llvm.experimental.constrained.sitofp.v16f16.v16i1(<16 x i1> %x,
metadata !"round.dynamic",		metadata !"round.dynamic",
metadata !"fpexcept.strict") #0		metadata !"fpexcept.strict") #0
ret <16 x half> %result		ret <16 x half> %result
}		}

define <16 x half> @uitofp_v16i1_v16f16(<16 x i1> %x) #0 {		define <16 x half> @uitofp_v16i1_v16f16(<16 x i1> %x) #0 {
; X86-LABEL: uitofp_v16i1_v16f16:		; X86-LABEL: uitofp_v16i1_v16f16:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0		; X86-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}{1to4}, %xmm0, %xmm0
; X86-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero		; X86-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero
; X86-NEXT: vcvtuw2ph %ymm0, %ymm0		; X86-NEXT: vcvtuw2ph %ymm0, %ymm0
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: uitofp_v16i1_v16f16:		; X64-LABEL: uitofp_v16i1_v16f16:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; X64-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
; X64-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero		; X64-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero
; X64-NEXT: vcvtuw2ph %ymm0, %ymm0		; X64-NEXT: vcvtuw2ph %ymm0, %ymm0
; X64-NEXT: retq		; X64-NEXT: retq
%result = call <16 x half> @llvm.experimental.constrained.uitofp.v16f16.v16i1(<16 x i1> %x,		%result = call <16 x half> @llvm.experimental.constrained.uitofp.v16f16.v16i1(<16 x i1> %x,
metadata !"round.dynamic",		metadata !"round.dynamic",
metadata !"fpexcept.strict") #0		metadata !"fpexcept.strict") #0
ret <16 x half> %result		ret <16 x half> %result
}		}
▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll

	Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines
	; AVX512F-64: # %bb.0:			; AVX512F-64: # %bb.0:
	; AVX512F-64-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512F-64-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512F-64-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX512F-64-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX512F-64-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX512F-64-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX512F-64-NEXT: retq			; AVX512F-64-NEXT: retq
	;			;
	; AVX512VL-32-LABEL: uitofp_v8i1_v8f32:			; AVX512VL-32-LABEL: uitofp_v8i1_v8f32:
	; AVX512VL-32: # %bb.0:			; AVX512VL-32: # %bb.0:
	; AVX512VL-32-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0			; AVX512VL-32-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}{1to4}, %xmm0, %xmm0
	; AVX512VL-32-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX512VL-32-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX512VL-32-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX512VL-32-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX512VL-32-NEXT: retl			; AVX512VL-32-NEXT: retl
	;			;
	; AVX512VL-64-LABEL: uitofp_v8i1_v8f32:			; AVX512VL-64-LABEL: uitofp_v8i1_v8f32:
	; AVX512VL-64: # %bb.0:			; AVX512VL-64: # %bb.0:
	; AVX512VL-64-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VL-64-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VL-64-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX512VL-64-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX512VL-64-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX512VL-64-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX512VL-64-NEXT: retq			; AVX512VL-64-NEXT: retq
	;			;
	; AVX512DQ-32-LABEL: uitofp_v8i1_v8f32:			; AVX512DQ-32-LABEL: uitofp_v8i1_v8f32:
	; AVX512DQ-32: # %bb.0:			; AVX512DQ-32: # %bb.0:
	; AVX512DQ-32-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0			; AVX512DQ-32-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
	; AVX512DQ-32-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX512DQ-32-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX512DQ-32-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX512DQ-32-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX512DQ-32-NEXT: retl			; AVX512DQ-32-NEXT: retl
	;			;
	; AVX512DQ-64-LABEL: uitofp_v8i1_v8f32:			; AVX512DQ-64-LABEL: uitofp_v8i1_v8f32:
	; AVX512DQ-64: # %bb.0:			; AVX512DQ-64: # %bb.0:
	; AVX512DQ-64-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512DQ-64-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512DQ-64-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX512DQ-64-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX512DQ-64-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX512DQ-64-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX512DQ-64-NEXT: retq			; AVX512DQ-64-NEXT: retq
	;			;
	; AVX512DQVL-32-LABEL: uitofp_v8i1_v8f32:			; AVX512DQVL-32-LABEL: uitofp_v8i1_v8f32:
	; AVX512DQVL-32: # %bb.0:			; AVX512DQVL-32: # %bb.0:
	; AVX512DQVL-32-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0			; AVX512DQVL-32-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}{1to4}, %xmm0, %xmm0
	; AVX512DQVL-32-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX512DQVL-32-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX512DQVL-32-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX512DQVL-32-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX512DQVL-32-NEXT: retl			; AVX512DQVL-32-NEXT: retl
	;			;
	; AVX512DQVL-64-LABEL: uitofp_v8i1_v8f32:			; AVX512DQVL-64-LABEL: uitofp_v8i1_v8f32:
	; AVX512DQVL-64: # %bb.0:			; AVX512DQVL-64: # %bb.0:
	; AVX512DQVL-64-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512DQVL-64-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512DQVL-64-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX512DQVL-64-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX512DQVL-64-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX512DQVL-64-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX512DQVL-64-NEXT: retq			; AVX512DQVL-64-NEXT: retq
	%result = call <8 x float> @llvm.experimental.constrained.uitofp.v8f32.v8i1(<8 x i1> %x,			%result = call <8 x float> @llvm.experimental.constrained.uitofp.v8f32.v8i1(<8 x i1> %x,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict") #0			metadata !"fpexcept.strict") #0
	ret <8 x float> %result			ret <8 x float> %result
	}			}
	▲ Show 20 Lines • Show All 1,003 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vec-strict-inttofp-512-fp16.ll

Show All 24 Lines	%result = call <32 x half> @llvm.experimental.constrained.sitofp.v32f16.v32i1(<32 x i1> %x,
metadata !"round.dynamic",		metadata !"round.dynamic",
metadata !"fpexcept.strict") #0		metadata !"fpexcept.strict") #0
ret <32 x half> %result		ret <32 x half> %result
}		}

define <32 x half> @uitofp_v32i1_v32f16(<32 x i1> %x) #0 {		define <32 x half> @uitofp_v32i1_v32f16(<32 x i1> %x) #0 {
; X86-LABEL: uitofp_v32i1_v32f16:		; X86-LABEL: uitofp_v32i1_v32f16:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}, %ymm0, %ymm0		; X86-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}{1to8}, %ymm0, %ymm0
; X86-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero		; X86-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
; X86-NEXT: vcvtuw2ph %zmm0, %zmm0		; X86-NEXT: vcvtuw2ph %zmm0, %zmm0
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: uitofp_v32i1_v32f16:		; X64-LABEL: uitofp_v32i1_v32f16:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0		; X64-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
; X64-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero		; X64-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
; X64-NEXT: vcvtuw2ph %zmm0, %zmm0		; X64-NEXT: vcvtuw2ph %zmm0, %zmm0
; X64-NEXT: retq		; X64-NEXT: retq
%result = call <32 x half> @llvm.experimental.constrained.uitofp.v32f16.v32i1(<32 x i1> %x,		%result = call <32 x half> @llvm.experimental.constrained.uitofp.v32f16.v32i1(<32 x i1> %x,
metadata !"round.dynamic",		metadata !"round.dynamic",
metadata !"fpexcept.strict") #0		metadata !"fpexcept.strict") #0
ret <32 x half> %result		ret <32 x half> %result
}		}
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-fshl-128.ll

	Show First 20 Lines • Show All 530 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: var_funnnel_v8i16:			; AVX512VL-LABEL: var_funnnel_v8i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX512VL-NEXT: vpslld $16, %ymm0, %ymm0			; AVX512VL-NEXT: vpslld $16, %ymm0, %ymm0
	; AVX512VL-NEXT: vpblendw {{.*#+}} ymm0 = ymm1[0],ymm0[1],ymm1[2],ymm0[3],ymm1[4],ymm0[5],ymm1[6],ymm0[7],ymm1[8],ymm0[9],ymm1[10],ymm0[11],ymm1[12],ymm0[13],ymm1[14],ymm0[15]			; AVX512VL-NEXT: vpblendw {{.*#+}} ymm0 = ymm1[0],ymm0[1],ymm1[2],ymm0[3],ymm1[4],ymm0[5],ymm1[6],ymm0[7],ymm1[8],ymm0[9],ymm1[10],ymm0[11],ymm1[12],ymm0[13],ymm1[14],ymm0[15]
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm1			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm2, %xmm1
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero
	; AVX512VL-NEXT: vpsllvd %ymm1, %ymm0, %ymm0			; AVX512VL-NEXT: vpsllvd %ymm1, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsrld $16, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrld $16, %ymm0, %ymm0
	; AVX512VL-NEXT: vpmovdw %ymm0, %xmm0			; AVX512VL-NEXT: vpmovdw %ymm0, %xmm0
	; AVX512VL-NEXT: vzeroupper			; AVX512VL-NEXT: vzeroupper
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v8i16:			; AVX512BW-LABEL: var_funnnel_v8i16:
	▲ Show 20 Lines • Show All 252 Lines • ▼ Show 20 Lines
	; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm3 = [7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]			; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm3 = [7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]
	; AVX512VL-NEXT: vpand %xmm3, %xmm2, %xmm4			; AVX512VL-NEXT: vpand %xmm3, %xmm2, %xmm4
	; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm4 = xmm4[0],zero,zero,zero,xmm4[1],zero,zero,zero,xmm4[2],zero,zero,zero,xmm4[3],zero,zero,zero,xmm4[4],zero,zero,zero,xmm4[5],zero,zero,zero,xmm4[6],zero,zero,zero,xmm4[7],zero,zero,zero,xmm4[8],zero,zero,zero,xmm4[9],zero,zero,zero,xmm4[10],zero,zero,zero,xmm4[11],zero,zero,zero,xmm4[12],zero,zero,zero,xmm4[13],zero,zero,zero,xmm4[14],zero,zero,zero,xmm4[15],zero,zero,zero			; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm4 = xmm4[0],zero,zero,zero,xmm4[1],zero,zero,zero,xmm4[2],zero,zero,zero,xmm4[3],zero,zero,zero,xmm4[4],zero,zero,zero,xmm4[5],zero,zero,zero,xmm4[6],zero,zero,zero,xmm4[7],zero,zero,zero,xmm4[8],zero,zero,zero,xmm4[9],zero,zero,zero,xmm4[10],zero,zero,zero,xmm4[11],zero,zero,zero,xmm4[12],zero,zero,zero,xmm4[13],zero,zero,zero,xmm4[14],zero,zero,zero,xmm4[15],zero,zero,zero
	; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero			; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero
	; AVX512VL-NEXT: vpsllvd %zmm4, %zmm0, %zmm0			; AVX512VL-NEXT: vpsllvd %zmm4, %zmm0, %zmm0
	; AVX512VL-NEXT: vpandn %xmm3, %xmm2, %xmm2			; AVX512VL-NEXT: vpandn %xmm3, %xmm2, %xmm2
	; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm2 = xmm2[0],zero,zero,zero,xmm2[1],zero,zero,zero,xmm2[2],zero,zero,zero,xmm2[3],zero,zero,zero,xmm2[4],zero,zero,zero,xmm2[5],zero,zero,zero,xmm2[6],zero,zero,zero,xmm2[7],zero,zero,zero,xmm2[8],zero,zero,zero,xmm2[9],zero,zero,zero,xmm2[10],zero,zero,zero,xmm2[11],zero,zero,zero,xmm2[12],zero,zero,zero,xmm2[13],zero,zero,zero,xmm2[14],zero,zero,zero,xmm2[15],zero,zero,zero			; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm2 = xmm2[0],zero,zero,zero,xmm2[1],zero,zero,zero,xmm2[2],zero,zero,zero,xmm2[3],zero,zero,zero,xmm2[4],zero,zero,zero,xmm2[5],zero,zero,zero,xmm2[6],zero,zero,zero,xmm2[7],zero,zero,zero,xmm2[8],zero,zero,zero,xmm2[9],zero,zero,zero,xmm2[10],zero,zero,zero,xmm2[11],zero,zero,zero,xmm2[12],zero,zero,zero,xmm2[13],zero,zero,zero,xmm2[14],zero,zero,zero,xmm2[15],zero,zero,zero
	; AVX512VL-NEXT: vpsrlw $1, %xmm1, %xmm1			; AVX512VL-NEXT: vpsrlw $1, %xmm1, %xmm1
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero,xmm1[4],zero,zero,zero,xmm1[5],zero,zero,zero,xmm1[6],zero,zero,zero,xmm1[7],zero,zero,zero,xmm1[8],zero,zero,zero,xmm1[9],zero,zero,zero,xmm1[10],zero,zero,zero,xmm1[11],zero,zero,zero,xmm1[12],zero,zero,zero,xmm1[13],zero,zero,zero,xmm1[14],zero,zero,zero,xmm1[15],zero,zero,zero			; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero,xmm1[4],zero,zero,zero,xmm1[5],zero,zero,zero,xmm1[6],zero,zero,zero,xmm1[7],zero,zero,zero,xmm1[8],zero,zero,zero,xmm1[9],zero,zero,zero,xmm1[10],zero,zero,zero,xmm1[11],zero,zero,zero,xmm1[12],zero,zero,zero,xmm1[13],zero,zero,zero,xmm1[14],zero,zero,zero,xmm1[15],zero,zero,zero
	; AVX512VL-NEXT: vpsrlvd %zmm2, %zmm1, %zmm1			; AVX512VL-NEXT: vpsrlvd %zmm2, %zmm1, %zmm1
	; AVX512VL-NEXT: vpord %zmm1, %zmm0, %zmm0			; AVX512VL-NEXT: vpord %zmm1, %zmm0, %zmm0
	; AVX512VL-NEXT: vpmovdb %zmm0, %xmm0			; AVX512VL-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512VL-NEXT: vzeroupper			; AVX512VL-NEXT: vzeroupper
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v16i8:			; AVX512BW-LABEL: var_funnnel_v16i8:
	Show All 27 Lines
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v16i8:			; AVX512VLBW-LABEL: var_funnnel_v16i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero,xmm1[8],zero,xmm1[9],zero,xmm1[10],zero,xmm1[11],zero,xmm1[12],zero,xmm1[13],zero,xmm1[14],zero,xmm1[15],zero			; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero,xmm1[8],zero,xmm1[9],zero,xmm1[10],zero,xmm1[11],zero,xmm1[12],zero,xmm1[13],zero,xmm1[14],zero,xmm1[15],zero
	; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero			; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero
	; AVX512VLBW-NEXT: vpsllw $8, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpsllw $8, %ymm0, %ymm0
	; AVX512VLBW-NEXT: vpor %ymm1, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm2, %xmm1
	; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero,xmm1[8],zero,xmm1[9],zero,xmm1[10],zero,xmm1[11],zero,xmm1[12],zero,xmm1[13],zero,xmm1[14],zero,xmm1[15],zero			; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero,xmm1[8],zero,xmm1[9],zero,xmm1[10],zero,xmm1[11],zero,xmm1[12],zero,xmm1[13],zero,xmm1[14],zero,xmm1[15],zero
	; AVX512VLBW-NEXT: vpsllvw %ymm1, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpsllvw %ymm1, %ymm0, %ymm0
	; AVX512VLBW-NEXT: vpsrlw $8, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpsrlw $8, %ymm0, %ymm0
	; AVX512VLBW-NEXT: vpmovwb %ymm0, %xmm0			; AVX512VLBW-NEXT: vpmovwb %ymm0, %xmm0
	; AVX512VLBW-NEXT: vzeroupper			; AVX512VLBW-NEXT: vzeroupper
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: var_funnnel_v16i8:			; AVX512VLVBMI2-LABEL: var_funnnel_v16i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: # kill: def $xmm1 killed $xmm1 def $ymm1			; AVX512VLVBMI2-NEXT: # kill: def $xmm1 killed $xmm1 def $ymm1
	; AVX512VLVBMI2-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0			; AVX512VLVBMI2-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX512VLVBMI2-NEXT: vmovdqa {{.*#+}} ymm3 = [0,32,1,33,2,34,3,35,4,36,5,37,6,38,7,39,8,40,9,41,10,42,11,43,12,44,13,45,14,46,15,47]			; AVX512VLVBMI2-NEXT: vmovdqa {{.*#+}} ymm3 = [0,32,1,33,2,34,3,35,4,36,5,37,6,38,7,39,8,40,9,41,10,42,11,43,12,44,13,45,14,46,15,47]
	; AVX512VLVBMI2-NEXT: vpermi2b %ymm0, %ymm1, %ymm3			; AVX512VLVBMI2-NEXT: vpermi2b %ymm0, %ymm1, %ymm3
	; AVX512VLVBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm0			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm2, %xmm0
	; AVX512VLVBMI2-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero			; AVX512VLVBMI2-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero
	; AVX512VLVBMI2-NEXT: vpsllvw %ymm0, %ymm3, %ymm0			; AVX512VLVBMI2-NEXT: vpsllvw %ymm0, %ymm3, %ymm0
	; AVX512VLVBMI2-NEXT: vpsrlw $8, %ymm0, %ymm0			; AVX512VLVBMI2-NEXT: vpsrlw $8, %ymm0, %ymm0
	; AVX512VLVBMI2-NEXT: vpmovwb %ymm0, %xmm0			; AVX512VLVBMI2-NEXT: vpmovwb %ymm0, %xmm0
	; AVX512VLVBMI2-NEXT: vzeroupper			; AVX512VLVBMI2-NEXT: vzeroupper
	; AVX512VLVBMI2-NEXT: retq			; AVX512VLVBMI2-NEXT: retq
	;			;
	; XOP-LABEL: var_funnnel_v16i8:			; XOP-LABEL: var_funnnel_v16i8:
	▲ Show 20 Lines • Show All 1,545 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-fshl-256.ll

	Show First 20 Lines • Show All 379 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: var_funnnel_v16i16:			; AVX512VL-LABEL: var_funnnel_v16i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	; AVX512VL-NEXT: vpslld $16, %zmm0, %zmm0			; AVX512VL-NEXT: vpslld $16, %zmm0, %zmm0
	; AVX512VL-NEXT: vpord %zmm1, %zmm0, %zmm0			; AVX512VL-NEXT: vpord %zmm1, %zmm0, %zmm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm1			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm1
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512VL-NEXT: vpsllvd %zmm1, %zmm0, %zmm0			; AVX512VL-NEXT: vpsllvd %zmm1, %zmm0, %zmm0
	; AVX512VL-NEXT: vpsrld $16, %zmm0, %zmm0			; AVX512VL-NEXT: vpsrld $16, %zmm0, %zmm0
	; AVX512VL-NEXT: vpmovdw %zmm0, %ymm0			; AVX512VL-NEXT: vpmovdw %zmm0, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v16i16:			; AVX512BW-LABEL: var_funnnel_v16i16:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	▲ Show 20 Lines • Show All 212 Lines • ▼ Show 20 Lines
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm3 = [7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm3 = [7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]
	; AVX512VL-NEXT: vpandn %ymm3, %ymm2, %ymm4			; AVX512VL-NEXT: vpandn %ymm3, %ymm2, %ymm4
	; AVX512VL-NEXT: vpsllw $5, %ymm4, %ymm4			; AVX512VL-NEXT: vpsllw $5, %ymm4, %ymm4
	; AVX512VL-NEXT: vpsrlw $1, %ymm1, %ymm1			; AVX512VL-NEXT: vpsrlw $1, %ymm1, %ymm1
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm5 = [127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm5 = [127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127]
	; AVX512VL-NEXT: vpand %ymm5, %ymm1, %ymm1			; AVX512VL-NEXT: vpand %ymm5, %ymm1, %ymm1
	; AVX512VL-NEXT: vpsrlw $4, %ymm1, %ymm6			; AVX512VL-NEXT: vpsrlw $4, %ymm1, %ymm6
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm6, %ymm6			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm6, %ymm6
	; AVX512VL-NEXT: vpblendvb %ymm4, %ymm6, %ymm1, %ymm1			; AVX512VL-NEXT: vpblendvb %ymm4, %ymm6, %ymm1, %ymm1
	; AVX512VL-NEXT: vpsrlw $2, %ymm1, %ymm6			; AVX512VL-NEXT: vpsrlw $2, %ymm1, %ymm6
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm6, %ymm6			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm6, %ymm6
	; AVX512VL-NEXT: vpaddb %ymm4, %ymm4, %ymm4			; AVX512VL-NEXT: vpaddb %ymm4, %ymm4, %ymm4
	; AVX512VL-NEXT: vpblendvb %ymm4, %ymm6, %ymm1, %ymm1			; AVX512VL-NEXT: vpblendvb %ymm4, %ymm6, %ymm1, %ymm1
	; AVX512VL-NEXT: vpsrlw $1, %ymm1, %ymm6			; AVX512VL-NEXT: vpsrlw $1, %ymm1, %ymm6
	; AVX512VL-NEXT: vpand %ymm5, %ymm6, %ymm5			; AVX512VL-NEXT: vpand %ymm5, %ymm6, %ymm5
	; AVX512VL-NEXT: vpaddb %ymm4, %ymm4, %ymm4			; AVX512VL-NEXT: vpaddb %ymm4, %ymm4, %ymm4
	; AVX512VL-NEXT: vpblendvb %ymm4, %ymm5, %ymm1, %ymm1			; AVX512VL-NEXT: vpblendvb %ymm4, %ymm5, %ymm1, %ymm1
	; AVX512VL-NEXT: vpand %ymm3, %ymm2, %ymm2			; AVX512VL-NEXT: vpand %ymm3, %ymm2, %ymm2
	; AVX512VL-NEXT: vpsllw $5, %ymm2, %ymm2			; AVX512VL-NEXT: vpsllw $5, %ymm2, %ymm2
	; AVX512VL-NEXT: vpaddb %ymm2, %ymm2, %ymm3			; AVX512VL-NEXT: vpaddb %ymm2, %ymm2, %ymm3
	; AVX512VL-NEXT: vpsllw $4, %ymm0, %ymm4			; AVX512VL-NEXT: vpsllw $4, %ymm0, %ymm4
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm4, %ymm4			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm4, %ymm4
	; AVX512VL-NEXT: vpblendvb %ymm2, %ymm4, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm2, %ymm4, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsllw $2, %ymm0, %ymm2			; AVX512VL-NEXT: vpsllw $2, %ymm0, %ymm2
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
	; AVX512VL-NEXT: vpblendvb %ymm3, %ymm2, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm3, %ymm2, %ymm0, %ymm0
	; AVX512VL-NEXT: vpaddb %ymm0, %ymm0, %ymm2			; AVX512VL-NEXT: vpaddb %ymm0, %ymm0, %ymm2
	; AVX512VL-NEXT: vpaddb %ymm3, %ymm3, %ymm3			; AVX512VL-NEXT: vpaddb %ymm3, %ymm3, %ymm3
	; AVX512VL-NEXT: vpblendvb %ymm3, %ymm2, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm3, %ymm2, %ymm0, %ymm0
	; AVX512VL-NEXT: vpor %ymm1, %ymm0, %ymm0			; AVX512VL-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v32i8:			; AVX512BW-LABEL: var_funnnel_v32i8:
	Show All 23 Lines
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v32i8:			; AVX512VLBW-LABEL: var_funnnel_v32i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero			; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero
	; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero			; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
	; AVX512VLBW-NEXT: vpsllw $8, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsllw $8, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vporq %zmm1, %zmm0, %zmm0			; AVX512VLBW-NEXT: vporq %zmm1, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm1
	; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero			; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero
	; AVX512VLBW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpsrlw $8, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsrlw $8, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpmovwb %zmm0, %ymm0			; AVX512VLBW-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: var_funnnel_v32i8:			; AVX512VLVBMI2-LABEL: var_funnnel_v32i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: # kill: def $ymm1 killed $ymm1 def $zmm1			; AVX512VLVBMI2-NEXT: # kill: def $ymm1 killed $ymm1 def $zmm1
	; AVX512VLVBMI2-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0			; AVX512VLVBMI2-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
	; AVX512VLVBMI2-NEXT: vmovdqa64 {{.*#+}} zmm3 = [0,64,1,65,2,66,3,67,4,68,5,69,6,70,7,71,8,72,9,73,10,74,11,75,12,76,13,77,14,78,15,79,16,80,17,81,18,82,19,83,20,84,21,85,22,86,23,87,24,88,25,89,26,90,27,91,28,92,29,93,30,94,31,95]			; AVX512VLVBMI2-NEXT: vmovdqa64 {{.*#+}} zmm3 = [0,64,1,65,2,66,3,67,4,68,5,69,6,70,7,71,8,72,9,73,10,74,11,75,12,76,13,77,14,78,15,79,16,80,17,81,18,82,19,83,20,84,21,85,22,86,23,87,24,88,25,89,26,90,27,91,28,92,29,93,30,94,31,95]
	; AVX512VLVBMI2-NEXT: vpermi2b %zmm0, %zmm1, %zmm3			; AVX512VLVBMI2-NEXT: vpermi2b %zmm0, %zmm1, %zmm3
	; AVX512VLVBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm0			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm0
	; AVX512VLVBMI2-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero			; AVX512VLVBMI2-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
	; AVX512VLVBMI2-NEXT: vpsllvw %zmm0, %zmm3, %zmm0			; AVX512VLVBMI2-NEXT: vpsllvw %zmm0, %zmm3, %zmm0
	; AVX512VLVBMI2-NEXT: vpsrlw $8, %zmm0, %zmm0			; AVX512VLVBMI2-NEXT: vpsrlw $8, %zmm0, %zmm0
	; AVX512VLVBMI2-NEXT: vpmovwb %zmm0, %ymm0			; AVX512VLVBMI2-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512VLVBMI2-NEXT: retq			; AVX512VLVBMI2-NEXT: retq
	;			;
	; XOPAVX1-LABEL: var_funnnel_v32i8:			; XOPAVX1-LABEL: var_funnnel_v32i8:
	; XOPAVX1: # %bb.0:			; XOPAVX1: # %bb.0:
	▲ Show 20 Lines • Show All 1,645 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-fshl-512.ll

	Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines

	define <32 x i16> @var_funnnel_v32i16(<32 x i16> %x, <32 x i16> %y, <32 x i16> %amt) nounwind {			define <32 x i16> @var_funnnel_v32i16(<32 x i16> %x, <32 x i16> %y, <32 x i16> %amt) nounwind {
	; AVX512F-LABEL: var_funnnel_v32i16:			; AVX512F-LABEL: var_funnnel_v32i16:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm3 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm3 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	; AVX512F-NEXT: vpslld $16, %zmm4, %zmm4			; AVX512F-NEXT: vpslld $16, %zmm4, %zmm4
	; AVX512F-NEXT: vpord %zmm3, %zmm4, %zmm3			; AVX512F-NEXT: vpord %zmm3, %zmm4, %zmm3
	; AVX512F-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero
	; AVX512F-NEXT: vpsllvd %zmm4, %zmm3, %zmm3			; AVX512F-NEXT: vpsllvd %zmm4, %zmm3, %zmm3
	; AVX512F-NEXT: vpsrld $16, %zmm3, %zmm3			; AVX512F-NEXT: vpsrld $16, %zmm3, %zmm3
	; AVX512F-NEXT: vpmovdw %zmm3, %ymm3			; AVX512F-NEXT: vpmovdw %zmm3, %ymm3
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm1			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm1
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm0			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	; AVX512F-NEXT: vpslld $16, %zmm0, %zmm0			; AVX512F-NEXT: vpslld $16, %zmm0, %zmm0
	; AVX512F-NEXT: vpord %zmm1, %zmm0, %zmm0			; AVX512F-NEXT: vpord %zmm1, %zmm0, %zmm0
	; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm1			; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm1
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512F-NEXT: vpsllvd %zmm1, %zmm0, %zmm0			; AVX512F-NEXT: vpsllvd %zmm1, %zmm0, %zmm0
	; AVX512F-NEXT: vpsrld $16, %zmm0, %zmm0			; AVX512F-NEXT: vpsrld $16, %zmm0, %zmm0
	; AVX512F-NEXT: vpmovdw %zmm0, %ymm0			; AVX512F-NEXT: vpmovdw %zmm0, %ymm0
	; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm3, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm3, %zmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: var_funnnel_v32i16:			; AVX512VL-LABEL: var_funnnel_v32i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm3 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm3 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	; AVX512VL-NEXT: vpslld $16, %zmm4, %zmm4			; AVX512VL-NEXT: vpslld $16, %zmm4, %zmm4
	; AVX512VL-NEXT: vpord %zmm3, %zmm4, %zmm3			; AVX512VL-NEXT: vpord %zmm3, %zmm4, %zmm3
	; AVX512VL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero
	; AVX512VL-NEXT: vpsllvd %zmm4, %zmm3, %zmm3			; AVX512VL-NEXT: vpsllvd %zmm4, %zmm3, %zmm3
	; AVX512VL-NEXT: vpsrld $16, %zmm3, %zmm3			; AVX512VL-NEXT: vpsrld $16, %zmm3, %zmm3
	; AVX512VL-NEXT: vpmovdw %zmm3, %ymm3			; AVX512VL-NEXT: vpmovdw %zmm3, %ymm3
	; AVX512VL-NEXT: vextracti64x4 $1, %zmm1, %ymm1			; AVX512VL-NEXT: vextracti64x4 $1, %zmm1, %ymm1
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm0			; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm3			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm3
	; AVX512F-NEXT: vpsrlw $1, %ymm3, %ymm3			; AVX512F-NEXT: vpsrlw $1, %ymm3, %ymm3
	; AVX512F-NEXT: vmovdqa {{.*#+}} ymm4 = [127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127]			; AVX512F-NEXT: vmovdqa {{.*#+}} ymm4 = [127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127]
	; AVX512F-NEXT: vpand %ymm4, %ymm3, %ymm5			; AVX512F-NEXT: vpand %ymm4, %ymm3, %ymm5
	; AVX512F-NEXT: vpsrlw $4, %ymm5, %ymm3			; AVX512F-NEXT: vpsrlw $4, %ymm5, %ymm3
	; AVX512F-NEXT: vmovdqa {{.*#+}} ymm6 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]			; AVX512F-NEXT: vmovdqa {{.*#+}} ymm6 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]
	; AVX512F-NEXT: vpand %ymm6, %ymm3, %ymm7			; AVX512F-NEXT: vpand %ymm6, %ymm3, %ymm7
	; AVX512F-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm3			; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm3
	; AVX512F-NEXT: vmovdqa {{.*#+}} ymm8 = [7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]			; AVX512F-NEXT: vmovdqa {{.*#+}} ymm8 = [7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]
	; AVX512F-NEXT: vpxor %ymm3, %ymm8, %ymm9			; AVX512F-NEXT: vpxor %ymm3, %ymm8, %ymm9
	; AVX512F-NEXT: vpsllw $5, %ymm9, %ymm9			; AVX512F-NEXT: vpsllw $5, %ymm9, %ymm9
	; AVX512F-NEXT: vpblendvb %ymm9, %ymm7, %ymm5, %ymm5			; AVX512F-NEXT: vpblendvb %ymm9, %ymm7, %ymm5, %ymm5
	; AVX512F-NEXT: vpsrlw $2, %ymm5, %ymm7			; AVX512F-NEXT: vpsrlw $2, %ymm5, %ymm7
	; AVX512F-NEXT: vmovdqa {{.*#+}} ymm10 = [63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63]			; AVX512F-NEXT: vmovdqa {{.*#+}} ymm10 = [63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63]
	; AVX512F-NEXT: vpand %ymm7, %ymm10, %ymm7			; AVX512F-NEXT: vpand %ymm7, %ymm10, %ymm7
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vextracti64x4 $1, %zmm1, %ymm3			; AVX512VL-NEXT: vextracti64x4 $1, %zmm1, %ymm3
	; AVX512VL-NEXT: vpsrlw $1, %ymm3, %ymm3			; AVX512VL-NEXT: vpsrlw $1, %ymm3, %ymm3
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm4 = [127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm4 = [127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127]
	; AVX512VL-NEXT: vpand %ymm4, %ymm3, %ymm5			; AVX512VL-NEXT: vpand %ymm4, %ymm3, %ymm5
	; AVX512VL-NEXT: vpsrlw $4, %ymm5, %ymm3			; AVX512VL-NEXT: vpsrlw $4, %ymm5, %ymm3
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm6 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm6 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]
	; AVX512VL-NEXT: vpand %ymm6, %ymm3, %ymm7			; AVX512VL-NEXT: vpand %ymm6, %ymm3, %ymm7
	; AVX512VL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512VL-NEXT: vextracti64x4 $1, %zmm2, %ymm3			; AVX512VL-NEXT: vextracti64x4 $1, %zmm2, %ymm3
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm8 = [7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm8 = [7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]
	; AVX512VL-NEXT: vpxor %ymm3, %ymm8, %ymm9			; AVX512VL-NEXT: vpxor %ymm3, %ymm8, %ymm9
	; AVX512VL-NEXT: vpsllw $5, %ymm9, %ymm9			; AVX512VL-NEXT: vpsllw $5, %ymm9, %ymm9
	; AVX512VL-NEXT: vpblendvb %ymm9, %ymm7, %ymm5, %ymm5			; AVX512VL-NEXT: vpblendvb %ymm9, %ymm7, %ymm5, %ymm5
	; AVX512VL-NEXT: vpsrlw $2, %ymm5, %ymm7			; AVX512VL-NEXT: vpsrlw $2, %ymm5, %ymm7
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm10 = [63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm10 = [63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63]
	; AVX512VL-NEXT: vpand %ymm7, %ymm10, %ymm7			; AVX512VL-NEXT: vpand %ymm7, %ymm10, %ymm7
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; AVX512VL-NEXT: vpblendvb %ymm2, %ymm4, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm2, %ymm4, %ymm0, %ymm0
	; AVX512VL-NEXT: vinserti64x4 $1, %ymm3, %zmm0, %zmm0			; AVX512VL-NEXT: vinserti64x4 $1, %ymm3, %zmm0, %zmm0
	; AVX512VL-NEXT: vporq %zmm1, %zmm0, %zmm0			; AVX512VL-NEXT: vporq %zmm1, %zmm0, %zmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v64i8:			; AVX512BW-LABEL: var_funnnel_v64i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512BW-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512BW-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]
	; AVX512BW-NEXT: vpsllvw %zmm5, %zmm3, %zmm3			; AVX512BW-NEXT: vpsllvw %zmm5, %zmm3, %zmm3
	; AVX512BW-NEXT: vpsrlw $8, %zmm3, %zmm3			; AVX512BW-NEXT: vpsrlw $8, %zmm3, %zmm3
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]
	; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpsrlw $8, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlw $8, %zmm0, %zmm0
	; AVX512BW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0			; AVX512BW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: var_funnnel_v64i8:			; AVX512VBMI2-LABEL: var_funnnel_v64i8:
	; AVX512VBMI2: # %bb.0:			; AVX512VBMI2: # %bb.0:
	; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]			; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]
	; AVX512VBMI2-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512VBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512VBMI2-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VBMI2-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]			; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]
	; AVX512VBMI2-NEXT: vpsllvw %zmm5, %zmm3, %zmm3			; AVX512VBMI2-NEXT: vpsllvw %zmm5, %zmm3, %zmm3
	; AVX512VBMI2-NEXT: vpsrlw $8, %zmm3, %zmm3			; AVX512VBMI2-NEXT: vpsrlw $8, %zmm3, %zmm3
	; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]			; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]
	; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]			; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]
	; AVX512VBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpsrlw $8, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpsrlw $8, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpackuswb %zmm3, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpackuswb %zmm3, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v64i8:			; AVX512VLBW-LABEL: var_funnnel_v64i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]
	; AVX512VLBW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512VLBW-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VLBW-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]
	; AVX512VLBW-NEXT: vpsllvw %zmm5, %zmm3, %zmm3			; AVX512VLBW-NEXT: vpsllvw %zmm5, %zmm3, %zmm3
	; AVX512VLBW-NEXT: vpsrlw $8, %zmm3, %zmm3			; AVX512VLBW-NEXT: vpsrlw $8, %zmm3, %zmm3
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]
	; AVX512VLBW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpsrlw $8, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsrlw $8, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: var_funnnel_v64i8:			; AVX512VLVBMI2-LABEL: var_funnnel_v64i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]
	; AVX512VLVBMI2-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512VLVBMI2-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VLVBMI2-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]
	; AVX512VLVBMI2-NEXT: vpsllvw %zmm5, %zmm3, %zmm3			; AVX512VLVBMI2-NEXT: vpsllvw %zmm5, %zmm3, %zmm3
	; AVX512VLVBMI2-NEXT: vpsrlw $8, %zmm3, %zmm3			; AVX512VLVBMI2-NEXT: vpsrlw $8, %zmm3, %zmm3
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]
	; AVX512VLVBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512VLVBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512VLVBMI2-NEXT: vpsrlw $8, %zmm0, %zmm0			; AVX512VLVBMI2-NEXT: vpsrlw $8, %zmm0, %zmm0
	▲ Show 20 Lines • Show All 757 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-fshl-rot-128.ll

	Show First 20 Lines • Show All 390 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero
	; AVX512F-NEXT: vpsllvd %xmm1, %xmm0, %xmm0			; AVX512F-NEXT: vpsllvd %xmm1, %xmm0, %xmm0
	; AVX512F-NEXT: vpsrld $16, %xmm0, %xmm0			; AVX512F-NEXT: vpsrld $16, %xmm0, %xmm0
	; AVX512F-NEXT: vpackusdw %xmm2, %xmm0, %xmm0			; AVX512F-NEXT: vpackusdw %xmm2, %xmm0, %xmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: var_funnnel_v8i16:			; AVX512VL-LABEL: var_funnnel_v8i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VL-NEXT: vpunpckhwd {{.*#+}} xmm2 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]			; AVX512VL-NEXT: vpunpckhwd {{.*#+}} xmm2 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
	; AVX512VL-NEXT: vpunpckhwd {{.*#+}} xmm3 = xmm0[4,4,5,5,6,6,7,7]			; AVX512VL-NEXT: vpunpckhwd {{.*#+}} xmm3 = xmm0[4,4,5,5,6,6,7,7]
	; AVX512VL-NEXT: vpsllvd %xmm2, %xmm3, %xmm2			; AVX512VL-NEXT: vpsllvd %xmm2, %xmm3, %xmm2
	; AVX512VL-NEXT: vpsrld $16, %xmm2, %xmm2			; AVX512VL-NEXT: vpsrld $16, %xmm2, %xmm2
	; AVX512VL-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]			; AVX512VL-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero
	; AVX512VL-NEXT: vpsllvd %xmm1, %xmm0, %xmm0			; AVX512VL-NEXT: vpsllvd %xmm1, %xmm0, %xmm0
	Show All 10 Lines
	; AVX512BW-NEXT: vpsubw %xmm1, %xmm3, %xmm1			; AVX512BW-NEXT: vpsubw %xmm1, %xmm3, %xmm1
	; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpor %xmm0, %xmm2, %xmm0			; AVX512BW-NEXT: vpor %xmm0, %xmm2, %xmm0
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v8i16:			; AVX512VLBW-LABEL: var_funnnel_v8i16:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpsllvw %xmm1, %xmm0, %xmm2			; AVX512VLBW-NEXT: vpsllvw %xmm1, %xmm0, %xmm2
	; AVX512VLBW-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]			; AVX512VLBW-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]
	; AVX512VLBW-NEXT: vpsubw %xmm1, %xmm3, %xmm1			; AVX512VLBW-NEXT: vpsubw %xmm1, %xmm3, %xmm1
	; AVX512VLBW-NEXT: vpsrlvw %xmm1, %xmm0, %xmm0			; AVX512VLBW-NEXT: vpsrlvw %xmm1, %xmm0, %xmm0
	; AVX512VLBW-NEXT: vpor %xmm0, %xmm2, %xmm0			; AVX512VLBW-NEXT: vpor %xmm0, %xmm2, %xmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: var_funnnel_v8i16:			; AVX512VBMI2-LABEL: var_funnnel_v8i16:
	▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: var_funnnel_v16i8:			; AVX512VL-LABEL: var_funnnel_v16i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero			; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero
	; AVX512VL-NEXT: vpslld $8, %zmm0, %zmm2			; AVX512VL-NEXT: vpslld $8, %zmm0, %zmm2
	; AVX512VL-NEXT: vpord %zmm2, %zmm0, %zmm0			; AVX512VL-NEXT: vpord %zmm2, %zmm0, %zmm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero,xmm1[4],zero,zero,zero,xmm1[5],zero,zero,zero,xmm1[6],zero,zero,zero,xmm1[7],zero,zero,zero,xmm1[8],zero,zero,zero,xmm1[9],zero,zero,zero,xmm1[10],zero,zero,zero,xmm1[11],zero,zero,zero,xmm1[12],zero,zero,zero,xmm1[13],zero,zero,zero,xmm1[14],zero,zero,zero,xmm1[15],zero,zero,zero			; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero,xmm1[4],zero,zero,zero,xmm1[5],zero,zero,zero,xmm1[6],zero,zero,zero,xmm1[7],zero,zero,zero,xmm1[8],zero,zero,zero,xmm1[9],zero,zero,zero,xmm1[10],zero,zero,zero,xmm1[11],zero,zero,zero,xmm1[12],zero,zero,zero,xmm1[13],zero,zero,zero,xmm1[14],zero,zero,zero,xmm1[15],zero,zero,zero
	; AVX512VL-NEXT: vpsllvd %zmm1, %zmm0, %zmm0			; AVX512VL-NEXT: vpsllvd %zmm1, %zmm0, %zmm0
	; AVX512VL-NEXT: vpsrld $8, %zmm0, %zmm0			; AVX512VL-NEXT: vpsrld $8, %zmm0, %zmm0
	; AVX512VL-NEXT: vpmovdb %zmm0, %xmm0			; AVX512VL-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512VL-NEXT: vzeroupper			; AVX512VL-NEXT: vzeroupper
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v16i8:			; AVX512BW-LABEL: var_funnnel_v16i8:
	Show All 9 Lines
	; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpsrlw $8, %xmm0, %xmm0			; AVX512BW-NEXT: vpsrlw $8, %xmm0, %xmm0
	; AVX512BW-NEXT: vpackuswb %xmm2, %xmm0, %xmm0			; AVX512BW-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v16i8:			; AVX512VLBW-LABEL: var_funnnel_v16i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} xmm3 = xmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} xmm3 = xmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]
	; AVX512VLBW-NEXT: vpsllvw %xmm2, %xmm3, %xmm2			; AVX512VLBW-NEXT: vpsllvw %xmm2, %xmm3, %xmm2
	; AVX512VLBW-NEXT: vpsrlw $8, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpsrlw $8, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
	; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero			; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero
	; AVX512VLBW-NEXT: vpsllvw %xmm1, %xmm0, %xmm0			; AVX512VLBW-NEXT: vpsllvw %xmm1, %xmm0, %xmm0
	Show All 14 Lines
	; AVX512VBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpsrlw $8, %xmm0, %xmm0			; AVX512VBMI2-NEXT: vpsrlw $8, %xmm0, %xmm0
	; AVX512VBMI2-NEXT: vpackuswb %xmm2, %xmm0, %xmm0			; AVX512VBMI2-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
	; AVX512VBMI2-NEXT: vzeroupper			; AVX512VBMI2-NEXT: vzeroupper
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: var_funnnel_v16i8:			; AVX512VLVBMI2-LABEL: var_funnnel_v16i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} xmm3 = xmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} xmm3 = xmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]
	; AVX512VLVBMI2-NEXT: vpsllvw %xmm2, %xmm3, %xmm2			; AVX512VLVBMI2-NEXT: vpsllvw %xmm2, %xmm3, %xmm2
	; AVX512VLVBMI2-NEXT: vpsrlw $8, %xmm2, %xmm2			; AVX512VLVBMI2-NEXT: vpsrlw $8, %xmm2, %xmm2
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
	; AVX512VLVBMI2-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero			; AVX512VLVBMI2-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero
	; AVX512VLVBMI2-NEXT: vpsllvw %xmm1, %xmm0, %xmm0			; AVX512VLVBMI2-NEXT: vpsllvw %xmm1, %xmm0, %xmm0
	▲ Show 20 Lines • Show All 1,234 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-fshl-rot-256.ll

	Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vpunpcklwd {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,8,8,9,9,10,10,11,11]			; AVX512F-NEXT: vpunpcklwd {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,8,8,9,9,10,10,11,11]
	; AVX512F-NEXT: vpsllvd %ymm1, %ymm0, %ymm0			; AVX512F-NEXT: vpsllvd %ymm1, %ymm0, %ymm0
	; AVX512F-NEXT: vpsrld $16, %ymm0, %ymm0			; AVX512F-NEXT: vpsrld $16, %ymm0, %ymm0
	; AVX512F-NEXT: vpackusdw %ymm3, %ymm0, %ymm0			; AVX512F-NEXT: vpackusdw %ymm3, %ymm0, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: var_funnnel_v16i16:			; AVX512VL-LABEL: var_funnnel_v16i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VL-NEXT: vpunpckhwd {{.*#+}} ymm3 = ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15]			; AVX512VL-NEXT: vpunpckhwd {{.*#+}} ymm3 = ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15]
	; AVX512VL-NEXT: vpunpckhwd {{.*#+}} ymm4 = ymm0[4,4,5,5,6,6,7,7,12,12,13,13,14,14,15,15]			; AVX512VL-NEXT: vpunpckhwd {{.*#+}} ymm4 = ymm0[4,4,5,5,6,6,7,7,12,12,13,13,14,14,15,15]
	; AVX512VL-NEXT: vpsllvd %ymm3, %ymm4, %ymm3			; AVX512VL-NEXT: vpsllvd %ymm3, %ymm4, %ymm3
	; AVX512VL-NEXT: vpsrld $16, %ymm3, %ymm3			; AVX512VL-NEXT: vpsrld $16, %ymm3, %ymm3
	; AVX512VL-NEXT: vpunpcklwd {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11]			; AVX512VL-NEXT: vpunpcklwd {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11]
	; AVX512VL-NEXT: vpunpcklwd {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,8,8,9,9,10,10,11,11]			; AVX512VL-NEXT: vpunpcklwd {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,8,8,9,9,10,10,11,11]
	; AVX512VL-NEXT: vpsllvd %ymm1, %ymm0, %ymm0			; AVX512VL-NEXT: vpsllvd %ymm1, %ymm0, %ymm0
	Show All 9 Lines
	; AVX512BW-NEXT: vmovdqa {{.*#+}} ymm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512BW-NEXT: vmovdqa {{.*#+}} ymm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512BW-NEXT: vpsubw %ymm1, %ymm3, %ymm1			; AVX512BW-NEXT: vpsubw %ymm1, %ymm3, %ymm1
	; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpor %ymm0, %ymm2, %ymm0			; AVX512BW-NEXT: vpor %ymm0, %ymm2, %ymm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v16i16:			; AVX512VLBW-LABEL: var_funnnel_v16i16:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpsllvw %ymm1, %ymm0, %ymm2			; AVX512VLBW-NEXT: vpsllvw %ymm1, %ymm0, %ymm2
	; AVX512VLBW-NEXT: vmovdqa {{.*#+}} ymm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512VLBW-NEXT: vmovdqa {{.*#+}} ymm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512VLBW-NEXT: vpsubw %ymm1, %ymm3, %ymm1			; AVX512VLBW-NEXT: vpsubw %ymm1, %ymm3, %ymm1
	; AVX512VLBW-NEXT: vpsrlvw %ymm1, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpsrlvw %ymm1, %ymm0, %ymm0
	; AVX512VLBW-NEXT: vpor %ymm0, %ymm2, %ymm0			; AVX512VLBW-NEXT: vpor %ymm0, %ymm2, %ymm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: var_funnnel_v16i16:			; AVX512VBMI2-LABEL: var_funnnel_v16i16:
	▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	; AVX512VL-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsllw $2, %ymm0, %ymm2			; AVX512VL-NEXT: vpsllw $2, %ymm0, %ymm2
	; AVX512VL-NEXT: vpsrlw $6, %ymm0, %ymm3			; AVX512VL-NEXT: vpsrlw $6, %ymm0, %ymm3
	; AVX512VL-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm2, %ymm3			; AVX512VL-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm2, %ymm3
	; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1			; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1
	; AVX512VL-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsrlw $7, %ymm0, %ymm2			; AVX512VL-NEXT: vpsrlw $7, %ymm0, %ymm2
	; AVX512VL-NEXT: vpaddb %ymm0, %ymm0, %ymm3			; AVX512VL-NEXT: vpaddb %ymm0, %ymm0, %ymm3
	; AVX512VL-NEXT: vpternlogq $248, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm3			; AVX512VL-NEXT: vpternlogd $248, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm3
	; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1			; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1
	; AVX512VL-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v32i8:			; AVX512BW-LABEL: var_funnnel_v32i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} ymm2 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} ymm2 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]
	; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1
	; AVX512BW-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512BW-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm1[8],ymm3[8],ymm1[9],ymm3[9],ymm1[10],ymm3[10],ymm1[11],ymm3[11],ymm1[12],ymm3[12],ymm1[13],ymm3[13],ymm1[14],ymm3[14],ymm1[15],ymm3[15],ymm1[24],ymm3[24],ymm1[25],ymm3[25],ymm1[26],ymm3[26],ymm1[27],ymm3[27],ymm1[28],ymm3[28],ymm1[29],ymm3[29],ymm1[30],ymm3[30],ymm1[31],ymm3[31]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm1[8],ymm3[8],ymm1[9],ymm3[9],ymm1[10],ymm3[10],ymm1[11],ymm3[11],ymm1[12],ymm3[12],ymm1[13],ymm3[13],ymm1[14],ymm3[14],ymm1[15],ymm3[15],ymm1[24],ymm3[24],ymm1[25],ymm3[25],ymm1[26],ymm3[26],ymm1[27],ymm3[27],ymm1[28],ymm3[28],ymm1[29],ymm3[29],ymm1[30],ymm3[30],ymm1[31],ymm3[31]
	; AVX512BW-NEXT: vpsllvw %zmm4, %zmm2, %zmm2			; AVX512BW-NEXT: vpsllvw %zmm4, %zmm2, %zmm2
	; AVX512BW-NEXT: vpsrlw $8, %ymm2, %ymm2			; AVX512BW-NEXT: vpsrlw $8, %ymm2, %ymm2
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm3[0],ymm1[1],ymm3[1],ymm1[2],ymm3[2],ymm1[3],ymm3[3],ymm1[4],ymm3[4],ymm1[5],ymm3[5],ymm1[6],ymm3[6],ymm1[7],ymm3[7],ymm1[16],ymm3[16],ymm1[17],ymm3[17],ymm1[18],ymm3[18],ymm1[19],ymm3[19],ymm1[20],ymm3[20],ymm1[21],ymm3[21],ymm1[22],ymm3[22],ymm1[23],ymm3[23]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm3[0],ymm1[1],ymm3[1],ymm1[2],ymm3[2],ymm1[3],ymm3[3],ymm1[4],ymm3[4],ymm1[5],ymm3[5],ymm1[6],ymm3[6],ymm1[7],ymm3[7],ymm1[16],ymm3[16],ymm1[17],ymm3[17],ymm1[18],ymm3[18],ymm1[19],ymm3[19],ymm1[20],ymm3[20],ymm1[21],ymm3[21],ymm1[22],ymm3[22],ymm1[23],ymm3[23]
	; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpsrlw $8, %ymm0, %ymm0			; AVX512BW-NEXT: vpsrlw $8, %ymm0, %ymm0
	; AVX512BW-NEXT: vpackuswb %ymm2, %ymm0, %ymm0			; AVX512BW-NEXT: vpackuswb %ymm2, %ymm0, %ymm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v32i8:			; AVX512VLBW-LABEL: var_funnnel_v32i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} ymm3 = ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15],ymm1[24],ymm2[24],ymm1[25],ymm2[25],ymm1[26],ymm2[26],ymm1[27],ymm2[27],ymm1[28],ymm2[28],ymm1[29],ymm2[29],ymm1[30],ymm2[30],ymm1[31],ymm2[31]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} ymm3 = ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15],ymm1[24],ymm2[24],ymm1[25],ymm2[25],ymm1[26],ymm2[26],ymm1[27],ymm2[27],ymm1[28],ymm2[28],ymm1[29],ymm2[29],ymm1[30],ymm2[30],ymm1[31],ymm2[31]
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]
	; AVX512VLBW-NEXT: vpsllvw %ymm3, %ymm4, %ymm3			; AVX512VLBW-NEXT: vpsllvw %ymm3, %ymm4, %ymm3
	; AVX512VLBW-NEXT: vpsrlw $8, %ymm3, %ymm3			; AVX512VLBW-NEXT: vpsrlw $8, %ymm3, %ymm3
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[16],ymm2[16],ymm1[17],ymm2[17],ymm1[18],ymm2[18],ymm1[19],ymm2[19],ymm1[20],ymm2[20],ymm1[21],ymm2[21],ymm1[22],ymm2[22],ymm1[23],ymm2[23]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[16],ymm2[16],ymm1[17],ymm2[17],ymm1[18],ymm2[18],ymm1[19],ymm2[19],ymm1[20],ymm2[20],ymm1[21],ymm2[21],ymm1[22],ymm2[22],ymm1[23],ymm2[23]
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]
	; AVX512VLBW-NEXT: vpsllvw %ymm1, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpsllvw %ymm1, %ymm0, %ymm0
	Show All 13 Lines
	; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm3[0],ymm1[1],ymm3[1],ymm1[2],ymm3[2],ymm1[3],ymm3[3],ymm1[4],ymm3[4],ymm1[5],ymm3[5],ymm1[6],ymm3[6],ymm1[7],ymm3[7],ymm1[16],ymm3[16],ymm1[17],ymm3[17],ymm1[18],ymm3[18],ymm1[19],ymm3[19],ymm1[20],ymm3[20],ymm1[21],ymm3[21],ymm1[22],ymm3[22],ymm1[23],ymm3[23]			; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm3[0],ymm1[1],ymm3[1],ymm1[2],ymm3[2],ymm1[3],ymm3[3],ymm1[4],ymm3[4],ymm1[5],ymm3[5],ymm1[6],ymm3[6],ymm1[7],ymm3[7],ymm1[16],ymm3[16],ymm1[17],ymm3[17],ymm1[18],ymm3[18],ymm1[19],ymm3[19],ymm1[20],ymm3[20],ymm1[21],ymm3[21],ymm1[22],ymm3[22],ymm1[23],ymm3[23]
	; AVX512VBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpsrlw $8, %ymm0, %ymm0			; AVX512VBMI2-NEXT: vpsrlw $8, %ymm0, %ymm0
	; AVX512VBMI2-NEXT: vpackuswb %ymm2, %ymm0, %ymm0			; AVX512VBMI2-NEXT: vpackuswb %ymm2, %ymm0, %ymm0
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: var_funnnel_v32i8:			; AVX512VLVBMI2-LABEL: var_funnnel_v32i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} ymm3 = ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15],ymm1[24],ymm2[24],ymm1[25],ymm2[25],ymm1[26],ymm2[26],ymm1[27],ymm2[27],ymm1[28],ymm2[28],ymm1[29],ymm2[29],ymm1[30],ymm2[30],ymm1[31],ymm2[31]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} ymm3 = ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15],ymm1[24],ymm2[24],ymm1[25],ymm2[25],ymm1[26],ymm2[26],ymm1[27],ymm2[27],ymm1[28],ymm2[28],ymm1[29],ymm2[29],ymm1[30],ymm2[30],ymm1[31],ymm2[31]
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]
	; AVX512VLVBMI2-NEXT: vpsllvw %ymm3, %ymm4, %ymm3			; AVX512VLVBMI2-NEXT: vpsllvw %ymm3, %ymm4, %ymm3
	; AVX512VLVBMI2-NEXT: vpsrlw $8, %ymm3, %ymm3			; AVX512VLVBMI2-NEXT: vpsrlw $8, %ymm3, %ymm3
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[16],ymm2[16],ymm1[17],ymm2[17],ymm1[18],ymm2[18],ymm1[19],ymm2[19],ymm1[20],ymm2[20],ymm1[21],ymm2[21],ymm1[22],ymm2[22],ymm1[23],ymm2[23]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[16],ymm2[16],ymm1[17],ymm2[17],ymm1[18],ymm2[18],ymm1[19],ymm2[19],ymm1[20],ymm2[20],ymm1[21],ymm2[21],ymm1[22],ymm2[22],ymm1[23],ymm2[23]
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]
	; AVX512VLVBMI2-NEXT: vpsllvw %ymm1, %ymm0, %ymm0			; AVX512VLVBMI2-NEXT: vpsllvw %ymm1, %ymm0, %ymm0
	▲ Show 20 Lines • Show All 1,168 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-fshl-rot-512.ll

	Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	; AVX512VL-NEXT: vpsllvd %ymm1, %ymm0, %ymm0			; AVX512VL-NEXT: vpsllvd %ymm1, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsrld $16, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrld $16, %ymm0, %ymm0
	; AVX512VL-NEXT: vpackusdw %ymm3, %ymm0, %ymm0			; AVX512VL-NEXT: vpackusdw %ymm3, %ymm0, %ymm0
	; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v32i16:			; AVX512BW-LABEL: var_funnnel_v32i16:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm2			; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm2
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512BW-NEXT: vpsubw %zmm1, %zmm3, %zmm1			; AVX512BW-NEXT: vpsubw %zmm1, %zmm3, %zmm1
	; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vporq %zmm0, %zmm2, %zmm0			; AVX512BW-NEXT: vporq %zmm0, %zmm2, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v32i16:			; AVX512VLBW-LABEL: var_funnnel_v32i16:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512VLBW-NEXT: vpsllvw %zmm1, %zmm0, %zmm2			; AVX512VLBW-NEXT: vpsllvw %zmm1, %zmm0, %zmm2
	; AVX512VLBW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512VLBW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512VLBW-NEXT: vpsubw %zmm1, %zmm3, %zmm1			; AVX512VLBW-NEXT: vpsubw %zmm1, %zmm3, %zmm1
	; AVX512VLBW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vporq %zmm0, %zmm2, %zmm0			; AVX512VLBW-NEXT: vporq %zmm0, %zmm2, %zmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: var_funnnel_v32i16:			; AVX512VBMI2-LABEL: var_funnnel_v32i16:
	▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	; AVX512VL-NEXT: vpternlogq $248, %ymm8, %ymm3, %ymm4			; AVX512VL-NEXT: vpternlogq $248, %ymm8, %ymm3, %ymm4
	; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1			; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1
	; AVX512VL-NEXT: vpblendvb %ymm1, %ymm4, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm1, %ymm4, %ymm0, %ymm0
	; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v64i8:			; AVX512BW-LABEL: var_funnnel_v64i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]
	; AVX512BW-NEXT: vpsllvw %zmm3, %zmm4, %zmm3			; AVX512BW-NEXT: vpsllvw %zmm3, %zmm4, %zmm3
	; AVX512BW-NEXT: vpsrlw $8, %zmm3, %zmm3			; AVX512BW-NEXT: vpsrlw $8, %zmm3, %zmm3
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]
	; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpsrlw $8, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlw $8, %zmm0, %zmm0
	; AVX512BW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0			; AVX512BW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v64i8:			; AVX512VLBW-LABEL: var_funnnel_v64i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]
	; AVX512VLBW-NEXT: vpsllvw %zmm3, %zmm4, %zmm3			; AVX512VLBW-NEXT: vpsllvw %zmm3, %zmm4, %zmm3
	; AVX512VLBW-NEXT: vpsrlw $8, %zmm3, %zmm3			; AVX512VLBW-NEXT: vpsrlw $8, %zmm3, %zmm3
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]
	; AVX512VLBW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpsrlw $8, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsrlw $8, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: var_funnnel_v64i8:			; AVX512VBMI2-LABEL: var_funnnel_v64i8:
	; AVX512VBMI2: # %bb.0:			; AVX512VBMI2: # %bb.0:
	; AVX512VBMI2-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512VBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512VBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]			; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]
	; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]			; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]
	; AVX512VBMI2-NEXT: vpsllvw %zmm3, %zmm4, %zmm3			; AVX512VBMI2-NEXT: vpsllvw %zmm3, %zmm4, %zmm3
	; AVX512VBMI2-NEXT: vpsrlw $8, %zmm3, %zmm3			; AVX512VBMI2-NEXT: vpsrlw $8, %zmm3, %zmm3
	; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]			; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]
	; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]			; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]
	; AVX512VBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpsrlw $8, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpsrlw $8, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpackuswb %zmm3, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpackuswb %zmm3, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: var_funnnel_v64i8:			; AVX512VLVBMI2-LABEL: var_funnnel_v64i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]
	; AVX512VLVBMI2-NEXT: vpsllvw %zmm3, %zmm4, %zmm3			; AVX512VLVBMI2-NEXT: vpsllvw %zmm3, %zmm4, %zmm3
	; AVX512VLVBMI2-NEXT: vpsrlw $8, %zmm3, %zmm3			; AVX512VLVBMI2-NEXT: vpsrlw $8, %zmm3, %zmm3
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]
	; AVX512VLVBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512VLVBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	▲ Show 20 Lines • Show All 502 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-fshr-128.ll

	Show First 20 Lines • Show All 588 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: var_funnnel_v8i16:			; AVX512VL-LABEL: var_funnnel_v8i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX512VL-NEXT: vpslld $16, %ymm0, %ymm0			; AVX512VL-NEXT: vpslld $16, %ymm0, %ymm0
	; AVX512VL-NEXT: vpblendw {{.*#+}} ymm0 = ymm1[0],ymm0[1],ymm1[2],ymm0[3],ymm1[4],ymm0[5],ymm1[6],ymm0[7],ymm1[8],ymm0[9],ymm1[10],ymm0[11],ymm1[12],ymm0[13],ymm1[14],ymm0[15]			; AVX512VL-NEXT: vpblendw {{.*#+}} ymm0 = ymm1[0],ymm0[1],ymm1[2],ymm0[3],ymm1[4],ymm0[5],ymm1[6],ymm0[7],ymm1[8],ymm0[9],ymm1[10],ymm0[11],ymm1[12],ymm0[13],ymm1[14],ymm0[15]
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm1			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm2, %xmm1
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero
	; AVX512VL-NEXT: vpsrlvd %ymm1, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrlvd %ymm1, %ymm0, %ymm0
	; AVX512VL-NEXT: vpmovdw %ymm0, %xmm0			; AVX512VL-NEXT: vpmovdw %ymm0, %xmm0
	; AVX512VL-NEXT: vzeroupper			; AVX512VL-NEXT: vzeroupper
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v8i16:			; AVX512BW-LABEL: var_funnnel_v8i16:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	▲ Show 20 Lines • Show All 314 Lines • ▼ Show 20 Lines
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v16i8:			; AVX512VLBW-LABEL: var_funnnel_v16i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero,xmm1[8],zero,xmm1[9],zero,xmm1[10],zero,xmm1[11],zero,xmm1[12],zero,xmm1[13],zero,xmm1[14],zero,xmm1[15],zero			; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero,xmm1[8],zero,xmm1[9],zero,xmm1[10],zero,xmm1[11],zero,xmm1[12],zero,xmm1[13],zero,xmm1[14],zero,xmm1[15],zero
	; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero			; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero
	; AVX512VLBW-NEXT: vpsllw $8, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpsllw $8, %ymm0, %ymm0
	; AVX512VLBW-NEXT: vpor %ymm1, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm2, %xmm1
	; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero,xmm1[8],zero,xmm1[9],zero,xmm1[10],zero,xmm1[11],zero,xmm1[12],zero,xmm1[13],zero,xmm1[14],zero,xmm1[15],zero			; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero,xmm1[8],zero,xmm1[9],zero,xmm1[10],zero,xmm1[11],zero,xmm1[12],zero,xmm1[13],zero,xmm1[14],zero,xmm1[15],zero
	; AVX512VLBW-NEXT: vpsrlvw %ymm1, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpsrlvw %ymm1, %ymm0, %ymm0
	; AVX512VLBW-NEXT: vpmovwb %ymm0, %xmm0			; AVX512VLBW-NEXT: vpmovwb %ymm0, %xmm0
	; AVX512VLBW-NEXT: vzeroupper			; AVX512VLBW-NEXT: vzeroupper
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: var_funnnel_v16i8:			; AVX512VLVBMI2-LABEL: var_funnnel_v16i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: # kill: def $xmm1 killed $xmm1 def $ymm1			; AVX512VLVBMI2-NEXT: # kill: def $xmm1 killed $xmm1 def $ymm1
	; AVX512VLVBMI2-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0			; AVX512VLVBMI2-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX512VLVBMI2-NEXT: vmovdqa {{.*#+}} ymm3 = [0,32,1,33,2,34,3,35,4,36,5,37,6,38,7,39,8,40,9,41,10,42,11,43,12,44,13,45,14,46,15,47]			; AVX512VLVBMI2-NEXT: vmovdqa {{.*#+}} ymm3 = [0,32,1,33,2,34,3,35,4,36,5,37,6,38,7,39,8,40,9,41,10,42,11,43,12,44,13,45,14,46,15,47]
	; AVX512VLVBMI2-NEXT: vpermi2b %ymm0, %ymm1, %ymm3			; AVX512VLVBMI2-NEXT: vpermi2b %ymm0, %ymm1, %ymm3
	; AVX512VLVBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm0			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm2, %xmm0
	; AVX512VLVBMI2-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero			; AVX512VLVBMI2-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero
	; AVX512VLVBMI2-NEXT: vpsrlvw %ymm0, %ymm3, %ymm0			; AVX512VLVBMI2-NEXT: vpsrlvw %ymm0, %ymm3, %ymm0
	; AVX512VLVBMI2-NEXT: vpmovwb %ymm0, %xmm0			; AVX512VLVBMI2-NEXT: vpmovwb %ymm0, %xmm0
	; AVX512VLVBMI2-NEXT: vzeroupper			; AVX512VLVBMI2-NEXT: vzeroupper
	; AVX512VLVBMI2-NEXT: retq			; AVX512VLVBMI2-NEXT: retq
	;			;
	; XOP-LABEL: var_funnnel_v16i8:			; XOP-LABEL: var_funnnel_v16i8:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	▲ Show 20 Lines • Show All 1,419 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-fshr-256.ll

	Show First 20 Lines • Show All 408 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: var_funnnel_v16i16:			; AVX512VL-LABEL: var_funnnel_v16i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	; AVX512VL-NEXT: vpslld $16, %zmm0, %zmm0			; AVX512VL-NEXT: vpslld $16, %zmm0, %zmm0
	; AVX512VL-NEXT: vpord %zmm1, %zmm0, %zmm0			; AVX512VL-NEXT: vpord %zmm1, %zmm0, %zmm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm1			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm1
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512VL-NEXT: vpsrlvd %zmm1, %zmm0, %zmm0			; AVX512VL-NEXT: vpsrlvd %zmm1, %zmm0, %zmm0
	; AVX512VL-NEXT: vpmovdw %zmm0, %ymm0			; AVX512VL-NEXT: vpmovdw %zmm0, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v16i16:			; AVX512BW-LABEL: var_funnnel_v16i16:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: # kill: def $ymm1 killed $ymm1 def $zmm1			; AVX512BW-NEXT: # kill: def $ymm1 killed $ymm1 def $zmm1
	▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines
	;			;
	; AVX512VL-LABEL: var_funnnel_v32i8:			; AVX512VL-LABEL: var_funnnel_v32i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm3 = [7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm3 = [7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]
	; AVX512VL-NEXT: vpand %ymm3, %ymm2, %ymm4			; AVX512VL-NEXT: vpand %ymm3, %ymm2, %ymm4
	; AVX512VL-NEXT: vpsllw $5, %ymm4, %ymm4			; AVX512VL-NEXT: vpsllw $5, %ymm4, %ymm4
	; AVX512VL-NEXT: vpaddb %ymm4, %ymm4, %ymm5			; AVX512VL-NEXT: vpaddb %ymm4, %ymm4, %ymm5
	; AVX512VL-NEXT: vpsrlw $4, %ymm1, %ymm6			; AVX512VL-NEXT: vpsrlw $4, %ymm1, %ymm6
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm6, %ymm6			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm6, %ymm6
	; AVX512VL-NEXT: vpblendvb %ymm4, %ymm6, %ymm1, %ymm1			; AVX512VL-NEXT: vpblendvb %ymm4, %ymm6, %ymm1, %ymm1
	; AVX512VL-NEXT: vpsrlw $2, %ymm1, %ymm4			; AVX512VL-NEXT: vpsrlw $2, %ymm1, %ymm4
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm4, %ymm4			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm4, %ymm4
	; AVX512VL-NEXT: vpblendvb %ymm5, %ymm4, %ymm1, %ymm1			; AVX512VL-NEXT: vpblendvb %ymm5, %ymm4, %ymm1, %ymm1
	; AVX512VL-NEXT: vpsrlw $1, %ymm1, %ymm4			; AVX512VL-NEXT: vpsrlw $1, %ymm1, %ymm4
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm4, %ymm4			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm4, %ymm4
	; AVX512VL-NEXT: vpaddb %ymm5, %ymm5, %ymm5			; AVX512VL-NEXT: vpaddb %ymm5, %ymm5, %ymm5
	; AVX512VL-NEXT: vpblendvb %ymm5, %ymm4, %ymm1, %ymm1			; AVX512VL-NEXT: vpblendvb %ymm5, %ymm4, %ymm1, %ymm1
	; AVX512VL-NEXT: vpandn %ymm3, %ymm2, %ymm2			; AVX512VL-NEXT: vpandn %ymm3, %ymm2, %ymm2
	; AVX512VL-NEXT: vpsllw $5, %ymm2, %ymm2			; AVX512VL-NEXT: vpsllw $5, %ymm2, %ymm2
	; AVX512VL-NEXT: vpaddb %ymm2, %ymm2, %ymm3			; AVX512VL-NEXT: vpaddb %ymm2, %ymm2, %ymm3
	; AVX512VL-NEXT: vpaddb %ymm0, %ymm0, %ymm0			; AVX512VL-NEXT: vpaddb %ymm0, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsllw $4, %ymm0, %ymm4			; AVX512VL-NEXT: vpsllw $4, %ymm0, %ymm4
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm4, %ymm4			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm4, %ymm4
	; AVX512VL-NEXT: vpblendvb %ymm2, %ymm4, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm2, %ymm4, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsllw $2, %ymm0, %ymm2			; AVX512VL-NEXT: vpsllw $2, %ymm0, %ymm2
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
	; AVX512VL-NEXT: vpblendvb %ymm3, %ymm2, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm3, %ymm2, %ymm0, %ymm0
	; AVX512VL-NEXT: vpaddb %ymm0, %ymm0, %ymm2			; AVX512VL-NEXT: vpaddb %ymm0, %ymm0, %ymm2
	; AVX512VL-NEXT: vpaddb %ymm3, %ymm3, %ymm3			; AVX512VL-NEXT: vpaddb %ymm3, %ymm3, %ymm3
	; AVX512VL-NEXT: vpblendvb %ymm3, %ymm2, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm3, %ymm2, %ymm0, %ymm0
	; AVX512VL-NEXT: vpor %ymm1, %ymm0, %ymm0			; AVX512VL-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v32i8:			; AVX512BW-LABEL: var_funnnel_v32i8:
	Show All 21 Lines
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v32i8:			; AVX512VLBW-LABEL: var_funnnel_v32i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero			; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero
	; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero			; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
	; AVX512VLBW-NEXT: vpsllw $8, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsllw $8, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vporq %zmm1, %zmm0, %zmm0			; AVX512VLBW-NEXT: vporq %zmm1, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm1
	; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero			; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero
	; AVX512VLBW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpmovwb %zmm0, %ymm0			; AVX512VLBW-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: var_funnnel_v32i8:			; AVX512VLVBMI2-LABEL: var_funnnel_v32i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: # kill: def $ymm1 killed $ymm1 def $zmm1			; AVX512VLVBMI2-NEXT: # kill: def $ymm1 killed $ymm1 def $zmm1
	; AVX512VLVBMI2-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0			; AVX512VLVBMI2-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
	; AVX512VLVBMI2-NEXT: vmovdqa64 {{.*#+}} zmm3 = [0,64,1,65,2,66,3,67,4,68,5,69,6,70,7,71,8,72,9,73,10,74,11,75,12,76,13,77,14,78,15,79,16,80,17,81,18,82,19,83,20,84,21,85,22,86,23,87,24,88,25,89,26,90,27,91,28,92,29,93,30,94,31,95]			; AVX512VLVBMI2-NEXT: vmovdqa64 {{.*#+}} zmm3 = [0,64,1,65,2,66,3,67,4,68,5,69,6,70,7,71,8,72,9,73,10,74,11,75,12,76,13,77,14,78,15,79,16,80,17,81,18,82,19,83,20,84,21,85,22,86,23,87,24,88,25,89,26,90,27,91,28,92,29,93,30,94,31,95]
	; AVX512VLVBMI2-NEXT: vpermi2b %zmm0, %zmm1, %zmm3			; AVX512VLVBMI2-NEXT: vpermi2b %zmm0, %zmm1, %zmm3
	; AVX512VLVBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm0			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm0
	; AVX512VLVBMI2-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero			; AVX512VLVBMI2-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
	; AVX512VLVBMI2-NEXT: vpsrlvw %zmm0, %zmm3, %zmm0			; AVX512VLVBMI2-NEXT: vpsrlvw %zmm0, %zmm3, %zmm0
	; AVX512VLVBMI2-NEXT: vpmovwb %zmm0, %ymm0			; AVX512VLVBMI2-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512VLVBMI2-NEXT: retq			; AVX512VLVBMI2-NEXT: retq
	;			;
	; XOPAVX1-LABEL: var_funnnel_v32i8:			; XOPAVX1-LABEL: var_funnnel_v32i8:
	; XOPAVX1: # %bb.0:			; XOPAVX1: # %bb.0:
	; XOPAVX1-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2			; XOPAVX1-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2
	▲ Show 20 Lines • Show All 1,420 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-fshr-512.ll

	Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines

	define <32 x i16> @var_funnnel_v32i16(<32 x i16> %x, <32 x i16> %y, <32 x i16> %amt) nounwind {			define <32 x i16> @var_funnnel_v32i16(<32 x i16> %x, <32 x i16> %y, <32 x i16> %amt) nounwind {
	; AVX512F-LABEL: var_funnnel_v32i16:			; AVX512F-LABEL: var_funnnel_v32i16:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm3 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm3 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	; AVX512F-NEXT: vpslld $16, %zmm4, %zmm4			; AVX512F-NEXT: vpslld $16, %zmm4, %zmm4
	; AVX512F-NEXT: vpord %zmm3, %zmm4, %zmm3			; AVX512F-NEXT: vpord %zmm3, %zmm4, %zmm3
	; AVX512F-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero
	; AVX512F-NEXT: vpsrlvd %zmm4, %zmm3, %zmm3			; AVX512F-NEXT: vpsrlvd %zmm4, %zmm3, %zmm3
	; AVX512F-NEXT: vpmovdw %zmm3, %ymm3			; AVX512F-NEXT: vpmovdw %zmm3, %ymm3
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm1			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm1
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm0			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	; AVX512F-NEXT: vpslld $16, %zmm0, %zmm0			; AVX512F-NEXT: vpslld $16, %zmm0, %zmm0
	; AVX512F-NEXT: vpord %zmm1, %zmm0, %zmm0			; AVX512F-NEXT: vpord %zmm1, %zmm0, %zmm0
	; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm1			; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm1
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512F-NEXT: vpsrlvd %zmm1, %zmm0, %zmm0			; AVX512F-NEXT: vpsrlvd %zmm1, %zmm0, %zmm0
	; AVX512F-NEXT: vpmovdw %zmm0, %ymm0			; AVX512F-NEXT: vpmovdw %zmm0, %ymm0
	; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm3, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm3, %zmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: var_funnnel_v32i16:			; AVX512VL-LABEL: var_funnnel_v32i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm3 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm3 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	; AVX512VL-NEXT: vpslld $16, %zmm4, %zmm4			; AVX512VL-NEXT: vpslld $16, %zmm4, %zmm4
	; AVX512VL-NEXT: vpord %zmm3, %zmm4, %zmm3			; AVX512VL-NEXT: vpord %zmm3, %zmm4, %zmm3
	; AVX512VL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm4 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero
	; AVX512VL-NEXT: vpsrlvd %zmm4, %zmm3, %zmm3			; AVX512VL-NEXT: vpsrlvd %zmm4, %zmm3, %zmm3
	; AVX512VL-NEXT: vpmovdw %zmm3, %ymm3			; AVX512VL-NEXT: vpmovdw %zmm3, %ymm3
	; AVX512VL-NEXT: vextracti64x4 $1, %zmm1, %ymm1			; AVX512VL-NEXT: vextracti64x4 $1, %zmm1, %ymm1
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm0			; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	; AVX512VL-NEXT: vpslld $16, %zmm0, %zmm0			; AVX512VL-NEXT: vpslld $16, %zmm0, %zmm0
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	define <64 x i8> @var_funnnel_v64i8(<64 x i8> %x, <64 x i8> %y, <64 x i8> %amt) nounwind {			define <64 x i8> @var_funnnel_v64i8(<64 x i8> %x, <64 x i8> %y, <64 x i8> %amt) nounwind {
	; AVX512F-LABEL: var_funnnel_v64i8:			; AVX512F-LABEL: var_funnnel_v64i8:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm3			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm3
	; AVX512F-NEXT: vpaddb %ymm3, %ymm3, %ymm4			; AVX512F-NEXT: vpaddb %ymm3, %ymm3, %ymm4
	; AVX512F-NEXT: vpsllw $4, %ymm4, %ymm3			; AVX512F-NEXT: vpsllw $4, %ymm4, %ymm3
	; AVX512F-NEXT: vmovdqa {{.*#+}} ymm5 = [240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240]			; AVX512F-NEXT: vmovdqa {{.*#+}} ymm5 = [240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240]
	; AVX512F-NEXT: vpand %ymm5, %ymm3, %ymm6			; AVX512F-NEXT: vpand %ymm5, %ymm3, %ymm6
	; AVX512F-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm3			; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm3
	; AVX512F-NEXT: vmovdqa {{.*#+}} ymm7 = [7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]			; AVX512F-NEXT: vmovdqa {{.*#+}} ymm7 = [7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]
	; AVX512F-NEXT: vpxor %ymm7, %ymm3, %ymm8			; AVX512F-NEXT: vpxor %ymm7, %ymm3, %ymm8
	; AVX512F-NEXT: vpsllw $5, %ymm8, %ymm8			; AVX512F-NEXT: vpsllw $5, %ymm8, %ymm8
	; AVX512F-NEXT: vpblendvb %ymm8, %ymm6, %ymm4, %ymm4			; AVX512F-NEXT: vpblendvb %ymm8, %ymm6, %ymm4, %ymm4
	; AVX512F-NEXT: vpsllw $2, %ymm4, %ymm6			; AVX512F-NEXT: vpsllw $2, %ymm4, %ymm6
	; AVX512F-NEXT: vmovdqa {{.*#+}} ymm9 = [252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252]			; AVX512F-NEXT: vmovdqa {{.*#+}} ymm9 = [252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252]
	; AVX512F-NEXT: vpand %ymm6, %ymm9, %ymm6			; AVX512F-NEXT: vpand %ymm6, %ymm9, %ymm6
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	;			;
	; AVX512VL-LABEL: var_funnnel_v64i8:			; AVX512VL-LABEL: var_funnnel_v64i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm3			; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm3
	; AVX512VL-NEXT: vpaddb %ymm3, %ymm3, %ymm4			; AVX512VL-NEXT: vpaddb %ymm3, %ymm3, %ymm4
	; AVX512VL-NEXT: vpsllw $4, %ymm4, %ymm3			; AVX512VL-NEXT: vpsllw $4, %ymm4, %ymm3
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm5 = [240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm5 = [240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240]
	; AVX512VL-NEXT: vpand %ymm5, %ymm3, %ymm6			; AVX512VL-NEXT: vpand %ymm5, %ymm3, %ymm6
	; AVX512VL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512VL-NEXT: vextracti64x4 $1, %zmm2, %ymm3			; AVX512VL-NEXT: vextracti64x4 $1, %zmm2, %ymm3
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm7 = [7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm7 = [7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]
	; AVX512VL-NEXT: vpxor %ymm7, %ymm3, %ymm8			; AVX512VL-NEXT: vpxor %ymm7, %ymm3, %ymm8
	; AVX512VL-NEXT: vpsllw $5, %ymm8, %ymm8			; AVX512VL-NEXT: vpsllw $5, %ymm8, %ymm8
	; AVX512VL-NEXT: vpblendvb %ymm8, %ymm6, %ymm4, %ymm4			; AVX512VL-NEXT: vpblendvb %ymm8, %ymm6, %ymm4, %ymm4
	; AVX512VL-NEXT: vpsllw $2, %ymm4, %ymm6			; AVX512VL-NEXT: vpsllw $2, %ymm4, %ymm6
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm9 = [252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm9 = [252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252,252]
	; AVX512VL-NEXT: vpand %ymm6, %ymm9, %ymm6			; AVX512VL-NEXT: vpand %ymm6, %ymm9, %ymm6
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; AVX512VL-NEXT: vpblendvb %ymm2, %ymm4, %ymm1, %ymm1			; AVX512VL-NEXT: vpblendvb %ymm2, %ymm4, %ymm1, %ymm1
	; AVX512VL-NEXT: vinserti64x4 $1, %ymm3, %zmm1, %zmm1			; AVX512VL-NEXT: vinserti64x4 $1, %ymm3, %zmm1, %zmm1
	; AVX512VL-NEXT: vporq %zmm1, %zmm0, %zmm0			; AVX512VL-NEXT: vporq %zmm1, %zmm0, %zmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v64i8:			; AVX512BW-LABEL: var_funnnel_v64i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512BW-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512BW-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]
	; AVX512BW-NEXT: vpsrlvw %zmm5, %zmm3, %zmm3			; AVX512BW-NEXT: vpsrlvw %zmm5, %zmm3, %zmm3
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm5 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm5 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]
	; AVX512BW-NEXT: vpandq %zmm5, %zmm3, %zmm3			; AVX512BW-NEXT: vpandq %zmm5, %zmm3, %zmm3
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]
	; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpandq %zmm5, %zmm0, %zmm0			; AVX512BW-NEXT: vpandq %zmm5, %zmm0, %zmm0
	; AVX512BW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0			; AVX512BW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: var_funnnel_v64i8:			; AVX512VBMI2-LABEL: var_funnnel_v64i8:
	; AVX512VBMI2: # %bb.0:			; AVX512VBMI2: # %bb.0:
	; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]			; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]
	; AVX512VBMI2-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512VBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512VBMI2-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VBMI2-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]			; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]
	; AVX512VBMI2-NEXT: vpsrlvw %zmm5, %zmm3, %zmm3			; AVX512VBMI2-NEXT: vpsrlvw %zmm5, %zmm3, %zmm3
	; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]			; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]
	; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]			; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]
	; AVX512VBMI2-NEXT: vpsrlvw %zmm1, %zmm0, %zmm1			; AVX512VBMI2-NEXT: vpsrlvw %zmm1, %zmm0, %zmm1
	; AVX512VBMI2-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,2,4,6,8,10,12,14,64,66,68,70,72,74,76,78,16,18,20,22,24,26,28,30,80,82,84,86,88,90,92,94,32,34,36,38,40,42,44,46,96,98,100,102,104,106,108,110,48,50,52,54,56,58,60,62,112,114,116,118,120,122,124,126]			; AVX512VBMI2-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,2,4,6,8,10,12,14,64,66,68,70,72,74,76,78,16,18,20,22,24,26,28,30,80,82,84,86,88,90,92,94,32,34,36,38,40,42,44,46,96,98,100,102,104,106,108,110,48,50,52,54,56,58,60,62,112,114,116,118,120,122,124,126]
	; AVX512VBMI2-NEXT: vpermi2b %zmm3, %zmm1, %zmm0			; AVX512VBMI2-NEXT: vpermi2b %zmm3, %zmm1, %zmm0
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v64i8:			; AVX512VLBW-LABEL: var_funnnel_v64i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]
	; AVX512VLBW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512VLBW-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VLBW-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]
	; AVX512VLBW-NEXT: vpsrlvw %zmm5, %zmm3, %zmm3			; AVX512VLBW-NEXT: vpsrlvw %zmm5, %zmm3, %zmm3
	; AVX512VLBW-NEXT: vmovdqa64 {{.*#+}} zmm5 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]			; AVX512VLBW-NEXT: vmovdqa64 {{.*#+}} zmm5 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]
	; AVX512VLBW-NEXT: vpandq %zmm5, %zmm3, %zmm3			; AVX512VLBW-NEXT: vpandq %zmm5, %zmm3, %zmm3
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]
	; AVX512VLBW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpandq %zmm5, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpandq %zmm5, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: var_funnnel_v64i8:			; AVX512VLVBMI2-LABEL: var_funnnel_v64i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm0[8],zmm1[9],zmm0[9],zmm1[10],zmm0[10],zmm1[11],zmm0[11],zmm1[12],zmm0[12],zmm1[13],zmm0[13],zmm1[14],zmm0[14],zmm1[15],zmm0[15],zmm1[24],zmm0[24],zmm1[25],zmm0[25],zmm1[26],zmm0[26],zmm1[27],zmm0[27],zmm1[28],zmm0[28],zmm1[29],zmm0[29],zmm1[30],zmm0[30],zmm1[31],zmm0[31],zmm1[40],zmm0[40],zmm1[41],zmm0[41],zmm1[42],zmm0[42],zmm1[43],zmm0[43],zmm1[44],zmm0[44],zmm1[45],zmm0[45],zmm1[46],zmm0[46],zmm1[47],zmm0[47],zmm1[56],zmm0[56],zmm1[57],zmm0[57],zmm1[58],zmm0[58],zmm1[59],zmm0[59],zmm1[60],zmm0[60],zmm1[61],zmm0[61],zmm1[62],zmm0[62],zmm1[63],zmm0[63]
	; AVX512VLVBMI2-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512VLVBMI2-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VLVBMI2-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm5 = zmm2[8],zmm4[8],zmm2[9],zmm4[9],zmm2[10],zmm4[10],zmm2[11],zmm4[11],zmm2[12],zmm4[12],zmm2[13],zmm4[13],zmm2[14],zmm4[14],zmm2[15],zmm4[15],zmm2[24],zmm4[24],zmm2[25],zmm4[25],zmm2[26],zmm4[26],zmm2[27],zmm4[27],zmm2[28],zmm4[28],zmm2[29],zmm4[29],zmm2[30],zmm4[30],zmm2[31],zmm4[31],zmm2[40],zmm4[40],zmm2[41],zmm4[41],zmm2[42],zmm4[42],zmm2[43],zmm4[43],zmm2[44],zmm4[44],zmm2[45],zmm4[45],zmm2[46],zmm4[46],zmm2[47],zmm4[47],zmm2[56],zmm4[56],zmm2[57],zmm4[57],zmm2[58],zmm4[58],zmm2[59],zmm4[59],zmm2[60],zmm4[60],zmm2[61],zmm4[61],zmm2[62],zmm4[62],zmm2[63],zmm4[63]
	; AVX512VLVBMI2-NEXT: vpsrlvw %zmm5, %zmm3, %zmm3			; AVX512VLVBMI2-NEXT: vpsrlvw %zmm5, %zmm3, %zmm3
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm2[0],zmm4[0],zmm2[1],zmm4[1],zmm2[2],zmm4[2],zmm2[3],zmm4[3],zmm2[4],zmm4[4],zmm2[5],zmm4[5],zmm2[6],zmm4[6],zmm2[7],zmm4[7],zmm2[16],zmm4[16],zmm2[17],zmm4[17],zmm2[18],zmm4[18],zmm2[19],zmm4[19],zmm2[20],zmm4[20],zmm2[21],zmm4[21],zmm2[22],zmm4[22],zmm2[23],zmm4[23],zmm2[32],zmm4[32],zmm2[33],zmm4[33],zmm2[34],zmm4[34],zmm2[35],zmm4[35],zmm2[36],zmm4[36],zmm2[37],zmm4[37],zmm2[38],zmm4[38],zmm2[39],zmm4[39],zmm2[48],zmm4[48],zmm2[49],zmm4[49],zmm2[50],zmm4[50],zmm2[51],zmm4[51],zmm2[52],zmm4[52],zmm2[53],zmm4[53],zmm2[54],zmm4[54],zmm2[55],zmm4[55]
	; AVX512VLVBMI2-NEXT: vpsrlvw %zmm1, %zmm0, %zmm1			; AVX512VLVBMI2-NEXT: vpsrlvw %zmm1, %zmm0, %zmm1
	; AVX512VLVBMI2-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,2,4,6,8,10,12,14,64,66,68,70,72,74,76,78,16,18,20,22,24,26,28,30,80,82,84,86,88,90,92,94,32,34,36,38,40,42,44,46,96,98,100,102,104,106,108,110,48,50,52,54,56,58,60,62,112,114,116,118,120,122,124,126]			; AVX512VLVBMI2-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,2,4,6,8,10,12,14,64,66,68,70,72,74,76,78,16,18,20,22,24,26,28,30,80,82,84,86,88,90,92,94,32,34,36,38,40,42,44,46,96,98,100,102,104,106,108,110,48,50,52,54,56,58,60,62,112,114,116,118,120,122,124,126]
	; AVX512VLVBMI2-NEXT: vpermi2b %zmm3, %zmm1, %zmm0			; AVX512VLVBMI2-NEXT: vpermi2b %zmm3, %zmm1, %zmm0
	▲ Show 20 Lines • Show All 814 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-fshr-rot-128.ll

	Show First 20 Lines • Show All 408 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero
	; AVX512F-NEXT: vpsrlvd %xmm1, %xmm0, %xmm0			; AVX512F-NEXT: vpsrlvd %xmm1, %xmm0, %xmm0
	; AVX512F-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],xmm2[1],xmm0[2],xmm2[3],xmm0[4],xmm2[5],xmm0[6],xmm2[7]			; AVX512F-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],xmm2[1],xmm0[2],xmm2[3],xmm0[4],xmm2[5],xmm0[6],xmm2[7]
	; AVX512F-NEXT: vpackusdw %xmm3, %xmm0, %xmm0			; AVX512F-NEXT: vpackusdw %xmm3, %xmm0, %xmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: var_funnnel_v8i16:			; AVX512VL-LABEL: var_funnnel_v8i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VL-NEXT: vpunpckhwd {{.*#+}} xmm2 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]			; AVX512VL-NEXT: vpunpckhwd {{.*#+}} xmm2 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
	; AVX512VL-NEXT: vpunpckhwd {{.*#+}} xmm3 = xmm0[4,4,5,5,6,6,7,7]			; AVX512VL-NEXT: vpunpckhwd {{.*#+}} xmm3 = xmm0[4,4,5,5,6,6,7,7]
	; AVX512VL-NEXT: vpsrlvd %xmm2, %xmm3, %xmm2			; AVX512VL-NEXT: vpsrlvd %xmm2, %xmm3, %xmm2
	; AVX512VL-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]			; AVX512VL-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero
	; AVX512VL-NEXT: vpsrlvd %xmm1, %xmm0, %xmm0			; AVX512VL-NEXT: vpsrlvd %xmm1, %xmm0, %xmm0
	; AVX512VL-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; AVX512VL-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	Show All 10 Lines
	; AVX512BW-NEXT: vpsubw %xmm1, %xmm3, %xmm1			; AVX512BW-NEXT: vpsubw %xmm1, %xmm3, %xmm1
	; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpor %xmm0, %xmm2, %xmm0			; AVX512BW-NEXT: vpor %xmm0, %xmm2, %xmm0
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v8i16:			; AVX512VLBW-LABEL: var_funnnel_v8i16:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpsrlvw %xmm1, %xmm0, %xmm2			; AVX512VLBW-NEXT: vpsrlvw %xmm1, %xmm0, %xmm2
	; AVX512VLBW-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]			; AVX512VLBW-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]
	; AVX512VLBW-NEXT: vpsubw %xmm1, %xmm3, %xmm1			; AVX512VLBW-NEXT: vpsubw %xmm1, %xmm3, %xmm1
	; AVX512VLBW-NEXT: vpsllvw %xmm1, %xmm0, %xmm0			; AVX512VLBW-NEXT: vpsllvw %xmm1, %xmm0, %xmm0
	; AVX512VLBW-NEXT: vpor %xmm0, %xmm2, %xmm0			; AVX512VLBW-NEXT: vpor %xmm0, %xmm2, %xmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: var_funnnel_v8i16:			; AVX512VBMI2-LABEL: var_funnnel_v8i16:
	▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: var_funnnel_v16i8:			; AVX512VL-LABEL: var_funnnel_v16i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero			; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero
	; AVX512VL-NEXT: vpslld $8, %zmm0, %zmm2			; AVX512VL-NEXT: vpslld $8, %zmm0, %zmm2
	; AVX512VL-NEXT: vpord %zmm2, %zmm0, %zmm0			; AVX512VL-NEXT: vpord %zmm2, %zmm0, %zmm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero,xmm1[4],zero,zero,zero,xmm1[5],zero,zero,zero,xmm1[6],zero,zero,zero,xmm1[7],zero,zero,zero,xmm1[8],zero,zero,zero,xmm1[9],zero,zero,zero,xmm1[10],zero,zero,zero,xmm1[11],zero,zero,zero,xmm1[12],zero,zero,zero,xmm1[13],zero,zero,zero,xmm1[14],zero,zero,zero,xmm1[15],zero,zero,zero			; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero,xmm1[4],zero,zero,zero,xmm1[5],zero,zero,zero,xmm1[6],zero,zero,zero,xmm1[7],zero,zero,zero,xmm1[8],zero,zero,zero,xmm1[9],zero,zero,zero,xmm1[10],zero,zero,zero,xmm1[11],zero,zero,zero,xmm1[12],zero,zero,zero,xmm1[13],zero,zero,zero,xmm1[14],zero,zero,zero,xmm1[15],zero,zero,zero
	; AVX512VL-NEXT: vpsrlvd %zmm1, %zmm0, %zmm0			; AVX512VL-NEXT: vpsrlvd %zmm1, %zmm0, %zmm0
	; AVX512VL-NEXT: vpmovdb %zmm0, %xmm0			; AVX512VL-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512VL-NEXT: vzeroupper			; AVX512VL-NEXT: vzeroupper
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v16i8:			; AVX512BW-LABEL: var_funnnel_v16i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	Show All 9 Lines
	; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpand %xmm3, %xmm0, %xmm0			; AVX512BW-NEXT: vpand %xmm3, %xmm0, %xmm0
	; AVX512BW-NEXT: vpackuswb %xmm2, %xmm0, %xmm0			; AVX512BW-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v16i8:			; AVX512VLBW-LABEL: var_funnnel_v16i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} xmm3 = xmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} xmm3 = xmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]
	; AVX512VLBW-NEXT: vpsrlvw %xmm2, %xmm3, %xmm2			; AVX512VLBW-NEXT: vpsrlvw %xmm2, %xmm3, %xmm2
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
	; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero			; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero
	; AVX512VLBW-NEXT: vpsrlvw %xmm1, %xmm0, %xmm0			; AVX512VLBW-NEXT: vpsrlvw %xmm1, %xmm0, %xmm0
	; AVX512VLBW-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; AVX512VLBW-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	Show All 14 Lines
	; AVX512VBMI2-NEXT: vmovdqa {{.*#+}} xmm0 = [0,2,4,6,8,10,12,14,64,66,68,70,72,74,76,78]			; AVX512VBMI2-NEXT: vmovdqa {{.*#+}} xmm0 = [0,2,4,6,8,10,12,14,64,66,68,70,72,74,76,78]
	; AVX512VBMI2-NEXT: vpermi2b %zmm2, %zmm1, %zmm0			; AVX512VBMI2-NEXT: vpermi2b %zmm2, %zmm1, %zmm0
	; AVX512VBMI2-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0			; AVX512VBMI2-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0
	; AVX512VBMI2-NEXT: vzeroupper			; AVX512VBMI2-NEXT: vzeroupper
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: var_funnnel_v16i8:			; AVX512VLVBMI2-LABEL: var_funnnel_v16i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} xmm3 = xmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} xmm3 = xmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]
	; AVX512VLVBMI2-NEXT: vpsrlvw %xmm2, %xmm3, %xmm2			; AVX512VLVBMI2-NEXT: vpsrlvw %xmm2, %xmm3, %xmm2
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
	; AVX512VLVBMI2-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero			; AVX512VLVBMI2-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero
	; AVX512VLVBMI2-NEXT: vpsrlvw %xmm1, %xmm0, %xmm0			; AVX512VLVBMI2-NEXT: vpsrlvw %xmm1, %xmm0, %xmm0
	; AVX512VLVBMI2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; AVX512VLVBMI2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	▲ Show 20 Lines • Show All 1,262 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-fshr-rot-256.ll

	Show First 20 Lines • Show All 310 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vpunpcklwd {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,8,8,9,9,10,10,11,11]			; AVX512F-NEXT: vpunpcklwd {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,8,8,9,9,10,10,11,11]
	; AVX512F-NEXT: vpsrlvd %ymm1, %ymm0, %ymm0			; AVX512F-NEXT: vpsrlvd %ymm1, %ymm0, %ymm0
	; AVX512F-NEXT: vpblendw {{.*#+}} ymm0 = ymm0[0],ymm2[1],ymm0[2],ymm2[3],ymm0[4],ymm2[5],ymm0[6],ymm2[7],ymm0[8],ymm2[9],ymm0[10],ymm2[11],ymm0[12],ymm2[13],ymm0[14],ymm2[15]			; AVX512F-NEXT: vpblendw {{.*#+}} ymm0 = ymm0[0],ymm2[1],ymm0[2],ymm2[3],ymm0[4],ymm2[5],ymm0[6],ymm2[7],ymm0[8],ymm2[9],ymm0[10],ymm2[11],ymm0[12],ymm2[13],ymm0[14],ymm2[15]
	; AVX512F-NEXT: vpackusdw %ymm3, %ymm0, %ymm0			; AVX512F-NEXT: vpackusdw %ymm3, %ymm0, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: var_funnnel_v16i16:			; AVX512VL-LABEL: var_funnnel_v16i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VL-NEXT: vpunpckhwd {{.*#+}} ymm3 = ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15]			; AVX512VL-NEXT: vpunpckhwd {{.*#+}} ymm3 = ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15]
	; AVX512VL-NEXT: vpunpckhwd {{.*#+}} ymm4 = ymm0[4,4,5,5,6,6,7,7,12,12,13,13,14,14,15,15]			; AVX512VL-NEXT: vpunpckhwd {{.*#+}} ymm4 = ymm0[4,4,5,5,6,6,7,7,12,12,13,13,14,14,15,15]
	; AVX512VL-NEXT: vpsrlvd %ymm3, %ymm4, %ymm3			; AVX512VL-NEXT: vpsrlvd %ymm3, %ymm4, %ymm3
	; AVX512VL-NEXT: vpblendw {{.*#+}} ymm3 = ymm3[0],ymm2[1],ymm3[2],ymm2[3],ymm3[4],ymm2[5],ymm3[6],ymm2[7],ymm3[8],ymm2[9],ymm3[10],ymm2[11],ymm3[12],ymm2[13],ymm3[14],ymm2[15]			; AVX512VL-NEXT: vpblendw {{.*#+}} ymm3 = ymm3[0],ymm2[1],ymm3[2],ymm2[3],ymm3[4],ymm2[5],ymm3[6],ymm2[7],ymm3[8],ymm2[9],ymm3[10],ymm2[11],ymm3[12],ymm2[13],ymm3[14],ymm2[15]
	; AVX512VL-NEXT: vpunpcklwd {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11]			; AVX512VL-NEXT: vpunpcklwd {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11]
	; AVX512VL-NEXT: vpunpcklwd {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,8,8,9,9,10,10,11,11]			; AVX512VL-NEXT: vpunpcklwd {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,8,8,9,9,10,10,11,11]
	; AVX512VL-NEXT: vpsrlvd %ymm1, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrlvd %ymm1, %ymm0, %ymm0
	Show All 9 Lines
	; AVX512BW-NEXT: vmovdqa {{.*#+}} ymm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512BW-NEXT: vmovdqa {{.*#+}} ymm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512BW-NEXT: vpsubw %ymm1, %ymm3, %ymm1			; AVX512BW-NEXT: vpsubw %ymm1, %ymm3, %ymm1
	; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpor %ymm0, %ymm2, %ymm0			; AVX512BW-NEXT: vpor %ymm0, %ymm2, %ymm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v16i16:			; AVX512VLBW-LABEL: var_funnnel_v16i16:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpsrlvw %ymm1, %ymm0, %ymm2			; AVX512VLBW-NEXT: vpsrlvw %ymm1, %ymm0, %ymm2
	; AVX512VLBW-NEXT: vmovdqa {{.*#+}} ymm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512VLBW-NEXT: vmovdqa {{.*#+}} ymm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512VLBW-NEXT: vpsubw %ymm1, %ymm3, %ymm1			; AVX512VLBW-NEXT: vpsubw %ymm1, %ymm3, %ymm1
	; AVX512VLBW-NEXT: vpsllvw %ymm1, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpsllvw %ymm1, %ymm0, %ymm0
	; AVX512VLBW-NEXT: vpor %ymm0, %ymm2, %ymm0			; AVX512VLBW-NEXT: vpor %ymm0, %ymm2, %ymm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: var_funnnel_v16i16:			; AVX512VBMI2-LABEL: var_funnnel_v16i16:
	▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm3[0],ymm1[1],ymm3[1],ymm1[2],ymm3[2],ymm1[3],ymm3[3],ymm1[4],ymm3[4],ymm1[5],ymm3[5],ymm1[6],ymm3[6],ymm1[7],ymm3[7],ymm1[16],ymm3[16],ymm1[17],ymm3[17],ymm1[18],ymm3[18],ymm1[19],ymm3[19],ymm1[20],ymm3[20],ymm1[21],ymm3[21],ymm1[22],ymm3[22],ymm1[23],ymm3[23]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm3[0],ymm1[1],ymm3[1],ymm1[2],ymm3[2],ymm1[3],ymm3[3],ymm1[4],ymm3[4],ymm1[5],ymm3[5],ymm1[6],ymm3[6],ymm1[7],ymm3[7],ymm1[16],ymm3[16],ymm1[17],ymm3[17],ymm1[18],ymm3[18],ymm1[19],ymm3[19],ymm1[20],ymm3[20],ymm1[21],ymm3[21],ymm1[22],ymm3[22],ymm1[23],ymm3[23]
	; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpand %ymm4, %ymm0, %ymm0			; AVX512BW-NEXT: vpand %ymm4, %ymm0, %ymm0
	; AVX512BW-NEXT: vpackuswb %ymm2, %ymm0, %ymm0			; AVX512BW-NEXT: vpackuswb %ymm2, %ymm0, %ymm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v32i8:			; AVX512VLBW-LABEL: var_funnnel_v32i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} ymm3 = ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15],ymm1[24],ymm2[24],ymm1[25],ymm2[25],ymm1[26],ymm2[26],ymm1[27],ymm2[27],ymm1[28],ymm2[28],ymm1[29],ymm2[29],ymm1[30],ymm2[30],ymm1[31],ymm2[31]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} ymm3 = ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15],ymm1[24],ymm2[24],ymm1[25],ymm2[25],ymm1[26],ymm2[26],ymm1[27],ymm2[27],ymm1[28],ymm2[28],ymm1[29],ymm2[29],ymm1[30],ymm2[30],ymm1[31],ymm2[31]
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]
	; AVX512VLBW-NEXT: vpsrlvw %ymm3, %ymm4, %ymm3			; AVX512VLBW-NEXT: vpsrlvw %ymm3, %ymm4, %ymm3
	; AVX512VLBW-NEXT: vmovdqa {{.*#+}} ymm4 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]			; AVX512VLBW-NEXT: vmovdqa {{.*#+}} ymm4 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]
	; AVX512VLBW-NEXT: vpand %ymm4, %ymm3, %ymm3			; AVX512VLBW-NEXT: vpand %ymm4, %ymm3, %ymm3
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[16],ymm2[16],ymm1[17],ymm2[17],ymm1[18],ymm2[18],ymm1[19],ymm2[19],ymm1[20],ymm2[20],ymm1[21],ymm2[21],ymm1[22],ymm2[22],ymm1[23],ymm2[23]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[16],ymm2[16],ymm1[17],ymm2[17],ymm1[18],ymm2[18],ymm1[19],ymm2[19],ymm1[20],ymm2[20],ymm1[21],ymm2[21],ymm1[22],ymm2[22],ymm1[23],ymm2[23]
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]
	Show All 14 Lines
	; AVX512VBMI2-NEXT: vpsrlvw %zmm1, %zmm0, %zmm1			; AVX512VBMI2-NEXT: vpsrlvw %zmm1, %zmm0, %zmm1
	; AVX512VBMI2-NEXT: vmovdqa {{.*#+}} ymm0 = [0,2,4,6,8,10,12,14,64,66,68,70,72,74,76,78,16,18,20,22,24,26,28,30,80,82,84,86,88,90,92,94]			; AVX512VBMI2-NEXT: vmovdqa {{.*#+}} ymm0 = [0,2,4,6,8,10,12,14,64,66,68,70,72,74,76,78,16,18,20,22,24,26,28,30,80,82,84,86,88,90,92,94]
	; AVX512VBMI2-NEXT: vpermi2b %zmm2, %zmm1, %zmm0			; AVX512VBMI2-NEXT: vpermi2b %zmm2, %zmm1, %zmm0
	; AVX512VBMI2-NEXT: # kill: def $ymm0 killed $ymm0 killed $zmm0			; AVX512VBMI2-NEXT: # kill: def $ymm0 killed $ymm0 killed $zmm0
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: var_funnnel_v32i8:			; AVX512VLVBMI2-LABEL: var_funnnel_v32i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} ymm3 = ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15],ymm1[24],ymm2[24],ymm1[25],ymm2[25],ymm1[26],ymm2[26],ymm1[27],ymm2[27],ymm1[28],ymm2[28],ymm1[29],ymm2[29],ymm1[30],ymm2[30],ymm1[31],ymm2[31]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} ymm3 = ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15],ymm1[24],ymm2[24],ymm1[25],ymm2[25],ymm1[26],ymm2[26],ymm1[27],ymm2[27],ymm1[28],ymm2[28],ymm1[29],ymm2[29],ymm1[30],ymm2[30],ymm1[31],ymm2[31]
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]
	; AVX512VLVBMI2-NEXT: vpsrlvw %ymm3, %ymm4, %ymm3			; AVX512VLVBMI2-NEXT: vpsrlvw %ymm3, %ymm4, %ymm3
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[16],ymm2[16],ymm1[17],ymm2[17],ymm1[18],ymm2[18],ymm1[19],ymm2[19],ymm1[20],ymm2[20],ymm1[21],ymm2[21],ymm1[22],ymm2[22],ymm1[23],ymm2[23]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[16],ymm2[16],ymm1[17],ymm2[17],ymm1[18],ymm2[18],ymm1[19],ymm2[19],ymm1[20],ymm2[20],ymm1[21],ymm2[21],ymm1[22],ymm2[22],ymm1[23],ymm2[23]
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]
	; AVX512VLVBMI2-NEXT: vpsrlvw %ymm1, %ymm0, %ymm1			; AVX512VLVBMI2-NEXT: vpsrlvw %ymm1, %ymm0, %ymm1
	; AVX512VLVBMI2-NEXT: vmovdqa {{.*#+}} ymm0 = [0,2,4,6,8,10,12,14,32,34,36,38,40,42,44,46,16,18,20,22,24,26,28,30,48,50,52,54,56,58,60,62]			; AVX512VLVBMI2-NEXT: vmovdqa {{.*#+}} ymm0 = [0,2,4,6,8,10,12,14,32,34,36,38,40,42,44,46,16,18,20,22,24,26,28,30,48,50,52,54,56,58,60,62]
	▲ Show 20 Lines • Show All 1,192 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-fshr-rot-512.ll

	Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	; AVX512VL-NEXT: vpsrlvd %ymm1, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrlvd %ymm1, %ymm0, %ymm0
	; AVX512VL-NEXT: vpblendw {{.*#+}} ymm0 = ymm0[0],ymm4[1],ymm0[2],ymm4[3],ymm0[4],ymm4[5],ymm0[6],ymm4[7],ymm0[8],ymm4[9],ymm0[10],ymm4[11],ymm0[12],ymm4[13],ymm0[14],ymm4[15]			; AVX512VL-NEXT: vpblendw {{.*#+}} ymm0 = ymm0[0],ymm4[1],ymm0[2],ymm4[3],ymm0[4],ymm4[5],ymm0[6],ymm4[7],ymm0[8],ymm4[9],ymm0[10],ymm4[11],ymm0[12],ymm4[13],ymm0[14],ymm4[15]
	; AVX512VL-NEXT: vpackusdw %ymm3, %ymm0, %ymm0			; AVX512VL-NEXT: vpackusdw %ymm3, %ymm0, %ymm0
	; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v32i16:			; AVX512BW-LABEL: var_funnnel_v32i16:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm2			; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm2
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512BW-NEXT: vpsubw %zmm1, %zmm3, %zmm1			; AVX512BW-NEXT: vpsubw %zmm1, %zmm3, %zmm1
	; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vporq %zmm0, %zmm2, %zmm0			; AVX512BW-NEXT: vporq %zmm0, %zmm2, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v32i16:			; AVX512VLBW-LABEL: var_funnnel_v32i16:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512VLBW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm2			; AVX512VLBW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm2
	; AVX512VLBW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512VLBW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512VLBW-NEXT: vpsubw %zmm1, %zmm3, %zmm1			; AVX512VLBW-NEXT: vpsubw %zmm1, %zmm3, %zmm1
	; AVX512VLBW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vporq %zmm0, %zmm2, %zmm0			; AVX512VLBW-NEXT: vporq %zmm0, %zmm2, %zmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: var_funnnel_v32i16:			; AVX512VBMI2-LABEL: var_funnnel_v32i16:
	▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	; AVX512VL-NEXT: vpternlogq $226, %ymm3, %ymm8, %ymm4			; AVX512VL-NEXT: vpternlogq $226, %ymm3, %ymm8, %ymm4
	; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1			; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1
	; AVX512VL-NEXT: vpblendvb %ymm1, %ymm4, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm1, %ymm4, %ymm0, %ymm0
	; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_funnnel_v64i8:			; AVX512BW-LABEL: var_funnnel_v64i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]
	; AVX512BW-NEXT: vpsrlvw %zmm3, %zmm4, %zmm3			; AVX512BW-NEXT: vpsrlvw %zmm3, %zmm4, %zmm3
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm4 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm4 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]
	; AVX512BW-NEXT: vpandq %zmm4, %zmm3, %zmm3			; AVX512BW-NEXT: vpandq %zmm4, %zmm3, %zmm3
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]
	; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpandq %zmm4, %zmm0, %zmm0			; AVX512BW-NEXT: vpandq %zmm4, %zmm0, %zmm0
	; AVX512BW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0			; AVX512BW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_funnnel_v64i8:			; AVX512VLBW-LABEL: var_funnnel_v64i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]
	; AVX512VLBW-NEXT: vpsrlvw %zmm3, %zmm4, %zmm3			; AVX512VLBW-NEXT: vpsrlvw %zmm3, %zmm4, %zmm3
	; AVX512VLBW-NEXT: vmovdqa64 {{.*#+}} zmm4 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]			; AVX512VLBW-NEXT: vmovdqa64 {{.*#+}} zmm4 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]
	; AVX512VLBW-NEXT: vpandq %zmm4, %zmm3, %zmm3			; AVX512VLBW-NEXT: vpandq %zmm4, %zmm3, %zmm3
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]
	; AVX512VLBW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpandq %zmm4, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpandq %zmm4, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: var_funnnel_v64i8:			; AVX512VBMI2-LABEL: var_funnnel_v64i8:
	; AVX512VBMI2: # %bb.0:			; AVX512VBMI2: # %bb.0:
	; AVX512VBMI2-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512VBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512VBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]			; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]
	; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]			; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]
	; AVX512VBMI2-NEXT: vpsrlvw %zmm3, %zmm4, %zmm3			; AVX512VBMI2-NEXT: vpsrlvw %zmm3, %zmm4, %zmm3
	; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]			; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]
	; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]			; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]
	; AVX512VBMI2-NEXT: vpsrlvw %zmm1, %zmm0, %zmm1			; AVX512VBMI2-NEXT: vpsrlvw %zmm1, %zmm0, %zmm1
	; AVX512VBMI2-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,2,4,6,8,10,12,14,64,66,68,70,72,74,76,78,16,18,20,22,24,26,28,30,80,82,84,86,88,90,92,94,32,34,36,38,40,42,44,46,96,98,100,102,104,106,108,110,48,50,52,54,56,58,60,62,112,114,116,118,120,122,124,126]			; AVX512VBMI2-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,2,4,6,8,10,12,14,64,66,68,70,72,74,76,78,16,18,20,22,24,26,28,30,80,82,84,86,88,90,92,94,32,34,36,38,40,42,44,46,96,98,100,102,104,106,108,110,48,50,52,54,56,58,60,62,112,114,116,118,120,122,124,126]
	; AVX512VBMI2-NEXT: vpermi2b %zmm3, %zmm1, %zmm0			; AVX512VBMI2-NEXT: vpermi2b %zmm3, %zmm1, %zmm0
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: var_funnnel_v64i8:			; AVX512VLVBMI2-LABEL: var_funnnel_v64i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]
	; AVX512VLVBMI2-NEXT: vpsrlvw %zmm3, %zmm4, %zmm3			; AVX512VLVBMI2-NEXT: vpsrlvw %zmm3, %zmm4, %zmm3
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]
	; AVX512VLVBMI2-NEXT: vpsrlvw %zmm1, %zmm0, %zmm1			; AVX512VLVBMI2-NEXT: vpsrlvw %zmm1, %zmm0, %zmm1
	; AVX512VLVBMI2-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,2,4,6,8,10,12,14,64,66,68,70,72,74,76,78,16,18,20,22,24,26,28,30,80,82,84,86,88,90,92,94,32,34,36,38,40,42,44,46,96,98,100,102,104,106,108,110,48,50,52,54,56,58,60,62,112,114,116,118,120,122,124,126]			; AVX512VLVBMI2-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,2,4,6,8,10,12,14,64,66,68,70,72,74,76,78,16,18,20,22,24,26,28,30,80,82,84,86,88,90,92,94,32,34,36,38,40,42,44,46,96,98,100,102,104,106,108,110,48,50,52,54,56,58,60,62,112,114,116,118,120,122,124,126]
	▲ Show 20 Lines • Show All 503 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-idiv-sdiv-512.ll

	Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vpsrlw $8, %zmm2, %zmm2			; AVX512BW-NEXT: vpsrlw $8, %zmm2, %zmm2
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]
	; AVX512BW-NEXT: vpmulhw %zmm3, %zmm1, %zmm1			; AVX512BW-NEXT: vpmulhw %zmm3, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsrlw $8, %zmm1, %zmm1			; AVX512BW-NEXT: vpsrlw $8, %zmm1, %zmm1
	; AVX512BW-NEXT: vpackuswb %zmm2, %zmm1, %zmm1			; AVX512BW-NEXT: vpackuswb %zmm2, %zmm1, %zmm1
	; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm0			; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm0
	; AVX512BW-NEXT: vpsrlw $2, %zmm0, %zmm1			; AVX512BW-NEXT: vpsrlw $2, %zmm0, %zmm1
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32]
	; AVX512BW-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm1			; AVX512BW-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm1
	; AVX512BW-NEXT: vpsrlw $7, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlw $7, %zmm0, %zmm0
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm0			; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm0
	; AVX512BW-NEXT: vpsubb %zmm2, %zmm0, %zmm0			; AVX512BW-NEXT: vpsubb %zmm2, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%res = sdiv <64 x i8> %a, <i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7>			%res = sdiv <64 x i8> %a, <i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7>
	ret <64 x i8> %res			ret <64 x i8> %res
	}			}

	;			;
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vpsllvw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpsllvw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1
	; AVX512BW-NEXT: vpsrlw $8, %zmm1, %zmm1			; AVX512BW-NEXT: vpsrlw $8, %zmm1, %zmm1
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm2 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm2 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]
	; AVX512BW-NEXT: vpsraw $8, %zmm2, %zmm2			; AVX512BW-NEXT: vpsraw $8, %zmm2, %zmm2
	; AVX512BW-NEXT: vpsllvw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512BW-NEXT: vpsllvw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2
	; AVX512BW-NEXT: vpsrlw $8, %zmm2, %zmm2			; AVX512BW-NEXT: vpsrlw $8, %zmm2, %zmm2
	; AVX512BW-NEXT: vpackuswb %zmm1, %zmm2, %zmm1			; AVX512BW-NEXT: vpackuswb %zmm1, %zmm2, %zmm1
	; AVX512BW-NEXT: vpsrlw $7, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlw $7, %zmm0, %zmm0
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm0			; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%res = sdiv <64 x i8> %a, <i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 20, i8 21, i8 22, i8 23, i8 24, i8 25, i8 26, i8 27, i8 28, i8 29, i8 30, i8 31, i8 32, i8 33, i8 34, i8 35, i8 36, i8 37, i8 38, i8 38, i8 37, i8 36, i8 35, i8 34, i8 33, i8 32, i8 31, i8 30, i8 29, i8 28, i8 27, i8 26, i8 25, i8 24, i8 23, i8 22, i8 21, i8 20, i8 19, i8 18, i8 17, i8 16, i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7>			%res = sdiv <64 x i8> %a, <i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 20, i8 21, i8 22, i8 23, i8 24, i8 25, i8 26, i8 27, i8 28, i8 29, i8 30, i8 31, i8 32, i8 33, i8 34, i8 35, i8 36, i8 37, i8 38, i8 38, i8 37, i8 36, i8 35, i8 34, i8 33, i8 32, i8 31, i8 30, i8 29, i8 28, i8 27, i8 26, i8 25, i8 24, i8 23, i8 22, i8 21, i8 20, i8 19, i8 18, i8 17, i8 16, i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7>
	ret <64 x i8> %res			ret <64 x i8> %res
	}			}

	;			;
	; srem by 7			; srem by 7
	▲ Show 20 Lines • Show All 221 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vpsrlw $8, %zmm2, %zmm2			; AVX512BW-NEXT: vpsrlw $8, %zmm2, %zmm2
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm0[0],zmm1[1],zmm0[1],zmm1[2],zmm0[2],zmm1[3],zmm0[3],zmm1[4],zmm0[4],zmm1[5],zmm0[5],zmm1[6],zmm0[6],zmm1[7],zmm0[7],zmm1[16],zmm0[16],zmm1[17],zmm0[17],zmm1[18],zmm0[18],zmm1[19],zmm0[19],zmm1[20],zmm0[20],zmm1[21],zmm0[21],zmm1[22],zmm0[22],zmm1[23],zmm0[23],zmm1[32],zmm0[32],zmm1[33],zmm0[33],zmm1[34],zmm0[34],zmm1[35],zmm0[35],zmm1[36],zmm0[36],zmm1[37],zmm0[37],zmm1[38],zmm0[38],zmm1[39],zmm0[39],zmm1[48],zmm0[48],zmm1[49],zmm0[49],zmm1[50],zmm0[50],zmm1[51],zmm0[51],zmm1[52],zmm0[52],zmm1[53],zmm0[53],zmm1[54],zmm0[54],zmm1[55],zmm0[55]
	; AVX512BW-NEXT: vpmulhw %zmm3, %zmm1, %zmm1			; AVX512BW-NEXT: vpmulhw %zmm3, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsrlw $8, %zmm1, %zmm1			; AVX512BW-NEXT: vpsrlw $8, %zmm1, %zmm1
	; AVX512BW-NEXT: vpackuswb %zmm2, %zmm1, %zmm1			; AVX512BW-NEXT: vpackuswb %zmm2, %zmm1, %zmm1
	; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm1			; AVX512BW-NEXT: vpaddb %zmm0, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsrlw $2, %zmm1, %zmm2			; AVX512BW-NEXT: vpsrlw $2, %zmm1, %zmm2
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32]
	; AVX512BW-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm3, %zmm2			; AVX512BW-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm3, %zmm2
	; AVX512BW-NEXT: vpsrlw $7, %zmm1, %zmm1			; AVX512BW-NEXT: vpsrlw $7, %zmm1, %zmm1
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vpaddb %zmm1, %zmm2, %zmm1			; AVX512BW-NEXT: vpaddb %zmm1, %zmm2, %zmm1
	; AVX512BW-NEXT: vpsubb %zmm3, %zmm1, %zmm1			; AVX512BW-NEXT: vpsubb %zmm3, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsllw $3, %zmm1, %zmm2			; AVX512BW-NEXT: vpsllw $3, %zmm1, %zmm2
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512BW-NEXT: vpsubb %zmm2, %zmm1, %zmm1			; AVX512BW-NEXT: vpsubb %zmm2, %zmm1, %zmm1
	; AVX512BW-NEXT: vpaddb %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpaddb %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%res = srem <64 x i8> %a, <i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7>			%res = srem <64 x i8> %a, <i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7>
	ret <64 x i8> %res			ret <64 x i8> %res
	}			}

	;			;
	▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vpsllvw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512BW-NEXT: vpsllvw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2
	; AVX512BW-NEXT: vpsrlw $8, %zmm2, %zmm2			; AVX512BW-NEXT: vpsrlw $8, %zmm2, %zmm2
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm3 = zmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm3 = zmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]
	; AVX512BW-NEXT: vpsraw $8, %zmm3, %zmm3			; AVX512BW-NEXT: vpsraw $8, %zmm3, %zmm3
	; AVX512BW-NEXT: vpsllvw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm3, %zmm3			; AVX512BW-NEXT: vpsllvw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm3, %zmm3
	; AVX512BW-NEXT: vpsrlw $8, %zmm3, %zmm3			; AVX512BW-NEXT: vpsrlw $8, %zmm3, %zmm3
	; AVX512BW-NEXT: vpackuswb %zmm2, %zmm3, %zmm2			; AVX512BW-NEXT: vpackuswb %zmm2, %zmm3, %zmm2
	; AVX512BW-NEXT: vpsrlw $7, %zmm1, %zmm1			; AVX512BW-NEXT: vpsrlw $7, %zmm1, %zmm1
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vpaddb %zmm1, %zmm2, %zmm1			; AVX512BW-NEXT: vpaddb %zmm1, %zmm2, %zmm1
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm2 = zmm1[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm2 = zmm1[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]
	; AVX512BW-NEXT: vpmullw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512BW-NEXT: vpmullw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]
	; AVX512BW-NEXT: vpandq %zmm3, %zmm2, %zmm2			; AVX512BW-NEXT: vpandq %zmm3, %zmm2, %zmm2
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]
	; AVX512BW-NEXT: vpmullw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpmullw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1
	; AVX512BW-NEXT: vpandq %zmm3, %zmm1, %zmm1			; AVX512BW-NEXT: vpandq %zmm3, %zmm1, %zmm1
	; AVX512BW-NEXT: vpackuswb %zmm2, %zmm1, %zmm1			; AVX512BW-NEXT: vpackuswb %zmm2, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsubb %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsubb %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%res = srem <64 x i8> %a, <i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 20, i8 21, i8 22, i8 23, i8 24, i8 25, i8 26, i8 27, i8 28, i8 29, i8 30, i8 31, i8 32, i8 33, i8 34, i8 35, i8 36, i8 37, i8 38, i8 38, i8 37, i8 36, i8 35, i8 34, i8 33, i8 32, i8 31, i8 30, i8 29, i8 28, i8 27, i8 26, i8 25, i8 24, i8 23, i8 22, i8 21, i8 20, i8 19, i8 18, i8 17, i8 16, i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7>			%res = srem <64 x i8> %a, <i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 20, i8 21, i8 22, i8 23, i8 24, i8 25, i8 26, i8 27, i8 28, i8 29, i8 30, i8 31, i8 32, i8 33, i8 34, i8 35, i8 36, i8 37, i8 38, i8 38, i8 37, i8 36, i8 35, i8 34, i8 33, i8 32, i8 31, i8 30, i8 29, i8 28, i8 27, i8 26, i8 25, i8 24, i8 23, i8 22, i8 21, i8 20, i8 19, i8 18, i8 17, i8 16, i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7>
	ret <64 x i8> %res			ret <64 x i8> %res
	}			}

llvm/test/CodeGen/X86/vector-idiv-udiv-512.ll

	Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vpsrlw $8, %ymm1, %ymm1			; AVX512F-NEXT: vpsrlw $8, %ymm1, %ymm1
	; AVX512F-NEXT: vpackuswb %ymm4, %ymm1, %ymm1			; AVX512F-NEXT: vpackuswb %ymm4, %ymm1, %ymm1
	; AVX512F-NEXT: vpsubb %ymm1, %ymm0, %ymm0			; AVX512F-NEXT: vpsubb %ymm1, %ymm0, %ymm0
	; AVX512F-NEXT: vpsrlw $1, %ymm0, %ymm0			; AVX512F-NEXT: vpsrlw $1, %ymm0, %ymm0
	; AVX512F-NEXT: vpand %ymm5, %ymm0, %ymm0			; AVX512F-NEXT: vpand %ymm5, %ymm0, %ymm0
	; AVX512F-NEXT: vpaddb %ymm1, %ymm0, %ymm0			; AVX512F-NEXT: vpaddb %ymm1, %ymm0, %ymm0
	; AVX512F-NEXT: vpsrlw $2, %ymm0, %ymm0			; AVX512F-NEXT: vpsrlw $2, %ymm0, %ymm0
	; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm2, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm2, %zmm0
	; AVX512F-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: test_div7_64i8:			; AVX512BW-LABEL: test_div7_64i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpxor %xmm1, %xmm1, %xmm1			; AVX512BW-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm2 = zmm0[8],zmm1[8],zmm0[9],zmm1[9],zmm0[10],zmm1[10],zmm0[11],zmm1[11],zmm0[12],zmm1[12],zmm0[13],zmm1[13],zmm0[14],zmm1[14],zmm0[15],zmm1[15],zmm0[24],zmm1[24],zmm0[25],zmm1[25],zmm0[26],zmm1[26],zmm0[27],zmm1[27],zmm0[28],zmm1[28],zmm0[29],zmm1[29],zmm0[30],zmm1[30],zmm0[31],zmm1[31],zmm0[40],zmm1[40],zmm0[41],zmm1[41],zmm0[42],zmm1[42],zmm0[43],zmm1[43],zmm0[44],zmm1[44],zmm0[45],zmm1[45],zmm0[46],zmm1[46],zmm0[47],zmm1[47],zmm0[56],zmm1[56],zmm0[57],zmm1[57],zmm0[58],zmm1[58],zmm0[59],zmm1[59],zmm0[60],zmm1[60],zmm0[61],zmm1[61],zmm0[62],zmm1[62],zmm0[63],zmm1[63]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm2 = zmm0[8],zmm1[8],zmm0[9],zmm1[9],zmm0[10],zmm1[10],zmm0[11],zmm1[11],zmm0[12],zmm1[12],zmm0[13],zmm1[13],zmm0[14],zmm1[14],zmm0[15],zmm1[15],zmm0[24],zmm1[24],zmm0[25],zmm1[25],zmm0[26],zmm1[26],zmm0[27],zmm1[27],zmm0[28],zmm1[28],zmm0[29],zmm1[29],zmm0[30],zmm1[30],zmm0[31],zmm1[31],zmm0[40],zmm1[40],zmm0[41],zmm1[41],zmm0[42],zmm1[42],zmm0[43],zmm1[43],zmm0[44],zmm1[44],zmm0[45],zmm1[45],zmm0[46],zmm1[46],zmm0[47],zmm1[47],zmm0[56],zmm1[56],zmm0[57],zmm1[57],zmm0[58],zmm1[58],zmm0[59],zmm1[59],zmm0[60],zmm1[60],zmm0[61],zmm1[61],zmm0[62],zmm1[62],zmm0[63],zmm1[63]
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37,37]
	; AVX512BW-NEXT: vpmullw %zmm3, %zmm2, %zmm2			; AVX512BW-NEXT: vpmullw %zmm3, %zmm2, %zmm2
	; AVX512BW-NEXT: vpsrlw $8, %zmm2, %zmm2			; AVX512BW-NEXT: vpsrlw $8, %zmm2, %zmm2
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm0[0],zmm1[0],zmm0[1],zmm1[1],zmm0[2],zmm1[2],zmm0[3],zmm1[3],zmm0[4],zmm1[4],zmm0[5],zmm1[5],zmm0[6],zmm1[6],zmm0[7],zmm1[7],zmm0[16],zmm1[16],zmm0[17],zmm1[17],zmm0[18],zmm1[18],zmm0[19],zmm1[19],zmm0[20],zmm1[20],zmm0[21],zmm1[21],zmm0[22],zmm1[22],zmm0[23],zmm1[23],zmm0[32],zmm1[32],zmm0[33],zmm1[33],zmm0[34],zmm1[34],zmm0[35],zmm1[35],zmm0[36],zmm1[36],zmm0[37],zmm1[37],zmm0[38],zmm1[38],zmm0[39],zmm1[39],zmm0[48],zmm1[48],zmm0[49],zmm1[49],zmm0[50],zmm1[50],zmm0[51],zmm1[51],zmm0[52],zmm1[52],zmm0[53],zmm1[53],zmm0[54],zmm1[54],zmm0[55],zmm1[55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm0[0],zmm1[0],zmm0[1],zmm1[1],zmm0[2],zmm1[2],zmm0[3],zmm1[3],zmm0[4],zmm1[4],zmm0[5],zmm1[5],zmm0[6],zmm1[6],zmm0[7],zmm1[7],zmm0[16],zmm1[16],zmm0[17],zmm1[17],zmm0[18],zmm1[18],zmm0[19],zmm1[19],zmm0[20],zmm1[20],zmm0[21],zmm1[21],zmm0[22],zmm1[22],zmm0[23],zmm1[23],zmm0[32],zmm1[32],zmm0[33],zmm1[33],zmm0[34],zmm1[34],zmm0[35],zmm1[35],zmm0[36],zmm1[36],zmm0[37],zmm1[37],zmm0[38],zmm1[38],zmm0[39],zmm1[39],zmm0[48],zmm1[48],zmm0[49],zmm1[49],zmm0[50],zmm1[50],zmm0[51],zmm1[51],zmm0[52],zmm1[52],zmm0[53],zmm1[53],zmm0[54],zmm1[54],zmm0[55],zmm1[55]
	; AVX512BW-NEXT: vpmullw %zmm3, %zmm1, %zmm1			; AVX512BW-NEXT: vpmullw %zmm3, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsrlw $8, %zmm1, %zmm1			; AVX512BW-NEXT: vpsrlw $8, %zmm1, %zmm1
	; AVX512BW-NEXT: vpackuswb %zmm2, %zmm1, %zmm1			; AVX512BW-NEXT: vpackuswb %zmm2, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsubb %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsubb %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpsrlw $1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlw $1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512BW-NEXT: vpaddb %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpaddb %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpsrlw $2, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlw $2, %zmm0, %zmm0
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%res = udiv <64 x i8> %a, <i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7>			%res = udiv <64 x i8> %a, <i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7>
	ret <64 x i8> %res			ret <64 x i8> %res
	}			}

	;			;
	; udiv by non-splat constant			; udiv by non-splat constant
	;			;
	▲ Show 20 Lines • Show All 326 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vpmullw %zmm3, %zmm2, %zmm2			; AVX512BW-NEXT: vpmullw %zmm3, %zmm2, %zmm2
	; AVX512BW-NEXT: vpsrlw $8, %zmm2, %zmm2			; AVX512BW-NEXT: vpsrlw $8, %zmm2, %zmm2
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm0[0],zmm1[0],zmm0[1],zmm1[1],zmm0[2],zmm1[2],zmm0[3],zmm1[3],zmm0[4],zmm1[4],zmm0[5],zmm1[5],zmm0[6],zmm1[6],zmm0[7],zmm1[7],zmm0[16],zmm1[16],zmm0[17],zmm1[17],zmm0[18],zmm1[18],zmm0[19],zmm1[19],zmm0[20],zmm1[20],zmm0[21],zmm1[21],zmm0[22],zmm1[22],zmm0[23],zmm1[23],zmm0[32],zmm1[32],zmm0[33],zmm1[33],zmm0[34],zmm1[34],zmm0[35],zmm1[35],zmm0[36],zmm1[36],zmm0[37],zmm1[37],zmm0[38],zmm1[38],zmm0[39],zmm1[39],zmm0[48],zmm1[48],zmm0[49],zmm1[49],zmm0[50],zmm1[50],zmm0[51],zmm1[51],zmm0[52],zmm1[52],zmm0[53],zmm1[53],zmm0[54],zmm1[54],zmm0[55],zmm1[55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm0[0],zmm1[0],zmm0[1],zmm1[1],zmm0[2],zmm1[2],zmm0[3],zmm1[3],zmm0[4],zmm1[4],zmm0[5],zmm1[5],zmm0[6],zmm1[6],zmm0[7],zmm1[7],zmm0[16],zmm1[16],zmm0[17],zmm1[17],zmm0[18],zmm1[18],zmm0[19],zmm1[19],zmm0[20],zmm1[20],zmm0[21],zmm1[21],zmm0[22],zmm1[22],zmm0[23],zmm1[23],zmm0[32],zmm1[32],zmm0[33],zmm1[33],zmm0[34],zmm1[34],zmm0[35],zmm1[35],zmm0[36],zmm1[36],zmm0[37],zmm1[37],zmm0[38],zmm1[38],zmm0[39],zmm1[39],zmm0[48],zmm1[48],zmm0[49],zmm1[49],zmm0[50],zmm1[50],zmm0[51],zmm1[51],zmm0[52],zmm1[52],zmm0[53],zmm1[53],zmm0[54],zmm1[54],zmm0[55],zmm1[55]
	; AVX512BW-NEXT: vpmullw %zmm3, %zmm1, %zmm1			; AVX512BW-NEXT: vpmullw %zmm3, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsrlw $8, %zmm1, %zmm1			; AVX512BW-NEXT: vpsrlw $8, %zmm1, %zmm1
	; AVX512BW-NEXT: vpackuswb %zmm2, %zmm1, %zmm1			; AVX512BW-NEXT: vpackuswb %zmm2, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsubb %zmm1, %zmm0, %zmm2			; AVX512BW-NEXT: vpsubb %zmm1, %zmm0, %zmm2
	; AVX512BW-NEXT: vpsrlw $1, %zmm2, %zmm2			; AVX512BW-NEXT: vpsrlw $1, %zmm2, %zmm2
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512BW-NEXT: vpaddb %zmm1, %zmm2, %zmm1			; AVX512BW-NEXT: vpaddb %zmm1, %zmm2, %zmm1
	; AVX512BW-NEXT: vpsrlw $2, %zmm1, %zmm1			; AVX512BW-NEXT: vpsrlw $2, %zmm1, %zmm1
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsllw $3, %zmm1, %zmm2			; AVX512BW-NEXT: vpsllw $3, %zmm1, %zmm2
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512BW-NEXT: vpsubb %zmm2, %zmm1, %zmm1			; AVX512BW-NEXT: vpsubb %zmm2, %zmm1, %zmm1
	; AVX512BW-NEXT: vpaddb %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpaddb %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%res = urem <64 x i8> %a, <i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7>			%res = urem <64 x i8> %a, <i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,i8 7, i8 7, i8 7, i8 7>
	ret <64 x i8> %res			ret <64 x i8> %res
	}			}

	;			;
	▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-lzcnt-128.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefixes=SSE,SSE2		; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefixes=SSE,SSE2
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse3 \| FileCheck %s --check-prefixes=SSE,SSE3		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse3 \| FileCheck %s --check-prefixes=SSE,SSE3
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+ssse3 \| FileCheck %s --check-prefixes=SSE,SSSE3		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+ssse3 \| FileCheck %s --check-prefixes=SSE,SSSE3
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefixes=SSE,SSE41		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefixes=SSE,SSE41
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefixes=NOBW,AVX		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefixes=NOBW,AVX,AVX1OR2
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefixes=NOBW,AVX		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefixes=NOBW,AVX,AVX1OR2
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl \| FileCheck %s --check-prefixes=NOBW,AVX		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl \| FileCheck %s --check-prefixes=NOBW,AVX,AVX512VL
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+avx512bw,+avx512dq \| FileCheck %s --check-prefix=AVX512VLBWDQ		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl,+avx512bw,+avx512dq \| FileCheck %s --check-prefixes=AVX512VLBWDQ
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512dq,+avx512cd,+avx512vl \| FileCheck %s --check-prefixes=NOBW,AVX512,AVX512VLCD		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512dq,+avx512cd,+avx512vl \| FileCheck %s --check-prefixes=NOBW,AVX512,AVX512VLCD
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512dq,+avx512cd \| FileCheck %s --check-prefixes=NOBW,AVX512,AVX512CD		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512dq,+avx512cd \| FileCheck %s --check-prefixes=NOBW,AVX512,AVX512CD
;		;
; Just one 32-bit run to make sure we do reasonable things for i64 lzcnt.		; Just one 32-bit run to make sure we do reasonable things for i64 lzcnt.
; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefix=X86-SSE		; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefix=X86-SSE

define <2 x i64> @testv2i64(<2 x i64> %in) nounwind {		define <2 x i64> @testv2i64(<2 x i64> %in) nounwind {
; SSE2-LABEL: testv2i64:		; SSE2-LABEL: testv2i64:
▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
; SSE41-NEXT: paddd %xmm2, %xmm1		; SSE41-NEXT: paddd %xmm2, %xmm1
; SSE41-NEXT: pcmpeqd %xmm4, %xmm0		; SSE41-NEXT: pcmpeqd %xmm4, %xmm0
; SSE41-NEXT: psrlq $32, %xmm0		; SSE41-NEXT: psrlq $32, %xmm0
; SSE41-NEXT: pand %xmm1, %xmm0		; SSE41-NEXT: pand %xmm1, %xmm0
; SSE41-NEXT: psrlq $32, %xmm1		; SSE41-NEXT: psrlq $32, %xmm1
; SSE41-NEXT: paddq %xmm1, %xmm0		; SSE41-NEXT: paddq %xmm1, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: testv2i64:		; AVX1OR2-LABEL: testv2i64:
; AVX: # %bb.0:		; AVX1OR2: # %bb.0:
; AVX-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX1OR2-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX1OR2-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX-NEXT: vpsrlw $4, %xmm0, %xmm3		; AVX1OR2-NEXT: vpsrlw $4, %xmm0, %xmm3
; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3		; AVX1OR2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3
; AVX-NEXT: vpxor %xmm4, %xmm4, %xmm4		; AVX1OR2-NEXT: vpxor %xmm4, %xmm4, %xmm4
; AVX-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5		; AVX1OR2-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
; AVX-NEXT: vpand %xmm5, %xmm2, %xmm2		; AVX1OR2-NEXT: vpand %xmm5, %xmm2, %xmm2
; AVX-NEXT: vpshufb %xmm3, %xmm1, %xmm1		; AVX1OR2-NEXT: vpshufb %xmm3, %xmm1, %xmm1
; AVX-NEXT: vpaddb %xmm1, %xmm2, %xmm1		; AVX1OR2-NEXT: vpaddb %xmm1, %xmm2, %xmm1
; AVX-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2		; AVX1OR2-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2
; AVX-NEXT: vpsrlw $8, %xmm2, %xmm2		; AVX1OR2-NEXT: vpsrlw $8, %xmm2, %xmm2
; AVX-NEXT: vpand %xmm2, %xmm1, %xmm2		; AVX1OR2-NEXT: vpand %xmm2, %xmm1, %xmm2
; AVX-NEXT: vpsrlw $8, %xmm1, %xmm1		; AVX1OR2-NEXT: vpsrlw $8, %xmm1, %xmm1
; AVX-NEXT: vpaddw %xmm2, %xmm1, %xmm1		; AVX1OR2-NEXT: vpaddw %xmm2, %xmm1, %xmm1
; AVX-NEXT: vpcmpeqw %xmm4, %xmm0, %xmm2		; AVX1OR2-NEXT: vpcmpeqw %xmm4, %xmm0, %xmm2
; AVX-NEXT: vpsrld $16, %xmm2, %xmm2		; AVX1OR2-NEXT: vpsrld $16, %xmm2, %xmm2
; AVX-NEXT: vpand %xmm2, %xmm1, %xmm2		; AVX1OR2-NEXT: vpand %xmm2, %xmm1, %xmm2
; AVX-NEXT: vpsrld $16, %xmm1, %xmm1		; AVX1OR2-NEXT: vpsrld $16, %xmm1, %xmm1
; AVX-NEXT: vpaddd %xmm2, %xmm1, %xmm1		; AVX1OR2-NEXT: vpaddd %xmm2, %xmm1, %xmm1
; AVX-NEXT: vpcmpeqd %xmm4, %xmm0, %xmm0		; AVX1OR2-NEXT: vpcmpeqd %xmm4, %xmm0, %xmm0
; AVX-NEXT: vpsrlq $32, %xmm0, %xmm0		; AVX1OR2-NEXT: vpsrlq $32, %xmm0, %xmm0
; AVX-NEXT: vpand %xmm0, %xmm1, %xmm0		; AVX1OR2-NEXT: vpand %xmm0, %xmm1, %xmm0
; AVX-NEXT: vpsrlq $32, %xmm1, %xmm1		; AVX1OR2-NEXT: vpsrlq $32, %xmm1, %xmm1
; AVX-NEXT: vpaddq %xmm0, %xmm1, %xmm0		; AVX1OR2-NEXT: vpaddq %xmm0, %xmm1, %xmm0
; AVX-NEXT: retq		; AVX1OR2-NEXT: retq
		;
		; AVX512VL-LABEL: testv2i64:
		; AVX512VL: # %bb.0:
		; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
		; AVX512VL-NEXT: vpshufb %xmm0, %xmm1, %xmm2
		; AVX512VL-NEXT: vpsrlw $4, %xmm0, %xmm3
		; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm3, %xmm3
		; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4
		; AVX512VL-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
		; AVX512VL-NEXT: vpand %xmm5, %xmm2, %xmm2
		; AVX512VL-NEXT: vpshufb %xmm3, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddb %xmm1, %xmm2, %xmm1
		; AVX512VL-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2
		; AVX512VL-NEXT: vpsrlw $8, %xmm2, %xmm2
		; AVX512VL-NEXT: vpand %xmm2, %xmm1, %xmm2
		; AVX512VL-NEXT: vpsrlw $8, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddw %xmm2, %xmm1, %xmm1
		; AVX512VL-NEXT: vpcmpeqw %xmm4, %xmm0, %xmm2
		; AVX512VL-NEXT: vpsrld $16, %xmm2, %xmm2
		; AVX512VL-NEXT: vpand %xmm2, %xmm1, %xmm2
		; AVX512VL-NEXT: vpsrld $16, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddd %xmm2, %xmm1, %xmm1
		; AVX512VL-NEXT: vpcmpeqd %xmm4, %xmm0, %xmm0
		; AVX512VL-NEXT: vpsrlq $32, %xmm0, %xmm0
		; AVX512VL-NEXT: vpand %xmm0, %xmm1, %xmm0
		; AVX512VL-NEXT: vpsrlq $32, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddq %xmm0, %xmm1, %xmm0
		; AVX512VL-NEXT: retq
;		;
; AVX512VLBWDQ-LABEL: testv2i64:		; AVX512VLBWDQ-LABEL: testv2i64:
; AVX512VLBWDQ: # %bb.0:		; AVX512VLBWDQ: # %bb.0:
; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm3		; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm3
; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3		; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm3, %xmm3
; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4		; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4
; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5		; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
; AVX512VLBWDQ-NEXT: vpand %xmm5, %xmm2, %xmm2		; AVX512VLBWDQ-NEXT: vpand %xmm5, %xmm2, %xmm2
; AVX512VLBWDQ-NEXT: vpshufb %xmm3, %xmm1, %xmm1		; AVX512VLBWDQ-NEXT: vpshufb %xmm3, %xmm1, %xmm1
; AVX512VLBWDQ-NEXT: vpaddb %xmm1, %xmm2, %xmm1		; AVX512VLBWDQ-NEXT: vpaddb %xmm1, %xmm2, %xmm1
; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2		; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2
; AVX512VLBWDQ-NEXT: vpsrlw $8, %xmm2, %xmm2		; AVX512VLBWDQ-NEXT: vpsrlw $8, %xmm2, %xmm2
; AVX512VLBWDQ-NEXT: vpand %xmm2, %xmm1, %xmm2		; AVX512VLBWDQ-NEXT: vpand %xmm2, %xmm1, %xmm2
▲ Show 20 Lines • Show All 200 Lines • ▼ Show 20 Lines
; SSE41-NEXT: paddd %xmm2, %xmm1		; SSE41-NEXT: paddd %xmm2, %xmm1
; SSE41-NEXT: pcmpeqd %xmm4, %xmm0		; SSE41-NEXT: pcmpeqd %xmm4, %xmm0
; SSE41-NEXT: psrlq $32, %xmm0		; SSE41-NEXT: psrlq $32, %xmm0
; SSE41-NEXT: pand %xmm1, %xmm0		; SSE41-NEXT: pand %xmm1, %xmm0
; SSE41-NEXT: psrlq $32, %xmm1		; SSE41-NEXT: psrlq $32, %xmm1
; SSE41-NEXT: paddq %xmm1, %xmm0		; SSE41-NEXT: paddq %xmm1, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: testv2i64u:		; AVX1OR2-LABEL: testv2i64u:
; AVX: # %bb.0:		; AVX1OR2: # %bb.0:
; AVX-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX1OR2-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX1OR2-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX-NEXT: vpsrlw $4, %xmm0, %xmm3		; AVX1OR2-NEXT: vpsrlw $4, %xmm0, %xmm3
; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3		; AVX1OR2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3
; AVX-NEXT: vpxor %xmm4, %xmm4, %xmm4		; AVX1OR2-NEXT: vpxor %xmm4, %xmm4, %xmm4
; AVX-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5		; AVX1OR2-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
; AVX-NEXT: vpand %xmm5, %xmm2, %xmm2		; AVX1OR2-NEXT: vpand %xmm5, %xmm2, %xmm2
; AVX-NEXT: vpshufb %xmm3, %xmm1, %xmm1		; AVX1OR2-NEXT: vpshufb %xmm3, %xmm1, %xmm1
; AVX-NEXT: vpaddb %xmm1, %xmm2, %xmm1		; AVX1OR2-NEXT: vpaddb %xmm1, %xmm2, %xmm1
; AVX-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2		; AVX1OR2-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2
; AVX-NEXT: vpsrlw $8, %xmm2, %xmm2		; AVX1OR2-NEXT: vpsrlw $8, %xmm2, %xmm2
; AVX-NEXT: vpand %xmm2, %xmm1, %xmm2		; AVX1OR2-NEXT: vpand %xmm2, %xmm1, %xmm2
; AVX-NEXT: vpsrlw $8, %xmm1, %xmm1		; AVX1OR2-NEXT: vpsrlw $8, %xmm1, %xmm1
; AVX-NEXT: vpaddw %xmm2, %xmm1, %xmm1		; AVX1OR2-NEXT: vpaddw %xmm2, %xmm1, %xmm1
; AVX-NEXT: vpcmpeqw %xmm4, %xmm0, %xmm2		; AVX1OR2-NEXT: vpcmpeqw %xmm4, %xmm0, %xmm2
; AVX-NEXT: vpsrld $16, %xmm2, %xmm2		; AVX1OR2-NEXT: vpsrld $16, %xmm2, %xmm2
; AVX-NEXT: vpand %xmm2, %xmm1, %xmm2		; AVX1OR2-NEXT: vpand %xmm2, %xmm1, %xmm2
; AVX-NEXT: vpsrld $16, %xmm1, %xmm1		; AVX1OR2-NEXT: vpsrld $16, %xmm1, %xmm1
; AVX-NEXT: vpaddd %xmm2, %xmm1, %xmm1		; AVX1OR2-NEXT: vpaddd %xmm2, %xmm1, %xmm1
; AVX-NEXT: vpcmpeqd %xmm4, %xmm0, %xmm0		; AVX1OR2-NEXT: vpcmpeqd %xmm4, %xmm0, %xmm0
; AVX-NEXT: vpsrlq $32, %xmm0, %xmm0		; AVX1OR2-NEXT: vpsrlq $32, %xmm0, %xmm0
; AVX-NEXT: vpand %xmm0, %xmm1, %xmm0		; AVX1OR2-NEXT: vpand %xmm0, %xmm1, %xmm0
; AVX-NEXT: vpsrlq $32, %xmm1, %xmm1		; AVX1OR2-NEXT: vpsrlq $32, %xmm1, %xmm1
; AVX-NEXT: vpaddq %xmm0, %xmm1, %xmm0		; AVX1OR2-NEXT: vpaddq %xmm0, %xmm1, %xmm0
; AVX-NEXT: retq		; AVX1OR2-NEXT: retq
		;
		; AVX512VL-LABEL: testv2i64u:
		; AVX512VL: # %bb.0:
		; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
		; AVX512VL-NEXT: vpshufb %xmm0, %xmm1, %xmm2
		; AVX512VL-NEXT: vpsrlw $4, %xmm0, %xmm3
		; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm3, %xmm3
		; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4
		; AVX512VL-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
		; AVX512VL-NEXT: vpand %xmm5, %xmm2, %xmm2
		; AVX512VL-NEXT: vpshufb %xmm3, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddb %xmm1, %xmm2, %xmm1
		; AVX512VL-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2
		; AVX512VL-NEXT: vpsrlw $8, %xmm2, %xmm2
		; AVX512VL-NEXT: vpand %xmm2, %xmm1, %xmm2
		; AVX512VL-NEXT: vpsrlw $8, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddw %xmm2, %xmm1, %xmm1
		; AVX512VL-NEXT: vpcmpeqw %xmm4, %xmm0, %xmm2
		; AVX512VL-NEXT: vpsrld $16, %xmm2, %xmm2
		; AVX512VL-NEXT: vpand %xmm2, %xmm1, %xmm2
		; AVX512VL-NEXT: vpsrld $16, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddd %xmm2, %xmm1, %xmm1
		; AVX512VL-NEXT: vpcmpeqd %xmm4, %xmm0, %xmm0
		; AVX512VL-NEXT: vpsrlq $32, %xmm0, %xmm0
		; AVX512VL-NEXT: vpand %xmm0, %xmm1, %xmm0
		; AVX512VL-NEXT: vpsrlq $32, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddq %xmm0, %xmm1, %xmm0
		; AVX512VL-NEXT: retq
;		;
; AVX512VLBWDQ-LABEL: testv2i64u:		; AVX512VLBWDQ-LABEL: testv2i64u:
; AVX512VLBWDQ: # %bb.0:		; AVX512VLBWDQ: # %bb.0:
; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm3		; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm3
; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3		; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm3, %xmm3
; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4		; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4
; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5		; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
; AVX512VLBWDQ-NEXT: vpand %xmm5, %xmm2, %xmm2		; AVX512VLBWDQ-NEXT: vpand %xmm5, %xmm2, %xmm2
; AVX512VLBWDQ-NEXT: vpshufb %xmm3, %xmm1, %xmm1		; AVX512VLBWDQ-NEXT: vpshufb %xmm3, %xmm1, %xmm1
; AVX512VLBWDQ-NEXT: vpaddb %xmm1, %xmm2, %xmm1		; AVX512VLBWDQ-NEXT: vpaddb %xmm1, %xmm2, %xmm1
; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2		; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2
; AVX512VLBWDQ-NEXT: vpsrlw $8, %xmm2, %xmm2		; AVX512VLBWDQ-NEXT: vpsrlw $8, %xmm2, %xmm2
; AVX512VLBWDQ-NEXT: vpand %xmm2, %xmm1, %xmm2		; AVX512VLBWDQ-NEXT: vpand %xmm2, %xmm1, %xmm2
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines
; SSE41-NEXT: paddw %xmm1, %xmm3		; SSE41-NEXT: paddw %xmm1, %xmm3
; SSE41-NEXT: pcmpeqw %xmm4, %xmm0		; SSE41-NEXT: pcmpeqw %xmm4, %xmm0
; SSE41-NEXT: psrld $16, %xmm0		; SSE41-NEXT: psrld $16, %xmm0
; SSE41-NEXT: pand %xmm3, %xmm0		; SSE41-NEXT: pand %xmm3, %xmm0
; SSE41-NEXT: psrld $16, %xmm3		; SSE41-NEXT: psrld $16, %xmm3
; SSE41-NEXT: paddd %xmm3, %xmm0		; SSE41-NEXT: paddd %xmm3, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: testv4i32:		; AVX1OR2-LABEL: testv4i32:
; AVX: # %bb.0:		; AVX1OR2: # %bb.0:
; AVX-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX1OR2-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX1OR2-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX-NEXT: vpsrlw $4, %xmm0, %xmm3		; AVX1OR2-NEXT: vpsrlw $4, %xmm0, %xmm3
; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3		; AVX1OR2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3
; AVX-NEXT: vpxor %xmm4, %xmm4, %xmm4		; AVX1OR2-NEXT: vpxor %xmm4, %xmm4, %xmm4
; AVX-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5		; AVX1OR2-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
; AVX-NEXT: vpand %xmm5, %xmm2, %xmm2		; AVX1OR2-NEXT: vpand %xmm5, %xmm2, %xmm2
; AVX-NEXT: vpshufb %xmm3, %xmm1, %xmm1		; AVX1OR2-NEXT: vpshufb %xmm3, %xmm1, %xmm1
; AVX-NEXT: vpaddb %xmm1, %xmm2, %xmm1		; AVX1OR2-NEXT: vpaddb %xmm1, %xmm2, %xmm1
; AVX-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2		; AVX1OR2-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2
; AVX-NEXT: vpsrlw $8, %xmm2, %xmm2		; AVX1OR2-NEXT: vpsrlw $8, %xmm2, %xmm2
; AVX-NEXT: vpand %xmm2, %xmm1, %xmm2		; AVX1OR2-NEXT: vpand %xmm2, %xmm1, %xmm2
; AVX-NEXT: vpsrlw $8, %xmm1, %xmm1		; AVX1OR2-NEXT: vpsrlw $8, %xmm1, %xmm1
; AVX-NEXT: vpaddw %xmm2, %xmm1, %xmm1		; AVX1OR2-NEXT: vpaddw %xmm2, %xmm1, %xmm1
; AVX-NEXT: vpcmpeqw %xmm4, %xmm0, %xmm0		; AVX1OR2-NEXT: vpcmpeqw %xmm4, %xmm0, %xmm0
; AVX-NEXT: vpsrld $16, %xmm0, %xmm0		; AVX1OR2-NEXT: vpsrld $16, %xmm0, %xmm0
; AVX-NEXT: vpand %xmm0, %xmm1, %xmm0		; AVX1OR2-NEXT: vpand %xmm0, %xmm1, %xmm0
; AVX-NEXT: vpsrld $16, %xmm1, %xmm1		; AVX1OR2-NEXT: vpsrld $16, %xmm1, %xmm1
; AVX-NEXT: vpaddd %xmm0, %xmm1, %xmm0		; AVX1OR2-NEXT: vpaddd %xmm0, %xmm1, %xmm0
; AVX-NEXT: retq		; AVX1OR2-NEXT: retq
		;
		; AVX512VL-LABEL: testv4i32:
		; AVX512VL: # %bb.0:
		; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
		; AVX512VL-NEXT: vpshufb %xmm0, %xmm1, %xmm2
		; AVX512VL-NEXT: vpsrlw $4, %xmm0, %xmm3
		; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm3, %xmm3
		; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4
		; AVX512VL-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
		; AVX512VL-NEXT: vpand %xmm5, %xmm2, %xmm2
		; AVX512VL-NEXT: vpshufb %xmm3, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddb %xmm1, %xmm2, %xmm1
		; AVX512VL-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2
		; AVX512VL-NEXT: vpsrlw $8, %xmm2, %xmm2
		; AVX512VL-NEXT: vpand %xmm2, %xmm1, %xmm2
		; AVX512VL-NEXT: vpsrlw $8, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddw %xmm2, %xmm1, %xmm1
		; AVX512VL-NEXT: vpcmpeqw %xmm4, %xmm0, %xmm0
		; AVX512VL-NEXT: vpsrld $16, %xmm0, %xmm0
		; AVX512VL-NEXT: vpand %xmm0, %xmm1, %xmm0
		; AVX512VL-NEXT: vpsrld $16, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddd %xmm0, %xmm1, %xmm0
		; AVX512VL-NEXT: retq
;		;
; AVX512VLBWDQ-LABEL: testv4i32:		; AVX512VLBWDQ-LABEL: testv4i32:
; AVX512VLBWDQ: # %bb.0:		; AVX512VLBWDQ: # %bb.0:
; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm3		; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm3
; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3		; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm3, %xmm3
; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4		; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4
; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5		; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
; AVX512VLBWDQ-NEXT: vpand %xmm5, %xmm2, %xmm2		; AVX512VLBWDQ-NEXT: vpand %xmm5, %xmm2, %xmm2
; AVX512VLBWDQ-NEXT: vpshufb %xmm3, %xmm1, %xmm1		; AVX512VLBWDQ-NEXT: vpshufb %xmm3, %xmm1, %xmm1
; AVX512VLBWDQ-NEXT: vpaddb %xmm1, %xmm2, %xmm1		; AVX512VLBWDQ-NEXT: vpaddb %xmm1, %xmm2, %xmm1
; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2		; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2
; AVX512VLBWDQ-NEXT: vpsrlw $8, %xmm2, %xmm2		; AVX512VLBWDQ-NEXT: vpsrlw $8, %xmm2, %xmm2
; AVX512VLBWDQ-NEXT: vpand %xmm2, %xmm1, %xmm2		; AVX512VLBWDQ-NEXT: vpand %xmm2, %xmm1, %xmm2
▲ Show 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
; SSE41-NEXT: paddw %xmm1, %xmm3		; SSE41-NEXT: paddw %xmm1, %xmm3
; SSE41-NEXT: pcmpeqw %xmm4, %xmm0		; SSE41-NEXT: pcmpeqw %xmm4, %xmm0
; SSE41-NEXT: psrld $16, %xmm0		; SSE41-NEXT: psrld $16, %xmm0
; SSE41-NEXT: pand %xmm3, %xmm0		; SSE41-NEXT: pand %xmm3, %xmm0
; SSE41-NEXT: psrld $16, %xmm3		; SSE41-NEXT: psrld $16, %xmm3
; SSE41-NEXT: paddd %xmm3, %xmm0		; SSE41-NEXT: paddd %xmm3, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: testv4i32u:		; AVX1OR2-LABEL: testv4i32u:
; AVX: # %bb.0:		; AVX1OR2: # %bb.0:
; AVX-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX1OR2-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX1OR2-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX-NEXT: vpsrlw $4, %xmm0, %xmm3		; AVX1OR2-NEXT: vpsrlw $4, %xmm0, %xmm3
; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3		; AVX1OR2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3
; AVX-NEXT: vpxor %xmm4, %xmm4, %xmm4		; AVX1OR2-NEXT: vpxor %xmm4, %xmm4, %xmm4
; AVX-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5		; AVX1OR2-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
; AVX-NEXT: vpand %xmm5, %xmm2, %xmm2		; AVX1OR2-NEXT: vpand %xmm5, %xmm2, %xmm2
; AVX-NEXT: vpshufb %xmm3, %xmm1, %xmm1		; AVX1OR2-NEXT: vpshufb %xmm3, %xmm1, %xmm1
; AVX-NEXT: vpaddb %xmm1, %xmm2, %xmm1		; AVX1OR2-NEXT: vpaddb %xmm1, %xmm2, %xmm1
; AVX-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2		; AVX1OR2-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2
; AVX-NEXT: vpsrlw $8, %xmm2, %xmm2		; AVX1OR2-NEXT: vpsrlw $8, %xmm2, %xmm2
; AVX-NEXT: vpand %xmm2, %xmm1, %xmm2		; AVX1OR2-NEXT: vpand %xmm2, %xmm1, %xmm2
; AVX-NEXT: vpsrlw $8, %xmm1, %xmm1		; AVX1OR2-NEXT: vpsrlw $8, %xmm1, %xmm1
; AVX-NEXT: vpaddw %xmm2, %xmm1, %xmm1		; AVX1OR2-NEXT: vpaddw %xmm2, %xmm1, %xmm1
; AVX-NEXT: vpcmpeqw %xmm4, %xmm0, %xmm0		; AVX1OR2-NEXT: vpcmpeqw %xmm4, %xmm0, %xmm0
; AVX-NEXT: vpsrld $16, %xmm0, %xmm0		; AVX1OR2-NEXT: vpsrld $16, %xmm0, %xmm0
; AVX-NEXT: vpand %xmm0, %xmm1, %xmm0		; AVX1OR2-NEXT: vpand %xmm0, %xmm1, %xmm0
; AVX-NEXT: vpsrld $16, %xmm1, %xmm1		; AVX1OR2-NEXT: vpsrld $16, %xmm1, %xmm1
; AVX-NEXT: vpaddd %xmm0, %xmm1, %xmm0		; AVX1OR2-NEXT: vpaddd %xmm0, %xmm1, %xmm0
; AVX-NEXT: retq		; AVX1OR2-NEXT: retq
		;
		; AVX512VL-LABEL: testv4i32u:
		; AVX512VL: # %bb.0:
		; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
		; AVX512VL-NEXT: vpshufb %xmm0, %xmm1, %xmm2
		; AVX512VL-NEXT: vpsrlw $4, %xmm0, %xmm3
		; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm3, %xmm3
		; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4
		; AVX512VL-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
		; AVX512VL-NEXT: vpand %xmm5, %xmm2, %xmm2
		; AVX512VL-NEXT: vpshufb %xmm3, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddb %xmm1, %xmm2, %xmm1
		; AVX512VL-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2
		; AVX512VL-NEXT: vpsrlw $8, %xmm2, %xmm2
		; AVX512VL-NEXT: vpand %xmm2, %xmm1, %xmm2
		; AVX512VL-NEXT: vpsrlw $8, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddw %xmm2, %xmm1, %xmm1
		; AVX512VL-NEXT: vpcmpeqw %xmm4, %xmm0, %xmm0
		; AVX512VL-NEXT: vpsrld $16, %xmm0, %xmm0
		; AVX512VL-NEXT: vpand %xmm0, %xmm1, %xmm0
		; AVX512VL-NEXT: vpsrld $16, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddd %xmm0, %xmm1, %xmm0
		; AVX512VL-NEXT: retq
;		;
; AVX512VLBWDQ-LABEL: testv4i32u:		; AVX512VLBWDQ-LABEL: testv4i32u:
; AVX512VLBWDQ: # %bb.0:		; AVX512VLBWDQ: # %bb.0:
; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm3		; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm3
; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3		; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm3, %xmm3
; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4		; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4
; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5		; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
; AVX512VLBWDQ-NEXT: vpand %xmm5, %xmm2, %xmm2		; AVX512VLBWDQ-NEXT: vpand %xmm5, %xmm2, %xmm2
; AVX512VLBWDQ-NEXT: vpshufb %xmm3, %xmm1, %xmm1		; AVX512VLBWDQ-NEXT: vpshufb %xmm3, %xmm1, %xmm1
; AVX512VLBWDQ-NEXT: vpaddb %xmm1, %xmm2, %xmm1		; AVX512VLBWDQ-NEXT: vpaddb %xmm1, %xmm2, %xmm1
; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2		; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2
; AVX512VLBWDQ-NEXT: vpsrlw $8, %xmm2, %xmm2		; AVX512VLBWDQ-NEXT: vpsrlw $8, %xmm2, %xmm2
; AVX512VLBWDQ-NEXT: vpand %xmm2, %xmm1, %xmm2		; AVX512VLBWDQ-NEXT: vpand %xmm2, %xmm1, %xmm2
▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
; SSE41-NEXT: paddb %xmm1, %xmm3		; SSE41-NEXT: paddb %xmm1, %xmm3
; SSE41-NEXT: pcmpeqb %xmm4, %xmm0		; SSE41-NEXT: pcmpeqb %xmm4, %xmm0
; SSE41-NEXT: psrlw $8, %xmm0		; SSE41-NEXT: psrlw $8, %xmm0
; SSE41-NEXT: pand %xmm3, %xmm0		; SSE41-NEXT: pand %xmm3, %xmm0
; SSE41-NEXT: psrlw $8, %xmm3		; SSE41-NEXT: psrlw $8, %xmm3
; SSE41-NEXT: paddw %xmm3, %xmm0		; SSE41-NEXT: paddw %xmm3, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: testv8i16:		; AVX1OR2-LABEL: testv8i16:
; AVX: # %bb.0:		; AVX1OR2: # %bb.0:
; AVX-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX1OR2-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX1OR2-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX-NEXT: vpsrlw $4, %xmm0, %xmm3		; AVX1OR2-NEXT: vpsrlw $4, %xmm0, %xmm3
; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3		; AVX1OR2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3
; AVX-NEXT: vpxor %xmm4, %xmm4, %xmm4		; AVX1OR2-NEXT: vpxor %xmm4, %xmm4, %xmm4
; AVX-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5		; AVX1OR2-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
; AVX-NEXT: vpand %xmm5, %xmm2, %xmm2		; AVX1OR2-NEXT: vpand %xmm5, %xmm2, %xmm2
; AVX-NEXT: vpshufb %xmm3, %xmm1, %xmm1		; AVX1OR2-NEXT: vpshufb %xmm3, %xmm1, %xmm1
; AVX-NEXT: vpaddb %xmm1, %xmm2, %xmm1		; AVX1OR2-NEXT: vpaddb %xmm1, %xmm2, %xmm1
; AVX-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm0		; AVX1OR2-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm0
; AVX-NEXT: vpsrlw $8, %xmm0, %xmm0		; AVX1OR2-NEXT: vpsrlw $8, %xmm0, %xmm0
; AVX-NEXT: vpand %xmm0, %xmm1, %xmm0		; AVX1OR2-NEXT: vpand %xmm0, %xmm1, %xmm0
; AVX-NEXT: vpsrlw $8, %xmm1, %xmm1		; AVX1OR2-NEXT: vpsrlw $8, %xmm1, %xmm1
; AVX-NEXT: vpaddw %xmm0, %xmm1, %xmm0		; AVX1OR2-NEXT: vpaddw %xmm0, %xmm1, %xmm0
; AVX-NEXT: retq		; AVX1OR2-NEXT: retq
		;
		; AVX512VL-LABEL: testv8i16:
		; AVX512VL: # %bb.0:
		; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
		; AVX512VL-NEXT: vpshufb %xmm0, %xmm1, %xmm2
		; AVX512VL-NEXT: vpsrlw $4, %xmm0, %xmm3
		; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm3, %xmm3
		; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4
		; AVX512VL-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
		; AVX512VL-NEXT: vpand %xmm5, %xmm2, %xmm2
		; AVX512VL-NEXT: vpshufb %xmm3, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddb %xmm1, %xmm2, %xmm1
		; AVX512VL-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm0
		; AVX512VL-NEXT: vpsrlw $8, %xmm0, %xmm0
		; AVX512VL-NEXT: vpand %xmm0, %xmm1, %xmm0
		; AVX512VL-NEXT: vpsrlw $8, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddw %xmm0, %xmm1, %xmm0
		; AVX512VL-NEXT: retq
;		;
; AVX512VLBWDQ-LABEL: testv8i16:		; AVX512VLBWDQ-LABEL: testv8i16:
; AVX512VLBWDQ: # %bb.0:		; AVX512VLBWDQ: # %bb.0:
; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm3		; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm3
; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3		; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm3, %xmm3
; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4		; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4
; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5		; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
; AVX512VLBWDQ-NEXT: vpand %xmm5, %xmm2, %xmm2		; AVX512VLBWDQ-NEXT: vpand %xmm5, %xmm2, %xmm2
; AVX512VLBWDQ-NEXT: vpshufb %xmm3, %xmm1, %xmm1		; AVX512VLBWDQ-NEXT: vpshufb %xmm3, %xmm1, %xmm1
; AVX512VLBWDQ-NEXT: vpaddb %xmm1, %xmm2, %xmm1		; AVX512VLBWDQ-NEXT: vpaddb %xmm1, %xmm2, %xmm1
; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm0		; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm0
; AVX512VLBWDQ-NEXT: vpsrlw $8, %xmm0, %xmm0		; AVX512VLBWDQ-NEXT: vpsrlw $8, %xmm0, %xmm0
; AVX512VLBWDQ-NEXT: vpand %xmm0, %xmm1, %xmm0		; AVX512VLBWDQ-NEXT: vpand %xmm0, %xmm1, %xmm0
▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
; SSE41-NEXT: paddb %xmm1, %xmm3		; SSE41-NEXT: paddb %xmm1, %xmm3
; SSE41-NEXT: pcmpeqb %xmm4, %xmm0		; SSE41-NEXT: pcmpeqb %xmm4, %xmm0
; SSE41-NEXT: psrlw $8, %xmm0		; SSE41-NEXT: psrlw $8, %xmm0
; SSE41-NEXT: pand %xmm3, %xmm0		; SSE41-NEXT: pand %xmm3, %xmm0
; SSE41-NEXT: psrlw $8, %xmm3		; SSE41-NEXT: psrlw $8, %xmm3
; SSE41-NEXT: paddw %xmm3, %xmm0		; SSE41-NEXT: paddw %xmm3, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: testv8i16u:		; AVX1OR2-LABEL: testv8i16u:
; AVX: # %bb.0:		; AVX1OR2: # %bb.0:
; AVX-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX1OR2-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX1OR2-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX-NEXT: vpsrlw $4, %xmm0, %xmm3		; AVX1OR2-NEXT: vpsrlw $4, %xmm0, %xmm3
; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3		; AVX1OR2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3
; AVX-NEXT: vpxor %xmm4, %xmm4, %xmm4		; AVX1OR2-NEXT: vpxor %xmm4, %xmm4, %xmm4
; AVX-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5		; AVX1OR2-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
; AVX-NEXT: vpand %xmm5, %xmm2, %xmm2		; AVX1OR2-NEXT: vpand %xmm5, %xmm2, %xmm2
; AVX-NEXT: vpshufb %xmm3, %xmm1, %xmm1		; AVX1OR2-NEXT: vpshufb %xmm3, %xmm1, %xmm1
; AVX-NEXT: vpaddb %xmm1, %xmm2, %xmm1		; AVX1OR2-NEXT: vpaddb %xmm1, %xmm2, %xmm1
; AVX-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm0		; AVX1OR2-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm0
; AVX-NEXT: vpsrlw $8, %xmm0, %xmm0		; AVX1OR2-NEXT: vpsrlw $8, %xmm0, %xmm0
; AVX-NEXT: vpand %xmm0, %xmm1, %xmm0		; AVX1OR2-NEXT: vpand %xmm0, %xmm1, %xmm0
; AVX-NEXT: vpsrlw $8, %xmm1, %xmm1		; AVX1OR2-NEXT: vpsrlw $8, %xmm1, %xmm1
; AVX-NEXT: vpaddw %xmm0, %xmm1, %xmm0		; AVX1OR2-NEXT: vpaddw %xmm0, %xmm1, %xmm0
; AVX-NEXT: retq		; AVX1OR2-NEXT: retq
		;
		; AVX512VL-LABEL: testv8i16u:
		; AVX512VL: # %bb.0:
		; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
		; AVX512VL-NEXT: vpshufb %xmm0, %xmm1, %xmm2
		; AVX512VL-NEXT: vpsrlw $4, %xmm0, %xmm3
		; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm3, %xmm3
		; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4
		; AVX512VL-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
		; AVX512VL-NEXT: vpand %xmm5, %xmm2, %xmm2
		; AVX512VL-NEXT: vpshufb %xmm3, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddb %xmm1, %xmm2, %xmm1
		; AVX512VL-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm0
		; AVX512VL-NEXT: vpsrlw $8, %xmm0, %xmm0
		; AVX512VL-NEXT: vpand %xmm0, %xmm1, %xmm0
		; AVX512VL-NEXT: vpsrlw $8, %xmm1, %xmm1
		; AVX512VL-NEXT: vpaddw %xmm0, %xmm1, %xmm0
		; AVX512VL-NEXT: retq
;		;
; AVX512VLBWDQ-LABEL: testv8i16u:		; AVX512VLBWDQ-LABEL: testv8i16u:
; AVX512VLBWDQ: # %bb.0:		; AVX512VLBWDQ: # %bb.0:
; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm3		; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm3
; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3, %xmm3		; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm3, %xmm3
; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4		; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4
; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5		; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm3, %xmm5
; AVX512VLBWDQ-NEXT: vpand %xmm5, %xmm2, %xmm2		; AVX512VLBWDQ-NEXT: vpand %xmm5, %xmm2, %xmm2
; AVX512VLBWDQ-NEXT: vpshufb %xmm3, %xmm1, %xmm1		; AVX512VLBWDQ-NEXT: vpshufb %xmm3, %xmm1, %xmm1
; AVX512VLBWDQ-NEXT: vpaddb %xmm1, %xmm2, %xmm1		; AVX512VLBWDQ-NEXT: vpaddb %xmm1, %xmm2, %xmm1
; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm0		; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm0
; AVX512VLBWDQ-NEXT: vpsrlw $8, %xmm0, %xmm0		; AVX512VLBWDQ-NEXT: vpsrlw $8, %xmm0, %xmm0
; AVX512VLBWDQ-NEXT: vpand %xmm0, %xmm1, %xmm0		; AVX512VLBWDQ-NEXT: vpand %xmm0, %xmm1, %xmm0
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
; SSE41-NEXT: pxor %xmm3, %xmm3		; SSE41-NEXT: pxor %xmm3, %xmm3
; SSE41-NEXT: pcmpeqb %xmm0, %xmm3		; SSE41-NEXT: pcmpeqb %xmm0, %xmm3
; SSE41-NEXT: pand %xmm2, %xmm3		; SSE41-NEXT: pand %xmm2, %xmm3
; SSE41-NEXT: pshufb %xmm0, %xmm1		; SSE41-NEXT: pshufb %xmm0, %xmm1
; SSE41-NEXT: paddb %xmm3, %xmm1		; SSE41-NEXT: paddb %xmm3, %xmm1
; SSE41-NEXT: movdqa %xmm1, %xmm0		; SSE41-NEXT: movdqa %xmm1, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: testv16i8:		; AVX1OR2-LABEL: testv16i8:
; AVX: # %bb.0:		; AVX1OR2: # %bb.0:
; AVX-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX1OR2-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX1OR2-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX-NEXT: vpsrlw $4, %xmm0, %xmm0		; AVX1OR2-NEXT: vpsrlw $4, %xmm0, %xmm0
; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; AVX1OR2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; AVX-NEXT: vpxor %xmm3, %xmm3, %xmm3		; AVX1OR2-NEXT: vpxor %xmm3, %xmm3, %xmm3
; AVX-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm3		; AVX1OR2-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm3
; AVX-NEXT: vpand %xmm3, %xmm2, %xmm2		; AVX1OR2-NEXT: vpand %xmm3, %xmm2, %xmm2
; AVX-NEXT: vpshufb %xmm0, %xmm1, %xmm0		; AVX1OR2-NEXT: vpshufb %xmm0, %xmm1, %xmm0
; AVX-NEXT: vpaddb %xmm0, %xmm2, %xmm0		; AVX1OR2-NEXT: vpaddb %xmm0, %xmm2, %xmm0
; AVX-NEXT: retq		; AVX1OR2-NEXT: retq
		;
		; AVX512VL-LABEL: testv16i8:
		; AVX512VL: # %bb.0:
		; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
		; AVX512VL-NEXT: vpshufb %xmm0, %xmm1, %xmm2
		; AVX512VL-NEXT: vpsrlw $4, %xmm0, %xmm0
		; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
		; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3
		; AVX512VL-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm3
		; AVX512VL-NEXT: vpand %xmm3, %xmm2, %xmm2
		; AVX512VL-NEXT: vpshufb %xmm0, %xmm1, %xmm0
		; AVX512VL-NEXT: vpaddb %xmm0, %xmm2, %xmm0
		; AVX512VL-NEXT: retq
;		;
; AVX512VLBWDQ-LABEL: testv16i8:		; AVX512VLBWDQ-LABEL: testv16i8:
; AVX512VLBWDQ: # %bb.0:		; AVX512VLBWDQ: # %bb.0:
; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm0		; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm0
; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
; AVX512VLBWDQ-NEXT: vpxor %xmm3, %xmm3, %xmm3		; AVX512VLBWDQ-NEXT: vpxor %xmm3, %xmm3, %xmm3
; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm3		; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm3
; AVX512VLBWDQ-NEXT: vpand %xmm3, %xmm2, %xmm2		; AVX512VLBWDQ-NEXT: vpand %xmm3, %xmm2, %xmm2
; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm0		; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm0
; AVX512VLBWDQ-NEXT: vpaddb %xmm0, %xmm2, %xmm0		; AVX512VLBWDQ-NEXT: vpaddb %xmm0, %xmm2, %xmm0
; AVX512VLBWDQ-NEXT: retq		; AVX512VLBWDQ-NEXT: retq
;		;
; AVX512-LABEL: testv16i8:		; AVX512-LABEL: testv16i8:
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
; SSE41-NEXT: pxor %xmm3, %xmm3		; SSE41-NEXT: pxor %xmm3, %xmm3
; SSE41-NEXT: pcmpeqb %xmm0, %xmm3		; SSE41-NEXT: pcmpeqb %xmm0, %xmm3
; SSE41-NEXT: pand %xmm2, %xmm3		; SSE41-NEXT: pand %xmm2, %xmm3
; SSE41-NEXT: pshufb %xmm0, %xmm1		; SSE41-NEXT: pshufb %xmm0, %xmm1
; SSE41-NEXT: paddb %xmm3, %xmm1		; SSE41-NEXT: paddb %xmm3, %xmm1
; SSE41-NEXT: movdqa %xmm1, %xmm0		; SSE41-NEXT: movdqa %xmm1, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: testv16i8u:		; AVX1OR2-LABEL: testv16i8u:
; AVX: # %bb.0:		; AVX1OR2: # %bb.0:
; AVX-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX1OR2-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX1OR2-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX-NEXT: vpsrlw $4, %xmm0, %xmm0		; AVX1OR2-NEXT: vpsrlw $4, %xmm0, %xmm0
; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; AVX1OR2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; AVX-NEXT: vpxor %xmm3, %xmm3, %xmm3		; AVX1OR2-NEXT: vpxor %xmm3, %xmm3, %xmm3
; AVX-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm3		; AVX1OR2-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm3
; AVX-NEXT: vpand %xmm3, %xmm2, %xmm2		; AVX1OR2-NEXT: vpand %xmm3, %xmm2, %xmm2
; AVX-NEXT: vpshufb %xmm0, %xmm1, %xmm0		; AVX1OR2-NEXT: vpshufb %xmm0, %xmm1, %xmm0
; AVX-NEXT: vpaddb %xmm0, %xmm2, %xmm0		; AVX1OR2-NEXT: vpaddb %xmm0, %xmm2, %xmm0
; AVX-NEXT: retq		; AVX1OR2-NEXT: retq
		;
		; AVX512VL-LABEL: testv16i8u:
		; AVX512VL: # %bb.0:
		; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
		; AVX512VL-NEXT: vpshufb %xmm0, %xmm1, %xmm2
		; AVX512VL-NEXT: vpsrlw $4, %xmm0, %xmm0
		; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
		; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3
		; AVX512VL-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm3
		; AVX512VL-NEXT: vpand %xmm3, %xmm2, %xmm2
		; AVX512VL-NEXT: vpshufb %xmm0, %xmm1, %xmm0
		; AVX512VL-NEXT: vpaddb %xmm0, %xmm2, %xmm0
		; AVX512VL-NEXT: retq
;		;
; AVX512VLBWDQ-LABEL: testv16i8u:		; AVX512VLBWDQ-LABEL: testv16i8u:
; AVX512VLBWDQ: # %bb.0:		; AVX512VLBWDQ: # %bb.0:
; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]		; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} xmm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2		; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm2
; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm0		; AVX512VLBWDQ-NEXT: vpsrlw $4, %xmm0, %xmm0
; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
; AVX512VLBWDQ-NEXT: vpxor %xmm3, %xmm3, %xmm3		; AVX512VLBWDQ-NEXT: vpxor %xmm3, %xmm3, %xmm3
; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm3		; AVX512VLBWDQ-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm3
; AVX512VLBWDQ-NEXT: vpand %xmm3, %xmm2, %xmm2		; AVX512VLBWDQ-NEXT: vpand %xmm3, %xmm2, %xmm2
; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm0		; AVX512VLBWDQ-NEXT: vpshufb %xmm0, %xmm1, %xmm0
; AVX512VLBWDQ-NEXT: vpaddb %xmm0, %xmm2, %xmm0		; AVX512VLBWDQ-NEXT: vpaddb %xmm0, %xmm2, %xmm0
; AVX512VLBWDQ-NEXT: retq		; AVX512VLBWDQ-NEXT: retq
;		;
; AVX512-LABEL: testv16i8u:		; AVX512-LABEL: testv16i8u:
▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	; X86-SSE-NEXT: retl
%out = call <16 x i8> @llvm.ctlz.v16i8(<16 x i8> <i8 256, i8 -1, i8 0, i8 255, i8 -65536, i8 7, i8 24, i8 88, i8 -2, i8 254, i8 1, i8 2, i8 4, i8 8, i8 16, i8 32>, i1 -1)		%out = call <16 x i8> @llvm.ctlz.v16i8(<16 x i8> <i8 256, i8 -1, i8 0, i8 255, i8 -65536, i8 7, i8 24, i8 88, i8 -2, i8 254, i8 1, i8 2, i8 4, i8 8, i8 16, i8 32>, i1 -1)
ret <16 x i8> %out		ret <16 x i8> %out
}		}

declare <2 x i64> @llvm.ctlz.v2i64(<2 x i64>, i1)		declare <2 x i64> @llvm.ctlz.v2i64(<2 x i64>, i1)
declare <4 x i32> @llvm.ctlz.v4i32(<4 x i32>, i1)		declare <4 x i32> @llvm.ctlz.v4i32(<4 x i32>, i1)
declare <8 x i16> @llvm.ctlz.v8i16(<8 x i16>, i1)		declare <8 x i16> @llvm.ctlz.v8i16(<8 x i16>, i1)
declare <16 x i8> @llvm.ctlz.v16i8(<16 x i8>, i1)		declare <16 x i8> @llvm.ctlz.v16i8(<16 x i8>, i1)
		;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
		; AVX: {{.*}}

llvm/test/CodeGen/X86/vector-lzcnt-256.ll

	Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpaddq %ymm0, %ymm1, %ymm0			; AVX2-NEXT: vpaddq %ymm0, %ymm1, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512VL-LABEL: testv4i64:			; AVX512VL-LABEL: testv4i64:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm3			; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm3
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm3, %ymm3			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm3, %ymm3
	; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5			; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5
	; AVX512VL-NEXT: vpand %ymm5, %ymm2, %ymm2			; AVX512VL-NEXT: vpand %ymm5, %ymm2, %ymm2
	; AVX512VL-NEXT: vpshufb %ymm3, %ymm1, %ymm1			; AVX512VL-NEXT: vpshufb %ymm3, %ymm1, %ymm1
	; AVX512VL-NEXT: vpaddb %ymm1, %ymm2, %ymm1			; AVX512VL-NEXT: vpaddb %ymm1, %ymm2, %ymm1
	; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2			; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2
	; AVX512VL-NEXT: vpsrlw $8, %ymm2, %ymm2			; AVX512VL-NEXT: vpsrlw $8, %ymm2, %ymm2
	; AVX512VL-NEXT: vpand %ymm2, %ymm1, %ymm2			; AVX512VL-NEXT: vpand %ymm2, %ymm1, %ymm2
	Show All 11 Lines
	; AVX512VL-NEXT: vpaddq %ymm0, %ymm1, %ymm0			; AVX512VL-NEXT: vpaddq %ymm0, %ymm1, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512VLBWDQ-LABEL: testv4i64:			; AVX512VLBWDQ-LABEL: testv4i64:
	; AVX512VLBWDQ: # %bb.0:			; AVX512VLBWDQ: # %bb.0:
	; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm3			; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm3
	; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm3, %ymm3			; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm3, %ymm3
	; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5			; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5
	; AVX512VLBWDQ-NEXT: vpand %ymm5, %ymm2, %ymm2			; AVX512VLBWDQ-NEXT: vpand %ymm5, %ymm2, %ymm2
	; AVX512VLBWDQ-NEXT: vpshufb %ymm3, %ymm1, %ymm1			; AVX512VLBWDQ-NEXT: vpshufb %ymm3, %ymm1, %ymm1
	; AVX512VLBWDQ-NEXT: vpaddb %ymm1, %ymm2, %ymm1			; AVX512VLBWDQ-NEXT: vpaddb %ymm1, %ymm2, %ymm1
	; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2			; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2
	; AVX512VLBWDQ-NEXT: vpsrlw $8, %ymm2, %ymm2			; AVX512VLBWDQ-NEXT: vpsrlw $8, %ymm2, %ymm2
	; AVX512VLBWDQ-NEXT: vpand %ymm2, %ymm1, %ymm2			; AVX512VLBWDQ-NEXT: vpand %ymm2, %ymm1, %ymm2
	▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpaddq %ymm0, %ymm1, %ymm0			; AVX2-NEXT: vpaddq %ymm0, %ymm1, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512VL-LABEL: testv4i64u:			; AVX512VL-LABEL: testv4i64u:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm3			; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm3
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm3, %ymm3			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm3, %ymm3
	; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5			; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5
	; AVX512VL-NEXT: vpand %ymm5, %ymm2, %ymm2			; AVX512VL-NEXT: vpand %ymm5, %ymm2, %ymm2
	; AVX512VL-NEXT: vpshufb %ymm3, %ymm1, %ymm1			; AVX512VL-NEXT: vpshufb %ymm3, %ymm1, %ymm1
	; AVX512VL-NEXT: vpaddb %ymm1, %ymm2, %ymm1			; AVX512VL-NEXT: vpaddb %ymm1, %ymm2, %ymm1
	; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2			; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2
	; AVX512VL-NEXT: vpsrlw $8, %ymm2, %ymm2			; AVX512VL-NEXT: vpsrlw $8, %ymm2, %ymm2
	; AVX512VL-NEXT: vpand %ymm2, %ymm1, %ymm2			; AVX512VL-NEXT: vpand %ymm2, %ymm1, %ymm2
	Show All 11 Lines
	; AVX512VL-NEXT: vpaddq %ymm0, %ymm1, %ymm0			; AVX512VL-NEXT: vpaddq %ymm0, %ymm1, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512VLBWDQ-LABEL: testv4i64u:			; AVX512VLBWDQ-LABEL: testv4i64u:
	; AVX512VLBWDQ: # %bb.0:			; AVX512VLBWDQ: # %bb.0:
	; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm3			; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm3
	; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm3, %ymm3			; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm3, %ymm3
	; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5			; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5
	; AVX512VLBWDQ-NEXT: vpand %ymm5, %ymm2, %ymm2			; AVX512VLBWDQ-NEXT: vpand %ymm5, %ymm2, %ymm2
	; AVX512VLBWDQ-NEXT: vpshufb %ymm3, %ymm1, %ymm1			; AVX512VLBWDQ-NEXT: vpshufb %ymm3, %ymm1, %ymm1
	; AVX512VLBWDQ-NEXT: vpaddb %ymm1, %ymm2, %ymm1			; AVX512VLBWDQ-NEXT: vpaddb %ymm1, %ymm2, %ymm1
	; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2			; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2
	; AVX512VLBWDQ-NEXT: vpsrlw $8, %ymm2, %ymm2			; AVX512VLBWDQ-NEXT: vpsrlw $8, %ymm2, %ymm2
	; AVX512VLBWDQ-NEXT: vpand %ymm2, %ymm1, %ymm2			; AVX512VLBWDQ-NEXT: vpand %ymm2, %ymm1, %ymm2
	▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpaddd %ymm0, %ymm1, %ymm0			; AVX2-NEXT: vpaddd %ymm0, %ymm1, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512VL-LABEL: testv8i32:			; AVX512VL-LABEL: testv8i32:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm3			; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm3
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm3, %ymm3			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm3, %ymm3
	; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5			; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5
	; AVX512VL-NEXT: vpand %ymm5, %ymm2, %ymm2			; AVX512VL-NEXT: vpand %ymm5, %ymm2, %ymm2
	; AVX512VL-NEXT: vpshufb %ymm3, %ymm1, %ymm1			; AVX512VL-NEXT: vpshufb %ymm3, %ymm1, %ymm1
	; AVX512VL-NEXT: vpaddb %ymm1, %ymm2, %ymm1			; AVX512VL-NEXT: vpaddb %ymm1, %ymm2, %ymm1
	; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2			; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2
	; AVX512VL-NEXT: vpsrlw $8, %ymm2, %ymm2			; AVX512VL-NEXT: vpsrlw $8, %ymm2, %ymm2
	; AVX512VL-NEXT: vpand %ymm2, %ymm1, %ymm2			; AVX512VL-NEXT: vpand %ymm2, %ymm1, %ymm2
	; AVX512VL-NEXT: vpsrlw $8, %ymm1, %ymm1			; AVX512VL-NEXT: vpsrlw $8, %ymm1, %ymm1
	; AVX512VL-NEXT: vpaddw %ymm2, %ymm1, %ymm1			; AVX512VL-NEXT: vpaddw %ymm2, %ymm1, %ymm1
	; AVX512VL-NEXT: vpcmpeqw %ymm4, %ymm0, %ymm0			; AVX512VL-NEXT: vpcmpeqw %ymm4, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsrld $16, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrld $16, %ymm0, %ymm0
	; AVX512VL-NEXT: vpand %ymm0, %ymm1, %ymm0			; AVX512VL-NEXT: vpand %ymm0, %ymm1, %ymm0
	; AVX512VL-NEXT: vpsrld $16, %ymm1, %ymm1			; AVX512VL-NEXT: vpsrld $16, %ymm1, %ymm1
	; AVX512VL-NEXT: vpaddd %ymm0, %ymm1, %ymm0			; AVX512VL-NEXT: vpaddd %ymm0, %ymm1, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512VLBWDQ-LABEL: testv8i32:			; AVX512VLBWDQ-LABEL: testv8i32:
	; AVX512VLBWDQ: # %bb.0:			; AVX512VLBWDQ: # %bb.0:
	; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm3			; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm3
	; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm3, %ymm3			; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm3, %ymm3
	; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5			; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5
	; AVX512VLBWDQ-NEXT: vpand %ymm5, %ymm2, %ymm2			; AVX512VLBWDQ-NEXT: vpand %ymm5, %ymm2, %ymm2
	; AVX512VLBWDQ-NEXT: vpshufb %ymm3, %ymm1, %ymm1			; AVX512VLBWDQ-NEXT: vpshufb %ymm3, %ymm1, %ymm1
	; AVX512VLBWDQ-NEXT: vpaddb %ymm1, %ymm2, %ymm1			; AVX512VLBWDQ-NEXT: vpaddb %ymm1, %ymm2, %ymm1
	; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2			; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2
	; AVX512VLBWDQ-NEXT: vpsrlw $8, %ymm2, %ymm2			; AVX512VLBWDQ-NEXT: vpsrlw $8, %ymm2, %ymm2
	; AVX512VLBWDQ-NEXT: vpand %ymm2, %ymm1, %ymm2			; AVX512VLBWDQ-NEXT: vpand %ymm2, %ymm1, %ymm2
	▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpaddd %ymm0, %ymm1, %ymm0			; AVX2-NEXT: vpaddd %ymm0, %ymm1, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512VL-LABEL: testv8i32u:			; AVX512VL-LABEL: testv8i32u:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm3			; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm3
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm3, %ymm3			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm3, %ymm3
	; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5			; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5
	; AVX512VL-NEXT: vpand %ymm5, %ymm2, %ymm2			; AVX512VL-NEXT: vpand %ymm5, %ymm2, %ymm2
	; AVX512VL-NEXT: vpshufb %ymm3, %ymm1, %ymm1			; AVX512VL-NEXT: vpshufb %ymm3, %ymm1, %ymm1
	; AVX512VL-NEXT: vpaddb %ymm1, %ymm2, %ymm1			; AVX512VL-NEXT: vpaddb %ymm1, %ymm2, %ymm1
	; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2			; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2
	; AVX512VL-NEXT: vpsrlw $8, %ymm2, %ymm2			; AVX512VL-NEXT: vpsrlw $8, %ymm2, %ymm2
	; AVX512VL-NEXT: vpand %ymm2, %ymm1, %ymm2			; AVX512VL-NEXT: vpand %ymm2, %ymm1, %ymm2
	; AVX512VL-NEXT: vpsrlw $8, %ymm1, %ymm1			; AVX512VL-NEXT: vpsrlw $8, %ymm1, %ymm1
	; AVX512VL-NEXT: vpaddw %ymm2, %ymm1, %ymm1			; AVX512VL-NEXT: vpaddw %ymm2, %ymm1, %ymm1
	; AVX512VL-NEXT: vpcmpeqw %ymm4, %ymm0, %ymm0			; AVX512VL-NEXT: vpcmpeqw %ymm4, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsrld $16, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrld $16, %ymm0, %ymm0
	; AVX512VL-NEXT: vpand %ymm0, %ymm1, %ymm0			; AVX512VL-NEXT: vpand %ymm0, %ymm1, %ymm0
	; AVX512VL-NEXT: vpsrld $16, %ymm1, %ymm1			; AVX512VL-NEXT: vpsrld $16, %ymm1, %ymm1
	; AVX512VL-NEXT: vpaddd %ymm0, %ymm1, %ymm0			; AVX512VL-NEXT: vpaddd %ymm0, %ymm1, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512VLBWDQ-LABEL: testv8i32u:			; AVX512VLBWDQ-LABEL: testv8i32u:
	; AVX512VLBWDQ: # %bb.0:			; AVX512VLBWDQ: # %bb.0:
	; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm3			; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm3
	; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm3, %ymm3			; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm3, %ymm3
	; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5			; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5
	; AVX512VLBWDQ-NEXT: vpand %ymm5, %ymm2, %ymm2			; AVX512VLBWDQ-NEXT: vpand %ymm5, %ymm2, %ymm2
	; AVX512VLBWDQ-NEXT: vpshufb %ymm3, %ymm1, %ymm1			; AVX512VLBWDQ-NEXT: vpshufb %ymm3, %ymm1, %ymm1
	; AVX512VLBWDQ-NEXT: vpaddb %ymm1, %ymm2, %ymm1			; AVX512VLBWDQ-NEXT: vpaddb %ymm1, %ymm2, %ymm1
	; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2			; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm2
	; AVX512VLBWDQ-NEXT: vpsrlw $8, %ymm2, %ymm2			; AVX512VLBWDQ-NEXT: vpsrlw $8, %ymm2, %ymm2
	; AVX512VLBWDQ-NEXT: vpand %ymm2, %ymm1, %ymm2			; AVX512VLBWDQ-NEXT: vpand %ymm2, %ymm1, %ymm2
	▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpaddw %ymm0, %ymm1, %ymm0			; AVX2-NEXT: vpaddw %ymm0, %ymm1, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512VL-LABEL: testv16i16:			; AVX512VL-LABEL: testv16i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm3			; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm3
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm3, %ymm3			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm3, %ymm3
	; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5			; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5
	; AVX512VL-NEXT: vpand %ymm5, %ymm2, %ymm2			; AVX512VL-NEXT: vpand %ymm5, %ymm2, %ymm2
	; AVX512VL-NEXT: vpshufb %ymm3, %ymm1, %ymm1			; AVX512VL-NEXT: vpshufb %ymm3, %ymm1, %ymm1
	; AVX512VL-NEXT: vpaddb %ymm1, %ymm2, %ymm1			; AVX512VL-NEXT: vpaddb %ymm1, %ymm2, %ymm1
	; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm0			; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsrlw $8, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrlw $8, %ymm0, %ymm0
	; AVX512VL-NEXT: vpand %ymm0, %ymm1, %ymm0			; AVX512VL-NEXT: vpand %ymm0, %ymm1, %ymm0
	; AVX512VL-NEXT: vpsrlw $8, %ymm1, %ymm1			; AVX512VL-NEXT: vpsrlw $8, %ymm1, %ymm1
	; AVX512VL-NEXT: vpaddw %ymm0, %ymm1, %ymm0			; AVX512VL-NEXT: vpaddw %ymm0, %ymm1, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512VLBWDQ-LABEL: testv16i16:			; AVX512VLBWDQ-LABEL: testv16i16:
	; AVX512VLBWDQ: # %bb.0:			; AVX512VLBWDQ: # %bb.0:
	; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm3			; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm3
	; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm3, %ymm3			; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm3, %ymm3
	; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5			; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5
	; AVX512VLBWDQ-NEXT: vpand %ymm5, %ymm2, %ymm2			; AVX512VLBWDQ-NEXT: vpand %ymm5, %ymm2, %ymm2
	; AVX512VLBWDQ-NEXT: vpshufb %ymm3, %ymm1, %ymm1			; AVX512VLBWDQ-NEXT: vpshufb %ymm3, %ymm1, %ymm1
	; AVX512VLBWDQ-NEXT: vpaddb %ymm1, %ymm2, %ymm1			; AVX512VLBWDQ-NEXT: vpaddb %ymm1, %ymm2, %ymm1
	; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm0			; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm0
	; AVX512VLBWDQ-NEXT: vpsrlw $8, %ymm0, %ymm0			; AVX512VLBWDQ-NEXT: vpsrlw $8, %ymm0, %ymm0
	; AVX512VLBWDQ-NEXT: vpand %ymm0, %ymm1, %ymm0			; AVX512VLBWDQ-NEXT: vpand %ymm0, %ymm1, %ymm0
	▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpaddw %ymm0, %ymm1, %ymm0			; AVX2-NEXT: vpaddw %ymm0, %ymm1, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512VL-LABEL: testv16i16u:			; AVX512VL-LABEL: testv16i16u:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm3			; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm3
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm3, %ymm3			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm3, %ymm3
	; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VL-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5			; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5
	; AVX512VL-NEXT: vpand %ymm5, %ymm2, %ymm2			; AVX512VL-NEXT: vpand %ymm5, %ymm2, %ymm2
	; AVX512VL-NEXT: vpshufb %ymm3, %ymm1, %ymm1			; AVX512VL-NEXT: vpshufb %ymm3, %ymm1, %ymm1
	; AVX512VL-NEXT: vpaddb %ymm1, %ymm2, %ymm1			; AVX512VL-NEXT: vpaddb %ymm1, %ymm2, %ymm1
	; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm0			; AVX512VL-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsrlw $8, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrlw $8, %ymm0, %ymm0
	; AVX512VL-NEXT: vpand %ymm0, %ymm1, %ymm0			; AVX512VL-NEXT: vpand %ymm0, %ymm1, %ymm0
	; AVX512VL-NEXT: vpsrlw $8, %ymm1, %ymm1			; AVX512VL-NEXT: vpsrlw $8, %ymm1, %ymm1
	; AVX512VL-NEXT: vpaddw %ymm0, %ymm1, %ymm0			; AVX512VL-NEXT: vpaddw %ymm0, %ymm1, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512VLBWDQ-LABEL: testv16i16u:			; AVX512VLBWDQ-LABEL: testv16i16u:
	; AVX512VLBWDQ: # %bb.0:			; AVX512VLBWDQ: # %bb.0:
	; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm3			; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm3
	; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm3, %ymm3			; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm3, %ymm3
	; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4			; AVX512VLBWDQ-NEXT: vpxor %xmm4, %xmm4, %xmm4
	; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5			; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm3, %ymm5
	; AVX512VLBWDQ-NEXT: vpand %ymm5, %ymm2, %ymm2			; AVX512VLBWDQ-NEXT: vpand %ymm5, %ymm2, %ymm2
	; AVX512VLBWDQ-NEXT: vpshufb %ymm3, %ymm1, %ymm1			; AVX512VLBWDQ-NEXT: vpshufb %ymm3, %ymm1, %ymm1
	; AVX512VLBWDQ-NEXT: vpaddb %ymm1, %ymm2, %ymm1			; AVX512VLBWDQ-NEXT: vpaddb %ymm1, %ymm2, %ymm1
	; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm0			; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm4, %ymm0, %ymm0
	; AVX512VLBWDQ-NEXT: vpsrlw $8, %ymm0, %ymm0			; AVX512VLBWDQ-NEXT: vpsrlw $8, %ymm0, %ymm0
	; AVX512VLBWDQ-NEXT: vpand %ymm0, %ymm1, %ymm0			; AVX512VLBWDQ-NEXT: vpand %ymm0, %ymm1, %ymm0
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpaddb %ymm0, %ymm2, %ymm0			; AVX2-NEXT: vpaddb %ymm0, %ymm2, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512VL-LABEL: testv32i8:			; AVX512VL-LABEL: testv32i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512VL-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm3			; AVX512VL-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm3
	; AVX512VL-NEXT: vpand %ymm3, %ymm2, %ymm2			; AVX512VL-NEXT: vpand %ymm3, %ymm2, %ymm2
	; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm0			; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm0
	; AVX512VL-NEXT: vpaddb %ymm0, %ymm2, %ymm0			; AVX512VL-NEXT: vpaddb %ymm0, %ymm2, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512VLBWDQ-LABEL: testv32i8:			; AVX512VLBWDQ-LABEL: testv32i8:
	; AVX512VLBWDQ: # %bb.0:			; AVX512VLBWDQ: # %bb.0:
	; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm0			; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm0
	; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX512VLBWDQ-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512VLBWDQ-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm3			; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm3
	; AVX512VLBWDQ-NEXT: vpand %ymm3, %ymm2, %ymm2			; AVX512VLBWDQ-NEXT: vpand %ymm3, %ymm2, %ymm2
	; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm0			; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm0
	; AVX512VLBWDQ-NEXT: vpaddb %ymm0, %ymm2, %ymm0			; AVX512VLBWDQ-NEXT: vpaddb %ymm0, %ymm2, %ymm0
	; AVX512VLBWDQ-NEXT: retq			; AVX512VLBWDQ-NEXT: retq
	;			;
	; AVX512-LABEL: testv32i8:			; AVX512-LABEL: testv32i8:
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpaddb %ymm0, %ymm2, %ymm0			; AVX2-NEXT: vpaddb %ymm0, %ymm2, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512VL-LABEL: testv32i8u:			; AVX512VL-LABEL: testv32i8u:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512VL-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm3			; AVX512VL-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm3
	; AVX512VL-NEXT: vpand %ymm3, %ymm2, %ymm2			; AVX512VL-NEXT: vpand %ymm3, %ymm2, %ymm2
	; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm0			; AVX512VL-NEXT: vpshufb %ymm0, %ymm1, %ymm0
	; AVX512VL-NEXT: vpaddb %ymm0, %ymm2, %ymm0			; AVX512VL-NEXT: vpaddb %ymm0, %ymm2, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512VLBWDQ-LABEL: testv32i8u:			; AVX512VLBWDQ-LABEL: testv32i8u:
	; AVX512VLBWDQ: # %bb.0:			; AVX512VLBWDQ: # %bb.0:
	; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512VLBWDQ-NEXT: vmovdqa {{.*#+}} ymm1 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2			; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm2
	; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm0			; AVX512VLBWDQ-NEXT: vpsrlw $4, %ymm0, %ymm0
	; AVX512VLBWDQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512VLBWDQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX512VLBWDQ-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512VLBWDQ-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm3			; AVX512VLBWDQ-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm3
	; AVX512VLBWDQ-NEXT: vpand %ymm3, %ymm2, %ymm2			; AVX512VLBWDQ-NEXT: vpand %ymm3, %ymm2, %ymm2
	; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm0			; AVX512VLBWDQ-NEXT: vpshufb %ymm0, %ymm1, %ymm0
	; AVX512VLBWDQ-NEXT: vpaddb %ymm0, %ymm2, %ymm0			; AVX512VLBWDQ-NEXT: vpaddb %ymm0, %ymm2, %ymm0
	; AVX512VLBWDQ-NEXT: retq			; AVX512VLBWDQ-NEXT: retq
	;			;
	; AVX512-LABEL: testv32i8u:			; AVX512-LABEL: testv32i8u:
	▲ Show 20 Lines • Show All 144 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-lzcnt-512.ll

	Show First 20 Lines • Show All 354 Lines • ▼ Show 20 Lines
	; AVX512CDBW-NEXT: vpmovdw %zmm0, %ymm0			; AVX512CDBW-NEXT: vpmovdw %zmm0, %ymm0
	; AVX512CDBW-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0			; AVX512CDBW-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0
	; AVX512CDBW-NEXT: vpsubw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512CDBW-NEXT: vpsubw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0
	; AVX512CDBW-NEXT: retq			; AVX512CDBW-NEXT: retq
	;			;
	; AVX512BW-LABEL: testv32i16:			; AVX512BW-LABEL: testv32i16:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsrlw $4, %zmm0, %zmm1			; AVX512BW-NEXT: vpsrlw $4, %zmm0, %zmm1
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512BW-NEXT: vpshufb %zmm1, %zmm2, %zmm3			; AVX512BW-NEXT: vpshufb %zmm1, %zmm2, %zmm3
	; AVX512BW-NEXT: vpshufb %zmm0, %zmm2, %zmm2			; AVX512BW-NEXT: vpshufb %zmm0, %zmm2, %zmm2
	; AVX512BW-NEXT: vptestnmb %zmm1, %zmm1, %k0			; AVX512BW-NEXT: vptestnmb %zmm1, %zmm1, %k0
	; AVX512BW-NEXT: vpmovm2b %k0, %zmm1			; AVX512BW-NEXT: vpmovm2b %k0, %zmm1
	; AVX512BW-NEXT: vpandq %zmm1, %zmm2, %zmm1			; AVX512BW-NEXT: vpandq %zmm1, %zmm2, %zmm1
	; AVX512BW-NEXT: vpaddb %zmm3, %zmm1, %zmm1			; AVX512BW-NEXT: vpaddb %zmm3, %zmm1, %zmm1
	; AVX512BW-NEXT: vptestnmb %zmm0, %zmm0, %k0			; AVX512BW-NEXT: vptestnmb %zmm0, %zmm0, %k0
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; AVX512CDBW-NEXT: vpmovdw %zmm0, %ymm0			; AVX512CDBW-NEXT: vpmovdw %zmm0, %ymm0
	; AVX512CDBW-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0			; AVX512CDBW-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0
	; AVX512CDBW-NEXT: vpsubw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512CDBW-NEXT: vpsubw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0
	; AVX512CDBW-NEXT: retq			; AVX512CDBW-NEXT: retq
	;			;
	; AVX512BW-LABEL: testv32i16u:			; AVX512BW-LABEL: testv32i16u:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsrlw $4, %zmm0, %zmm1			; AVX512BW-NEXT: vpsrlw $4, %zmm0, %zmm1
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512BW-NEXT: vpshufb %zmm1, %zmm2, %zmm3			; AVX512BW-NEXT: vpshufb %zmm1, %zmm2, %zmm3
	; AVX512BW-NEXT: vpshufb %zmm0, %zmm2, %zmm2			; AVX512BW-NEXT: vpshufb %zmm0, %zmm2, %zmm2
	; AVX512BW-NEXT: vptestnmb %zmm1, %zmm1, %k0			; AVX512BW-NEXT: vptestnmb %zmm1, %zmm1, %k0
	; AVX512BW-NEXT: vpmovm2b %k0, %zmm1			; AVX512BW-NEXT: vpmovm2b %k0, %zmm1
	; AVX512BW-NEXT: vpandq %zmm1, %zmm2, %zmm1			; AVX512BW-NEXT: vpandq %zmm1, %zmm2, %zmm1
	; AVX512BW-NEXT: vpaddb %zmm3, %zmm1, %zmm1			; AVX512BW-NEXT: vpaddb %zmm3, %zmm1, %zmm1
	; AVX512BW-NEXT: vptestnmb %zmm0, %zmm0, %k0			; AVX512BW-NEXT: vptestnmb %zmm0, %zmm0, %k0
	▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	; AVX512CDBW-NEXT: vinserti128 $1, %xmm0, %ymm2, %ymm0			; AVX512CDBW-NEXT: vinserti128 $1, %xmm0, %ymm2, %ymm0
	; AVX512CDBW-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512CDBW-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512CDBW-NEXT: vpsubb {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512CDBW-NEXT: vpsubb {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0
	; AVX512CDBW-NEXT: retq			; AVX512CDBW-NEXT: retq
	;			;
	; AVX512BW-LABEL: testv64i8:			; AVX512BW-LABEL: testv64i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsrlw $4, %zmm0, %zmm1			; AVX512BW-NEXT: vpsrlw $4, %zmm0, %zmm1
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512BW-NEXT: vpshufb %zmm1, %zmm2, %zmm3			; AVX512BW-NEXT: vpshufb %zmm1, %zmm2, %zmm3
	; AVX512BW-NEXT: vpshufb %zmm0, %zmm2, %zmm0			; AVX512BW-NEXT: vpshufb %zmm0, %zmm2, %zmm0
	; AVX512BW-NEXT: vptestnmb %zmm1, %zmm1, %k0			; AVX512BW-NEXT: vptestnmb %zmm1, %zmm1, %k0
	; AVX512BW-NEXT: vpmovm2b %k0, %zmm1			; AVX512BW-NEXT: vpmovm2b %k0, %zmm1
	; AVX512BW-NEXT: vpandq %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpandq %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpaddb %zmm3, %zmm0, %zmm0			; AVX512BW-NEXT: vpaddb %zmm3, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; AVX512CDBW-NEXT: vinserti128 $1, %xmm0, %ymm2, %ymm0			; AVX512CDBW-NEXT: vinserti128 $1, %xmm0, %ymm2, %ymm0
	; AVX512CDBW-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512CDBW-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512CDBW-NEXT: vpsubb {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512CDBW-NEXT: vpsubb {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0
	; AVX512CDBW-NEXT: retq			; AVX512CDBW-NEXT: retq
	;			;
	; AVX512BW-LABEL: testv64i8u:			; AVX512BW-LABEL: testv64i8u:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsrlw $4, %zmm0, %zmm1			; AVX512BW-NEXT: vpsrlw $4, %zmm0, %zmm1
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0,4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
	; AVX512BW-NEXT: vpshufb %zmm1, %zmm2, %zmm3			; AVX512BW-NEXT: vpshufb %zmm1, %zmm2, %zmm3
	; AVX512BW-NEXT: vpshufb %zmm0, %zmm2, %zmm0			; AVX512BW-NEXT: vpshufb %zmm0, %zmm2, %zmm0
	; AVX512BW-NEXT: vptestnmb %zmm1, %zmm1, %k0			; AVX512BW-NEXT: vptestnmb %zmm1, %zmm1, %k0
	; AVX512BW-NEXT: vpmovm2b %k0, %zmm1			; AVX512BW-NEXT: vpmovm2b %k0, %zmm1
	; AVX512BW-NEXT: vpandq %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpandq %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpaddb %zmm3, %zmm0, %zmm0			; AVX512BW-NEXT: vpaddb %zmm3, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	Show All 31 Lines

llvm/test/CodeGen/X86/vector-mul.ll

	Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	; X64-AVX2: # %bb.0:			; X64-AVX2: # %bb.0:
	; X64-AVX2-NEXT: vpsllw $5, %xmm0, %xmm0			; X64-AVX2-NEXT: vpsllw $5, %xmm0, %xmm0
	; X64-AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512DQ-LABEL: mul_v16i8_32:			; X64-AVX512DQ-LABEL: mul_v16i8_32:
	; X64-AVX512DQ: # %bb.0:			; X64-AVX512DQ: # %bb.0:
	; X64-AVX512DQ-NEXT: vpsllw $5, %xmm0, %xmm0			; X64-AVX512DQ-NEXT: vpsllw $5, %xmm0, %xmm0
	; X64-AVX512DQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; X64-AVX512DQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; X64-AVX512DQ-NEXT: retq			; X64-AVX512DQ-NEXT: retq
	%1 = mul <16 x i8> %a0, <i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32>			%1 = mul <16 x i8> %a0, <i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32>
	ret <16 x i8> %1			ret <16 x i8> %1
	}			}

	;			;
	; PowOf2 (non-uniform)			; PowOf2 (non-uniform)
	;			;
	▲ Show 20 Lines • Show All 319 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: vpsllw $4, %xmm0, %xmm1			; X64-AVX2-NEXT: vpsllw $4, %xmm0, %xmm1
	; X64-AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; X64-AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
	; X64-AVX2-NEXT: vpaddb %xmm0, %xmm1, %xmm0			; X64-AVX2-NEXT: vpaddb %xmm0, %xmm1, %xmm0
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512DQ-LABEL: mul_v16i8_17:			; X64-AVX512DQ-LABEL: mul_v16i8_17:
	; X64-AVX512DQ: # %bb.0:			; X64-AVX512DQ: # %bb.0:
	; X64-AVX512DQ-NEXT: vpsllw $4, %xmm0, %xmm1			; X64-AVX512DQ-NEXT: vpsllw $4, %xmm0, %xmm1
	; X64-AVX512DQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; X64-AVX512DQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; X64-AVX512DQ-NEXT: vpaddb %xmm0, %xmm1, %xmm0			; X64-AVX512DQ-NEXT: vpaddb %xmm0, %xmm1, %xmm0
	; X64-AVX512DQ-NEXT: retq			; X64-AVX512DQ-NEXT: retq
	%1 = mul <16 x i8> %a0, <i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17>			%1 = mul <16 x i8> %a0, <i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17>
	ret <16 x i8> %1			ret <16 x i8> %1
	}			}

	define <4 x i64> @mul_v4i64_17(<4 x i64> %a0) nounwind {			define <4 x i64> @mul_v4i64_17(<4 x i64> %a0) nounwind {
	; SSE-LABEL: mul_v4i64_17:			; SSE-LABEL: mul_v4i64_17:
	▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: vpsllw $4, %ymm0, %ymm1			; X64-AVX2-NEXT: vpsllw $4, %ymm0, %ymm1
	; X64-AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; X64-AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1
	; X64-AVX2-NEXT: vpaddb %ymm0, %ymm1, %ymm0			; X64-AVX2-NEXT: vpaddb %ymm0, %ymm1, %ymm0
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512DQ-LABEL: mul_v32i8_17:			; X64-AVX512DQ-LABEL: mul_v32i8_17:
	; X64-AVX512DQ: # %bb.0:			; X64-AVX512DQ: # %bb.0:
	; X64-AVX512DQ-NEXT: vpsllw $4, %ymm0, %ymm1			; X64-AVX512DQ-NEXT: vpsllw $4, %ymm0, %ymm1
	; X64-AVX512DQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; X64-AVX512DQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; X64-AVX512DQ-NEXT: vpaddb %ymm0, %ymm1, %ymm0			; X64-AVX512DQ-NEXT: vpaddb %ymm0, %ymm1, %ymm0
	; X64-AVX512DQ-NEXT: retq			; X64-AVX512DQ-NEXT: retq
	%1 = mul <32 x i8> %a0, <i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17>			%1 = mul <32 x i8> %a0, <i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17, i8 17>
	ret <32 x i8> %1			ret <32 x i8> %1
	}			}

	;			;
	; -(PowOf2 + 1) (uniform)			; -(PowOf2 + 1) (uniform)
	▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: vpaddb %xmm0, %xmm1, %xmm0			; X64-AVX2-NEXT: vpaddb %xmm0, %xmm1, %xmm0
	; X64-AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1			; X64-AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; X64-AVX2-NEXT: vpsubb %xmm0, %xmm1, %xmm0			; X64-AVX2-NEXT: vpsubb %xmm0, %xmm1, %xmm0
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512DQ-LABEL: mul_v16i8_neg5:			; X64-AVX512DQ-LABEL: mul_v16i8_neg5:
	; X64-AVX512DQ: # %bb.0:			; X64-AVX512DQ: # %bb.0:
	; X64-AVX512DQ-NEXT: vpsllw $2, %xmm0, %xmm1			; X64-AVX512DQ-NEXT: vpsllw $2, %xmm0, %xmm1
	; X64-AVX512DQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; X64-AVX512DQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; X64-AVX512DQ-NEXT: vpaddb %xmm0, %xmm1, %xmm0			; X64-AVX512DQ-NEXT: vpaddb %xmm0, %xmm1, %xmm0
	; X64-AVX512DQ-NEXT: vpxor %xmm1, %xmm1, %xmm1			; X64-AVX512DQ-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; X64-AVX512DQ-NEXT: vpsubb %xmm0, %xmm1, %xmm0			; X64-AVX512DQ-NEXT: vpsubb %xmm0, %xmm1, %xmm0
	; X64-AVX512DQ-NEXT: retq			; X64-AVX512DQ-NEXT: retq
	%1 = mul <16 x i8> %a0, <i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5>			%1 = mul <16 x i8> %a0, <i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5>
	ret <16 x i8> %1			ret <16 x i8> %1
	}			}

	▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: vpaddb %ymm0, %ymm1, %ymm0			; X64-AVX2-NEXT: vpaddb %ymm0, %ymm1, %ymm0
	; X64-AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1			; X64-AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; X64-AVX2-NEXT: vpsubb %ymm0, %ymm1, %ymm0			; X64-AVX2-NEXT: vpsubb %ymm0, %ymm1, %ymm0
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512DQ-LABEL: mul_v32i8_neg5:			; X64-AVX512DQ-LABEL: mul_v32i8_neg5:
	; X64-AVX512DQ: # %bb.0:			; X64-AVX512DQ: # %bb.0:
	; X64-AVX512DQ-NEXT: vpsllw $2, %ymm0, %ymm1			; X64-AVX512DQ-NEXT: vpsllw $2, %ymm0, %ymm1
	; X64-AVX512DQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; X64-AVX512DQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; X64-AVX512DQ-NEXT: vpaddb %ymm0, %ymm1, %ymm0			; X64-AVX512DQ-NEXT: vpaddb %ymm0, %ymm1, %ymm0
	; X64-AVX512DQ-NEXT: vpxor %xmm1, %xmm1, %xmm1			; X64-AVX512DQ-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; X64-AVX512DQ-NEXT: vpsubb %ymm0, %ymm1, %ymm0			; X64-AVX512DQ-NEXT: vpsubb %ymm0, %ymm1, %ymm0
	; X64-AVX512DQ-NEXT: retq			; X64-AVX512DQ-NEXT: retq
	%1 = mul <32 x i8> %a0, <i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5>			%1 = mul <32 x i8> %a0, <i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5, i8 -5>
	ret <32 x i8> %1			ret <32 x i8> %1
	}			}

	▲ Show 20 Lines • Show All 311 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: vpsllw $5, %xmm0, %xmm1			; X64-AVX2-NEXT: vpsllw $5, %xmm0, %xmm1
	; X64-AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; X64-AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
	; X64-AVX2-NEXT: vpsubb %xmm0, %xmm1, %xmm0			; X64-AVX2-NEXT: vpsubb %xmm0, %xmm1, %xmm0
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512DQ-LABEL: mul_v16i8_31:			; X64-AVX512DQ-LABEL: mul_v16i8_31:
	; X64-AVX512DQ: # %bb.0:			; X64-AVX512DQ: # %bb.0:
	; X64-AVX512DQ-NEXT: vpsllw $5, %xmm0, %xmm1			; X64-AVX512DQ-NEXT: vpsllw $5, %xmm0, %xmm1
	; X64-AVX512DQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; X64-AVX512DQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; X64-AVX512DQ-NEXT: vpsubb %xmm0, %xmm1, %xmm0			; X64-AVX512DQ-NEXT: vpsubb %xmm0, %xmm1, %xmm0
	; X64-AVX512DQ-NEXT: retq			; X64-AVX512DQ-NEXT: retq
	%1 = mul <16 x i8> %a0, <i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31>			%1 = mul <16 x i8> %a0, <i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31>
	ret <16 x i8> %1			ret <16 x i8> %1
	}			}

	;			;
	; -(PowOf2 - 1) (uniform)			; -(PowOf2 - 1) (uniform)
	▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: vpsllw $4, %xmm0, %xmm1			; X64-AVX2-NEXT: vpsllw $4, %xmm0, %xmm1
	; X64-AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; X64-AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
	; X64-AVX2-NEXT: vpsubb %xmm1, %xmm0, %xmm0			; X64-AVX2-NEXT: vpsubb %xmm1, %xmm0, %xmm0
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512DQ-LABEL: mul_v16i8_neg15:			; X64-AVX512DQ-LABEL: mul_v16i8_neg15:
	; X64-AVX512DQ: # %bb.0:			; X64-AVX512DQ: # %bb.0:
	; X64-AVX512DQ-NEXT: vpsllw $4, %xmm0, %xmm1			; X64-AVX512DQ-NEXT: vpsllw $4, %xmm0, %xmm1
	; X64-AVX512DQ-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; X64-AVX512DQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; X64-AVX512DQ-NEXT: vpsubb %xmm1, %xmm0, %xmm0			; X64-AVX512DQ-NEXT: vpsubb %xmm1, %xmm0, %xmm0
	; X64-AVX512DQ-NEXT: retq			; X64-AVX512DQ-NEXT: retq
	%1 = mul <16 x i8> %a0, <i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15>			%1 = mul <16 x i8> %a0, <i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15, i8 -15>
	ret <16 x i8> %1			ret <16 x i8> %1
	}			}

	;			;
	; PowOf2 - 1 (non-uniform)			; PowOf2 - 1 (non-uniform)
	▲ Show 20 Lines • Show All 696 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-pack-128.ll

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	define <16 x i8> @trunc_concat_packsswb_128(<8 x i16> %a0, <8 x i16> %a1) nounwind {			define <16 x i8> @trunc_concat_packsswb_128(<8 x i16> %a0, <8 x i16> %a1) nounwind {
	; SSE-LABEL: trunc_concat_packsswb_128:			; SSE-LABEL: trunc_concat_packsswb_128:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: psraw $15, %xmm0			; SSE-NEXT: psraw $15, %xmm0
	; SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1			; SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
	; SSE-NEXT: packsswb %xmm1, %xmm0			; SSE-NEXT: packsswb %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: trunc_concat_packsswb_128:			; AVX1-LABEL: trunc_concat_packsswb_128:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vpsraw $15, %xmm0, %xmm0			; AVX1-NEXT: vpsraw $15, %xmm0, %xmm0
	; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
	; AVX-NEXT: vpacksswb %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpacksswb %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: trunc_concat_packsswb_128:
				; AVX2: # %bb.0:
				; AVX2-NEXT: vpsraw $15, %xmm0, %xmm0
				; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
				; AVX2-NEXT: vpacksswb %xmm1, %xmm0, %xmm0
				; AVX2-NEXT: retq
				;
				; AVX512-LABEL: trunc_concat_packsswb_128:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vpsraw $15, %xmm0, %xmm0
				; AVX512-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
				; AVX512-NEXT: vpacksswb %xmm1, %xmm0, %xmm0
				; AVX512-NEXT: retq
	%1 = ashr <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			%1 = ashr <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	%2 = and <8 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>			%2 = and <8 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
	%3 = shufflevector <8 x i16> %1, <8 x i16> %2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%3 = shufflevector <8 x i16> %1, <8 x i16> %2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	%4 = trunc <16 x i16> %3 to <16 x i8>			%4 = trunc <16 x i16> %3 to <16 x i8>
	ret <16 x i8> %4			ret <16 x i8> %4
	}			}

	define <16 x i8> @trunc_concat_packuswb_128(<8 x i16> %a0, <8 x i16> %a1) nounwind {			define <16 x i8> @trunc_concat_packuswb_128(<8 x i16> %a0, <8 x i16> %a1) nounwind {
	; SSE-LABEL: trunc_concat_packuswb_128:			; SSE-LABEL: trunc_concat_packuswb_128:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: psrlw $15, %xmm0			; SSE-NEXT: psrlw $15, %xmm0
	; SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1			; SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
	; SSE-NEXT: packuswb %xmm1, %xmm0			; SSE-NEXT: packuswb %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: trunc_concat_packuswb_128:			; AVX1-LABEL: trunc_concat_packuswb_128:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vpsrlw $15, %xmm0, %xmm0			; AVX1-NEXT: vpsrlw $15, %xmm0, %xmm0
	; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
	; AVX-NEXT: vpackuswb %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: trunc_concat_packuswb_128:
				; AVX2: # %bb.0:
				; AVX2-NEXT: vpsrlw $15, %xmm0, %xmm0
				; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
				; AVX2-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
				; AVX2-NEXT: retq
				;
				; AVX512-LABEL: trunc_concat_packuswb_128:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vpsrlw $15, %xmm0, %xmm0
				; AVX512-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
				; AVX512-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
				; AVX512-NEXT: retq
	%1 = lshr <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			%1 = lshr <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	%2 = and <8 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>			%2 = and <8 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
	%3 = shufflevector <8 x i16> %1, <8 x i16> %2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%3 = shufflevector <8 x i16> %1, <8 x i16> %2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	%4 = trunc <16 x i16> %3 to <16 x i8>			%4 = trunc <16 x i16> %3 to <16 x i8>
	ret <16 x i8> %4			ret <16 x i8> %4
	}			}

	; concat(trunc(x),trunc(y)) -> pack			; concat(trunc(x),trunc(y)) -> pack
	▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
	define <16 x i8> @concat_trunc_packsswb_128(<8 x i16> %a0, <8 x i16> %a1) nounwind {			define <16 x i8> @concat_trunc_packsswb_128(<8 x i16> %a0, <8 x i16> %a1) nounwind {
	; SSE-LABEL: concat_trunc_packsswb_128:			; SSE-LABEL: concat_trunc_packsswb_128:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: psraw $15, %xmm0			; SSE-NEXT: psraw $15, %xmm0
	; SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1			; SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
	; SSE-NEXT: packsswb %xmm1, %xmm0			; SSE-NEXT: packsswb %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: concat_trunc_packsswb_128:			; AVX1-LABEL: concat_trunc_packsswb_128:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vpsraw $15, %xmm0, %xmm0			; AVX1-NEXT: vpsraw $15, %xmm0, %xmm0
	; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
	; AVX-NEXT: vpacksswb %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpacksswb %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: concat_trunc_packsswb_128:
				; AVX2: # %bb.0:
				; AVX2-NEXT: vpsraw $15, %xmm0, %xmm0
				; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
				; AVX2-NEXT: vpacksswb %xmm1, %xmm0, %xmm0
				; AVX2-NEXT: retq
				;
				; AVX512-LABEL: concat_trunc_packsswb_128:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vpsraw $15, %xmm0, %xmm0
				; AVX512-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
				; AVX512-NEXT: vpacksswb %xmm1, %xmm0, %xmm0
				; AVX512-NEXT: retq
	%1 = ashr <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			%1 = ashr <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	%2 = and <8 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>			%2 = and <8 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
	%3 = trunc <8 x i16> %1 to <8 x i8>			%3 = trunc <8 x i16> %1 to <8 x i8>
	%4 = trunc <8 x i16> %2 to <8 x i8>			%4 = trunc <8 x i16> %2 to <8 x i8>
	%5 = shufflevector <8 x i8> %3, <8 x i8> %4, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%5 = shufflevector <8 x i8> %3, <8 x i8> %4, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	ret <16 x i8> %5			ret <16 x i8> %5
	}			}

	define <16 x i8> @concat_trunc_packuswb_128(<8 x i16> %a0, <8 x i16> %a1) nounwind {			define <16 x i8> @concat_trunc_packuswb_128(<8 x i16> %a0, <8 x i16> %a1) nounwind {
	; SSE-LABEL: concat_trunc_packuswb_128:			; SSE-LABEL: concat_trunc_packuswb_128:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: psrlw $15, %xmm0			; SSE-NEXT: psrlw $15, %xmm0
	; SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1			; SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
	; SSE-NEXT: packuswb %xmm1, %xmm0			; SSE-NEXT: packuswb %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: concat_trunc_packuswb_128:			; AVX1-LABEL: concat_trunc_packuswb_128:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vpsrlw $15, %xmm0, %xmm0			; AVX1-NEXT: vpsrlw $15, %xmm0, %xmm0
	; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
	; AVX-NEXT: vpackuswb %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: concat_trunc_packuswb_128:
				; AVX2: # %bb.0:
				; AVX2-NEXT: vpsrlw $15, %xmm0, %xmm0
				; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
				; AVX2-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
				; AVX2-NEXT: retq
				;
				; AVX512-LABEL: concat_trunc_packuswb_128:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vpsrlw $15, %xmm0, %xmm0
				; AVX512-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
				; AVX512-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
				; AVX512-NEXT: retq
	%1 = lshr <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			%1 = lshr <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	%2 = and <8 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>			%2 = and <8 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
	%3 = trunc <8 x i16> %1 to <8 x i8>			%3 = trunc <8 x i16> %1 to <8 x i8>
	%4 = trunc <8 x i16> %2 to <8 x i8>			%4 = trunc <8 x i16> %2 to <8 x i8>
	%5 = shufflevector <8 x i8> %3, <8 x i8> %4, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%5 = shufflevector <8 x i8> %3, <8 x i8> %4, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	ret <16 x i8> %5			ret <16 x i8> %5
	}			}
				;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
				; AVX: {{.*}}

llvm/test/CodeGen/X86/vector-pack-256.ll

	Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpsraw $15, %ymm0, %ymm0			; AVX2-NEXT: vpsraw $15, %ymm0, %ymm0
	; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1
	; AVX2-NEXT: vpacksswb %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpacksswb %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512F-LABEL: trunc_concat_packsswb_256:			; AVX512F-LABEL: trunc_concat_packsswb_256:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vpsraw $15, %ymm0, %ymm0			; AVX512F-NEXT: vpsraw $15, %ymm0, %ymm0
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512F-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3]			; AVX512F-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3]
	; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0			; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	; AVX512F-NEXT: vpmovdb %zmm0, %xmm0			; AVX512F-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero
	; AVX512F-NEXT: vpmovdb %zmm1, %xmm1			; AVX512F-NEXT: vpmovdb %zmm1, %xmm1
	; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0			; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: trunc_concat_packsswb_256:			; AVX512BW-LABEL: trunc_concat_packsswb_256:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsraw $15, %ymm0, %ymm0			; AVX512BW-NEXT: vpsraw $15, %ymm0, %ymm0
	; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512BW-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3]			; AVX512BW-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3]
	; AVX512BW-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0			; AVX512BW-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
	; AVX512BW-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512BW-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0			; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%1 = ashr <16 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			%1 = ashr <16 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	%2 = and <16 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>			%2 = and <16 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
	%3 = shufflevector <16 x i16> %1, <16 x i16> %2, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>			%3 = shufflevector <16 x i16> %1, <16 x i16> %2, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
	Show All 19 Lines
	; AVX2-NEXT: vpsrlw $15, %ymm0, %ymm0			; AVX2-NEXT: vpsrlw $15, %ymm0, %ymm0
	; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1
	; AVX2-NEXT: vpackuswb %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpackuswb %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512F-LABEL: trunc_concat_packuswb_256:			; AVX512F-LABEL: trunc_concat_packuswb_256:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vpsrlw $15, %ymm0, %ymm0			; AVX512F-NEXT: vpsrlw $15, %ymm0, %ymm0
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512F-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3]			; AVX512F-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3]
	; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0			; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	; AVX512F-NEXT: vpmovdb %zmm0, %xmm0			; AVX512F-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero
	; AVX512F-NEXT: vpmovdb %zmm1, %xmm1			; AVX512F-NEXT: vpmovdb %zmm1, %xmm1
	; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0			; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: trunc_concat_packuswb_256:			; AVX512BW-LABEL: trunc_concat_packuswb_256:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsrlw $15, %ymm0, %ymm0			; AVX512BW-NEXT: vpsrlw $15, %ymm0, %ymm0
	; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512BW-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3]			; AVX512BW-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3]
	; AVX512BW-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0			; AVX512BW-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
	; AVX512BW-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512BW-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0			; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%1 = lshr <16 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			%1 = lshr <16 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	%2 = and <16 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>			%2 = and <16 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
	%3 = shufflevector <16 x i16> %1, <16 x i16> %2, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>			%3 = shufflevector <16 x i16> %1, <16 x i16> %2, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: concat_trunc_packusdw_256:			; AVX512-LABEL: concat_trunc_packusdw_256:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpsrld $17, %ymm0, %ymm0			; AVX512-NEXT: vpsrld $17, %ymm0, %ymm0
	; AVX512-NEXT: vpmovdw %ymm0, %xmm0			; AVX512-NEXT: vpmovdw %ymm0, %xmm0
	; AVX512-NEXT: vpmovdw %ymm1, %xmm1			; AVX512-NEXT: vpmovdw %ymm1, %xmm1
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512-NEXT: vpunpckhqdq {{.*#+}} xmm2 = xmm0[1],xmm1[1]			; AVX512-NEXT: vpunpckhqdq {{.*#+}} xmm2 = xmm0[1],xmm1[1]
	; AVX512-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX512-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX512-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; AVX512-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%1 = lshr <8 x i32> %a0, <i32 17, i32 17, i32 17, i32 17, i32 17, i32 17, i32 17, i32 17>			%1 = lshr <8 x i32> %a0, <i32 17, i32 17, i32 17, i32 17, i32 17, i32 17, i32 17, i32 17>
	%2 = and <8 x i32> %a1, <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>			%2 = and <8 x i32> %a1, <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
	%3 = trunc <8 x i32> %1 to <8 x i16>			%3 = trunc <8 x i32> %1 to <8 x i16>
	%4 = trunc <8 x i32> %2 to <8 x i16>			%4 = trunc <8 x i32> %2 to <8 x i16>
	Show All 33 Lines
	;			;
	; AVX512F-LABEL: concat_trunc_packsswb_256:			; AVX512F-LABEL: concat_trunc_packsswb_256:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vpsraw $15, %ymm0, %ymm0			; AVX512F-NEXT: vpsraw $15, %ymm0, %ymm0
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	; AVX512F-NEXT: vpmovdb %zmm0, %xmm0			; AVX512F-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512F-NEXT: vpmovdb %zmm1, %xmm1			; AVX512F-NEXT: vpmovdb %zmm1, %xmm1
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512F-NEXT: vpunpckhqdq {{.*#+}} xmm2 = xmm0[1],xmm1[1]			; AVX512F-NEXT: vpunpckhqdq {{.*#+}} xmm2 = xmm0[1],xmm1[1]
	; AVX512F-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX512F-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX512F-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; AVX512F-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: concat_trunc_packsswb_256:			; AVX512BW-LABEL: concat_trunc_packsswb_256:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsraw $15, %ymm0, %ymm0			; AVX512BW-NEXT: vpsraw $15, %ymm0, %ymm0
	; AVX512BW-NEXT: vpmovwb %ymm0, %xmm0			; AVX512BW-NEXT: vpmovwb %ymm0, %xmm0
	; AVX512BW-NEXT: vpmovwb %ymm1, %xmm1			; AVX512BW-NEXT: vpmovwb %ymm1, %xmm1
	; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512BW-NEXT: vpunpckhqdq {{.*#+}} xmm2 = xmm0[1],xmm1[1]			; AVX512BW-NEXT: vpunpckhqdq {{.*#+}} xmm2 = xmm0[1],xmm1[1]
	; AVX512BW-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX512BW-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX512BW-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; AVX512BW-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%1 = ashr <16 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			%1 = ashr <16 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	%2 = and <16 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>			%2 = and <16 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
	%3 = trunc <16 x i16> %1 to <16 x i8>			%3 = trunc <16 x i16> %1 to <16 x i8>
	%4 = trunc <16 x i16> %2 to <16 x i8>			%4 = trunc <16 x i16> %2 to <16 x i8>
	Show All 33 Lines
	;			;
	; AVX512F-LABEL: concat_trunc_packuswb_256:			; AVX512F-LABEL: concat_trunc_packuswb_256:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vpsrlw $15, %ymm0, %ymm0			; AVX512F-NEXT: vpsrlw $15, %ymm0, %ymm0
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero
	; AVX512F-NEXT: vpmovdb %zmm0, %xmm0			; AVX512F-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512F-NEXT: vpmovdb %zmm1, %xmm1			; AVX512F-NEXT: vpmovdb %zmm1, %xmm1
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512F-NEXT: vpunpckhqdq {{.*#+}} xmm2 = xmm0[1],xmm1[1]			; AVX512F-NEXT: vpunpckhqdq {{.*#+}} xmm2 = xmm0[1],xmm1[1]
	; AVX512F-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX512F-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX512F-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; AVX512F-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: concat_trunc_packuswb_256:			; AVX512BW-LABEL: concat_trunc_packuswb_256:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsrlw $15, %ymm0, %ymm0			; AVX512BW-NEXT: vpsrlw $15, %ymm0, %ymm0
	; AVX512BW-NEXT: vpmovwb %ymm0, %xmm0			; AVX512BW-NEXT: vpmovwb %ymm0, %xmm0
	; AVX512BW-NEXT: vpmovwb %ymm1, %xmm1			; AVX512BW-NEXT: vpmovwb %ymm1, %xmm1
	; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512BW-NEXT: vpunpckhqdq {{.*#+}} xmm2 = xmm0[1],xmm1[1]			; AVX512BW-NEXT: vpunpckhqdq {{.*#+}} xmm2 = xmm0[1],xmm1[1]
	; AVX512BW-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX512BW-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX512BW-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0			; AVX512BW-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%1 = lshr <16 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			%1 = lshr <16 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	%2 = and <16 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>			%2 = and <16 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
	%3 = trunc <16 x i16> %1 to <16 x i8>			%3 = trunc <16 x i16> %1 to <16 x i8>
	%4 = trunc <16 x i16> %2 to <16 x i8>			%4 = trunc <16 x i16> %2 to <16 x i8>
	%5 = shufflevector <16 x i8> %3, <16 x i8> %4, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>			%5 = shufflevector <16 x i8> %3, <16 x i8> %4, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
	ret <32 x i8> %5			ret <32 x i8> %5
	}			}

llvm/test/CodeGen/X86/vector-pack-512.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines

	define <64 x i8> @trunc_concat_packsswb_512(<32 x i16> %a0, <32 x i16> %a1) nounwind {			define <64 x i8> @trunc_concat_packsswb_512(<32 x i16> %a0, <32 x i16> %a1) nounwind {
	; AVX512F-LABEL: trunc_concat_packsswb_512:			; AVX512F-LABEL: trunc_concat_packsswb_512:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vpsraw $15, %ymm0, %ymm2			; AVX512F-NEXT: vpsraw $15, %ymm0, %ymm2
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm0			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; AVX512F-NEXT: vpsraw $15, %ymm0, %ymm0			; AVX512F-NEXT: vpsraw $15, %ymm0, %ymm0
	; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm2, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm2, %zmm0
	; AVX512F-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm2 = [0,1,8,9,2,3,10,11]			; AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm2 = [0,1,8,9,2,3,10,11]
	; AVX512F-NEXT: vpermi2q %zmm1, %zmm0, %zmm2			; AVX512F-NEXT: vpermi2q %zmm1, %zmm0, %zmm2
	; AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm3 = [4,5,12,13,6,7,14,15]			; AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm3 = [4,5,12,13,6,7,14,15]
	; AVX512F-NEXT: vpermi2q %zmm1, %zmm0, %zmm3			; AVX512F-NEXT: vpermi2q %zmm1, %zmm0, %zmm3
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm3[0],zero,ymm3[1],zero,ymm3[2],zero,ymm3[3],zero,ymm3[4],zero,ymm3[5],zero,ymm3[6],zero,ymm3[7],zero,ymm3[8],zero,ymm3[9],zero,ymm3[10],zero,ymm3[11],zero,ymm3[12],zero,ymm3[13],zero,ymm3[14],zero,ymm3[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm3[0],zero,ymm3[1],zero,ymm3[2],zero,ymm3[3],zero,ymm3[4],zero,ymm3[5],zero,ymm3[6],zero,ymm3[7],zero,ymm3[8],zero,ymm3[9],zero,ymm3[10],zero,ymm3[11],zero,ymm3[12],zero,ymm3[13],zero,ymm3[14],zero,ymm3[15],zero
	; AVX512F-NEXT: vpmovdb %zmm0, %xmm0			; AVX512F-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512F-NEXT: vextracti64x4 $1, %zmm3, %ymm1			; AVX512F-NEXT: vextracti64x4 $1, %zmm3, %ymm1
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512F-NEXT: vpmovdb %zmm1, %xmm1			; AVX512F-NEXT: vpmovdb %zmm1, %xmm1
	; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0			; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero
	; AVX512F-NEXT: vpmovdb %zmm1, %xmm1			; AVX512F-NEXT: vpmovdb %zmm1, %xmm1
	; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm2			; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm2
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm2 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm2 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero
	; AVX512F-NEXT: vpmovdb %zmm2, %xmm2			; AVX512F-NEXT: vpmovdb %zmm2, %xmm2
	; AVX512F-NEXT: vinserti128 $1, %xmm2, %ymm1, %ymm1			; AVX512F-NEXT: vinserti128 $1, %xmm2, %ymm1, %ymm1
	; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: trunc_concat_packsswb_512:			; AVX512BW-LABEL: trunc_concat_packsswb_512:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsraw $15, %zmm0, %zmm0			; AVX512BW-NEXT: vpsraw $15, %zmm0, %zmm0
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [4,5,12,13,6,7,14,15]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [4,5,12,13,6,7,14,15]
	; AVX512BW-NEXT: vpermi2q %zmm1, %zmm0, %zmm2			; AVX512BW-NEXT: vpermi2q %zmm1, %zmm0, %zmm2
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [0,1,8,9,2,3,10,11]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [0,1,8,9,2,3,10,11]
	; AVX512BW-NEXT: vpermi2q %zmm1, %zmm0, %zmm3			; AVX512BW-NEXT: vpermi2q %zmm1, %zmm0, %zmm3
	; AVX512BW-NEXT: vpmovwb %zmm3, %ymm0			; AVX512BW-NEXT: vpmovwb %zmm3, %ymm0
	; AVX512BW-NEXT: vpmovwb %zmm2, %ymm1			; AVX512BW-NEXT: vpmovwb %zmm2, %ymm1
	; AVX512BW-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512BW-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%1 = ashr <32 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			%1 = ashr <32 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	%2 = and <32 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>			%2 = and <32 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
	%3 = shufflevector <32 x i16> %1, <32 x i16> %2, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>			%3 = shufflevector <32 x i16> %1, <32 x i16> %2, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
	%4 = trunc <64 x i16> %3 to <64 x i8>			%4 = trunc <64 x i16> %3 to <64 x i8>
	ret <64 x i8> %4			ret <64 x i8> %4
	}			}

	define <64 x i8> @trunc_concat_packuswb_512(<32 x i16> %a0, <32 x i16> %a1) nounwind {			define <64 x i8> @trunc_concat_packuswb_512(<32 x i16> %a0, <32 x i16> %a1) nounwind {
	; AVX512F-LABEL: trunc_concat_packuswb_512:			; AVX512F-LABEL: trunc_concat_packuswb_512:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vpsrlw $15, %ymm0, %ymm2			; AVX512F-NEXT: vpsrlw $15, %ymm0, %ymm2
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm0			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; AVX512F-NEXT: vpsrlw $15, %ymm0, %ymm0			; AVX512F-NEXT: vpsrlw $15, %ymm0, %ymm0
	; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm2, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm2, %zmm0
	; AVX512F-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm2 = [0,1,8,9,2,3,10,11]			; AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm2 = [0,1,8,9,2,3,10,11]
	; AVX512F-NEXT: vpermi2q %zmm1, %zmm0, %zmm2			; AVX512F-NEXT: vpermi2q %zmm1, %zmm0, %zmm2
	; AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm3 = [4,5,12,13,6,7,14,15]			; AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm3 = [4,5,12,13,6,7,14,15]
	; AVX512F-NEXT: vpermi2q %zmm1, %zmm0, %zmm3			; AVX512F-NEXT: vpermi2q %zmm1, %zmm0, %zmm3
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm3[0],zero,ymm3[1],zero,ymm3[2],zero,ymm3[3],zero,ymm3[4],zero,ymm3[5],zero,ymm3[6],zero,ymm3[7],zero,ymm3[8],zero,ymm3[9],zero,ymm3[10],zero,ymm3[11],zero,ymm3[12],zero,ymm3[13],zero,ymm3[14],zero,ymm3[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm0 = ymm3[0],zero,ymm3[1],zero,ymm3[2],zero,ymm3[3],zero,ymm3[4],zero,ymm3[5],zero,ymm3[6],zero,ymm3[7],zero,ymm3[8],zero,ymm3[9],zero,ymm3[10],zero,ymm3[11],zero,ymm3[12],zero,ymm3[13],zero,ymm3[14],zero,ymm3[15],zero
	; AVX512F-NEXT: vpmovdb %zmm0, %xmm0			; AVX512F-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512F-NEXT: vextracti64x4 $1, %zmm3, %ymm1			; AVX512F-NEXT: vextracti64x4 $1, %zmm3, %ymm1
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512F-NEXT: vpmovdb %zmm1, %xmm1			; AVX512F-NEXT: vpmovdb %zmm1, %xmm1
	; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0			; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero
	; AVX512F-NEXT: vpmovdb %zmm1, %xmm1			; AVX512F-NEXT: vpmovdb %zmm1, %xmm1
	; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm2			; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm2
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm2 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm2 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero
	; AVX512F-NEXT: vpmovdb %zmm2, %xmm2			; AVX512F-NEXT: vpmovdb %zmm2, %xmm2
	; AVX512F-NEXT: vinserti128 $1, %xmm2, %ymm1, %ymm1			; AVX512F-NEXT: vinserti128 $1, %xmm2, %ymm1, %ymm1
	; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: trunc_concat_packuswb_512:			; AVX512BW-LABEL: trunc_concat_packuswb_512:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsrlw $15, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlw $15, %zmm0, %zmm0
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [4,5,12,13,6,7,14,15]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [4,5,12,13,6,7,14,15]
	; AVX512BW-NEXT: vpermi2q %zmm1, %zmm0, %zmm2			; AVX512BW-NEXT: vpermi2q %zmm1, %zmm0, %zmm2
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [0,1,8,9,2,3,10,11]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [0,1,8,9,2,3,10,11]
	; AVX512BW-NEXT: vpermi2q %zmm1, %zmm0, %zmm3			; AVX512BW-NEXT: vpermi2q %zmm1, %zmm0, %zmm3
	; AVX512BW-NEXT: vpmovwb %zmm3, %ymm0			; AVX512BW-NEXT: vpmovwb %zmm3, %ymm0
	; AVX512BW-NEXT: vpmovwb %zmm2, %ymm1			; AVX512BW-NEXT: vpmovwb %zmm2, %ymm1
	; AVX512BW-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512BW-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm2 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm2 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero
	; AVX512F-NEXT: vpmovdb %zmm2, %xmm2			; AVX512F-NEXT: vpmovdb %zmm2, %xmm2
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm3 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm3 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512F-NEXT: vpmovdb %zmm3, %xmm3			; AVX512F-NEXT: vpmovdb %zmm3, %xmm3
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm1			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm1
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512F-NEXT: vpmovdb %zmm1, %xmm1			; AVX512F-NEXT: vpmovdb %zmm1, %xmm1
	; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm3, %ymm1			; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm3, %ymm1
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm2, %zmm2			; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm2, %zmm2
	; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm1, %zmm1			; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm1, %zmm1
	; AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,8,1,9,6,14,7,15]			; AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,8,1,9,6,14,7,15]
	; AVX512F-NEXT: vpermi2q %zmm1, %zmm2, %zmm0			; AVX512F-NEXT: vpermi2q %zmm1, %zmm2, %zmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: concat_trunc_packsswb_512:			; AVX512BW-LABEL: concat_trunc_packsswb_512:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsraw $15, %zmm0, %zmm0			; AVX512BW-NEXT: vpsraw $15, %zmm0, %zmm0
	; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0			; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512BW-NEXT: vpmovwb %zmm1, %ymm1			; AVX512BW-NEXT: vpmovwb %zmm1, %ymm1
	; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512BW-NEXT: vinserti64x4 $1, %ymm1, %zmm1, %zmm1			; AVX512BW-NEXT: vinserti64x4 $1, %ymm1, %zmm1, %zmm1
	; AVX512BW-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm2			; AVX512BW-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm2
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,8,1,9,6,14,7,15]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,8,1,9,6,14,7,15]
	; AVX512BW-NEXT: vpermi2q %zmm1, %zmm2, %zmm0			; AVX512BW-NEXT: vpermi2q %zmm1, %zmm2, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%1 = ashr <32 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			%1 = ashr <32 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	%2 = and <32 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>			%2 = and <32 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
	%3 = trunc <32 x i16> %1 to <32 x i8>			%3 = trunc <32 x i16> %1 to <32 x i8>
	Show All 14 Lines
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm2 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm2 = ymm2[0],zero,ymm2[1],zero,ymm2[2],zero,ymm2[3],zero,ymm2[4],zero,ymm2[5],zero,ymm2[6],zero,ymm2[7],zero,ymm2[8],zero,ymm2[9],zero,ymm2[10],zero,ymm2[11],zero,ymm2[12],zero,ymm2[13],zero,ymm2[14],zero,ymm2[15],zero
	; AVX512F-NEXT: vpmovdb %zmm2, %xmm2			; AVX512F-NEXT: vpmovdb %zmm2, %xmm2
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm3 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm3 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512F-NEXT: vpmovdb %zmm3, %xmm3			; AVX512F-NEXT: vpmovdb %zmm3, %xmm3
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm1			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm1
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero
	; AVX512F-NEXT: vpmovdb %zmm1, %xmm1			; AVX512F-NEXT: vpmovdb %zmm1, %xmm1
	; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm3, %ymm1			; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm3, %ymm1
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm2, %zmm2			; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm2, %zmm2
	; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm1, %zmm1			; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm1, %zmm1
	; AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,8,1,9,6,14,7,15]			; AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,8,1,9,6,14,7,15]
	; AVX512F-NEXT: vpermi2q %zmm1, %zmm2, %zmm0			; AVX512F-NEXT: vpermi2q %zmm1, %zmm2, %zmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: concat_trunc_packuswb_512:			; AVX512BW-LABEL: concat_trunc_packuswb_512:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsrlw $15, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlw $15, %zmm0, %zmm0
	; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0			; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512BW-NEXT: vpmovwb %zmm1, %ymm1			; AVX512BW-NEXT: vpmovwb %zmm1, %ymm1
	; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512BW-NEXT: vinserti64x4 $1, %ymm1, %zmm1, %zmm1			; AVX512BW-NEXT: vinserti64x4 $1, %ymm1, %zmm1, %zmm1
	; AVX512BW-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm2			; AVX512BW-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm2
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,8,1,9,6,14,7,15]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm0 = [0,8,1,9,6,14,7,15]
	; AVX512BW-NEXT: vpermi2q %zmm1, %zmm2, %zmm0			; AVX512BW-NEXT: vpermi2q %zmm1, %zmm2, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%1 = lshr <32 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>			%1 = lshr <32 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
	%2 = and <32 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>			%2 = and <32 x i16> %a1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
	%3 = trunc <32 x i16> %1 to <32 x i8>			%3 = trunc <32 x i16> %1 to <32 x i8>
	%4 = trunc <32 x i16> %2 to <32 x i8>			%4 = trunc <32 x i16> %2 to <32 x i8>
	%5 = shufflevector <32 x i8> %3, <32 x i8> %4, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>			%5 = shufflevector <32 x i8> %3, <32 x i8> %4, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
	ret <64 x i8> %5			ret <64 x i8> %5
	}			}

llvm/test/CodeGen/X86/vector-pcmp.ll

	Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines

	define <16 x i8> @cmpeq_zext_v16i8(<16 x i8> %a, <16 x i8> %b) {			define <16 x i8> @cmpeq_zext_v16i8(<16 x i8> %a, <16 x i8> %b) {
	; SSE-LABEL: cmpeq_zext_v16i8:			; SSE-LABEL: cmpeq_zext_v16i8:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: pcmpeqb %xmm1, %xmm0			; SSE-NEXT: pcmpeqb %xmm1, %xmm0
	; SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: cmpeq_zext_v16i8:			; AVX1-LABEL: cmpeq_zext_v16i8:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: cmpeq_zext_v16i8:
				; AVX2: # %bb.0:
				; AVX2-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0
				; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX2-NEXT: retq
				;
				; AVX512-LABEL: cmpeq_zext_v16i8:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0
				; AVX512-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512-NEXT: retq
	%cmp = icmp eq <16 x i8> %a, %b			%cmp = icmp eq <16 x i8> %a, %b
	%zext = zext <16 x i1> %cmp to <16 x i8>			%zext = zext <16 x i1> %cmp to <16 x i8>
	ret <16 x i8> %zext			ret <16 x i8> %zext
	}			}

	define <16 x i16> @cmpeq_zext_v16i16(<16 x i16> %a, <16 x i16> %b) {			define <16 x i16> @cmpeq_zext_v16i16(<16 x i16> %a, <16 x i16> %b) {
	; SSE-LABEL: cmpeq_zext_v16i16:			; SSE-LABEL: cmpeq_zext_v16i16:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpcmpgtb %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpcmpgtb %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: cmpgt_zext_v32i8:			; AVX512-LABEL: cmpgt_zext_v32i8:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpcmpgtb %ymm1, %ymm0, %ymm0			; AVX512-NEXT: vpcmpgtb %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%cmp = icmp sgt <32 x i8> %a, %b			%cmp = icmp sgt <32 x i8> %a, %b
	%zext = zext <32 x i1> %cmp to <32 x i8>			%zext = zext <32 x i1> %cmp to <32 x i8>
	ret <32 x i8> %zext			ret <32 x i8> %zext
	}			}

	define <8 x i16> @cmpgt_zext_v8i16(<8 x i16> %a, <8 x i16> %b) {			define <8 x i16> @cmpgt_zext_v8i16(<8 x i16> %a, <8 x i16> %b) {
	; SSE-LABEL: cmpgt_zext_v8i16:			; SSE-LABEL: cmpgt_zext_v8i16:
	▲ Show 20 Lines • Show All 1,588 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-reduce-add-mask.ll

	Show All 22 Lines
	; SSE41-LABEL: test_v2i64_v2i32:			; SSE41-LABEL: test_v2i64_v2i32:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; SSE41-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]			; SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
	; SSE41-NEXT: paddq %xmm0, %xmm1			; SSE41-NEXT: paddq %xmm0, %xmm1
	; SSE41-NEXT: movq %xmm1, %rax			; SSE41-NEXT: movq %xmm1, %rax
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: test_v2i64_v2i32:			; AVX1-LABEL: test_v2i64_v2i32:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]			; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
	; AVX-NEXT: vpaddq %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpaddq %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vmovq %xmm0, %rax			; AVX1-NEXT: vmovq %xmm0, %rax
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-LABEL: test_v2i64_v2i32:
				; AVX2: # %bb.0:
				; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX2-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
				; AVX2-NEXT: vpaddq %xmm1, %xmm0, %xmm0
				; AVX2-NEXT: vmovq %xmm0, %rax
				; AVX2-NEXT: retq
				;
				; AVX512BW-LABEL: test_v2i64_v2i32:
				; AVX512BW: # %bb.0:
				; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX512BW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
				; AVX512BW-NEXT: vpaddq %xmm1, %xmm0, %xmm0
				; AVX512BW-NEXT: vmovq %xmm0, %rax
				; AVX512BW-NEXT: retq
				;
				; AVX512BWVL-LABEL: test_v2i64_v2i32:
				; AVX512BWVL: # %bb.0:
				; AVX512BWVL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm0, %xmm0
				; AVX512BWVL-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
				; AVX512BWVL-NEXT: vpaddq %xmm1, %xmm0, %xmm0
				; AVX512BWVL-NEXT: vmovq %xmm0, %rax
				; AVX512BWVL-NEXT: retq
	%1 = and <2 x i64> %a0, <i64 255, i64 255>			%1 = and <2 x i64> %a0, <i64 255, i64 255>
	%2 = call i64 @llvm.vector.reduce.add.v2i64(<2 x i64> %1)			%2 = call i64 @llvm.vector.reduce.add.v2i64(<2 x i64> %1)
	ret i64 %2			ret i64 %2
	}			}

	define i64 @test_v4i64_v4i16(<4 x i64> %a0) {			define i64 @test_v4i64_v4i16(<4 x i64> %a0) {
	; SSE2-LABEL: test_v4i64_v4i16:			; SSE2-LABEL: test_v4i64_v4i16:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	▲ Show 20 Lines • Show All 220 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX2-NEXT: vpaddq %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpaddq %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]			; AVX2-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
	; AVX2-NEXT: vpaddq %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpaddq %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vmovq %xmm0, %rax			; AVX2-NEXT: vmovq %xmm0, %rax
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test_v16i64_v16i8:			; AVX512BW-LABEL: test_v16i64_v16i8:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vpmovqb %zmm1, %xmm1			; AVX512BW-NEXT: vpmovqb %zmm1, %xmm1
	; AVX512-NEXT: vpmovqb %zmm0, %xmm0			; AVX512BW-NEXT: vpmovqb %zmm0, %xmm0
	; AVX512-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX512BW-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vpxor %xmm1, %xmm1, %xmm1			; AVX512BW-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX512-NEXT: vpsadbw %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpsadbw %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]			; AVX512BW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
	; AVX512-NEXT: vpaddq %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpaddq %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vmovq %xmm0, %rax			; AVX512BW-NEXT: vmovq %xmm0, %rax
	; AVX512-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512BWVL-LABEL: test_v16i64_v16i8:
				; AVX512BWVL: # %bb.0:
				; AVX512BWVL-NEXT: vpmovqb %zmm1, %xmm1
				; AVX512BWVL-NEXT: vpmovqb %zmm0, %xmm0
				; AVX512BWVL-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX512BWVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512BWVL-NEXT: vpxor %xmm1, %xmm1, %xmm1
				; AVX512BWVL-NEXT: vpsadbw %xmm1, %xmm0, %xmm0
				; AVX512BWVL-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
				; AVX512BWVL-NEXT: vpaddq %xmm1, %xmm0, %xmm0
				; AVX512BWVL-NEXT: vmovq %xmm0, %rax
				; AVX512BWVL-NEXT: vzeroupper
				; AVX512BWVL-NEXT: retq
	%1 = and <16 x i64> %a0, <i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1>			%1 = and <16 x i64> %a0, <i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1>
	%2 = call i64 @llvm.vector.reduce.add.v16i64(<16 x i64> %1)			%2 = call i64 @llvm.vector.reduce.add.v16i64(<16 x i64> %1)
	ret i64 %2			ret i64 %2
	}			}

	;			;
	; vXi32			; vXi32
	;			;
	▲ Show 20 Lines • Show All 715 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512BW-NEXT: # kill: def $ax killed $ax killed $eax			; AVX512BW-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: test_v16i16_v16i8:			; AVX512BWVL-LABEL: test_v16i16_v16i8:
	; AVX512BWVL: # %bb.0:			; AVX512BWVL: # %bb.0:
	; AVX512BWVL-NEXT: vpmovwb %ymm0, %xmm0			; AVX512BWVL-NEXT: vpmovwb %ymm0, %xmm0
	; AVX512BWVL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BWVL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm0, %xmm0
	; AVX512BWVL-NEXT: vpxor %xmm1, %xmm1, %xmm1			; AVX512BWVL-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpsadbw %xmm1, %xmm0, %xmm0			; AVX512BWVL-NEXT: vpsadbw %xmm1, %xmm0, %xmm0
	; AVX512BWVL-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]			; AVX512BWVL-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
	; AVX512BWVL-NEXT: vpaddq %xmm1, %xmm0, %xmm0			; AVX512BWVL-NEXT: vpaddq %xmm1, %xmm0, %xmm0
	; AVX512BWVL-NEXT: vmovd %xmm0, %eax			; AVX512BWVL-NEXT: vmovd %xmm0, %eax
	; AVX512BWVL-NEXT: # kill: def $ax killed $ax killed $eax			; AVX512BWVL-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX512BWVL-NEXT: vzeroupper			; AVX512BWVL-NEXT: vzeroupper
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq
	▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test_v64i16_v64i8:			; AVX512-LABEL: test_v64i16_v64i8:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpmovwb %zmm0, %ymm0			; AVX512-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512-NEXT: vpmovwb %zmm1, %ymm1			; AVX512-NEXT: vpmovwb %zmm1, %ymm1
	; AVX512-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512-NEXT: vpxor %xmm1, %xmm1, %xmm1			; AVX512-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX512-NEXT: vpsadbw %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpsadbw %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512-NEXT: vpaddq %ymm1, %ymm0, %ymm0			; AVX512-NEXT: vpaddq %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX512-NEXT: vpaddq %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpaddq %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]			; AVX512-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
	; AVX512-NEXT: vpaddq %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpaddq %xmm1, %xmm0, %xmm0
	Show All 34 Lines

llvm/test/CodeGen/X86/vector-reduce-or-bool.ll

	Show First 20 Lines • Show All 430 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0			; AVX2-NEXT: vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0
	; AVX2-NEXT: setne %al			; AVX2-NEXT: setne %al
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: trunc_v32i16_v32i1:			; AVX512-LABEL: trunc_v32i16_v32i1:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %k0			; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %k0
	; AVX512-NEXT: kortestw %k0, %k0			; AVX512-NEXT: kortestw %k0, %k0
	; AVX512-NEXT: setne %al			; AVX512-NEXT: setne %al
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%a = trunc <32 x i16> %0 to <32 x i1>			%a = trunc <32 x i16> %0 to <32 x i1>
	%b = call i1 @llvm.vector.reduce.or.v32i1(<32 x i1> %a)			%b = call i1 @llvm.vector.reduce.or.v32i1(<32 x i1> %a)
	ret i1 %b			ret i1 %b
	}			}
	Show All 32 Lines
	; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0			; AVX2-NEXT: vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0
	; AVX2-NEXT: setne %al			; AVX2-NEXT: setne %al
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: trunc_v64i8_v64i1:			; AVX512-LABEL: trunc_v64i8_v64i1:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %k0			; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %k0
	; AVX512-NEXT: kortestw %k0, %k0			; AVX512-NEXT: kortestw %k0, %k0
	; AVX512-NEXT: setne %al			; AVX512-NEXT: setne %al
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%a = trunc <64 x i8> %0 to <64 x i1>			%a = trunc <64 x i8> %0 to <64 x i1>
	%b = call i1 @llvm.vector.reduce.or.v64i1(<64 x i1> %a)			%b = call i1 @llvm.vector.reduce.or.v64i1(<64 x i1> %a)
	ret i1 %b			ret i1 %b
	}			}
	▲ Show 20 Lines • Show All 1,611 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-reduce-or-cmp.ll

	Show First 20 Lines • Show All 981 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vptest %ymm1, %ymm0			; AVX2-NEXT: vptest %ymm1, %ymm0
	; AVX2-NEXT: sete %al			; AVX2-NEXT: sete %al
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: mask_v128i8:			; AVX512-LABEL: mask_v128i8:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vporq %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vporq %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %k0			; AVX512-NEXT: vptestmd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %k0
	; AVX512-NEXT: kortestw %k0, %k0			; AVX512-NEXT: kortestw %k0, %k0
	; AVX512-NEXT: sete %al			; AVX512-NEXT: sete %al
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%1 = call i8 @llvm.vector.reduce.or.v128i8(<128 x i8> %a0)			%1 = call i8 @llvm.vector.reduce.or.v128i8(<128 x i8> %a0)
	%2 = and i8 %1, 1			%2 = and i8 %1, 1
	%3 = icmp eq i8 %2, 0			%3 = icmp eq i8 %2, 0
	ret i1 %3			ret i1 %3
	▲ Show 20 Lines • Show All 205 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-reduce-smax.ll

	Show First 20 Lines • Show All 1,202 Lines • ▼ Show 20 Lines
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX-NEXT: vphminposuw %xmm0, %xmm0			; AVX-NEXT: vphminposuw %xmm0, %xmm0
	; AVX-NEXT: vmovd %xmm0, %eax			; AVX-NEXT: vmovd %xmm0, %eax
	; AVX-NEXT: xorl $32767, %eax # imm = 0x7FFF			; AVX-NEXT: xorl $32767, %eax # imm = 0x7FFF
	; AVX-NEXT: # kill: def $ax killed $ax killed $eax			; AVX-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; AVX512-LABEL: test_v8i16:			; AVX512BW-LABEL: test_v8i16:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: xorl $32767, %eax # imm = 0x7FFF			; AVX512BW-NEXT: xorl $32767, %eax # imm = 0x7FFF
	; AVX512-NEXT: # kill: def $ax killed $ax killed $eax			; AVX512BW-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v8i16:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: xorl $32767, %eax # imm = 0x7FFF
				; AVX512VL-NEXT: # kill: def $ax killed $ax killed $eax
				; AVX512VL-NEXT: retq
	%1 = call i16 @llvm.vector.reduce.smax.v8i16(<8 x i16> %a0)			%1 = call i16 @llvm.vector.reduce.smax.v8i16(<8 x i16> %a0)
	ret i16 %1			ret i16 %1
	}			}

	define i16 @test_v16i16(<16 x i16> %a0) {			define i16 @test_v16i16(<16 x i16> %a0) {
	; SSE2-LABEL: test_v16i16:			; SSE2-LABEL: test_v16i16:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: pmaxsw %xmm1, %xmm0			; SSE2-NEXT: pmaxsw %xmm1, %xmm0
	Show All 37 Lines
	; AVX2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX2-NEXT: vphminposuw %xmm0, %xmm0			; AVX2-NEXT: vphminposuw %xmm0, %xmm0
	; AVX2-NEXT: vmovd %xmm0, %eax			; AVX2-NEXT: vmovd %xmm0, %eax
	; AVX2-NEXT: xorl $32767, %eax # imm = 0x7FFF			; AVX2-NEXT: xorl $32767, %eax # imm = 0x7FFF
	; AVX2-NEXT: # kill: def $ax killed $ax killed $eax			; AVX2-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test_v16i16:			; AVX512BW-LABEL: test_v16i16:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX512-NEXT: vpmaxsw %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpmaxsw %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: xorl $32767, %eax # imm = 0x7FFF			; AVX512BW-NEXT: xorl $32767, %eax # imm = 0x7FFF
	; AVX512-NEXT: # kill: def $ax killed $ax killed $eax			; AVX512BW-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX512-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v16i16:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vextracti128 $1, %ymm0, %xmm1
				; AVX512VL-NEXT: vpmaxsw %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: xorl $32767, %eax # imm = 0x7FFF
				; AVX512VL-NEXT: # kill: def $ax killed $ax killed $eax
				; AVX512VL-NEXT: vzeroupper
				; AVX512VL-NEXT: retq
	%1 = call i16 @llvm.vector.reduce.smax.v16i16(<16 x i16> %a0)			%1 = call i16 @llvm.vector.reduce.smax.v16i16(<16 x i16> %a0)
	ret i16 %1			ret i16 %1
	}			}

	define i16 @test_v32i16(<32 x i16> %a0) {			define i16 @test_v32i16(<32 x i16> %a0) {
	; SSE2-LABEL: test_v32i16:			; SSE2-LABEL: test_v32i16:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: pmaxsw %xmm3, %xmm1			; SSE2-NEXT: pmaxsw %xmm3, %xmm1
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX2-NEXT: vphminposuw %xmm0, %xmm0			; AVX2-NEXT: vphminposuw %xmm0, %xmm0
	; AVX2-NEXT: vmovd %xmm0, %eax			; AVX2-NEXT: vmovd %xmm0, %eax
	; AVX2-NEXT: xorl $32767, %eax # imm = 0x7FFF			; AVX2-NEXT: xorl $32767, %eax # imm = 0x7FFF
	; AVX2-NEXT: # kill: def $ax killed $ax killed $eax			; AVX2-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test_v32i16:			; AVX512BW-LABEL: test_v32i16:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; AVX512BW-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512-NEXT: vpmaxsw %ymm1, %ymm0, %ymm0			; AVX512BW-NEXT: vpmaxsw %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX512-NEXT: vpmaxsw %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpmaxsw %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: xorl $32767, %eax # imm = 0x7FFF			; AVX512BW-NEXT: xorl $32767, %eax # imm = 0x7FFF
	; AVX512-NEXT: # kill: def $ax killed $ax killed $eax			; AVX512BW-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX512-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v32i16:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm1
				; AVX512VL-NEXT: vpmaxsw %ymm1, %ymm0, %ymm0
				; AVX512VL-NEXT: vextracti128 $1, %ymm0, %xmm1
				; AVX512VL-NEXT: vpmaxsw %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: xorl $32767, %eax # imm = 0x7FFF
				; AVX512VL-NEXT: # kill: def $ax killed $ax killed $eax
				; AVX512VL-NEXT: vzeroupper
				; AVX512VL-NEXT: retq
	%1 = call i16 @llvm.vector.reduce.smax.v32i16(<32 x i16> %a0)			%1 = call i16 @llvm.vector.reduce.smax.v32i16(<32 x i16> %a0)
	ret i16 %1			ret i16 %1
	}			}

	define i16 @test_v64i16(<64 x i16> %a0) {			define i16 @test_v64i16(<64 x i16> %a0) {
	; SSE2-LABEL: test_v64i16:			; SSE2-LABEL: test_v64i16:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: pmaxsw %xmm6, %xmm2			; SSE2-NEXT: pmaxsw %xmm6, %xmm2
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX2-NEXT: vphminposuw %xmm0, %xmm0			; AVX2-NEXT: vphminposuw %xmm0, %xmm0
	; AVX2-NEXT: vmovd %xmm0, %eax			; AVX2-NEXT: vmovd %xmm0, %eax
	; AVX2-NEXT: xorl $32767, %eax # imm = 0x7FFF			; AVX2-NEXT: xorl $32767, %eax # imm = 0x7FFF
	; AVX2-NEXT: # kill: def $ax killed $ax killed $eax			; AVX2-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test_v64i16:			; AVX512BW-LABEL: test_v64i16:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vpmaxsw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpmaxsw %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; AVX512BW-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512-NEXT: vpmaxsw %ymm1, %ymm0, %ymm0			; AVX512BW-NEXT: vpmaxsw %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX512-NEXT: vpmaxsw %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpmaxsw %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: xorl $32767, %eax # imm = 0x7FFF			; AVX512BW-NEXT: xorl $32767, %eax # imm = 0x7FFF
	; AVX512-NEXT: # kill: def $ax killed $ax killed $eax			; AVX512BW-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX512-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v64i16:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vpmaxsw %zmm1, %zmm0, %zmm0
				; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm1
				; AVX512VL-NEXT: vpmaxsw %ymm1, %ymm0, %ymm0
				; AVX512VL-NEXT: vextracti128 $1, %ymm0, %xmm1
				; AVX512VL-NEXT: vpmaxsw %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: xorl $32767, %eax # imm = 0x7FFF
				; AVX512VL-NEXT: # kill: def $ax killed $ax killed $eax
				; AVX512VL-NEXT: vzeroupper
				; AVX512VL-NEXT: retq
	%1 = call i16 @llvm.vector.reduce.smax.v64i16(<64 x i16> %a0)			%1 = call i16 @llvm.vector.reduce.smax.v64i16(<64 x i16> %a0)
	ret i16 %1			ret i16 %1
	}			}

	;			;
	; vXi8			; vXi8
	;			;

	▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines
	; AVX-NEXT: vpsrlw $8, %xmm0, %xmm1			; AVX-NEXT: vpsrlw $8, %xmm0, %xmm1
	; AVX-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vphminposuw %xmm0, %xmm0			; AVX-NEXT: vphminposuw %xmm0, %xmm0
	; AVX-NEXT: vmovd %xmm0, %eax			; AVX-NEXT: vmovd %xmm0, %eax
	; AVX-NEXT: xorb $127, %al			; AVX-NEXT: xorb $127, %al
	; AVX-NEXT: # kill: def $al killed $al killed $eax			; AVX-NEXT: # kill: def $al killed $al killed $eax
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; AVX512-LABEL: test_v16i8:			; AVX512BW-LABEL: test_v16i8:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1			; AVX512BW-NEXT: vpsrlw $8, %xmm0, %xmm1
	; AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: xorb $127, %al			; AVX512BW-NEXT: xorb $127, %al
	; AVX512-NEXT: # kill: def $al killed $al killed $eax			; AVX512BW-NEXT: # kill: def $al killed $al killed $eax
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v16i8:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vpsrlw $8, %xmm0, %xmm1
				; AVX512VL-NEXT: vpminub %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: xorb $127, %al
				; AVX512VL-NEXT: # kill: def $al killed $al killed $eax
				; AVX512VL-NEXT: retq
	%1 = call i8 @llvm.vector.reduce.smax.v16i8(<16 x i8> %a0)			%1 = call i8 @llvm.vector.reduce.smax.v16i8(<16 x i8> %a0)
	ret i8 %1			ret i8 %1
	}			}

	define i8 @test_v32i8(<32 x i8> %a0) {			define i8 @test_v32i8(<32 x i8> %a0) {
	; SSE2-LABEL: test_v32i8:			; SSE2-LABEL: test_v32i8:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movdqa %xmm0, %xmm2			; SSE2-NEXT: movdqa %xmm0, %xmm2
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vphminposuw %xmm0, %xmm0			; AVX2-NEXT: vphminposuw %xmm0, %xmm0
	; AVX2-NEXT: vmovd %xmm0, %eax			; AVX2-NEXT: vmovd %xmm0, %eax
	; AVX2-NEXT: xorb $127, %al			; AVX2-NEXT: xorb $127, %al
	; AVX2-NEXT: # kill: def $al killed $al killed $eax			; AVX2-NEXT: # kill: def $al killed $al killed $eax
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test_v32i8:			; AVX512BW-LABEL: test_v32i8:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX512-NEXT: vpmaxsb %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpmaxsb %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1			; AVX512BW-NEXT: vpsrlw $8, %xmm0, %xmm1
	; AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: xorb $127, %al			; AVX512BW-NEXT: xorb $127, %al
	; AVX512-NEXT: # kill: def $al killed $al killed $eax			; AVX512BW-NEXT: # kill: def $al killed $al killed $eax
	; AVX512-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v32i8:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vextracti128 $1, %ymm0, %xmm1
				; AVX512VL-NEXT: vpmaxsb %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vpsrlw $8, %xmm0, %xmm1
				; AVX512VL-NEXT: vpminub %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: xorb $127, %al
				; AVX512VL-NEXT: # kill: def $al killed $al killed $eax
				; AVX512VL-NEXT: vzeroupper
				; AVX512VL-NEXT: retq
	%1 = call i8 @llvm.vector.reduce.smax.v32i8(<32 x i8> %a0)			%1 = call i8 @llvm.vector.reduce.smax.v32i8(<32 x i8> %a0)
	ret i8 %1			ret i8 %1
	}			}

	define i8 @test_v64i8(<64 x i8> %a0) {			define i8 @test_v64i8(<64 x i8> %a0) {
	; SSE2-LABEL: test_v64i8:			; SSE2-LABEL: test_v64i8:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movdqa %xmm1, %xmm4			; SSE2-NEXT: movdqa %xmm1, %xmm4
	▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vphminposuw %xmm0, %xmm0			; AVX2-NEXT: vphminposuw %xmm0, %xmm0
	; AVX2-NEXT: vmovd %xmm0, %eax			; AVX2-NEXT: vmovd %xmm0, %eax
	; AVX2-NEXT: xorb $127, %al			; AVX2-NEXT: xorb $127, %al
	; AVX2-NEXT: # kill: def $al killed $al killed $eax			; AVX2-NEXT: # kill: def $al killed $al killed $eax
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test_v64i8:			; AVX512BW-LABEL: test_v64i8:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; AVX512BW-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512-NEXT: vpmaxsb %ymm1, %ymm0, %ymm0			; AVX512BW-NEXT: vpmaxsb %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX512-NEXT: vpmaxsb %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpmaxsb %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1			; AVX512BW-NEXT: vpsrlw $8, %xmm0, %xmm1
	; AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: xorb $127, %al			; AVX512BW-NEXT: xorb $127, %al
	; AVX512-NEXT: # kill: def $al killed $al killed $eax			; AVX512BW-NEXT: # kill: def $al killed $al killed $eax
	; AVX512-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v64i8:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm1
				; AVX512VL-NEXT: vpmaxsb %ymm1, %ymm0, %ymm0
				; AVX512VL-NEXT: vextracti128 $1, %ymm0, %xmm1
				; AVX512VL-NEXT: vpmaxsb %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vpsrlw $8, %xmm0, %xmm1
				; AVX512VL-NEXT: vpminub %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: xorb $127, %al
				; AVX512VL-NEXT: # kill: def $al killed $al killed $eax
				; AVX512VL-NEXT: vzeroupper
				; AVX512VL-NEXT: retq
	%1 = call i8 @llvm.vector.reduce.smax.v64i8(<64 x i8> %a0)			%1 = call i8 @llvm.vector.reduce.smax.v64i8(<64 x i8> %a0)
	ret i8 %1			ret i8 %1
	}			}

	define i8 @test_v128i8(<128 x i8> %a0) {			define i8 @test_v128i8(<128 x i8> %a0) {
	; SSE2-LABEL: test_v128i8:			; SSE2-LABEL: test_v128i8:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movdqa %xmm2, %xmm8			; SSE2-NEXT: movdqa %xmm2, %xmm8
	▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vphminposuw %xmm0, %xmm0			; AVX2-NEXT: vphminposuw %xmm0, %xmm0
	; AVX2-NEXT: vmovd %xmm0, %eax			; AVX2-NEXT: vmovd %xmm0, %eax
	; AVX2-NEXT: xorb $127, %al			; AVX2-NEXT: xorb $127, %al
	; AVX2-NEXT: # kill: def $al killed $al killed $eax			; AVX2-NEXT: # kill: def $al killed $al killed $eax
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test_v128i8:			; AVX512BW-LABEL: test_v128i8:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vpmaxsb %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpmaxsb %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; AVX512BW-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512-NEXT: vpmaxsb %ymm1, %ymm0, %ymm0			; AVX512BW-NEXT: vpmaxsb %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX512-NEXT: vpmaxsb %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpmaxsb %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1			; AVX512BW-NEXT: vpsrlw $8, %xmm0, %xmm1
	; AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: xorb $127, %al			; AVX512BW-NEXT: xorb $127, %al
	; AVX512-NEXT: # kill: def $al killed $al killed $eax			; AVX512BW-NEXT: # kill: def $al killed $al killed $eax
	; AVX512-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v128i8:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vpmaxsb %zmm1, %zmm0, %zmm0
				; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm1
				; AVX512VL-NEXT: vpmaxsb %ymm1, %ymm0, %ymm0
				; AVX512VL-NEXT: vextracti128 $1, %ymm0, %xmm1
				; AVX512VL-NEXT: vpmaxsb %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vpsrlw $8, %xmm0, %xmm1
				; AVX512VL-NEXT: vpminub %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: xorb $127, %al
				; AVX512VL-NEXT: # kill: def $al killed $al killed $eax
				; AVX512VL-NEXT: vzeroupper
				; AVX512VL-NEXT: retq
	%1 = call i8 @llvm.vector.reduce.smax.v128i8(<128 x i8> %a0)			%1 = call i8 @llvm.vector.reduce.smax.v128i8(<128 x i8> %a0)
	ret i8 %1			ret i8 %1
	}			}

	declare i64 @llvm.vector.reduce.smax.v2i64(<2 x i64>)			declare i64 @llvm.vector.reduce.smax.v2i64(<2 x i64>)
	declare i64 @llvm.vector.reduce.smax.v4i64(<4 x i64>)			declare i64 @llvm.vector.reduce.smax.v4i64(<4 x i64>)
	declare i64 @llvm.vector.reduce.smax.v8i64(<8 x i64>)			declare i64 @llvm.vector.reduce.smax.v8i64(<8 x i64>)
	declare i64 @llvm.vector.reduce.smax.v16i64(<16 x i64>)			declare i64 @llvm.vector.reduce.smax.v16i64(<16 x i64>)
	Show All 21 Lines

llvm/test/CodeGen/X86/vector-reduce-smin.ll

	Show First 20 Lines • Show All 1,202 Lines • ▼ Show 20 Lines
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX-NEXT: vphminposuw %xmm0, %xmm0			; AVX-NEXT: vphminposuw %xmm0, %xmm0
	; AVX-NEXT: vmovd %xmm0, %eax			; AVX-NEXT: vmovd %xmm0, %eax
	; AVX-NEXT: xorl $32768, %eax # imm = 0x8000			; AVX-NEXT: xorl $32768, %eax # imm = 0x8000
	; AVX-NEXT: # kill: def $ax killed $ax killed $eax			; AVX-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; AVX512-LABEL: test_v8i16:			; AVX512BW-LABEL: test_v8i16:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: xorl $32768, %eax # imm = 0x8000			; AVX512BW-NEXT: xorl $32768, %eax # imm = 0x8000
	; AVX512-NEXT: # kill: def $ax killed $ax killed $eax			; AVX512BW-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v8i16:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: xorl $32768, %eax # imm = 0x8000
				; AVX512VL-NEXT: # kill: def $ax killed $ax killed $eax
				; AVX512VL-NEXT: retq
	%1 = call i16 @llvm.vector.reduce.smin.v8i16(<8 x i16> %a0)			%1 = call i16 @llvm.vector.reduce.smin.v8i16(<8 x i16> %a0)
	ret i16 %1			ret i16 %1
	}			}

	define i16 @test_v16i16(<16 x i16> %a0) {			define i16 @test_v16i16(<16 x i16> %a0) {
	; SSE2-LABEL: test_v16i16:			; SSE2-LABEL: test_v16i16:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: pminsw %xmm1, %xmm0			; SSE2-NEXT: pminsw %xmm1, %xmm0
	Show All 37 Lines
	; AVX2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX2-NEXT: vphminposuw %xmm0, %xmm0			; AVX2-NEXT: vphminposuw %xmm0, %xmm0
	; AVX2-NEXT: vmovd %xmm0, %eax			; AVX2-NEXT: vmovd %xmm0, %eax
	; AVX2-NEXT: xorl $32768, %eax # imm = 0x8000			; AVX2-NEXT: xorl $32768, %eax # imm = 0x8000
	; AVX2-NEXT: # kill: def $ax killed $ax killed $eax			; AVX2-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test_v16i16:			; AVX512BW-LABEL: test_v16i16:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX512-NEXT: vpminsw %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpminsw %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: xorl $32768, %eax # imm = 0x8000			; AVX512BW-NEXT: xorl $32768, %eax # imm = 0x8000
	; AVX512-NEXT: # kill: def $ax killed $ax killed $eax			; AVX512BW-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX512-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v16i16:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vextracti128 $1, %ymm0, %xmm1
				; AVX512VL-NEXT: vpminsw %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: xorl $32768, %eax # imm = 0x8000
				; AVX512VL-NEXT: # kill: def $ax killed $ax killed $eax
				; AVX512VL-NEXT: vzeroupper
				; AVX512VL-NEXT: retq
	%1 = call i16 @llvm.vector.reduce.smin.v16i16(<16 x i16> %a0)			%1 = call i16 @llvm.vector.reduce.smin.v16i16(<16 x i16> %a0)
	ret i16 %1			ret i16 %1
	}			}

	define i16 @test_v32i16(<32 x i16> %a0) {			define i16 @test_v32i16(<32 x i16> %a0) {
	; SSE2-LABEL: test_v32i16:			; SSE2-LABEL: test_v32i16:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: pminsw %xmm3, %xmm1			; SSE2-NEXT: pminsw %xmm3, %xmm1
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX2-NEXT: vphminposuw %xmm0, %xmm0			; AVX2-NEXT: vphminposuw %xmm0, %xmm0
	; AVX2-NEXT: vmovd %xmm0, %eax			; AVX2-NEXT: vmovd %xmm0, %eax
	; AVX2-NEXT: xorl $32768, %eax # imm = 0x8000			; AVX2-NEXT: xorl $32768, %eax # imm = 0x8000
	; AVX2-NEXT: # kill: def $ax killed $ax killed $eax			; AVX2-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test_v32i16:			; AVX512BW-LABEL: test_v32i16:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; AVX512BW-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512-NEXT: vpminsw %ymm1, %ymm0, %ymm0			; AVX512BW-NEXT: vpminsw %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX512-NEXT: vpminsw %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpminsw %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: xorl $32768, %eax # imm = 0x8000			; AVX512BW-NEXT: xorl $32768, %eax # imm = 0x8000
	; AVX512-NEXT: # kill: def $ax killed $ax killed $eax			; AVX512BW-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX512-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v32i16:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm1
				; AVX512VL-NEXT: vpminsw %ymm1, %ymm0, %ymm0
				; AVX512VL-NEXT: vextracti128 $1, %ymm0, %xmm1
				; AVX512VL-NEXT: vpminsw %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: xorl $32768, %eax # imm = 0x8000
				; AVX512VL-NEXT: # kill: def $ax killed $ax killed $eax
				; AVX512VL-NEXT: vzeroupper
				; AVX512VL-NEXT: retq
	%1 = call i16 @llvm.vector.reduce.smin.v32i16(<32 x i16> %a0)			%1 = call i16 @llvm.vector.reduce.smin.v32i16(<32 x i16> %a0)
	ret i16 %1			ret i16 %1
	}			}

	define i16 @test_v64i16(<64 x i16> %a0) {			define i16 @test_v64i16(<64 x i16> %a0) {
	; SSE2-LABEL: test_v64i16:			; SSE2-LABEL: test_v64i16:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: pminsw %xmm6, %xmm2			; SSE2-NEXT: pminsw %xmm6, %xmm2
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX2-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX2-NEXT: vphminposuw %xmm0, %xmm0			; AVX2-NEXT: vphminposuw %xmm0, %xmm0
	; AVX2-NEXT: vmovd %xmm0, %eax			; AVX2-NEXT: vmovd %xmm0, %eax
	; AVX2-NEXT: xorl $32768, %eax # imm = 0x8000			; AVX2-NEXT: xorl $32768, %eax # imm = 0x8000
	; AVX2-NEXT: # kill: def $ax killed $ax killed $eax			; AVX2-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test_v64i16:			; AVX512BW-LABEL: test_v64i16:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vpminsw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpminsw %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; AVX512BW-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512-NEXT: vpminsw %ymm1, %ymm0, %ymm0			; AVX512BW-NEXT: vpminsw %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX512-NEXT: vpminsw %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpminsw %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: xorl $32768, %eax # imm = 0x8000			; AVX512BW-NEXT: xorl $32768, %eax # imm = 0x8000
	; AVX512-NEXT: # kill: def $ax killed $ax killed $eax			; AVX512BW-NEXT: # kill: def $ax killed $ax killed $eax
	; AVX512-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v64i16:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vpminsw %zmm1, %zmm0, %zmm0
				; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm1
				; AVX512VL-NEXT: vpminsw %ymm1, %ymm0, %ymm0
				; AVX512VL-NEXT: vextracti128 $1, %ymm0, %xmm1
				; AVX512VL-NEXT: vpminsw %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: xorl $32768, %eax # imm = 0x8000
				; AVX512VL-NEXT: # kill: def $ax killed $ax killed $eax
				; AVX512VL-NEXT: vzeroupper
				; AVX512VL-NEXT: retq
	%1 = call i16 @llvm.vector.reduce.smin.v64i16(<64 x i16> %a0)			%1 = call i16 @llvm.vector.reduce.smin.v64i16(<64 x i16> %a0)
	ret i16 %1			ret i16 %1
	}			}

	;			;
	; vXi8			; vXi8
	;			;

	▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines
	; AVX-NEXT: vpsrlw $8, %xmm0, %xmm1			; AVX-NEXT: vpsrlw $8, %xmm0, %xmm1
	; AVX-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vphminposuw %xmm0, %xmm0			; AVX-NEXT: vphminposuw %xmm0, %xmm0
	; AVX-NEXT: vmovd %xmm0, %eax			; AVX-NEXT: vmovd %xmm0, %eax
	; AVX-NEXT: addb $-128, %al			; AVX-NEXT: addb $-128, %al
	; AVX-NEXT: # kill: def $al killed $al killed $eax			; AVX-NEXT: # kill: def $al killed $al killed $eax
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; AVX512-LABEL: test_v16i8:			; AVX512BW-LABEL: test_v16i8:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1			; AVX512BW-NEXT: vpsrlw $8, %xmm0, %xmm1
	; AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: addb $-128, %al			; AVX512BW-NEXT: addb $-128, %al
	; AVX512-NEXT: # kill: def $al killed $al killed $eax			; AVX512BW-NEXT: # kill: def $al killed $al killed $eax
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v16i8:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vpsrlw $8, %xmm0, %xmm1
				; AVX512VL-NEXT: vpminub %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: addb $-128, %al
				; AVX512VL-NEXT: # kill: def $al killed $al killed $eax
				; AVX512VL-NEXT: retq
	%1 = call i8 @llvm.vector.reduce.smin.v16i8(<16 x i8> %a0)			%1 = call i8 @llvm.vector.reduce.smin.v16i8(<16 x i8> %a0)
	ret i8 %1			ret i8 %1
	}			}

	define i8 @test_v32i8(<32 x i8> %a0) {			define i8 @test_v32i8(<32 x i8> %a0) {
	; SSE2-LABEL: test_v32i8:			; SSE2-LABEL: test_v32i8:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movdqa %xmm1, %xmm2			; SSE2-NEXT: movdqa %xmm1, %xmm2
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vphminposuw %xmm0, %xmm0			; AVX2-NEXT: vphminposuw %xmm0, %xmm0
	; AVX2-NEXT: vmovd %xmm0, %eax			; AVX2-NEXT: vmovd %xmm0, %eax
	; AVX2-NEXT: addb $-128, %al			; AVX2-NEXT: addb $-128, %al
	; AVX2-NEXT: # kill: def $al killed $al killed $eax			; AVX2-NEXT: # kill: def $al killed $al killed $eax
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test_v32i8:			; AVX512BW-LABEL: test_v32i8:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX512-NEXT: vpminsb %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpminsb %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1			; AVX512BW-NEXT: vpsrlw $8, %xmm0, %xmm1
	; AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: addb $-128, %al			; AVX512BW-NEXT: addb $-128, %al
	; AVX512-NEXT: # kill: def $al killed $al killed $eax			; AVX512BW-NEXT: # kill: def $al killed $al killed $eax
	; AVX512-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v32i8:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vextracti128 $1, %ymm0, %xmm1
				; AVX512VL-NEXT: vpminsb %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vpsrlw $8, %xmm0, %xmm1
				; AVX512VL-NEXT: vpminub %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: addb $-128, %al
				; AVX512VL-NEXT: # kill: def $al killed $al killed $eax
				; AVX512VL-NEXT: vzeroupper
				; AVX512VL-NEXT: retq
	%1 = call i8 @llvm.vector.reduce.smin.v32i8(<32 x i8> %a0)			%1 = call i8 @llvm.vector.reduce.smin.v32i8(<32 x i8> %a0)
	ret i8 %1			ret i8 %1
	}			}

	define i8 @test_v64i8(<64 x i8> %a0) {			define i8 @test_v64i8(<64 x i8> %a0) {
	; SSE2-LABEL: test_v64i8:			; SSE2-LABEL: test_v64i8:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movdqa %xmm2, %xmm4			; SSE2-NEXT: movdqa %xmm2, %xmm4
	▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vphminposuw %xmm0, %xmm0			; AVX2-NEXT: vphminposuw %xmm0, %xmm0
	; AVX2-NEXT: vmovd %xmm0, %eax			; AVX2-NEXT: vmovd %xmm0, %eax
	; AVX2-NEXT: addb $-128, %al			; AVX2-NEXT: addb $-128, %al
	; AVX2-NEXT: # kill: def $al killed $al killed $eax			; AVX2-NEXT: # kill: def $al killed $al killed $eax
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test_v64i8:			; AVX512BW-LABEL: test_v64i8:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; AVX512BW-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512-NEXT: vpminsb %ymm1, %ymm0, %ymm0			; AVX512BW-NEXT: vpminsb %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX512-NEXT: vpminsb %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpminsb %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1			; AVX512BW-NEXT: vpsrlw $8, %xmm0, %xmm1
	; AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: addb $-128, %al			; AVX512BW-NEXT: addb $-128, %al
	; AVX512-NEXT: # kill: def $al killed $al killed $eax			; AVX512BW-NEXT: # kill: def $al killed $al killed $eax
	; AVX512-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v64i8:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm1
				; AVX512VL-NEXT: vpminsb %ymm1, %ymm0, %ymm0
				; AVX512VL-NEXT: vextracti128 $1, %ymm0, %xmm1
				; AVX512VL-NEXT: vpminsb %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vpsrlw $8, %xmm0, %xmm1
				; AVX512VL-NEXT: vpminub %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: addb $-128, %al
				; AVX512VL-NEXT: # kill: def $al killed $al killed $eax
				; AVX512VL-NEXT: vzeroupper
				; AVX512VL-NEXT: retq
	%1 = call i8 @llvm.vector.reduce.smin.v64i8(<64 x i8> %a0)			%1 = call i8 @llvm.vector.reduce.smin.v64i8(<64 x i8> %a0)
	ret i8 %1			ret i8 %1
	}			}

	define i8 @test_v128i8(<128 x i8> %a0) {			define i8 @test_v128i8(<128 x i8> %a0) {
	; SSE2-LABEL: test_v128i8:			; SSE2-LABEL: test_v128i8:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movdqa %xmm5, %xmm8			; SSE2-NEXT: movdqa %xmm5, %xmm8
	▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vphminposuw %xmm0, %xmm0			; AVX2-NEXT: vphminposuw %xmm0, %xmm0
	; AVX2-NEXT: vmovd %xmm0, %eax			; AVX2-NEXT: vmovd %xmm0, %eax
	; AVX2-NEXT: addb $-128, %al			; AVX2-NEXT: addb $-128, %al
	; AVX2-NEXT: # kill: def $al killed $al killed $eax			; AVX2-NEXT: # kill: def $al killed $al killed $eax
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: test_v128i8:			; AVX512BW-LABEL: test_v128i8:
	; AVX512: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512-NEXT: vpminsb %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpminsb %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; AVX512BW-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512-NEXT: vpminsb %ymm1, %ymm0, %ymm0			; AVX512BW-NEXT: vpminsb %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX512-NEXT: vpminsb %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpminsb %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: vpsrlw $8, %xmm0, %xmm1			; AVX512BW-NEXT: vpsrlw $8, %xmm0, %xmm1
	; AVX512-NEXT: vpminub %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vphminposuw %xmm0, %xmm0			; AVX512BW-NEXT: vphminposuw %xmm0, %xmm0
	; AVX512-NEXT: vmovd %xmm0, %eax			; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: addb $-128, %al			; AVX512BW-NEXT: addb $-128, %al
	; AVX512-NEXT: # kill: def $al killed $al killed $eax			; AVX512BW-NEXT: # kill: def $al killed $al killed $eax
	; AVX512-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VL-LABEL: test_v128i8:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vpminsb %zmm1, %zmm0, %zmm0
				; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm1
				; AVX512VL-NEXT: vpminsb %ymm1, %ymm0, %ymm0
				; AVX512VL-NEXT: vextracti128 $1, %ymm0, %xmm1
				; AVX512VL-NEXT: vpminsb %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vpxord {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
				; AVX512VL-NEXT: vpsrlw $8, %xmm0, %xmm1
				; AVX512VL-NEXT: vpminub %xmm1, %xmm0, %xmm0
				; AVX512VL-NEXT: vphminposuw %xmm0, %xmm0
				; AVX512VL-NEXT: vmovd %xmm0, %eax
				; AVX512VL-NEXT: addb $-128, %al
				; AVX512VL-NEXT: # kill: def $al killed $al killed $eax
				; AVX512VL-NEXT: vzeroupper
				; AVX512VL-NEXT: retq
	%1 = call i8 @llvm.vector.reduce.smin.v128i8(<128 x i8> %a0)			%1 = call i8 @llvm.vector.reduce.smin.v128i8(<128 x i8> %a0)
	ret i8 %1			ret i8 %1
	}			}

	declare i64 @llvm.vector.reduce.smin.v2i64(<2 x i64>)			declare i64 @llvm.vector.reduce.smin.v2i64(<2 x i64>)
	declare i64 @llvm.vector.reduce.smin.v4i64(<4 x i64>)			declare i64 @llvm.vector.reduce.smin.v4i64(<4 x i64>)
	declare i64 @llvm.vector.reduce.smin.v8i64(<8 x i64>)			declare i64 @llvm.vector.reduce.smin.v8i64(<8 x i64>)
	declare i64 @llvm.vector.reduce.smin.v16i64(<16 x i64>)			declare i64 @llvm.vector.reduce.smin.v16i64(<16 x i64>)
	Show All 21 Lines

llvm/test/CodeGen/X86/vector-rotate-128.ll

	Show First 20 Lines • Show All 320 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero			; AVX512F-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero
	; AVX512F-NEXT: vpsllvd %xmm1, %xmm0, %xmm0			; AVX512F-NEXT: vpsllvd %xmm1, %xmm0, %xmm0
	; AVX512F-NEXT: vpsrld $16, %xmm0, %xmm0			; AVX512F-NEXT: vpsrld $16, %xmm0, %xmm0
	; AVX512F-NEXT: vpackusdw %xmm2, %xmm0, %xmm0			; AVX512F-NEXT: vpackusdw %xmm2, %xmm0, %xmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: var_rotate_v8i16:			; AVX512VL-LABEL: var_rotate_v8i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VL-NEXT: vpunpckhwd {{.*#+}} xmm2 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]			; AVX512VL-NEXT: vpunpckhwd {{.*#+}} xmm2 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
	; AVX512VL-NEXT: vpunpckhwd {{.*#+}} xmm3 = xmm0[4,4,5,5,6,6,7,7]			; AVX512VL-NEXT: vpunpckhwd {{.*#+}} xmm3 = xmm0[4,4,5,5,6,6,7,7]
	; AVX512VL-NEXT: vpsllvd %xmm2, %xmm3, %xmm2			; AVX512VL-NEXT: vpsllvd %xmm2, %xmm3, %xmm2
	; AVX512VL-NEXT: vpsrld $16, %xmm2, %xmm2			; AVX512VL-NEXT: vpsrld $16, %xmm2, %xmm2
	; AVX512VL-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]			; AVX512VL-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
	; AVX512VL-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero			; AVX512VL-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero
	; AVX512VL-NEXT: vpsllvd %xmm1, %xmm0, %xmm0			; AVX512VL-NEXT: vpsllvd %xmm1, %xmm0, %xmm0
	Show All 10 Lines
	; AVX512BW-NEXT: vpsubw %xmm1, %xmm3, %xmm1			; AVX512BW-NEXT: vpsubw %xmm1, %xmm3, %xmm1
	; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpor %xmm0, %xmm2, %xmm0			; AVX512BW-NEXT: vpor %xmm0, %xmm2, %xmm0
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_rotate_v8i16:			; AVX512VLBW-LABEL: var_rotate_v8i16:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpsllvw %xmm1, %xmm0, %xmm2			; AVX512VLBW-NEXT: vpsllvw %xmm1, %xmm0, %xmm2
	; AVX512VLBW-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]			; AVX512VLBW-NEXT: vmovdqa {{.*#+}} xmm3 = [16,16,16,16,16,16,16,16]
	; AVX512VLBW-NEXT: vpsubw %xmm1, %xmm3, %xmm1			; AVX512VLBW-NEXT: vpsubw %xmm1, %xmm3, %xmm1
	; AVX512VLBW-NEXT: vpsrlvw %xmm1, %xmm0, %xmm0			; AVX512VLBW-NEXT: vpsrlvw %xmm1, %xmm0, %xmm0
	; AVX512VLBW-NEXT: vpor %xmm0, %xmm2, %xmm0			; AVX512VLBW-NEXT: vpor %xmm0, %xmm2, %xmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: var_rotate_v8i16:			; AVX512VBMI2-LABEL: var_rotate_v8i16:
	▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: var_rotate_v16i8:			; AVX512VL-LABEL: var_rotate_v16i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero			; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero
	; AVX512VL-NEXT: vpslld $8, %zmm0, %zmm2			; AVX512VL-NEXT: vpslld $8, %zmm0, %zmm2
	; AVX512VL-NEXT: vpord %zmm2, %zmm0, %zmm0			; AVX512VL-NEXT: vpord %zmm2, %zmm0, %zmm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero,xmm1[4],zero,zero,zero,xmm1[5],zero,zero,zero,xmm1[6],zero,zero,zero,xmm1[7],zero,zero,zero,xmm1[8],zero,zero,zero,xmm1[9],zero,zero,zero,xmm1[10],zero,zero,zero,xmm1[11],zero,zero,zero,xmm1[12],zero,zero,zero,xmm1[13],zero,zero,zero,xmm1[14],zero,zero,zero,xmm1[15],zero,zero,zero			; AVX512VL-NEXT: vpmovzxbd {{.*#+}} zmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero,xmm1[4],zero,zero,zero,xmm1[5],zero,zero,zero,xmm1[6],zero,zero,zero,xmm1[7],zero,zero,zero,xmm1[8],zero,zero,zero,xmm1[9],zero,zero,zero,xmm1[10],zero,zero,zero,xmm1[11],zero,zero,zero,xmm1[12],zero,zero,zero,xmm1[13],zero,zero,zero,xmm1[14],zero,zero,zero,xmm1[15],zero,zero,zero
	; AVX512VL-NEXT: vpsllvd %zmm1, %zmm0, %zmm0			; AVX512VL-NEXT: vpsllvd %zmm1, %zmm0, %zmm0
	; AVX512VL-NEXT: vpsrld $8, %zmm0, %zmm0			; AVX512VL-NEXT: vpsrld $8, %zmm0, %zmm0
	; AVX512VL-NEXT: vpmovdb %zmm0, %xmm0			; AVX512VL-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512VL-NEXT: vzeroupper			; AVX512VL-NEXT: vzeroupper
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_rotate_v16i8:			; AVX512BW-LABEL: var_rotate_v16i8:
	Show All 9 Lines
	; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpsrlw $8, %xmm0, %xmm0			; AVX512BW-NEXT: vpsrlw $8, %xmm0, %xmm0
	; AVX512BW-NEXT: vpackuswb %xmm2, %xmm0, %xmm0			; AVX512BW-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_rotate_v16i8:			; AVX512VLBW-LABEL: var_rotate_v16i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} xmm3 = xmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} xmm3 = xmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]
	; AVX512VLBW-NEXT: vpsllvw %xmm2, %xmm3, %xmm2			; AVX512VLBW-NEXT: vpsllvw %xmm2, %xmm3, %xmm2
	; AVX512VLBW-NEXT: vpsrlw $8, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpsrlw $8, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
	; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero			; AVX512VLBW-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero
	; AVX512VLBW-NEXT: vpsllvw %xmm1, %xmm0, %xmm0			; AVX512VLBW-NEXT: vpsllvw %xmm1, %xmm0, %xmm0
	Show All 14 Lines
	; AVX512VBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpsrlw $8, %xmm0, %xmm0			; AVX512VBMI2-NEXT: vpsrlw $8, %xmm0, %xmm0
	; AVX512VBMI2-NEXT: vpackuswb %xmm2, %xmm0, %xmm0			; AVX512VBMI2-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
	; AVX512VBMI2-NEXT: vzeroupper			; AVX512VBMI2-NEXT: vzeroupper
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: var_rotate_v16i8:			; AVX512VLVBMI2-LABEL: var_rotate_v16i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} xmm2 = xmm1[8],xmm2[8],xmm1[9],xmm2[9],xmm1[10],xmm2[10],xmm1[11],xmm2[11],xmm1[12],xmm2[12],xmm1[13],xmm2[13],xmm1[14],xmm2[14],xmm1[15],xmm2[15]
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} xmm3 = xmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} xmm3 = xmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]
	; AVX512VLVBMI2-NEXT: vpsllvw %xmm2, %xmm3, %xmm2			; AVX512VLVBMI2-NEXT: vpsllvw %xmm2, %xmm3, %xmm2
	; AVX512VLVBMI2-NEXT: vpsrlw $8, %xmm2, %xmm2			; AVX512VLVBMI2-NEXT: vpsrlw $8, %xmm2, %xmm2
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
	; AVX512VLVBMI2-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero			; AVX512VLVBMI2-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero
	; AVX512VLVBMI2-NEXT: vpsllvw %xmm1, %xmm0, %xmm0			; AVX512VLVBMI2-NEXT: vpsllvw %xmm1, %xmm0, %xmm0
	▲ Show 20 Lines • Show All 1,115 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vpor %xmm1, %xmm0, %xmm0			; AVX512F-NEXT: vpor %xmm1, %xmm0, %xmm0
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_rotate_mask_v8i16:			; AVX512VL-LABEL: splatconstant_rotate_mask_v8i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsllw $5, %xmm0, %xmm1			; AVX512VL-NEXT: vpsllw $5, %xmm0, %xmm1
	; AVX512VL-NEXT: vpsrlw $11, %xmm0, %xmm0			; AVX512VL-NEXT: vpsrlw $11, %xmm0, %xmm0
	; AVX512VL-NEXT: vpternlogq $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0			; AVX512VL-NEXT: vpternlogd $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: splatconstant_rotate_mask_v8i16:			; AVX512BW-LABEL: splatconstant_rotate_mask_v8i16:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsrlw $11, %xmm0, %xmm1			; AVX512BW-NEXT: vpsrlw $11, %xmm0, %xmm1
	; AVX512BW-NEXT: vpsllw $5, %xmm0, %xmm0			; AVX512BW-NEXT: vpsllw $5, %xmm0, %xmm0
	; AVX512BW-NEXT: vpor %xmm1, %xmm0, %xmm0			; AVX512BW-NEXT: vpor %xmm1, %xmm0, %xmm0
	; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: splatconstant_rotate_mask_v8i16:			; AVX512VLBW-LABEL: splatconstant_rotate_mask_v8i16:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpsllw $5, %xmm0, %xmm1			; AVX512VLBW-NEXT: vpsllw $5, %xmm0, %xmm1
	; AVX512VLBW-NEXT: vpsrlw $11, %xmm0, %xmm0			; AVX512VLBW-NEXT: vpsrlw $11, %xmm0, %xmm0
	; AVX512VLBW-NEXT: vpternlogq $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0			; AVX512VLBW-NEXT: vpternlogd $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: splatconstant_rotate_mask_v8i16:			; AVX512VBMI2-LABEL: splatconstant_rotate_mask_v8i16:
	; AVX512VBMI2: # %bb.0:			; AVX512VBMI2: # %bb.0:
	; AVX512VBMI2-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; AVX512VBMI2-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; AVX512VBMI2-NEXT: vpshldw $5, %zmm0, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpshldw $5, %zmm0, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512VBMI2-NEXT: vzeroupper			; AVX512VBMI2-NEXT: vzeroupper
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: splatconstant_rotate_mask_v8i16:			; AVX512VLVBMI2-LABEL: splatconstant_rotate_mask_v8i16:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpshldw $5, %xmm0, %xmm0, %xmm0			; AVX512VLVBMI2-NEXT: vpshldw $5, %xmm0, %xmm0, %xmm0
	; AVX512VLVBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VLVBMI2-NEXT: retq			; AVX512VLVBMI2-NEXT: retq
	;			;
	; XOP-LABEL: splatconstant_rotate_mask_v8i16:			; XOP-LABEL: splatconstant_rotate_mask_v8i16:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: vprotw $5, %xmm0, %xmm0			; XOP-NEXT: vprotw $5, %xmm0, %xmm0
	; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; XOP-NEXT: retq			; XOP-NEXT: retq
	;			;
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; AVX512NOVLX-NEXT: vzeroupper			; AVX512NOVLX-NEXT: vzeroupper
	; AVX512NOVLX-NEXT: retq			; AVX512NOVLX-NEXT: retq
	;			;
	; AVX512VLX-LABEL: splatconstant_rotate_mask_v16i8:			; AVX512VLX-LABEL: splatconstant_rotate_mask_v16i8:
	; AVX512VLX: # %bb.0:			; AVX512VLX: # %bb.0:
	; AVX512VLX-NEXT: vpsllw $4, %xmm0, %xmm1			; AVX512VLX-NEXT: vpsllw $4, %xmm0, %xmm1
	; AVX512VLX-NEXT: vpsrlw $4, %xmm0, %xmm0			; AVX512VLX-NEXT: vpsrlw $4, %xmm0, %xmm0
	; AVX512VLX-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm1, %xmm0			; AVX512VLX-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm1, %xmm0
	; AVX512VLX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VLX-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VLX-NEXT: retq			; AVX512VLX-NEXT: retq
	;			;
	; XOP-LABEL: splatconstant_rotate_mask_v16i8:			; XOP-LABEL: splatconstant_rotate_mask_v16i8:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: vprotb $4, %xmm0, %xmm0			; XOP-NEXT: vprotb $4, %xmm0, %xmm0
	; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; XOP-NEXT: retq			; XOP-NEXT: retq
	;			;
	▲ Show 20 Lines • Show All 168 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-rotate-256.ll

	Show First 20 Lines • Show All 235 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vpunpcklwd {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,8,8,9,9,10,10,11,11]			; AVX512F-NEXT: vpunpcklwd {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,8,8,9,9,10,10,11,11]
	; AVX512F-NEXT: vpsllvd %ymm1, %ymm0, %ymm0			; AVX512F-NEXT: vpsllvd %ymm1, %ymm0, %ymm0
	; AVX512F-NEXT: vpsrld $16, %ymm0, %ymm0			; AVX512F-NEXT: vpsrld $16, %ymm0, %ymm0
	; AVX512F-NEXT: vpackusdw %ymm3, %ymm0, %ymm0			; AVX512F-NEXT: vpackusdw %ymm3, %ymm0, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: var_rotate_v16i16:			; AVX512VL-LABEL: var_rotate_v16i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VL-NEXT: vpunpckhwd {{.*#+}} ymm3 = ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15]			; AVX512VL-NEXT: vpunpckhwd {{.*#+}} ymm3 = ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15]
	; AVX512VL-NEXT: vpunpckhwd {{.*#+}} ymm4 = ymm0[4,4,5,5,6,6,7,7,12,12,13,13,14,14,15,15]			; AVX512VL-NEXT: vpunpckhwd {{.*#+}} ymm4 = ymm0[4,4,5,5,6,6,7,7,12,12,13,13,14,14,15,15]
	; AVX512VL-NEXT: vpsllvd %ymm3, %ymm4, %ymm3			; AVX512VL-NEXT: vpsllvd %ymm3, %ymm4, %ymm3
	; AVX512VL-NEXT: vpsrld $16, %ymm3, %ymm3			; AVX512VL-NEXT: vpsrld $16, %ymm3, %ymm3
	; AVX512VL-NEXT: vpunpcklwd {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11]			; AVX512VL-NEXT: vpunpcklwd {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11]
	; AVX512VL-NEXT: vpunpcklwd {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,8,8,9,9,10,10,11,11]			; AVX512VL-NEXT: vpunpcklwd {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,8,8,9,9,10,10,11,11]
	; AVX512VL-NEXT: vpsllvd %ymm1, %ymm0, %ymm0			; AVX512VL-NEXT: vpsllvd %ymm1, %ymm0, %ymm0
	Show All 9 Lines
	; AVX512BW-NEXT: vmovdqa {{.*#+}} ymm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512BW-NEXT: vmovdqa {{.*#+}} ymm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512BW-NEXT: vpsubw %ymm1, %ymm3, %ymm1			; AVX512BW-NEXT: vpsubw %ymm1, %ymm3, %ymm1
	; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpor %ymm0, %ymm2, %ymm0			; AVX512BW-NEXT: vpor %ymm0, %ymm2, %ymm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_rotate_v16i16:			; AVX512VLBW-LABEL: var_rotate_v16i16:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpsllvw %ymm1, %ymm0, %ymm2			; AVX512VLBW-NEXT: vpsllvw %ymm1, %ymm0, %ymm2
	; AVX512VLBW-NEXT: vmovdqa {{.*#+}} ymm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512VLBW-NEXT: vmovdqa {{.*#+}} ymm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512VLBW-NEXT: vpsubw %ymm1, %ymm3, %ymm1			; AVX512VLBW-NEXT: vpsubw %ymm1, %ymm3, %ymm1
	; AVX512VLBW-NEXT: vpsrlvw %ymm1, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpsrlvw %ymm1, %ymm0, %ymm0
	; AVX512VLBW-NEXT: vpor %ymm0, %ymm2, %ymm0			; AVX512VLBW-NEXT: vpor %ymm0, %ymm2, %ymm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: var_rotate_v16i16:			; AVX512VBMI2-LABEL: var_rotate_v16i16:
	▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	; AVX512VL-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsllw $2, %ymm0, %ymm2			; AVX512VL-NEXT: vpsllw $2, %ymm0, %ymm2
	; AVX512VL-NEXT: vpsrlw $6, %ymm0, %ymm3			; AVX512VL-NEXT: vpsrlw $6, %ymm0, %ymm3
	; AVX512VL-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm2, %ymm3			; AVX512VL-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm2, %ymm3
	; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1			; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1
	; AVX512VL-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsrlw $7, %ymm0, %ymm2			; AVX512VL-NEXT: vpsrlw $7, %ymm0, %ymm2
	; AVX512VL-NEXT: vpaddb %ymm0, %ymm0, %ymm3			; AVX512VL-NEXT: vpaddb %ymm0, %ymm0, %ymm3
	; AVX512VL-NEXT: vpternlogq $248, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm3			; AVX512VL-NEXT: vpternlogd $248, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm3
	; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1			; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1
	; AVX512VL-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_rotate_v32i8:			; AVX512BW-LABEL: var_rotate_v32i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} ymm2 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} ymm2 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]
	; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1
	; AVX512BW-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512BW-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm1[8],ymm3[8],ymm1[9],ymm3[9],ymm1[10],ymm3[10],ymm1[11],ymm3[11],ymm1[12],ymm3[12],ymm1[13],ymm3[13],ymm1[14],ymm3[14],ymm1[15],ymm3[15],ymm1[24],ymm3[24],ymm1[25],ymm3[25],ymm1[26],ymm3[26],ymm1[27],ymm3[27],ymm1[28],ymm3[28],ymm1[29],ymm3[29],ymm1[30],ymm3[30],ymm1[31],ymm3[31]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm1[8],ymm3[8],ymm1[9],ymm3[9],ymm1[10],ymm3[10],ymm1[11],ymm3[11],ymm1[12],ymm3[12],ymm1[13],ymm3[13],ymm1[14],ymm3[14],ymm1[15],ymm3[15],ymm1[24],ymm3[24],ymm1[25],ymm3[25],ymm1[26],ymm3[26],ymm1[27],ymm3[27],ymm1[28],ymm3[28],ymm1[29],ymm3[29],ymm1[30],ymm3[30],ymm1[31],ymm3[31]
	; AVX512BW-NEXT: vpsllvw %zmm4, %zmm2, %zmm2			; AVX512BW-NEXT: vpsllvw %zmm4, %zmm2, %zmm2
	; AVX512BW-NEXT: vpsrlw $8, %ymm2, %ymm2			; AVX512BW-NEXT: vpsrlw $8, %ymm2, %ymm2
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm3[0],ymm1[1],ymm3[1],ymm1[2],ymm3[2],ymm1[3],ymm3[3],ymm1[4],ymm3[4],ymm1[5],ymm3[5],ymm1[6],ymm3[6],ymm1[7],ymm3[7],ymm1[16],ymm3[16],ymm1[17],ymm3[17],ymm1[18],ymm3[18],ymm1[19],ymm3[19],ymm1[20],ymm3[20],ymm1[21],ymm3[21],ymm1[22],ymm3[22],ymm1[23],ymm3[23]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm3[0],ymm1[1],ymm3[1],ymm1[2],ymm3[2],ymm1[3],ymm3[3],ymm1[4],ymm3[4],ymm1[5],ymm3[5],ymm1[6],ymm3[6],ymm1[7],ymm3[7],ymm1[16],ymm3[16],ymm1[17],ymm3[17],ymm1[18],ymm3[18],ymm1[19],ymm3[19],ymm1[20],ymm3[20],ymm1[21],ymm3[21],ymm1[22],ymm3[22],ymm1[23],ymm3[23]
	; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpsrlw $8, %ymm0, %ymm0			; AVX512BW-NEXT: vpsrlw $8, %ymm0, %ymm0
	; AVX512BW-NEXT: vpackuswb %ymm2, %ymm0, %ymm0			; AVX512BW-NEXT: vpackuswb %ymm2, %ymm0, %ymm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_rotate_v32i8:			; AVX512VLBW-LABEL: var_rotate_v32i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} ymm3 = ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15],ymm1[24],ymm2[24],ymm1[25],ymm2[25],ymm1[26],ymm2[26],ymm1[27],ymm2[27],ymm1[28],ymm2[28],ymm1[29],ymm2[29],ymm1[30],ymm2[30],ymm1[31],ymm2[31]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} ymm3 = ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15],ymm1[24],ymm2[24],ymm1[25],ymm2[25],ymm1[26],ymm2[26],ymm1[27],ymm2[27],ymm1[28],ymm2[28],ymm1[29],ymm2[29],ymm1[30],ymm2[30],ymm1[31],ymm2[31]
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]
	; AVX512VLBW-NEXT: vpsllvw %ymm3, %ymm4, %ymm3			; AVX512VLBW-NEXT: vpsllvw %ymm3, %ymm4, %ymm3
	; AVX512VLBW-NEXT: vpsrlw $8, %ymm3, %ymm3			; AVX512VLBW-NEXT: vpsrlw $8, %ymm3, %ymm3
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[16],ymm2[16],ymm1[17],ymm2[17],ymm1[18],ymm2[18],ymm1[19],ymm2[19],ymm1[20],ymm2[20],ymm1[21],ymm2[21],ymm1[22],ymm2[22],ymm1[23],ymm2[23]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[16],ymm2[16],ymm1[17],ymm2[17],ymm1[18],ymm2[18],ymm1[19],ymm2[19],ymm1[20],ymm2[20],ymm1[21],ymm2[21],ymm1[22],ymm2[22],ymm1[23],ymm2[23]
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]
	; AVX512VLBW-NEXT: vpsllvw %ymm1, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpsllvw %ymm1, %ymm0, %ymm0
	Show All 13 Lines
	; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm3[0],ymm1[1],ymm3[1],ymm1[2],ymm3[2],ymm1[3],ymm3[3],ymm1[4],ymm3[4],ymm1[5],ymm3[5],ymm1[6],ymm3[6],ymm1[7],ymm3[7],ymm1[16],ymm3[16],ymm1[17],ymm3[17],ymm1[18],ymm3[18],ymm1[19],ymm3[19],ymm1[20],ymm3[20],ymm1[21],ymm3[21],ymm1[22],ymm3[22],ymm1[23],ymm3[23]			; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm3[0],ymm1[1],ymm3[1],ymm1[2],ymm3[2],ymm1[3],ymm3[3],ymm1[4],ymm3[4],ymm1[5],ymm3[5],ymm1[6],ymm3[6],ymm1[7],ymm3[7],ymm1[16],ymm3[16],ymm1[17],ymm3[17],ymm1[18],ymm3[18],ymm1[19],ymm3[19],ymm1[20],ymm3[20],ymm1[21],ymm3[21],ymm1[22],ymm3[22],ymm1[23],ymm3[23]
	; AVX512VBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpsrlw $8, %ymm0, %ymm0			; AVX512VBMI2-NEXT: vpsrlw $8, %ymm0, %ymm0
	; AVX512VBMI2-NEXT: vpackuswb %ymm2, %ymm0, %ymm0			; AVX512VBMI2-NEXT: vpackuswb %ymm2, %ymm0, %ymm0
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: var_rotate_v32i8:			; AVX512VLVBMI2-LABEL: var_rotate_v32i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm1
	; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} ymm3 = ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15],ymm1[24],ymm2[24],ymm1[25],ymm2[25],ymm1[26],ymm2[26],ymm1[27],ymm2[27],ymm1[28],ymm2[28],ymm1[29],ymm2[29],ymm1[30],ymm2[30],ymm1[31],ymm2[31]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} ymm3 = ymm1[8],ymm2[8],ymm1[9],ymm2[9],ymm1[10],ymm2[10],ymm1[11],ymm2[11],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15],ymm1[24],ymm2[24],ymm1[25],ymm2[25],ymm1[26],ymm2[26],ymm1[27],ymm2[27],ymm1[28],ymm2[28],ymm1[29],ymm2[29],ymm1[30],ymm2[30],ymm1[31],ymm2[31]
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} ymm4 = ymm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31]
	; AVX512VLVBMI2-NEXT: vpsllvw %ymm3, %ymm4, %ymm3			; AVX512VLVBMI2-NEXT: vpsllvw %ymm3, %ymm4, %ymm3
	; AVX512VLVBMI2-NEXT: vpsrlw $8, %ymm3, %ymm3			; AVX512VLVBMI2-NEXT: vpsrlw $8, %ymm3, %ymm3
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[16],ymm2[16],ymm1[17],ymm2[17],ymm1[18],ymm2[18],ymm1[19],ymm2[19],ymm1[20],ymm2[20],ymm1[21],ymm2[21],ymm1[22],ymm2[22],ymm1[23],ymm2[23]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} ymm1 = ymm1[0],ymm2[0],ymm1[1],ymm2[1],ymm1[2],ymm2[2],ymm1[3],ymm2[3],ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[16],ymm2[16],ymm1[17],ymm2[17],ymm1[18],ymm2[18],ymm1[19],ymm2[19],ymm1[20],ymm2[20],ymm1[21],ymm2[21],ymm1[22],ymm2[22],ymm1[23],ymm2[23]
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} ymm0 = ymm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23]
	; AVX512VLVBMI2-NEXT: vpsllvw %ymm1, %ymm0, %ymm0			; AVX512VLVBMI2-NEXT: vpsllvw %ymm1, %ymm0, %ymm0
	▲ Show 20 Lines • Show All 1,082 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vpor %ymm1, %ymm0, %ymm0			; AVX512F-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_rotate_mask_v16i16:			; AVX512VL-LABEL: splatconstant_rotate_mask_v16i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsllw $5, %ymm0, %ymm1			; AVX512VL-NEXT: vpsllw $5, %ymm0, %ymm1
	; AVX512VL-NEXT: vpsrlw $11, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrlw $11, %ymm0, %ymm0
	; AVX512VL-NEXT: vpternlogq $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm0			; AVX512VL-NEXT: vpternlogd $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: splatconstant_rotate_mask_v16i16:			; AVX512BW-LABEL: splatconstant_rotate_mask_v16i16:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsrlw $11, %ymm0, %ymm1			; AVX512BW-NEXT: vpsrlw $11, %ymm0, %ymm1
	; AVX512BW-NEXT: vpsllw $5, %ymm0, %ymm0			; AVX512BW-NEXT: vpsllw $5, %ymm0, %ymm0
	; AVX512BW-NEXT: vpor %ymm1, %ymm0, %ymm0			; AVX512BW-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: splatconstant_rotate_mask_v16i16:			; AVX512VLBW-LABEL: splatconstant_rotate_mask_v16i16:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpsllw $5, %ymm0, %ymm1			; AVX512VLBW-NEXT: vpsllw $5, %ymm0, %ymm1
	; AVX512VLBW-NEXT: vpsrlw $11, %ymm0, %ymm0			; AVX512VLBW-NEXT: vpsrlw $11, %ymm0, %ymm0
	; AVX512VLBW-NEXT: vpternlogq $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm0			; AVX512VLBW-NEXT: vpternlogd $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: splatconstant_rotate_mask_v16i16:			; AVX512VBMI2-LABEL: splatconstant_rotate_mask_v16i16:
	; AVX512VBMI2: # %bb.0:			; AVX512VBMI2: # %bb.0:
	; AVX512VBMI2-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0			; AVX512VBMI2-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
	; AVX512VBMI2-NEXT: vpshldw $5, %zmm0, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpshldw $5, %zmm0, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512VBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: splatconstant_rotate_mask_v16i16:			; AVX512VLVBMI2-LABEL: splatconstant_rotate_mask_v16i16:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpshldw $5, %ymm0, %ymm0, %ymm0			; AVX512VLVBMI2-NEXT: vpshldw $5, %ymm0, %ymm0, %ymm0
	; AVX512VLVBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX512VLVBMI2-NEXT: retq			; AVX512VLVBMI2-NEXT: retq
	;			;
	; XOPAVX1-LABEL: splatconstant_rotate_mask_v16i16:			; XOPAVX1-LABEL: splatconstant_rotate_mask_v16i16:
	; XOPAVX1: # %bb.0:			; XOPAVX1: # %bb.0:
	; XOPAVX1-NEXT: vprotw $5, %xmm0, %xmm1			; XOPAVX1-NEXT: vprotw $5, %xmm0, %xmm1
	; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm0			; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
	; XOPAVX1-NEXT: vprotw $5, %xmm0, %xmm0			; XOPAVX1-NEXT: vprotw $5, %xmm0, %xmm0
	; XOPAVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0			; XOPAVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; AVX512NOVLX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512NOVLX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX512NOVLX-NEXT: retq			; AVX512NOVLX-NEXT: retq
	;			;
	; AVX512VLX-LABEL: splatconstant_rotate_mask_v32i8:			; AVX512VLX-LABEL: splatconstant_rotate_mask_v32i8:
	; AVX512VLX: # %bb.0:			; AVX512VLX: # %bb.0:
	; AVX512VLX-NEXT: vpsllw $4, %ymm0, %ymm1			; AVX512VLX-NEXT: vpsllw $4, %ymm0, %ymm1
	; AVX512VLX-NEXT: vpsrlw $4, %ymm0, %ymm0			; AVX512VLX-NEXT: vpsrlw $4, %ymm0, %ymm0
	; AVX512VLX-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm1, %ymm0			; AVX512VLX-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm1, %ymm0
	; AVX512VLX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512VLX-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX512VLX-NEXT: retq			; AVX512VLX-NEXT: retq
	;			;
	; XOPAVX1-LABEL: splatconstant_rotate_mask_v32i8:			; XOPAVX1-LABEL: splatconstant_rotate_mask_v32i8:
	; XOPAVX1: # %bb.0:			; XOPAVX1: # %bb.0:
	; XOPAVX1-NEXT: vprotb $4, %xmm0, %xmm1			; XOPAVX1-NEXT: vprotb $4, %xmm0, %xmm1
	; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm0			; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
	; XOPAVX1-NEXT: vprotb $4, %xmm0, %xmm0			; XOPAVX1-NEXT: vprotb $4, %xmm0, %xmm0
	; XOPAVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0			; XOPAVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	Show All 18 Lines

llvm/test/CodeGen/X86/vector-rotate-512.ll

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	; AVX512VL-NEXT: vpsllvd %ymm1, %ymm0, %ymm0			; AVX512VL-NEXT: vpsllvd %ymm1, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsrld $16, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrld $16, %ymm0, %ymm0
	; AVX512VL-NEXT: vpackusdw %ymm3, %ymm0, %ymm0			; AVX512VL-NEXT: vpackusdw %ymm3, %ymm0, %ymm0
	; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_rotate_v32i16:			; AVX512BW-LABEL: var_rotate_v32i16:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm2			; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm2
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512BW-NEXT: vpsubw %zmm1, %zmm3, %zmm1			; AVX512BW-NEXT: vpsubw %zmm1, %zmm3, %zmm1
	; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vporq %zmm0, %zmm2, %zmm0			; AVX512BW-NEXT: vporq %zmm0, %zmm2, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_rotate_v32i16:			; AVX512VLBW-LABEL: var_rotate_v32i16:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512VLBW-NEXT: vpsllvw %zmm1, %zmm0, %zmm2			; AVX512VLBW-NEXT: vpsllvw %zmm1, %zmm0, %zmm2
	; AVX512VLBW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512VLBW-NEXT: vmovdqa64 {{.*#+}} zmm3 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512VLBW-NEXT: vpsubw %zmm1, %zmm3, %zmm1			; AVX512VLBW-NEXT: vpsubw %zmm1, %zmm3, %zmm1
	; AVX512VLBW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vporq %zmm0, %zmm2, %zmm0			; AVX512VLBW-NEXT: vporq %zmm0, %zmm2, %zmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: var_rotate_v32i16:			; AVX512VBMI2-LABEL: var_rotate_v32i16:
	▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	; AVX512VL-NEXT: vpternlogq $248, %ymm8, %ymm3, %ymm4			; AVX512VL-NEXT: vpternlogq $248, %ymm8, %ymm3, %ymm4
	; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1			; AVX512VL-NEXT: vpaddb %ymm1, %ymm1, %ymm1
	; AVX512VL-NEXT: vpblendvb %ymm1, %ymm4, %ymm0, %ymm0			; AVX512VL-NEXT: vpblendvb %ymm1, %ymm4, %ymm0, %ymm0
	; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_rotate_v64i8:			; AVX512BW-LABEL: var_rotate_v64i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]
	; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]			; AVX512BW-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]
	; AVX512BW-NEXT: vpsllvw %zmm3, %zmm4, %zmm3			; AVX512BW-NEXT: vpsllvw %zmm3, %zmm4, %zmm3
	; AVX512BW-NEXT: vpsrlw $8, %zmm3, %zmm3			; AVX512BW-NEXT: vpsrlw $8, %zmm3, %zmm3
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]
	; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]			; AVX512BW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]
	; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpsrlw $8, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlw $8, %zmm0, %zmm0
	; AVX512BW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0			; AVX512BW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: var_rotate_v64i8:			; AVX512VLBW-LABEL: var_rotate_v64i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]
	; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]			; AVX512VLBW-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]
	; AVX512VLBW-NEXT: vpsllvw %zmm3, %zmm4, %zmm3			; AVX512VLBW-NEXT: vpsllvw %zmm3, %zmm4, %zmm3
	; AVX512VLBW-NEXT: vpsrlw $8, %zmm3, %zmm3			; AVX512VLBW-NEXT: vpsrlw $8, %zmm3, %zmm3
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]
	; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]			; AVX512VLBW-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]
	; AVX512VLBW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpsrlw $8, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsrlw $8, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpackuswb %zmm3, %zmm0, %zmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: var_rotate_v64i8:			; AVX512VBMI2-LABEL: var_rotate_v64i8:
	; AVX512VBMI2: # %bb.0:			; AVX512VBMI2: # %bb.0:
	; AVX512VBMI2-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512VBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512VBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]			; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]
	; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]			; AVX512VBMI2-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]
	; AVX512VBMI2-NEXT: vpsllvw %zmm3, %zmm4, %zmm3			; AVX512VBMI2-NEXT: vpsllvw %zmm3, %zmm4, %zmm3
	; AVX512VBMI2-NEXT: vpsrlw $8, %zmm3, %zmm3			; AVX512VBMI2-NEXT: vpsrlw $8, %zmm3, %zmm3
	; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]			; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]
	; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]			; AVX512VBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]
	; AVX512VBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpsrlw $8, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpsrlw $8, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpackuswb %zmm3, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpackuswb %zmm3, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: var_rotate_v64i8:			; AVX512VLVBMI2-LABEL: var_rotate_v64i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm1			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm1
	; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512VLVBMI2-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm3 = zmm1[8],zmm2[8],zmm1[9],zmm2[9],zmm1[10],zmm2[10],zmm1[11],zmm2[11],zmm1[12],zmm2[12],zmm1[13],zmm2[13],zmm1[14],zmm2[14],zmm1[15],zmm2[15],zmm1[24],zmm2[24],zmm1[25],zmm2[25],zmm1[26],zmm2[26],zmm1[27],zmm2[27],zmm1[28],zmm2[28],zmm1[29],zmm2[29],zmm1[30],zmm2[30],zmm1[31],zmm2[31],zmm1[40],zmm2[40],zmm1[41],zmm2[41],zmm1[42],zmm2[42],zmm1[43],zmm2[43],zmm1[44],zmm2[44],zmm1[45],zmm2[45],zmm1[46],zmm2[46],zmm1[47],zmm2[47],zmm1[56],zmm2[56],zmm1[57],zmm2[57],zmm1[58],zmm2[58],zmm1[59],zmm2[59],zmm1[60],zmm2[60],zmm1[61],zmm2[61],zmm1[62],zmm2[62],zmm1[63],zmm2[63]
	; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]			; AVX512VLVBMI2-NEXT: vpunpckhbw {{.*#+}} zmm4 = zmm0[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,24,24,25,25,26,26,27,27,28,28,29,29,30,30,31,31,40,40,41,41,42,42,43,43,44,44,45,45,46,46,47,47,56,56,57,57,58,58,59,59,60,60,61,61,62,62,63,63]
	; AVX512VLVBMI2-NEXT: vpsllvw %zmm3, %zmm4, %zmm3			; AVX512VLVBMI2-NEXT: vpsllvw %zmm3, %zmm4, %zmm3
	; AVX512VLVBMI2-NEXT: vpsrlw $8, %zmm3, %zmm3			; AVX512VLVBMI2-NEXT: vpsrlw $8, %zmm3, %zmm3
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm1 = zmm1[0],zmm2[0],zmm1[1],zmm2[1],zmm1[2],zmm2[2],zmm1[3],zmm2[3],zmm1[4],zmm2[4],zmm1[5],zmm2[5],zmm1[6],zmm2[6],zmm1[7],zmm2[7],zmm1[16],zmm2[16],zmm1[17],zmm2[17],zmm1[18],zmm2[18],zmm1[19],zmm2[19],zmm1[20],zmm2[20],zmm1[21],zmm2[21],zmm1[22],zmm2[22],zmm1[23],zmm2[23],zmm1[32],zmm2[32],zmm1[33],zmm2[33],zmm1[34],zmm2[34],zmm1[35],zmm2[35],zmm1[36],zmm2[36],zmm1[37],zmm2[37],zmm1[38],zmm2[38],zmm1[39],zmm2[39],zmm1[48],zmm2[48],zmm1[49],zmm2[49],zmm1[50],zmm2[50],zmm1[51],zmm2[51],zmm1[52],zmm2[52],zmm1[53],zmm2[53],zmm1[54],zmm2[54],zmm1[55],zmm2[55]
	; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]			; AVX512VLVBMI2-NEXT: vpunpcklbw {{.*#+}} zmm0 = zmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,16,16,17,17,18,18,19,19,20,20,21,21,22,22,23,23,32,32,33,33,34,34,35,35,36,36,37,37,38,38,39,39,48,48,49,49,50,50,51,51,52,52,53,53,54,54,55,55]
	; AVX512VLVBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512VLVBMI2-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	▲ Show 20 Lines • Show All 568 Lines • ▼ Show 20 Lines
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vpsllw $5, %ymm0, %ymm1			; AVX512F-NEXT: vpsllw $5, %ymm0, %ymm1
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm2			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm2
	; AVX512F-NEXT: vpsllw $5, %ymm2, %ymm3			; AVX512F-NEXT: vpsllw $5, %ymm2, %ymm3
	; AVX512F-NEXT: vinserti64x4 $1, %ymm3, %zmm1, %zmm1			; AVX512F-NEXT: vinserti64x4 $1, %ymm3, %zmm1, %zmm1
	; AVX512F-NEXT: vpsrlw $11, %ymm0, %ymm0			; AVX512F-NEXT: vpsrlw $11, %ymm0, %ymm0
	; AVX512F-NEXT: vpsrlw $11, %ymm2, %ymm2			; AVX512F-NEXT: vpsrlw $11, %ymm2, %ymm2
	; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512F-NEXT: vpternlogq $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm0			; AVX512F-NEXT: vpternlogd $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_rotate_mask_v32i16:			; AVX512VL-LABEL: splatconstant_rotate_mask_v32i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsllw $5, %ymm0, %ymm1			; AVX512VL-NEXT: vpsllw $5, %ymm0, %ymm1
	; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm2			; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm2
	; AVX512VL-NEXT: vpsllw $5, %ymm2, %ymm3			; AVX512VL-NEXT: vpsllw $5, %ymm2, %ymm3
	; AVX512VL-NEXT: vinserti64x4 $1, %ymm3, %zmm1, %zmm1			; AVX512VL-NEXT: vinserti64x4 $1, %ymm3, %zmm1, %zmm1
	; AVX512VL-NEXT: vpsrlw $11, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrlw $11, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsrlw $11, %ymm2, %ymm2			; AVX512VL-NEXT: vpsrlw $11, %ymm2, %ymm2
	; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512VL-NEXT: vpternlogq $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm0			; AVX512VL-NEXT: vpternlogd $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: splatconstant_rotate_mask_v32i16:			; AVX512BW-LABEL: splatconstant_rotate_mask_v32i16:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsllw $5, %zmm0, %zmm1			; AVX512BW-NEXT: vpsllw $5, %zmm0, %zmm1
	; AVX512BW-NEXT: vpsrlw $11, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlw $11, %zmm0, %zmm0
	; AVX512BW-NEXT: vpternlogq $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm0			; AVX512BW-NEXT: vpternlogd $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: splatconstant_rotate_mask_v32i16:			; AVX512VLBW-LABEL: splatconstant_rotate_mask_v32i16:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpsllw $5, %zmm0, %zmm1			; AVX512VLBW-NEXT: vpsllw $5, %zmm0, %zmm1
	; AVX512VLBW-NEXT: vpsrlw $11, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsrlw $11, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpternlogq $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm0			; AVX512VLBW-NEXT: vpternlogd $168, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: splatconstant_rotate_mask_v32i16:			; AVX512VBMI2-LABEL: splatconstant_rotate_mask_v32i16:
	; AVX512VBMI2: # %bb.0:			; AVX512VBMI2: # %bb.0:
	; AVX512VBMI2-NEXT: vpshldw $5, %zmm0, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpshldw $5, %zmm0, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: splatconstant_rotate_mask_v32i16:			; AVX512VLVBMI2-LABEL: splatconstant_rotate_mask_v32i16:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpshldw $5, %zmm0, %zmm0, %zmm0			; AVX512VLVBMI2-NEXT: vpshldw $5, %zmm0, %zmm0, %zmm0
	; AVX512VLVBMI2-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512VLVBMI2-NEXT: retq			; AVX512VLVBMI2-NEXT: retq
	%shl = shl <32 x i16> %a, <i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5>			%shl = shl <32 x i16> %a, <i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5, i16 5>
	%lshr = lshr <32 x i16> %a, <i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11>			%lshr = lshr <32 x i16> %a, <i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11, i16 11>
	%rmask = and <32 x i16> %lshr, <i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55>			%rmask = and <32 x i16> %lshr, <i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55, i16 55>
	%lmask = and <32 x i16> %shl, <i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33>			%lmask = and <32 x i16> %shl, <i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33, i16 33>
	%or = or <32 x i16> %lmask, %rmask			%or = or <32 x i16> %lmask, %rmask
	ret <32 x i16> %or			ret <32 x i16> %or
	}			}

	define <64 x i8> @splatconstant_rotate_mask_v64i8(<64 x i8> %a) nounwind {			define <64 x i8> @splatconstant_rotate_mask_v64i8(<64 x i8> %a) nounwind {
	; AVX512F-LABEL: splatconstant_rotate_mask_v64i8:			; AVX512F-LABEL: splatconstant_rotate_mask_v64i8:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vpsllw $4, %ymm0, %ymm1			; AVX512F-NEXT: vpsllw $4, %ymm0, %ymm1
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm2			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm2
	; AVX512F-NEXT: vpsllw $4, %ymm2, %ymm3			; AVX512F-NEXT: vpsllw $4, %ymm2, %ymm3
	; AVX512F-NEXT: vinserti64x4 $1, %ymm3, %zmm1, %zmm1			; AVX512F-NEXT: vinserti64x4 $1, %ymm3, %zmm1, %zmm1
	; AVX512F-NEXT: vpsrlw $4, %ymm0, %ymm0			; AVX512F-NEXT: vpsrlw $4, %ymm0, %ymm0
	; AVX512F-NEXT: vpsrlw $4, %ymm2, %ymm2			; AVX512F-NEXT: vpsrlw $4, %ymm2, %ymm2
	; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512F-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm1, %zmm0			; AVX512F-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm1, %zmm0
	; AVX512F-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512F-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_rotate_mask_v64i8:			; AVX512VL-LABEL: splatconstant_rotate_mask_v64i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsllw $4, %ymm0, %ymm1			; AVX512VL-NEXT: vpsllw $4, %ymm0, %ymm1
	; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm2			; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm2
	; AVX512VL-NEXT: vpsllw $4, %ymm2, %ymm3			; AVX512VL-NEXT: vpsllw $4, %ymm2, %ymm3
	; AVX512VL-NEXT: vinserti64x4 $1, %ymm3, %zmm1, %zmm1			; AVX512VL-NEXT: vinserti64x4 $1, %ymm3, %zmm1, %zmm1
	; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrlw $4, %ymm0, %ymm0
	; AVX512VL-NEXT: vpsrlw $4, %ymm2, %ymm2			; AVX512VL-NEXT: vpsrlw $4, %ymm2, %ymm2
	; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512VL-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512VL-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm1, %zmm0			; AVX512VL-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm1, %zmm0
	; AVX512VL-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: splatconstant_rotate_mask_v64i8:			; AVX512BW-LABEL: splatconstant_rotate_mask_v64i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsllw $4, %zmm0, %zmm1			; AVX512BW-NEXT: vpsllw $4, %zmm0, %zmm1
	; AVX512BW-NEXT: vpsrlw $4, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlw $4, %zmm0, %zmm0
	; AVX512BW-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm1, %zmm0			; AVX512BW-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm1, %zmm0
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: splatconstant_rotate_mask_v64i8:			; AVX512VLBW-LABEL: splatconstant_rotate_mask_v64i8:
	; AVX512VLBW: # %bb.0:			; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vpsllw $4, %zmm0, %zmm1			; AVX512VLBW-NEXT: vpsllw $4, %zmm0, %zmm1
	; AVX512VLBW-NEXT: vpsrlw $4, %zmm0, %zmm0			; AVX512VLBW-NEXT: vpsrlw $4, %zmm0, %zmm0
	; AVX512VLBW-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm1, %zmm0			; AVX512VLBW-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm1, %zmm0
	; AVX512VLBW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512VLBW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; AVX512VBMI2-LABEL: splatconstant_rotate_mask_v64i8:			; AVX512VBMI2-LABEL: splatconstant_rotate_mask_v64i8:
	; AVX512VBMI2: # %bb.0:			; AVX512VBMI2: # %bb.0:
	; AVX512VBMI2-NEXT: vpsllw $4, %zmm0, %zmm1			; AVX512VBMI2-NEXT: vpsllw $4, %zmm0, %zmm1
	; AVX512VBMI2-NEXT: vpsrlw $4, %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpsrlw $4, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm1, %zmm0			; AVX512VBMI2-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm1, %zmm0
	; AVX512VBMI2-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512VBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: retq			; AVX512VBMI2-NEXT: retq
	;			;
	; AVX512VLVBMI2-LABEL: splatconstant_rotate_mask_v64i8:			; AVX512VLVBMI2-LABEL: splatconstant_rotate_mask_v64i8:
	; AVX512VLVBMI2: # %bb.0:			; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vpsllw $4, %zmm0, %zmm1			; AVX512VLVBMI2-NEXT: vpsllw $4, %zmm0, %zmm1
	; AVX512VLVBMI2-NEXT: vpsrlw $4, %zmm0, %zmm0			; AVX512VLVBMI2-NEXT: vpsrlw $4, %zmm0, %zmm0
	; AVX512VLVBMI2-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm1, %zmm0			; AVX512VLVBMI2-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm1, %zmm0
	; AVX512VLVBMI2-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512VLVBMI2-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512VLVBMI2-NEXT: retq			; AVX512VLVBMI2-NEXT: retq
	%shl = shl <64 x i8> %a, <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>			%shl = shl <64 x i8> %a, <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>
	%lshr = lshr <64 x i8> %a, <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>			%lshr = lshr <64 x i8> %a, <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>
	%rmask = and <64 x i8> %lshr, <i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55>			%rmask = and <64 x i8> %lshr, <i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55, i8 55>
	%lmask = and <64 x i8> %shl, <i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33>			%lmask = and <64 x i8> %shl, <i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33, i8 33>
	%or = or <64 x i8> %lmask, %rmask			%or = or <64 x i8> %lmask, %rmask
	ret <64 x i8> %or			ret <64 x i8> %or
	}			}

llvm/test/CodeGen/X86/vector-shift-ashr-128.ll

	Show First 20 Lines • Show All 1,171 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vpsraw %xmm1, %ymm0, %ymm0			; AVX512BW-NEXT: vpsraw %xmm1, %ymm0, %ymm0
	; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0			; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512BW-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0			; AVX512BW-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512DQVL-LABEL: splatvar_modulo_shift_v16i8:			; AVX512DQVL-LABEL: splatvar_modulo_shift_v16i8:
	; AVX512DQVL: # %bb.0:			; AVX512DQVL: # %bb.0:
	; AVX512DQVL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512DQVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512DQVL-NEXT: vpmovsxbd %xmm0, %zmm0			; AVX512DQVL-NEXT: vpmovsxbd %xmm0, %zmm0
	; AVX512DQVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero			; AVX512DQVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
	; AVX512DQVL-NEXT: vpsrad %xmm1, %zmm0, %zmm0			; AVX512DQVL-NEXT: vpsrad %xmm1, %zmm0, %zmm0
	; AVX512DQVL-NEXT: vpmovdb %zmm0, %xmm0			; AVX512DQVL-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512DQVL-NEXT: vzeroupper			; AVX512DQVL-NEXT: vzeroupper
	; AVX512DQVL-NEXT: retq			; AVX512DQVL-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: splatvar_modulo_shift_v16i8:			; AVX512BWVL-LABEL: splatvar_modulo_shift_v16i8:
	; AVX512BWVL: # %bb.0:			; AVX512BWVL: # %bb.0:
	; AVX512BWVL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512BWVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpmovsxbw %xmm0, %ymm0			; AVX512BWVL-NEXT: vpmovsxbw %xmm0, %ymm0
	; AVX512BWVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero			; AVX512BWVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
	; AVX512BWVL-NEXT: vpsraw %xmm1, %ymm0, %ymm0			; AVX512BWVL-NEXT: vpsraw %xmm1, %ymm0, %ymm0
	; AVX512BWVL-NEXT: vpmovwb %ymm0, %xmm0			; AVX512BWVL-NEXT: vpmovwb %ymm0, %xmm0
	; AVX512BWVL-NEXT: vzeroupper			; AVX512BWVL-NEXT: vzeroupper
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq
	;			;
	; X86-SSE-LABEL: splatvar_modulo_shift_v16i8:			; X86-SSE-LABEL: splatvar_modulo_shift_v16i8:
	▲ Show 20 Lines • Show All 531 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: vpxor %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpxor %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpsubb %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpsubb %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_shift_v16i8:			; AVX512VL-LABEL: splatconstant_shift_v16i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0			; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0
	; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512VL-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0			; AVX512VL-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm0
	; AVX512VL-NEXT: vpsubb %xmm1, %xmm0, %xmm0			; AVX512VL-NEXT: vpsubb %xmm1, %xmm0, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; X86-SSE-LABEL: splatconstant_shift_v16i8:			; X86-SSE-LABEL: splatconstant_shift_v16i8:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: psrlw $3, %xmm0			; X86-SSE-NEXT: psrlw $3, %xmm0
	; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: movdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; X86-SSE-NEXT: movdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shift-ashr-256.ll

	Show First 20 Lines • Show All 1,317 Lines • ▼ Show 20 Lines
	; AVX512DQVL-NEXT: vpbroadcastb %xmm1, %ymm1			; AVX512DQVL-NEXT: vpbroadcastb %xmm1, %ymm1
	; AVX512DQVL-NEXT: vpternlogq $108, %ymm0, %ymm2, %ymm1			; AVX512DQVL-NEXT: vpternlogq $108, %ymm0, %ymm2, %ymm1
	; AVX512DQVL-NEXT: vpsubb %ymm2, %ymm1, %ymm0			; AVX512DQVL-NEXT: vpsubb %ymm2, %ymm1, %ymm0
	; AVX512DQVL-NEXT: retq			; AVX512DQVL-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: splatvar_modulo_shift_v32i8:			; AVX512BWVL-LABEL: splatvar_modulo_shift_v32i8:
	; AVX512BWVL: # %bb.0:			; AVX512BWVL: # %bb.0:
	; AVX512BWVL-NEXT: vpmovsxbw %ymm0, %zmm0			; AVX512BWVL-NEXT: vpmovsxbw %ymm0, %zmm0
	; AVX512BWVL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512BWVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero			; AVX512BWVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
	; AVX512BWVL-NEXT: vpsraw %xmm1, %zmm0, %zmm0			; AVX512BWVL-NEXT: vpsraw %xmm1, %zmm0, %zmm0
	; AVX512BWVL-NEXT: vpmovwb %zmm0, %ymm0			; AVX512BWVL-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq
	;			;
	; X86-AVX1-LABEL: splatvar_modulo_shift_v32i8:			; X86-AVX1-LABEL: splatvar_modulo_shift_v32i8:
	; X86-AVX1: # %bb.0:			; X86-AVX1: # %bb.0:
	; X86-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2			; X86-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
	▲ Show 20 Lines • Show All 655 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: vpxor %ymm1, %ymm0, %ymm0			; AVX512-NEXT: vpxor %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vpsubb %ymm1, %ymm0, %ymm0			; AVX512-NEXT: vpsubb %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_shift_v32i8:			; AVX512VL-LABEL: splatconstant_shift_v32i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsrlw $3, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrlw $3, %ymm0, %ymm0
	; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512VL-NEXT: vmovdqa {{.*#+}} ymm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512VL-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm0			; AVX512VL-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm1, %ymm0
	; AVX512VL-NEXT: vpsubb %ymm1, %ymm0, %ymm0			; AVX512VL-NEXT: vpsubb %ymm1, %ymm0, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; X86-AVX1-LABEL: splatconstant_shift_v32i8:			; X86-AVX1-LABEL: splatconstant_shift_v32i8:
	; X86-AVX1: # %bb.0:			; X86-AVX1: # %bb.0:
	; X86-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1			; X86-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
	; X86-AVX1-NEXT: vpsrlw $3, %xmm1, %xmm1			; X86-AVX1-NEXT: vpsrlw $3, %xmm1, %xmm1
	; X86-AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31]			; X86-AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31]
	▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shift-ashr-512.ll

	Show First 20 Lines • Show All 459 Lines • ▼ Show 20 Lines
	; AVX512DQ-NEXT: vpsubb %ymm3, %ymm0, %ymm0			; AVX512DQ-NEXT: vpsubb %ymm3, %ymm0, %ymm0
	; AVX512DQ-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512DQ-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512DQ-NEXT: retq			; AVX512DQ-NEXT: retq
	;			;
	; AVX512BW-LABEL: splatconstant_shift_v64i8:			; AVX512BW-LABEL: splatconstant_shift_v64i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsrlw $3, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlw $3, %zmm0, %zmm0
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512BW-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm1, %zmm0			; AVX512BW-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm1, %zmm0
	; AVX512BW-NEXT: vpsubb %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsubb %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%shift = ashr <64 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>			%shift = ashr <64 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
	ret <64 x i8> %shift			ret <64 x i8> %shift
	}			}

	define <64 x i8> @ashr_const7_v64i8(<64 x i8> %a) {			define <64 x i8> @ashr_const7_v64i8(<64 x i8> %a) {
	; AVX512DQ-LABEL: ashr_const7_v64i8:			; AVX512DQ-LABEL: ashr_const7_v64i8:
	Show All 29 Lines

llvm/test/CodeGen/X86/vector-shift-ashr-sub128.ll

	Show First 20 Lines • Show All 2,329 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: vpxor %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpxor %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpsubb %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpsubb %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_shift_v8i8:			; AVX512VL-LABEL: splatconstant_shift_v8i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0			; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0
	; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512VL-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0			; AVX512VL-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm0
	; AVX512VL-NEXT: vpsubb %xmm1, %xmm0, %xmm0			; AVX512VL-NEXT: vpsubb %xmm1, %xmm0, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; X86-SSE-LABEL: splatconstant_shift_v8i8:			; X86-SSE-LABEL: splatconstant_shift_v8i8:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: psrlw $3, %xmm0			; X86-SSE-NEXT: psrlw $3, %xmm0
	; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: movdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; X86-SSE-NEXT: movdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	Show All 36 Lines
	; AVX512-NEXT: vpxor %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpxor %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpsubb %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpsubb %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_shift_v4i8:			; AVX512VL-LABEL: splatconstant_shift_v4i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0			; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0
	; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512VL-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0			; AVX512VL-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm0
	; AVX512VL-NEXT: vpsubb %xmm1, %xmm0, %xmm0			; AVX512VL-NEXT: vpsubb %xmm1, %xmm0, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; X86-SSE-LABEL: splatconstant_shift_v4i8:			; X86-SSE-LABEL: splatconstant_shift_v4i8:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: psrlw $3, %xmm0			; X86-SSE-NEXT: psrlw $3, %xmm0
	; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: movdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; X86-SSE-NEXT: movdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	Show All 36 Lines
	; AVX512-NEXT: vpxor %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpxor %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpsubb %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpsubb %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_shift_v2i8:			; AVX512VL-LABEL: splatconstant_shift_v2i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0			; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0
	; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; AVX512VL-NEXT: vpternlogq $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0			; AVX512VL-NEXT: vpternlogd $108, {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm0
	; AVX512VL-NEXT: vpsubb %xmm1, %xmm0, %xmm0			; AVX512VL-NEXT: vpsubb %xmm1, %xmm0, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; X86-SSE-LABEL: splatconstant_shift_v2i8:			; X86-SSE-LABEL: splatconstant_shift_v2i8:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: psrlw $3, %xmm0			; X86-SSE-NEXT: psrlw $3, %xmm0
	; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: movdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]			; X86-SSE-NEXT: movdqa {{.*#+}} xmm1 = [16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
	; X86-SSE-NEXT: pxor %xmm1, %xmm0			; X86-SSE-NEXT: pxor %xmm1, %xmm0
	; X86-SSE-NEXT: psubb %xmm1, %xmm0			; X86-SSE-NEXT: psubb %xmm1, %xmm0
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	%shift = ashr <2 x i8> %a, <i8 3, i8 3>			%shift = ashr <2 x i8> %a, <i8 3, i8 3>
	ret <2 x i8> %shift			ret <2 x i8> %shift
	}			}

llvm/test/CodeGen/X86/vector-shift-lshr-128.ll

	Show First 20 Lines • Show All 974 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vpsrlw %xmm1, %ymm0, %ymm0			; AVX512BW-NEXT: vpsrlw %xmm1, %ymm0, %ymm0
	; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0			; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512BW-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0			; AVX512BW-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512DQVL-LABEL: splatvar_modulo_shift_v16i8:			; AVX512DQVL-LABEL: splatvar_modulo_shift_v16i8:
	; AVX512DQVL: # %bb.0:			; AVX512DQVL: # %bb.0:
	; AVX512DQVL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512DQVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512DQVL-NEXT: vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero			; AVX512DQVL-NEXT: vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero
	; AVX512DQVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero			; AVX512DQVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
	; AVX512DQVL-NEXT: vpsrld %xmm1, %zmm0, %zmm0			; AVX512DQVL-NEXT: vpsrld %xmm1, %zmm0, %zmm0
	; AVX512DQVL-NEXT: vpmovdb %zmm0, %xmm0			; AVX512DQVL-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512DQVL-NEXT: vzeroupper			; AVX512DQVL-NEXT: vzeroupper
	; AVX512DQVL-NEXT: retq			; AVX512DQVL-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: splatvar_modulo_shift_v16i8:			; AVX512BWVL-LABEL: splatvar_modulo_shift_v16i8:
	; AVX512BWVL: # %bb.0:			; AVX512BWVL: # %bb.0:
	; AVX512BWVL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512BWVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero			; AVX512BWVL-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero
	; AVX512BWVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero			; AVX512BWVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
	; AVX512BWVL-NEXT: vpsrlw %xmm1, %ymm0, %ymm0			; AVX512BWVL-NEXT: vpsrlw %xmm1, %ymm0, %ymm0
	; AVX512BWVL-NEXT: vpmovwb %ymm0, %xmm0			; AVX512BWVL-NEXT: vpmovwb %ymm0, %xmm0
	; AVX512BWVL-NEXT: vzeroupper			; AVX512BWVL-NEXT: vzeroupper
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq
	;			;
	; X86-SSE-LABEL: splatvar_modulo_shift_v16i8:			; X86-SSE-LABEL: splatvar_modulo_shift_v16i8:
	▲ Show 20 Lines • Show All 456 Lines • ▼ Show 20 Lines
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpsrlw $3, %xmm0, %xmm0			; AVX512-NEXT: vpsrlw $3, %xmm0, %xmm0
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_shift_v16i8:			; AVX512VL-LABEL: splatconstant_shift_v16i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0			; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; X86-SSE-LABEL: splatconstant_shift_v16i8:			; X86-SSE-LABEL: splatconstant_shift_v16i8:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: psrlw $3, %xmm0			; X86-SSE-NEXT: psrlw $3, %xmm0
	; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	%shift = lshr <16 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>			%shift = lshr <16 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
	▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shift-lshr-256.ll

	Show First 20 Lines • Show All 440 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0			; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512DQVL-LABEL: var_shift_v32i8:			; AVX512DQVL-LABEL: var_shift_v32i8:
	; AVX512DQVL: # %bb.0:			; AVX512DQVL: # %bb.0:
	; AVX512DQVL-NEXT: vpsllw $5, %ymm1, %ymm1			; AVX512DQVL-NEXT: vpsllw $5, %ymm1, %ymm1
	; AVX512DQVL-NEXT: vpsrlw $4, %ymm0, %ymm2			; AVX512DQVL-NEXT: vpsrlw $4, %ymm0, %ymm2
	; AVX512DQVL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2			; AVX512DQVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
	; AVX512DQVL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0			; AVX512DQVL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
	; AVX512DQVL-NEXT: vpsrlw $2, %ymm0, %ymm2			; AVX512DQVL-NEXT: vpsrlw $2, %ymm0, %ymm2
	; AVX512DQVL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2			; AVX512DQVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
	; AVX512DQVL-NEXT: vpaddb %ymm1, %ymm1, %ymm1			; AVX512DQVL-NEXT: vpaddb %ymm1, %ymm1, %ymm1
	; AVX512DQVL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0			; AVX512DQVL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
	; AVX512DQVL-NEXT: vpsrlw $1, %ymm0, %ymm2			; AVX512DQVL-NEXT: vpsrlw $1, %ymm0, %ymm2
	; AVX512DQVL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2			; AVX512DQVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
	; AVX512DQVL-NEXT: vpaddb %ymm1, %ymm1, %ymm1			; AVX512DQVL-NEXT: vpaddb %ymm1, %ymm1, %ymm1
	; AVX512DQVL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0			; AVX512DQVL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
	; AVX512DQVL-NEXT: retq			; AVX512DQVL-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: var_shift_v32i8:			; AVX512BWVL-LABEL: var_shift_v32i8:
	; AVX512BWVL: # %bb.0:			; AVX512BWVL: # %bb.0:
	; AVX512BWVL-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero			; AVX512BWVL-NEXT: vpmovzxbw {{.*#+}} zmm1 = ymm1[0],zero,ymm1[1],zero,ymm1[2],zero,ymm1[3],zero,ymm1[4],zero,ymm1[5],zero,ymm1[6],zero,ymm1[7],zero,ymm1[8],zero,ymm1[9],zero,ymm1[10],zero,ymm1[11],zero,ymm1[12],zero,ymm1[13],zero,ymm1[14],zero,ymm1[15],zero,ymm1[16],zero,ymm1[17],zero,ymm1[18],zero,ymm1[19],zero,ymm1[20],zero,ymm1[21],zero,ymm1[22],zero,ymm1[23],zero,ymm1[24],zero,ymm1[25],zero,ymm1[26],zero,ymm1[27],zero,ymm1[28],zero,ymm1[29],zero,ymm1[30],zero,ymm1[31],zero
	; AVX512BWVL-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero			; AVX512BWVL-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
	▲ Show 20 Lines • Show All 623 Lines • ▼ Show 20 Lines
	; AVX512DQVL-NEXT: vpsrlw $8, %xmm1, %xmm1			; AVX512DQVL-NEXT: vpsrlw $8, %xmm1, %xmm1
	; AVX512DQVL-NEXT: vpbroadcastb %xmm1, %ymm1			; AVX512DQVL-NEXT: vpbroadcastb %xmm1, %ymm1
	; AVX512DQVL-NEXT: vpand %ymm1, %ymm0, %ymm0			; AVX512DQVL-NEXT: vpand %ymm1, %ymm0, %ymm0
	; AVX512DQVL-NEXT: retq			; AVX512DQVL-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: splatvar_modulo_shift_v32i8:			; AVX512BWVL-LABEL: splatvar_modulo_shift_v32i8:
	; AVX512BWVL: # %bb.0:			; AVX512BWVL: # %bb.0:
	; AVX512BWVL-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero			; AVX512BWVL-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
	; AVX512BWVL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512BWVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero			; AVX512BWVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
	; AVX512BWVL-NEXT: vpsrlw %xmm1, %zmm0, %zmm0			; AVX512BWVL-NEXT: vpsrlw %xmm1, %zmm0, %zmm0
	; AVX512BWVL-NEXT: vpmovwb %zmm0, %ymm0			; AVX512BWVL-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq
	;			;
	; X86-AVX1-LABEL: splatvar_modulo_shift_v32i8:			; X86-AVX1-LABEL: splatvar_modulo_shift_v32i8:
	; X86-AVX1: # %bb.0:			; X86-AVX1: # %bb.0:
	; X86-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2			; X86-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
	▲ Show 20 Lines • Show All 570 Lines • ▼ Show 20 Lines
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpsrlw $3, %ymm0, %ymm0			; AVX512-NEXT: vpsrlw $3, %ymm0, %ymm0
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_shift_v32i8:			; AVX512VL-LABEL: splatconstant_shift_v32i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsrlw $3, %ymm0, %ymm0			; AVX512VL-NEXT: vpsrlw $3, %ymm0, %ymm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; X86-AVX1-LABEL: splatconstant_shift_v32i8:			; X86-AVX1-LABEL: splatconstant_shift_v32i8:
	; X86-AVX1: # %bb.0:			; X86-AVX1: # %bb.0:
	; X86-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1			; X86-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
	; X86-AVX1-NEXT: vpsrlw $3, %xmm1, %xmm1			; X86-AVX1-NEXT: vpsrlw $3, %xmm1, %xmm1
	; X86-AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31]			; X86-AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31]
	; X86-AVX1-NEXT: vpand %xmm2, %xmm1, %xmm1			; X86-AVX1-NEXT: vpand %xmm2, %xmm1, %xmm1
	▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shift-lshr-512.ll

	Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; AVX512DQ-NEXT: vpaddb %ymm1, %ymm1, %ymm1			; AVX512DQ-NEXT: vpaddb %ymm1, %ymm1, %ymm1
	; AVX512DQ-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0			; AVX512DQ-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0
	; AVX512DQ-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512DQ-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512DQ-NEXT: retq			; AVX512DQ-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_shift_v64i8:			; AVX512BW-LABEL: var_shift_v64i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsrlw $4, %zmm0, %zmm2			; AVX512BW-NEXT: vpsrlw $4, %zmm0, %zmm2
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512BW-NEXT: vpsllw $5, %zmm1, %zmm1			; AVX512BW-NEXT: vpsllw $5, %zmm1, %zmm1
	; AVX512BW-NEXT: vpmovb2m %zmm1, %k1			; AVX512BW-NEXT: vpmovb2m %zmm1, %k1
	; AVX512BW-NEXT: vmovdqu8 %zmm2, %zmm0 {%k1}			; AVX512BW-NEXT: vmovdqu8 %zmm2, %zmm0 {%k1}
	; AVX512BW-NEXT: vpsrlw $2, %zmm0, %zmm2			; AVX512BW-NEXT: vpsrlw $2, %zmm0, %zmm2
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512BW-NEXT: vpaddb %zmm1, %zmm1, %zmm1			; AVX512BW-NEXT: vpaddb %zmm1, %zmm1, %zmm1
	; AVX512BW-NEXT: vpmovb2m %zmm1, %k1			; AVX512BW-NEXT: vpmovb2m %zmm1, %k1
	; AVX512BW-NEXT: vmovdqu8 %zmm2, %zmm0 {%k1}			; AVX512BW-NEXT: vmovdqu8 %zmm2, %zmm0 {%k1}
	; AVX512BW-NEXT: vpsrlw $1, %zmm0, %zmm2			; AVX512BW-NEXT: vpsrlw $1, %zmm0, %zmm2
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512BW-NEXT: vpaddb %zmm1, %zmm1, %zmm1			; AVX512BW-NEXT: vpaddb %zmm1, %zmm1, %zmm1
	; AVX512BW-NEXT: vpmovb2m %zmm1, %k1			; AVX512BW-NEXT: vpmovb2m %zmm1, %k1
	; AVX512BW-NEXT: vmovdqu8 %zmm2, %zmm0 {%k1}			; AVX512BW-NEXT: vmovdqu8 %zmm2, %zmm0 {%k1}
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%shift = lshr <64 x i8> %a, %b			%shift = lshr <64 x i8> %a, %b
	ret <64 x i8> %shift			ret <64 x i8> %shift
	}			}

	▲ Show 20 Lines • Show All 280 Lines • ▼ Show 20 Lines

	define <64 x i8> @splatconstant_shift_v64i8(<64 x i8> %a) nounwind {			define <64 x i8> @splatconstant_shift_v64i8(<64 x i8> %a) nounwind {
	; AVX512DQ-LABEL: splatconstant_shift_v64i8:			; AVX512DQ-LABEL: splatconstant_shift_v64i8:
	; AVX512DQ: # %bb.0:			; AVX512DQ: # %bb.0:
	; AVX512DQ-NEXT: vpsrlw $3, %ymm0, %ymm1			; AVX512DQ-NEXT: vpsrlw $3, %ymm0, %ymm1
	; AVX512DQ-NEXT: vextracti64x4 $1, %zmm0, %ymm0			; AVX512DQ-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; AVX512DQ-NEXT: vpsrlw $3, %ymm0, %ymm0			; AVX512DQ-NEXT: vpsrlw $3, %ymm0, %ymm0
	; AVX512DQ-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0			; AVX512DQ-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0
	; AVX512DQ-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512DQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512DQ-NEXT: retq			; AVX512DQ-NEXT: retq
	;			;
	; AVX512BW-LABEL: splatconstant_shift_v64i8:			; AVX512BW-LABEL: splatconstant_shift_v64i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsrlw $3, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlw $3, %zmm0, %zmm0
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%shift = lshr <64 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>			%shift = lshr <64 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
	ret <64 x i8> %shift			ret <64 x i8> %shift
	}			}

llvm/test/CodeGen/X86/vector-shift-lshr-sub128.ll

	Show First 20 Lines • Show All 2,028 Lines • ▼ Show 20 Lines
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpsrlw $3, %xmm0, %xmm0			; AVX512-NEXT: vpsrlw $3, %xmm0, %xmm0
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_shift_v8i8:			; AVX512VL-LABEL: splatconstant_shift_v8i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0			; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; X86-SSE-LABEL: splatconstant_shift_v8i8:			; X86-SSE-LABEL: splatconstant_shift_v8i8:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: psrlw $3, %xmm0			; X86-SSE-NEXT: psrlw $3, %xmm0
	; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	%shift = lshr <8 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>			%shift = lshr <8 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
	Show All 22 Lines
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpsrlw $3, %xmm0, %xmm0			; AVX512-NEXT: vpsrlw $3, %xmm0, %xmm0
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_shift_v4i8:			; AVX512VL-LABEL: splatconstant_shift_v4i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0			; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; X86-SSE-LABEL: splatconstant_shift_v4i8:			; X86-SSE-LABEL: splatconstant_shift_v4i8:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: psrlw $3, %xmm0			; X86-SSE-NEXT: psrlw $3, %xmm0
	; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	%shift = lshr <4 x i8> %a, <i8 3, i8 3, i8 3, i8 3>			%shift = lshr <4 x i8> %a, <i8 3, i8 3, i8 3, i8 3>
	Show All 22 Lines
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpsrlw $3, %xmm0, %xmm0			; AVX512-NEXT: vpsrlw $3, %xmm0, %xmm0
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_shift_v2i8:			; AVX512VL-LABEL: splatconstant_shift_v2i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0			; AVX512VL-NEXT: vpsrlw $3, %xmm0, %xmm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; X86-SSE-LABEL: splatconstant_shift_v2i8:			; X86-SSE-LABEL: splatconstant_shift_v2i8:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: psrlw $3, %xmm0			; X86-SSE-NEXT: psrlw $3, %xmm0
	; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	%shift = lshr <2 x i8> %a, <i8 3, i8 3>			%shift = lshr <2 x i8> %a, <i8 3, i8 3>
	ret <2 x i8> %shift			ret <2 x i8> %shift
	}			}

llvm/test/CodeGen/X86/vector-shift-shl-128.ll

	Show First 20 Lines • Show All 878 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vpsllw %xmm1, %ymm0, %ymm0			; AVX512BW-NEXT: vpsllw %xmm1, %ymm0, %ymm0
	; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0			; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512BW-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0			; AVX512BW-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512DQVL-LABEL: splatvar_modulo_shift_v16i8:			; AVX512DQVL-LABEL: splatvar_modulo_shift_v16i8:
	; AVX512DQVL: # %bb.0:			; AVX512DQVL: # %bb.0:
	; AVX512DQVL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512DQVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512DQVL-NEXT: vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero			; AVX512DQVL-NEXT: vpmovzxbd {{.*#+}} zmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero,xmm0[8],zero,zero,zero,xmm0[9],zero,zero,zero,xmm0[10],zero,zero,zero,xmm0[11],zero,zero,zero,xmm0[12],zero,zero,zero,xmm0[13],zero,zero,zero,xmm0[14],zero,zero,zero,xmm0[15],zero,zero,zero
	; AVX512DQVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero			; AVX512DQVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
	; AVX512DQVL-NEXT: vpslld %xmm1, %zmm0, %zmm0			; AVX512DQVL-NEXT: vpslld %xmm1, %zmm0, %zmm0
	; AVX512DQVL-NEXT: vpmovdb %zmm0, %xmm0			; AVX512DQVL-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512DQVL-NEXT: vzeroupper			; AVX512DQVL-NEXT: vzeroupper
	; AVX512DQVL-NEXT: retq			; AVX512DQVL-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: splatvar_modulo_shift_v16i8:			; AVX512BWVL-LABEL: splatvar_modulo_shift_v16i8:
	; AVX512BWVL: # %bb.0:			; AVX512BWVL: # %bb.0:
	; AVX512BWVL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512BWVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero			; AVX512BWVL-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero
	; AVX512BWVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero			; AVX512BWVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
	; AVX512BWVL-NEXT: vpsllw %xmm1, %ymm0, %ymm0			; AVX512BWVL-NEXT: vpsllw %xmm1, %ymm0, %ymm0
	; AVX512BWVL-NEXT: vpmovwb %ymm0, %xmm0			; AVX512BWVL-NEXT: vpmovwb %ymm0, %xmm0
	; AVX512BWVL-NEXT: vzeroupper			; AVX512BWVL-NEXT: vzeroupper
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq
	;			;
	; X86-SSE-LABEL: splatvar_modulo_shift_v16i8:			; X86-SSE-LABEL: splatvar_modulo_shift_v16i8:
	▲ Show 20 Lines • Show All 416 Lines • ▼ Show 20 Lines
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpsllw $3, %xmm0, %xmm0			; AVX512-NEXT: vpsllw $3, %xmm0, %xmm0
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_shift_v16i8:			; AVX512VL-LABEL: splatconstant_shift_v16i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsllw $3, %xmm0, %xmm0			; AVX512VL-NEXT: vpsllw $3, %xmm0, %xmm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; X86-SSE-LABEL: splatconstant_shift_v16i8:			; X86-SSE-LABEL: splatconstant_shift_v16i8:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: psllw $3, %xmm0			; X86-SSE-NEXT: psllw $3, %xmm0
	; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	%shift = shl <16 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>			%shift = shl <16 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
	ret <16 x i8> %shift			ret <16 x i8> %shift
	}			}

llvm/test/CodeGen/X86/vector-shift-shl-256.ll

	Show First 20 Lines • Show All 375 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsllvw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0			; AVX512BW-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512DQVL-LABEL: var_shift_v32i8:			; AVX512DQVL-LABEL: var_shift_v32i8:
	; AVX512DQVL: # %bb.0:			; AVX512DQVL: # %bb.0:
	; AVX512DQVL-NEXT: vpsllw $5, %ymm1, %ymm1			; AVX512DQVL-NEXT: vpsllw $5, %ymm1, %ymm1
	; AVX512DQVL-NEXT: vpsllw $4, %ymm0, %ymm2			; AVX512DQVL-NEXT: vpsllw $4, %ymm0, %ymm2
	; AVX512DQVL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2			; AVX512DQVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
	; AVX512DQVL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0			; AVX512DQVL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
	; AVX512DQVL-NEXT: vpsllw $2, %ymm0, %ymm2			; AVX512DQVL-NEXT: vpsllw $2, %ymm0, %ymm2
	; AVX512DQVL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2			; AVX512DQVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm2, %ymm2
	; AVX512DQVL-NEXT: vpaddb %ymm1, %ymm1, %ymm1			; AVX512DQVL-NEXT: vpaddb %ymm1, %ymm1, %ymm1
	; AVX512DQVL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0			; AVX512DQVL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
	; AVX512DQVL-NEXT: vpaddb %ymm0, %ymm0, %ymm2			; AVX512DQVL-NEXT: vpaddb %ymm0, %ymm0, %ymm2
	; AVX512DQVL-NEXT: vpaddb %ymm1, %ymm1, %ymm1			; AVX512DQVL-NEXT: vpaddb %ymm1, %ymm1, %ymm1
	; AVX512DQVL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0			; AVX512DQVL-NEXT: vpblendvb %ymm1, %ymm2, %ymm0, %ymm0
	; AVX512DQVL-NEXT: retq			; AVX512DQVL-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: var_shift_v32i8:			; AVX512BWVL-LABEL: var_shift_v32i8:
	▲ Show 20 Lines • Show All 612 Lines • ▼ Show 20 Lines
	; AVX512DQVL-NEXT: vpsllw %xmm1, %xmm2, %xmm1			; AVX512DQVL-NEXT: vpsllw %xmm1, %xmm2, %xmm1
	; AVX512DQVL-NEXT: vpbroadcastb %xmm1, %ymm1			; AVX512DQVL-NEXT: vpbroadcastb %xmm1, %ymm1
	; AVX512DQVL-NEXT: vpand %ymm1, %ymm0, %ymm0			; AVX512DQVL-NEXT: vpand %ymm1, %ymm0, %ymm0
	; AVX512DQVL-NEXT: retq			; AVX512DQVL-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: splatvar_modulo_shift_v32i8:			; AVX512BWVL-LABEL: splatvar_modulo_shift_v32i8:
	; AVX512BWVL: # %bb.0:			; AVX512BWVL: # %bb.0:
	; AVX512BWVL-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero			; AVX512BWVL-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
	; AVX512BWVL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1			; AVX512BWVL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero			; AVX512BWVL-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,zero,zero,zero,zero,xmm1[1],zero,zero,zero,zero,zero,zero,zero
	; AVX512BWVL-NEXT: vpsllw %xmm1, %zmm0, %zmm0			; AVX512BWVL-NEXT: vpsllw %xmm1, %zmm0, %zmm0
	; AVX512BWVL-NEXT: vpmovwb %zmm0, %ymm0			; AVX512BWVL-NEXT: vpmovwb %zmm0, %ymm0
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq
	;			;
	; X86-AVX1-LABEL: splatvar_modulo_shift_v32i8:			; X86-AVX1-LABEL: splatvar_modulo_shift_v32i8:
	; X86-AVX1: # %bb.0:			; X86-AVX1: # %bb.0:
	; X86-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2			; X86-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
	▲ Show 20 Lines • Show All 538 Lines • ▼ Show 20 Lines
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpsllw $3, %ymm0, %ymm0			; AVX512-NEXT: vpsllw $3, %ymm0, %ymm0
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_shift_v32i8:			; AVX512VL-LABEL: splatconstant_shift_v32i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsllw $3, %ymm0, %ymm0			; AVX512VL-NEXT: vpsllw $3, %ymm0, %ymm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; X86-AVX1-LABEL: splatconstant_shift_v32i8:			; X86-AVX1-LABEL: splatconstant_shift_v32i8:
	; X86-AVX1: # %bb.0:			; X86-AVX1: # %bb.0:
	; X86-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1			; X86-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
	; X86-AVX1-NEXT: vpsllw $3, %xmm1, %xmm1			; X86-AVX1-NEXT: vpsllw $3, %xmm1, %xmm1
	; X86-AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [248,248,248,248,248,248,248,248,248,248,248,248,248,248,248,248]			; X86-AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [248,248,248,248,248,248,248,248,248,248,248,248,248,248,248,248]
	; X86-AVX1-NEXT: vpand %xmm2, %xmm1, %xmm1			; X86-AVX1-NEXT: vpand %xmm2, %xmm1, %xmm1
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shift-shl-512.ll

	Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
	; AVX512DQ-NEXT: vpaddb %ymm1, %ymm1, %ymm1			; AVX512DQ-NEXT: vpaddb %ymm1, %ymm1, %ymm1
	; AVX512DQ-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0			; AVX512DQ-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0
	; AVX512DQ-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512DQ-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512DQ-NEXT: retq			; AVX512DQ-NEXT: retq
	;			;
	; AVX512BW-LABEL: var_shift_v64i8:			; AVX512BW-LABEL: var_shift_v64i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsllw $4, %zmm0, %zmm2			; AVX512BW-NEXT: vpsllw $4, %zmm0, %zmm2
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512BW-NEXT: vpsllw $5, %zmm1, %zmm1			; AVX512BW-NEXT: vpsllw $5, %zmm1, %zmm1
	; AVX512BW-NEXT: vpmovb2m %zmm1, %k1			; AVX512BW-NEXT: vpmovb2m %zmm1, %k1
	; AVX512BW-NEXT: vmovdqu8 %zmm2, %zmm0 {%k1}			; AVX512BW-NEXT: vmovdqu8 %zmm2, %zmm0 {%k1}
	; AVX512BW-NEXT: vpsllw $2, %zmm0, %zmm2			; AVX512BW-NEXT: vpsllw $2, %zmm0, %zmm2
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm2			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm2, %zmm2
	; AVX512BW-NEXT: vpaddb %zmm1, %zmm1, %zmm1			; AVX512BW-NEXT: vpaddb %zmm1, %zmm1, %zmm1
	; AVX512BW-NEXT: vpmovb2m %zmm1, %k1			; AVX512BW-NEXT: vpmovb2m %zmm1, %k1
	; AVX512BW-NEXT: vmovdqu8 %zmm2, %zmm0 {%k1}			; AVX512BW-NEXT: vmovdqu8 %zmm2, %zmm0 {%k1}
	; AVX512BW-NEXT: vpaddb %zmm1, %zmm1, %zmm1			; AVX512BW-NEXT: vpaddb %zmm1, %zmm1, %zmm1
	; AVX512BW-NEXT: vpmovb2m %zmm1, %k1			; AVX512BW-NEXT: vpmovb2m %zmm1, %k1
	; AVX512BW-NEXT: vpaddb %zmm0, %zmm0, %zmm0 {%k1}			; AVX512BW-NEXT: vpaddb %zmm0, %zmm0, %zmm0 {%k1}
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%shift = shl <64 x i8> %a, %b			%shift = shl <64 x i8> %a, %b
	▲ Show 20 Lines • Show All 275 Lines • ▼ Show 20 Lines

	define <64 x i8> @splatconstant_shift_v64i8(<64 x i8> %a) nounwind {			define <64 x i8> @splatconstant_shift_v64i8(<64 x i8> %a) nounwind {
	; AVX512DQ-LABEL: splatconstant_shift_v64i8:			; AVX512DQ-LABEL: splatconstant_shift_v64i8:
	; AVX512DQ: # %bb.0:			; AVX512DQ: # %bb.0:
	; AVX512DQ-NEXT: vpsllw $3, %ymm0, %ymm1			; AVX512DQ-NEXT: vpsllw $3, %ymm0, %ymm1
	; AVX512DQ-NEXT: vextracti64x4 $1, %zmm0, %ymm0			; AVX512DQ-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; AVX512DQ-NEXT: vpsllw $3, %ymm0, %ymm0			; AVX512DQ-NEXT: vpsllw $3, %ymm0, %ymm0
	; AVX512DQ-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0			; AVX512DQ-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0
	; AVX512DQ-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512DQ-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512DQ-NEXT: retq			; AVX512DQ-NEXT: retq
	;			;
	; AVX512BW-LABEL: splatconstant_shift_v64i8:			; AVX512BW-LABEL: splatconstant_shift_v64i8:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpsllw $3, %zmm0, %zmm0			; AVX512BW-NEXT: vpsllw $3, %zmm0, %zmm0
	; AVX512BW-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512BW-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to16}, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%shift = shl <64 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>			%shift = shl <64 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
	ret <64 x i8> %shift			ret <64 x i8> %shift
	}			}

llvm/test/CodeGen/X86/vector-shift-shl-sub128.ll

	Show First 20 Lines • Show All 1,819 Lines • ▼ Show 20 Lines
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpsllw $3, %xmm0, %xmm0			; AVX512-NEXT: vpsllw $3, %xmm0, %xmm0
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_shift_v8i8:			; AVX512VL-LABEL: splatconstant_shift_v8i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsllw $3, %xmm0, %xmm0			; AVX512VL-NEXT: vpsllw $3, %xmm0, %xmm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; X86-SSE-LABEL: splatconstant_shift_v8i8:			; X86-SSE-LABEL: splatconstant_shift_v8i8:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: psllw $3, %xmm0			; X86-SSE-NEXT: psllw $3, %xmm0
	; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	%shift = shl <8 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>			%shift = shl <8 x i8> %a, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
	Show All 22 Lines
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpsllw $3, %xmm0, %xmm0			; AVX512-NEXT: vpsllw $3, %xmm0, %xmm0
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_shift_v4i8:			; AVX512VL-LABEL: splatconstant_shift_v4i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsllw $3, %xmm0, %xmm0			; AVX512VL-NEXT: vpsllw $3, %xmm0, %xmm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; X86-SSE-LABEL: splatconstant_shift_v4i8:			; X86-SSE-LABEL: splatconstant_shift_v4i8:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: psllw $3, %xmm0			; X86-SSE-NEXT: psllw $3, %xmm0
	; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	%shift = shl <4 x i8> %a, <i8 3, i8 3, i8 3, i8 3>			%shift = shl <4 x i8> %a, <i8 3, i8 3, i8 3, i8 3>
	Show All 22 Lines
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpsllw $3, %xmm0, %xmm0			; AVX512-NEXT: vpsllw $3, %xmm0, %xmm0
	; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; AVX512VL-LABEL: splatconstant_shift_v2i8:			; AVX512VL-LABEL: splatconstant_shift_v2i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsllw $3, %xmm0, %xmm0			; AVX512VL-NEXT: vpsllw $3, %xmm0, %xmm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; X86-SSE-LABEL: splatconstant_shift_v2i8:			; X86-SSE-LABEL: splatconstant_shift_v2i8:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: psllw $3, %xmm0			; X86-SSE-NEXT: psllw $3, %xmm0
	; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	%shift = shl <2 x i8> %a, <i8 3, i8 3>			%shift = shl <2 x i8> %a, <i8 3, i8 3>
	ret <2 x i8> %shift			ret <2 x i8> %shift
	}			}

llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	; ALL-NEXT: retq
%shuffle = shufflevector <16 x float> %a, <16 x float> %b, <16 x i32> <i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 4, i32 undef, i32 undef, i32 undef, i32 undef, i32 11, i32 undef, i32 undef, i32 undef, i32 undef, i32 12>		%shuffle = shufflevector <16 x float> %a, <16 x float> %b, <16 x i32> <i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 4, i32 undef, i32 undef, i32 undef, i32 undef, i32 11, i32 undef, i32 undef, i32 undef, i32 undef, i32 12>
ret <16 x float> %shuffle		ret <16 x float> %shuffle
}		}

; PR41203		; PR41203
define <16 x float> @shuffle_v16f32_00_17_02_19_04_21_06_23_08_25_10_27_12_29_14_31(<16 x float> %a) {		define <16 x float> @shuffle_v16f32_00_17_02_19_04_21_06_23_08_25_10_27_12_29_14_31(<16 x float> %a) {
; ALL-LABEL: shuffle_v16f32_00_17_02_19_04_21_06_23_08_25_10_27_12_29_14_31:		; ALL-LABEL: shuffle_v16f32_00_17_02_19_04_21_06_23_08_25_10_27_12_29_14_31:
; ALL: # %bb.0:		; ALL: # %bb.0:
; ALL-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0		; ALL-NEXT: vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm0, %zmm0
; ALL-NEXT: retq		; ALL-NEXT: retq
%tmp1 = shufflevector <16 x float> %a, <16 x float> undef, <16 x i32> <i32 undef, i32 17, i32 undef, i32 19, i32 undef, i32 5, i32 undef, i32 7, i32 undef, i32 9, i32 undef, i32 11, i32 undef, i32 13, i32 undef, i32 15>		%tmp1 = shufflevector <16 x float> %a, <16 x float> undef, <16 x i32> <i32 undef, i32 17, i32 undef, i32 19, i32 undef, i32 5, i32 undef, i32 7, i32 undef, i32 9, i32 undef, i32 11, i32 undef, i32 13, i32 undef, i32 15>
%tmp2 = shufflevector <16 x float> %tmp1, <16 x float> <float 0.000000e+00, float undef, float 0.000000e+00, float undef, float 0.000000e+00, float undef, float 0.000000e+00, float undef, float 0.000000e+00, float undef, float 0.000000e+00, float undef, float 0.000000e+00, float undef, float 0.000000e+00, float undef>, <16 x i32> <i32 16, i32 1, i32 18, i32 3, i32 20, i32 5, i32 22, i32 7, i32 24, i32 9, i32 26, i32 11, i32 28, i32 13, i32 30, i32 15>		%tmp2 = shufflevector <16 x float> %tmp1, <16 x float> <float 0.000000e+00, float undef, float 0.000000e+00, float undef, float 0.000000e+00, float undef, float 0.000000e+00, float undef, float 0.000000e+00, float undef, float 0.000000e+00, float undef, float 0.000000e+00, float undef, float 0.000000e+00, float undef>, <16 x i32> <i32 16, i32 1, i32 18, i32 3, i32 20, i32 5, i32 22, i32 7, i32 24, i32 9, i32 26, i32 11, i32 28, i32 13, i32 30, i32 15>
ret <16 x float> %tmp2		ret <16 x float> %tmp2
}		}

; PR48322		; PR48322
define <16 x float> @shuffle_v16f32_02_03_16_17_06_07_20_21_10_11_24_25_14_15_28_29(<16 x float> %a, <16 x float> %b) {		define <16 x float> @shuffle_v16f32_02_03_16_17_06_07_20_21_10_11_24_25_14_15_28_29(<16 x float> %a, <16 x float> %b) {
▲ Show 20 Lines • Show All 725 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vselect-pcmp.ll

	Show First 20 Lines • Show All 606 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512F-NEXT: vpcmpeqw %ymm3, %ymm0, %ymm0			; AVX512F-NEXT: vpcmpeqw %ymm3, %ymm0, %ymm0
	; AVX512F-NEXT: vpblendvb %ymm0, %ymm1, %ymm2, %ymm0			; AVX512F-NEXT: vpblendvb %ymm0, %ymm1, %ymm2, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: blend_splat1_mask_cond_v16i16:			; AVX512VL-LABEL: blend_splat1_mask_cond_v16i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512VL-NEXT: vpcmpeqw %ymm3, %ymm0, %ymm0			; AVX512VL-NEXT: vpcmpeqw %ymm3, %ymm0, %ymm0
	; AVX512VL-NEXT: vpternlogq $202, %ymm2, %ymm1, %ymm0			; AVX512VL-NEXT: vpternlogq $202, %ymm2, %ymm1, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; XOP-LABEL: blend_splat1_mask_cond_v16i16:			; XOP-LABEL: blend_splat1_mask_cond_v16i16:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: vpsllw $15, %xmm0, %xmm3			; XOP-NEXT: vpsllw $15, %xmm0, %xmm3
	Show All 22 Lines
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512F-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm0			; AVX512F-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm0
	; AVX512F-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0			; AVX512F-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: blend_splat1_mask_cond_v16i8:			; AVX512VL-LABEL: blend_splat1_mask_cond_v16i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512VL-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm0			; AVX512VL-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm0
	; AVX512VL-NEXT: vpternlogq $202, %xmm2, %xmm1, %xmm0			; AVX512VL-NEXT: vpternlogq $202, %xmm2, %xmm1, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; XOP-LABEL: blend_splat1_mask_cond_v16i8:			; XOP-LABEL: blend_splat1_mask_cond_v16i8:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512F-NEXT: vpcmpeqw %xmm3, %xmm0, %xmm0			; AVX512F-NEXT: vpcmpeqw %xmm3, %xmm0, %xmm0
	; AVX512F-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0			; AVX512F-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: blend_splatmax_mask_cond_v8i16:			; AVX512VL-LABEL: blend_splatmax_mask_cond_v8i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512VL-NEXT: vpcmpeqw %xmm3, %xmm0, %xmm0			; AVX512VL-NEXT: vpcmpeqw %xmm3, %xmm0, %xmm0
	; AVX512VL-NEXT: vpternlogq $202, %xmm2, %xmm1, %xmm0			; AVX512VL-NEXT: vpternlogq $202, %xmm2, %xmm1, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; XOP-LABEL: blend_splatmax_mask_cond_v8i16:			; XOP-LABEL: blend_splatmax_mask_cond_v8i16:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	Show All 30 Lines
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512F-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm0			; AVX512F-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm0
	; AVX512F-NEXT: vpblendvb %ymm0, %ymm1, %ymm2, %ymm0			; AVX512F-NEXT: vpblendvb %ymm0, %ymm1, %ymm2, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: blend_splatmax_mask_cond_v32i8:			; AVX512VL-LABEL: blend_splatmax_mask_cond_v32i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512VL-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm0			; AVX512VL-NEXT: vpcmpeqb %ymm3, %ymm0, %ymm0
	; AVX512VL-NEXT: vpternlogq $202, %ymm2, %ymm1, %ymm0			; AVX512VL-NEXT: vpternlogq $202, %ymm2, %ymm1, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; XOP-LABEL: blend_splatmax_mask_cond_v32i8:			; XOP-LABEL: blend_splatmax_mask_cond_v32i8:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: vextractf128 $1, %ymm0, %xmm3			; XOP-NEXT: vextractf128 $1, %ymm0, %xmm3
	▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512F-NEXT: vpcmpeqw %ymm3, %ymm0, %ymm0			; AVX512F-NEXT: vpcmpeqw %ymm3, %ymm0, %ymm0
	; AVX512F-NEXT: vpblendvb %ymm0, %ymm1, %ymm2, %ymm0			; AVX512F-NEXT: vpblendvb %ymm0, %ymm1, %ymm2, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: blend_splat_mask_cond_v16i16:			; AVX512VL-LABEL: blend_splat_mask_cond_v16i16:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %ymm0, %ymm0
	; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512VL-NEXT: vpcmpeqw %ymm3, %ymm0, %ymm0			; AVX512VL-NEXT: vpcmpeqw %ymm3, %ymm0, %ymm0
	; AVX512VL-NEXT: vpternlogq $202, %ymm2, %ymm1, %ymm0			; AVX512VL-NEXT: vpternlogq $202, %ymm2, %ymm1, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; XOP-LABEL: blend_splat_mask_cond_v16i16:			; XOP-LABEL: blend_splat_mask_cond_v16i16:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: vpsllw $5, %xmm0, %xmm3			; XOP-NEXT: vpsllw $5, %xmm0, %xmm3
	Show All 22 Lines
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512F-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm0			; AVX512F-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm0
	; AVX512F-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0			; AVX512F-NEXT: vpblendvb %xmm0, %xmm1, %xmm2, %xmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: blend_splat_mask_cond_v16i8:			; AVX512VL-LABEL: blend_splat_mask_cond_v16i8:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512VL-NEXT: vpandd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
	; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512VL-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512VL-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm0			; AVX512VL-NEXT: vpcmpeqb %xmm3, %xmm0, %xmm0
	; AVX512VL-NEXT: vpternlogq $202, %xmm2, %xmm1, %xmm0			; AVX512VL-NEXT: vpternlogq $202, %xmm2, %xmm1, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; XOP-LABEL: blend_splat_mask_cond_v16i8:			; XOP-LABEL: blend_splat_mask_cond_v16i8:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	▲ Show 20 Lines • Show All 436 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as broadcast foldsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 524633

llvm/lib/Target/X86/CMakeLists.txt

llvm/lib/Target/X86/X86.h

llvm/lib/Target/X86/X86FixupVectorConstants.cpp

llvm/lib/Target/X86/X86InstrFoldTables.h

llvm/lib/Target/X86/X86InstrFoldTables.cpp

llvm/lib/Target/X86/X86TargetMachine.cpp

llvm/test/CodeGen/X86/avx512-calling-conv.ll

llvm/test/CodeGen/X86/avx512-ext.ll

llvm/test/CodeGen/X86/avx512-logic.ll

llvm/test/CodeGen/X86/avx512fp16-cvt-ph-w-vl-intrinsics.ll

llvm/test/CodeGen/X86/avx512vl-logic.ll

llvm/test/CodeGen/X86/bitcast-vector-bool.ll

llvm/test/CodeGen/X86/combine-and.ll

llvm/test/CodeGen/X86/combine-sdiv.ll

llvm/test/CodeGen/X86/dpbusd_const.ll

llvm/test/CodeGen/X86/dpbusd_i4.ll

llvm/test/CodeGen/X86/gfni-funnel-shifts.ll

llvm/test/CodeGen/X86/gfni-rotates.ll

llvm/test/CodeGen/X86/gfni-shifts.ll

llvm/test/CodeGen/X86/horizontal-reduce-smax.ll

llvm/test/CodeGen/X86/horizontal-reduce-smin.ll

llvm/test/CodeGen/X86/i64-to-float.ll

llvm/test/CodeGen/X86/icmp-pow2-diff.ll

llvm/test/CodeGen/X86/midpoint-int-vec-128.ll

llvm/test/CodeGen/X86/midpoint-int-vec-256.ll

llvm/test/CodeGen/X86/midpoint-int-vec-512.ll

llvm/test/CodeGen/X86/min-legal-vector-width.ll

llvm/test/CodeGen/X86/movmsk-cmp.ll

llvm/test/CodeGen/X86/opt-pipeline.ll

llvm/test/CodeGen/X86/paddus.ll

llvm/test/CodeGen/X86/prefer-avx256-lzcnt.ll

llvm/test/CodeGen/X86/prefer-avx256-mulo.ll

llvm/test/CodeGen/X86/prefer-avx256-shift.ll

llvm/test/CodeGen/X86/prefer-avx256-trunc.ll

llvm/test/CodeGen/X86/prefer-avx256-wide-mul.ll

llvm/test/CodeGen/X86/psubus.ll

llvm/test/CodeGen/X86/rotate-extract-vector.ll

llvm/test/CodeGen/X86/rotate_vec.ll

llvm/test/CodeGen/X86/sadd_sat_vec.ll

llvm/test/CodeGen/X86/srem-seteq-vec-nonsplat.ll

llvm/test/CodeGen/X86/ssub_sat_vec.ll

llvm/test/CodeGen/X86/usub_sat_vec.ll

llvm/test/CodeGen/X86/vec-strict-inttofp-128-fp16.ll

llvm/test/CodeGen/X86/vec-strict-inttofp-256-fp16.ll

llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll

llvm/test/CodeGen/X86/vec-strict-inttofp-512-fp16.ll

llvm/test/CodeGen/X86/vector-fshl-128.ll

llvm/test/CodeGen/X86/vector-fshl-256.ll

llvm/test/CodeGen/X86/vector-fshl-512.ll

llvm/test/CodeGen/X86/vector-fshl-rot-128.ll

llvm/test/CodeGen/X86/vector-fshl-rot-256.ll

llvm/test/CodeGen/X86/vector-fshl-rot-512.ll

llvm/test/CodeGen/X86/vector-fshr-128.ll

llvm/test/CodeGen/X86/vector-fshr-256.ll

llvm/test/CodeGen/X86/vector-fshr-512.ll

llvm/test/CodeGen/X86/vector-fshr-rot-128.ll

llvm/test/CodeGen/X86/vector-fshr-rot-256.ll

llvm/test/CodeGen/X86/vector-fshr-rot-512.ll

llvm/test/CodeGen/X86/vector-idiv-sdiv-512.ll

llvm/test/CodeGen/X86/vector-idiv-udiv-512.ll

llvm/test/CodeGen/X86/vector-lzcnt-128.ll

llvm/test/CodeGen/X86/vector-lzcnt-256.ll

llvm/test/CodeGen/X86/vector-lzcnt-512.ll

llvm/test/CodeGen/X86/vector-mul.ll

llvm/test/CodeGen/X86/vector-pack-128.ll

llvm/test/CodeGen/X86/vector-pack-256.ll

llvm/test/CodeGen/X86/vector-pack-512.ll

llvm/test/CodeGen/X86/vector-pcmp.ll

llvm/test/CodeGen/X86/vector-reduce-add-mask.ll

llvm/test/CodeGen/X86/vector-reduce-or-bool.ll

llvm/test/CodeGen/X86/vector-reduce-or-cmp.ll

llvm/test/CodeGen/X86/vector-reduce-smax.ll

llvm/test/CodeGen/X86/vector-reduce-smin.ll

[X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as broadcast folds
ClosedPublic