This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost
ClosedPublic

Authored by dmgreen on Apr 8 2022, 10:17 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
labrinea
samtebbs
jaykang10

Commits

rGd6327050e00f: [AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost

Summary

Given a shuffle with 4 elements size 16 or 32, we can use the costs directly from the PerfectShuffle tables to get a slightly more accurate cost for the resulting shuffle.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Apr 8 2022, 10:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2022, 10:17 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

dmgreen requested review of this revision.Apr 8 2022, 10:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2022, 10:17 AM

Harbormaster completed remote builds in B158733: Diff 421568.Apr 8 2022, 10:18 AM

dmgreen added a parent revision: D123386: [AArch64] Add lane moves to PerfectShuffle tables.Apr 8 2022, 10:26 AM

dmgreen added a child revision: D123414: [AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost.Apr 13 2022, 1:30 AM

samtebbs added inline comments.Apr 13 2022, 3:15 AM

llvm/lib/Target/AArch64/AArch64PerfectShuffle.h
6590	Why do we have to limit this to 4x16 or 4x32 shuffles?
6602–6609	A comment about what is being done here would be beneficial.
6611–6617	A bit more elaboration here would be nice too.

dmgreen mentioned this in D123379: [AArch64] Cost all perfect shuffles entries as cost 1.Apr 13 2022, 8:52 AM

dmgreen updated this revision to Diff 423573.Apr 19 2022, 3:11 AM

dmgreen added inline comments.

llvm/lib/Target/AArch64/AArch64PerfectShuffle.h
6590	There is a comment in the summary of D123379 that might help explain perfect shuffles. The quick version is that they only support 4 entry shuffles, because otherwise the tables we store would just be too large.
6611–6617	I'm not sure exactly what else to say, other than this is how perfect shuffle tables work :)

Harbormaster completed remote builds in B160200: Diff 423573.Apr 19 2022, 3:12 AM

LGTM!

llvm/lib/Target/AArch64/AArch64PerfectShuffle.h
6590	Ah thanks for the link, that explains it.
6611–6617	The comment in the post you linked did help a lot! cheers

This revision is now accepted and ready to land.Apr 20 2022, 6:37 AM

This revision was landed with ongoing or failed builds.Apr 27 2022, 4:09 AM

Closed by commit rGd6327050e00f: [AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rGd6327050e00f: [AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost.

Just a heads up, I'm seeing a few 4-8% regressions on different AArch64 CPUs with this change for a few benchmarks. I still need to isolate the binary changes.

Did you ever manage to come up with a reproducer? I'm hope this new cost model is generally more accurate, but you know cost modelling.. The codegen might be off or their might be any number of second order effects going wrong. Let me know.

In D123409#3496279, @dmgreen wrote:

Did you ever manage to come up with a reproducer? I'm hope this new cost model is generally more accurate, but you know cost modelling.. The codegen might be off or their might be any number of second order effects going wrong. Let me know.

The issue was that after this patch some code got vectorized when it wasn't profitable, but it looked like a general SLP issue. Previously it just didn't get vectorized because some ridiculously high costs where used for some shuffles. It looks like D115750 fixed the underlying issue and the code is back to not getting vectorized and original performance is restored :)

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

19 lines

AArch64PerfectShuffle.h

34 lines

AArch64TargetTransformInfo.cpp

27 lines

test/

Analysis/

CostModel/

AArch64/

shuffle-other.ll

20 lines

shuffle-select.ll

2 lines

Transforms/

PhaseOrdering/

AArch64/

matrix-extract-insert.ll

14 lines

Diff 425476

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 11,587 Lines • ▼ Show 20 Lines

	bool AArch64TargetLowering::isShuffleMaskLegal(ArrayRef<int> M, EVT VT) const {			bool AArch64TargetLowering::isShuffleMaskLegal(ArrayRef<int> M, EVT VT) const {
	// Currently no fixed length shuffles that require SVE are legal.			// Currently no fixed length shuffles that require SVE are legal.
	if (useSVEForFixedLengthVectorVT(VT))			if (useSVEForFixedLengthVectorVT(VT))
	return false;			return false;

	if (VT.getVectorNumElements() == 4 &&			if (VT.getVectorNumElements() == 4 &&
	(VT.is128BitVector() \|\| VT.is64BitVector())) {			(VT.is128BitVector() \|\| VT.is64BitVector())) {
	unsigned PFIndexes[4];			unsigned Cost = getPerfectShuffleCost(M);
	for (unsigned i = 0; i != 4; ++i) {			if (Cost <= 1)
	if (M[i] < 0)
	PFIndexes[i] = 8;
	else
	PFIndexes[i] = M[i];
	}

	// Compute the index in the perfect shuffle table.
	unsigned PFTableIndex = PFIndexes[0] * 9 * 9 * 9 + PFIndexes[1] * 9 * 9 +
	PFIndexes[2] * 9 + PFIndexes[3];
	unsigned PFEntry = PerfectShuffleTable[PFTableIndex];
	unsigned Cost = (PFEntry >> 30);

	// The cost tables encode cost 0 or cost 1 shuffles using the value 0 in
	// the top 2 bits.
	if (Cost == 0)
	return true;			return true;
	}			}

	bool DummyBool;			bool DummyBool;
	int DummyInt;			int DummyInt;
	unsigned DummyUnsigned;			unsigned DummyUnsigned;

	return (ShuffleVectorSDNode::isSplatMask(&M[0], VT) \|\| isREVMask(M, VT, 64) \|\|			return (ShuffleVectorSDNode::isSplatMask(&M[0], VT) \|\| isREVMask(M, VT, 64) \|\|
	▲ Show 20 Lines • Show All 9,531 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64PerfectShuffle.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	//===-- AArch64PerfectShuffle.h - AdvSIMD Perfect Shuffle Table -----------===//			//===-- AArch64PerfectShuffle.h - AdvSIMD Perfect Shuffle Table -----------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file, which was autogenerated by llvm-PerfectShuffle, contains data			// This file, which was autogenerated by llvm-PerfectShuffle, contains data
	// for the optimal way to build a perfect shuffle using AdvSIMD instructions.			// for the optimal way to build a perfect shuffle using AdvSIMD instructions.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_LIB_TARGET_AARCH64_AARCH64PERFECTSHUFFLE_H			#ifndef LLVM_LIB_TARGET_AARCH64_AARCH64PERFECTSHUFFLE_H
	#define LLVM_LIB_TARGET_AARCH64_AARCH64PERFECTSHUFFLE_H			#define LLVM_LIB_TARGET_AARCH64_AARCH64PERFECTSHUFFLE_H

				#include "llvm/ADT/ArrayRef.h"

	// 31 entries have cost 0			// 31 entries have cost 0
	// 730 entries have cost 1			// 730 entries have cost 1
	// 3658 entries have cost 2			// 3658 entries have cost 2
	// 2142 entries have cost 3			// 2142 entries have cost 3

	// This table is 6561*4 = 26244 bytes in size.			// This table is 6561*4 = 26244 bytes in size.
	static const unsigned PerfectShuffleTable[6561 + 1] = {			static const unsigned PerfectShuffleTable[6561 + 1] = {
	135053414U, // <0,0,0,0>: Cost 1 vdup0 LHS			135053414U, // <0,0,0,0>: Cost 1 vdup0 LHS
	▲ Show 20 Lines • Show All 6,554 Lines • ▼ Show 20 Lines
	835584U, // <u,u,u,3>: Cost 0 copy LHS			835584U, // <u,u,u,3>: Cost 0 copy LHS
	408620342U, // <u,u,u,4>: Cost 1 vext1 LHS, RHS			408620342U, // <u,u,u,4>: Cost 1 vext1 LHS, RHS
	471087258U, // <u,u,u,5>: Cost 1 vext2 LHS, RHS			471087258U, // <u,u,u,5>: Cost 1 vext2 LHS, RHS
	537753997U, // <u,u,u,6>: Cost 1 vext3 LHS, RHS			537753997U, // <u,u,u,6>: Cost 1 vext3 LHS, RHS
	27705344U, // <u,u,u,7>: Cost 0 copy RHS			27705344U, // <u,u,u,7>: Cost 0 copy RHS
	835584U, // <u,u,u,u>: Cost 0 copy LHS			835584U, // <u,u,u,u>: Cost 0 copy LHS
	0};			0};

				static unsigned getPerfectShuffleCost(llvm::ArrayRef<int> M) {
				assert(M.size() == 4 && "Expected a 4 entry perfect shuffle");
				samtebbsUnsubmitted Not Done Reply Inline Actions Why do we have to limit this to 4x16 or 4x32 shuffles? samtebbs: Why do we have to limit this to 4x16 or 4x32 shuffles?
				dmgreenAuthorUnsubmitted Done Reply Inline Actions There is a comment in the summary of D123379 that might help explain perfect shuffles. The quick version is that they only support 4 entry shuffles, because otherwise the tables we store would just be too large. dmgreen: There is a comment in the summary of D123379 that might help explain perfect shuffles. The…
				samtebbsUnsubmitted Not Done Reply Inline Actions Ah thanks for the link, that explains it. samtebbs: Ah thanks for the link, that explains it.

				// Special case zero-cost nop copies, from either LHS or RHS.
				if (llvm::all_of(llvm::enumerate(M), [](auto &E) {
				return E.value() < 0 \|\| E.value() == (int)E.index();
				}))
				return 0;
				if (llvm::all_of(llvm::enumerate(M), [](auto &E) {
				return E.value() < 0 \|\| E.value() == (int)E.index() + 4;
				}))
				return 0;

				// Get the four mask elementd from the 2 inputs. Perfect shuffles encode undef
				// elements with value 8.
				unsigned PFIndexes[4];
				for (unsigned i = 0; i != 4; ++i) {
				assert(M[i] < 8 && "Expected a maximum entry of 8 for shuffle mask");
				if (M[i] < 0)
				PFIndexes[i] = 8;
				else
				samtebbsUnsubmitted Not Done Reply Inline Actions A comment about what is being done here would be beneficial. samtebbs: A comment about what is being done here would be beneficial.
				PFIndexes[i] = M[i];
				}

				// Compute the index in the perfect shuffle table.
				unsigned PFTableIndex = PFIndexes[0] * 9 * 9 * 9 + PFIndexes[1] * 9 * 9 +
				PFIndexes[2] * 9 + PFIndexes[3];
				unsigned PFEntry = PerfectShuffleTable[PFTableIndex];
				// And extract the cost from the upper bits. The cost is encoded as Cost-1.
				samtebbsUnsubmitted Not Done Reply Inline Actions A bit more elaboration here would be nice too. samtebbs: A bit more elaboration here would be nice too.
				dmgreenAuthorUnsubmitted Done Reply Inline Actions I'm not sure exactly what else to say, other than this is how perfect shuffle tables work :) dmgreen: I'm not sure exactly what else to say, other than this is how perfect shuffle tables work :)
				samtebbsUnsubmitted Not Done Reply Inline Actions The comment in the post you linked did help a lot! cheers samtebbs: The comment in the post you linked did help a lot! cheers
				return (PFEntry >> 30) + 1;
				}

	#endif			#endif

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

	//===-- AArch64TargetTransformInfo.cpp - AArch64 specific TTI -------------===//			//===-- AArch64TargetTransformInfo.cpp - AArch64 specific TTI -------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "AArch64TargetTransformInfo.h"			#include "AArch64TargetTransformInfo.h"
	#include "AArch64ExpandImm.h"			#include "AArch64ExpandImm.h"
				#include "AArch64PerfectShuffle.h"
	#include "MCTargetDesc/AArch64AddressingModes.h"			#include "MCTargetDesc/AArch64AddressingModes.h"
	#include "llvm/Analysis/IVDescriptors.h"			#include "llvm/Analysis/IVDescriptors.h"
	#include "llvm/Analysis/LoopInfo.h"			#include "llvm/Analysis/LoopInfo.h"
	#include "llvm/Analysis/TargetTransformInfo.h"			#include "llvm/Analysis/TargetTransformInfo.h"
	#include "llvm/CodeGen/BasicTTIImpl.h"			#include "llvm/CodeGen/BasicTTIImpl.h"
	#include "llvm/CodeGen/CostTable.h"			#include "llvm/CodeGen/CostTable.h"
	#include "llvm/CodeGen/TargetLowering.h"			#include "llvm/CodeGen/TargetLowering.h"
	#include "llvm/IR/Intrinsics.h"			#include "llvm/IR/Intrinsics.h"
	▲ Show 20 Lines • Show All 2,573 Lines • ▼ Show 20 Lines

	InstructionCost AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind Kind,			InstructionCost AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
	VectorType *Tp,			VectorType *Tp,
	ArrayRef<int> Mask, int Index,			ArrayRef<int> Mask, int Index,
	VectorType *SubTp,			VectorType *SubTp,
	ArrayRef<const Value *> Args) {			ArrayRef<const Value *> Args) {
	Kind = improveShuffleKindFromMask(Kind, Mask);			Kind = improveShuffleKindFromMask(Kind, Mask);
	std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);			std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);
	if (Kind == TTI::SK_Broadcast \|\| Kind == TTI::SK_Transpose \|\|
	Kind == TTI::SK_Select \|\| Kind == TTI::SK_PermuteSingleSrc \|\|
	Kind == TTI::SK_Reverse) {

	// Check for broadcast loads.			// Check for broadcast loads.
	if (Kind == TTI::SK_Broadcast) {			if (Kind == TTI::SK_Broadcast) {
	bool IsLoad = !Args.empty() && isa<LoadInst>(Args[0]);			bool IsLoad = !Args.empty() && isa<LoadInst>(Args[0]);
	if (IsLoad && LT.second.isVector() &&			if (IsLoad && LT.second.isVector() &&
	isLegalBroadcastLoad(Tp->getElementType(),			isLegalBroadcastLoad(Tp->getElementType(),
	LT.second.getVectorElementCount()))			LT.second.getVectorElementCount()))
	return 0; // broadcast is handled by ld1r			return 0; // broadcast is handled by ld1r
	}			}

				// If we have 4 elements for the shuffle and a Mask, get the cost straight
				// from the perfect shuffle tables.
				if (Mask.size() == 4 && Tp->getElementCount() == ElementCount::getFixed(4) &&
				(Tp->getScalarSizeInBits() == 16 \|\| Tp->getScalarSizeInBits() == 32) &&
				all_of(Mask, [](int E) { return E < 8; }))
				return getPerfectShuffleCost(Mask);

				if (Kind == TTI::SK_Broadcast \|\| Kind == TTI::SK_Transpose \|\|
				Kind == TTI::SK_Select \|\| Kind == TTI::SK_PermuteSingleSrc \|\|
				Kind == TTI::SK_Reverse) {

	static const CostTblEntry ShuffleTbl[] = {			static const CostTblEntry ShuffleTbl[] = {
	// Broadcast shuffle kinds can be performed with 'dup'.			// Broadcast shuffle kinds can be performed with 'dup'.
	{ TTI::SK_Broadcast, MVT::v8i8, 1 },			{ TTI::SK_Broadcast, MVT::v8i8, 1 },
	{ TTI::SK_Broadcast, MVT::v16i8, 1 },			{ TTI::SK_Broadcast, MVT::v16i8, 1 },
	{ TTI::SK_Broadcast, MVT::v4i16, 1 },			{ TTI::SK_Broadcast, MVT::v4i16, 1 },
	{ TTI::SK_Broadcast, MVT::v8i16, 1 },			{ TTI::SK_Broadcast, MVT::v8i16, 1 },
	{ TTI::SK_Broadcast, MVT::v2i32, 1 },			{ TTI::SK_Broadcast, MVT::v2i32, 1 },
	{ TTI::SK_Broadcast, MVT::v4i32, 1 },			{ TTI::SK_Broadcast, MVT::v4i32, 1 },
	▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/shuffle-other.ll

; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
; RUN: opt < %s -mtriple=aarch64--linux-gnu -passes='print<cost-model>' 2>&1 -disable-output \| FileCheck %s		; RUN: opt < %s -mtriple=aarch64--linux-gnu -passes='print<cost-model>' 2>&1 -disable-output \| FileCheck %s

define void @shuffle() {		define void @shuffle() {
; CHECK-LABEL: 'shuffle'		; CHECK-LABEL: 'shuffle'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v7 = shufflevector <2 x i8> undef, <2 x i8> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v7 = shufflevector <2 x i8> undef, <2 x i8> undef, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v8 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v8 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v9 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v9 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v10 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v10 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v11 = shufflevector <2 x i16> undef, <2 x i16> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v11 = shufflevector <2 x i16> undef, <2 x i16> undef, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v12 = shufflevector <4 x i16> undef, <4 x i16> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v12 = shufflevector <4 x i16> undef, <4 x i16> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v13 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v13 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v14 = shufflevector <2 x i32> undef, <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v14 = shufflevector <2 x i32> undef, <2 x i32> undef, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v15 = shufflevector <4 x i32> undef, <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v15 = shufflevector <4 x i32> undef, <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16 = shufflevector <2 x float> undef, <2 x float> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16 = shufflevector <2 x float> undef, <2 x float> undef, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v17 = shufflevector <4 x float> undef, <4 x float> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v17 = shufflevector <4 x float> undef, <4 x float> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
%v7 = shufflevector <2 x i8> undef, <2 x i8> undef, <2 x i32> <i32 1, i32 0>		%v7 = shufflevector <2 x i8> undef, <2 x i8> undef, <2 x i32> <i32 1, i32 0>
%v8 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0>		%v8 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0>
%v9 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>		%v9 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>
%v10 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>		%v10 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>

%v11 = shufflevector <2 x i16> undef, <2 x i16> undef, <2 x i32> <i32 1, i32 0>		%v11 = shufflevector <2 x i16> undef, <2 x i16> undef, <2 x i32> <i32 1, i32 0>
Show All 9 Lines	;
ret void		ret void
}		}

define void @concat() {		define void @concat() {
; CHECK-LABEL: 'concat'		; CHECK-LABEL: 'concat'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i8 = shufflevector <2 x i8> undef, <2 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i8 = shufflevector <2 x i8> undef, <2 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i8 = shufflevector <4 x i8> undef, <4 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i8 = shufflevector <4 x i8> undef, <4 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8 = shufflevector <8 x i8> undef, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8 = shufflevector <8 x i8> undef, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i16 = shufflevector <2 x i16> undef, <2 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v4i16 = shufflevector <2 x i16> undef, <2 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16 = shufflevector <4 x i16> undef, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16 = shufflevector <4 x i16> undef, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16 = shufflevector <8 x i16> undef, <8 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16 = shufflevector <8 x i16> undef, <8 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32 = shufflevector <2 x i32> undef, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v4i32 = shufflevector <2 x i32> undef, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32 = shufflevector <4 x i32> undef, <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32 = shufflevector <4 x i32> undef, <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64 = shufflevector <2 x i64> undef, <2 x i64> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64 = shufflevector <2 x i64> undef, <2 x i64> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v4f16 = shufflevector <2 x half> undef, <2 x half> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v4f16 = shufflevector <2 x half> undef, <2 x half> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8f16 = shufflevector <4 x half> undef, <4 x half> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8f16 = shufflevector <4 x half> undef, <4 x half> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16f16 = shufflevector <8 x half> undef, <8 x half> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16f16 = shufflevector <8 x half> undef, <8 x half> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4f32 = shufflevector <2 x float> undef, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v4f32 = shufflevector <2 x float> undef, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8f32 = shufflevector <4 x float> undef, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8f32 = shufflevector <4 x float> undef, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4f64 = shufflevector <2 x double> undef, <2 x double> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4f64 = shufflevector <2 x double> undef, <2 x double> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
%v4i8 = shufflevector <2 x i8> undef, <2 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		%v4i8 = shufflevector <2 x i8> undef, <2 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%v8i8 = shufflevector <4 x i8> undef, <4 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%v8i8 = shufflevector <4 x i8> undef, <4 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%v16i8 = shufflevector <8 x i8> undef, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%v16i8 = shufflevector <8 x i8> undef, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>

Show All 27 Lines
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i8_2_2 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i8_2_2 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i8_2_3 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i8_2_3 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i8_2_05 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i8_2_05 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 90 for instruction: %v16i8_4_0 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 90 for instruction: %v16i8_4_0 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_4_1 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_4_1 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_4_2 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_4_2 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_4_3 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_4_3 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %v16i8_4_05 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %v16i8_4_05 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %v4i16_2_0 = shufflevector <4 x i16> undef, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4i16_2_0 = shufflevector <4 x i16> undef, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i16_2_1 = shufflevector <4 x i16> undef, <4 x i16> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4i16_2_1 = shufflevector <4 x i16> undef, <4 x i16> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %v8i16_2_0 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %v8i16_2_0 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_2_1 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_2_1 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_2_2 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_2_2 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_2_3 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_2_3 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i16_2_05 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i16_2_05 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 84 for instruction: %v16i16_4_0 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 84 for instruction: %v16i16_4_0 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_4_1 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_4_1 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_4_2 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_4_2 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_4_3 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_4_3 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %v16i16_4_05 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %v16i16_4_05 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4i32_2_0 = shufflevector <4 x i32> undef, <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4i32_2_0 = shufflevector <4 x i32> undef, <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_2_1 = shufflevector <4 x i32> undef, <4 x i32> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4i32_2_1 = shufflevector <4 x i32> undef, <4 x i32> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v8i32_2_0 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v8i32_2_0 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_2_1 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_2_1 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_2_2 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_2_2 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_2_3 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_2_3 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i32_2_05 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i32_2_05 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v16i32_4_0 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v16i32_4_0 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_4_1 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_4_1 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_4_2 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_4_2 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll

	Show All 14 Lines
	; CODE-LABEL: sel.v16i8			; CODE-LABEL: sel.v16i8
	; CODE: tbl v0.16b, { v0.16b, v1.16b }, v2.16b			; CODE: tbl v0.16b, { v0.16b, v1.16b }, v2.16b
	define <16 x i8> @sel.v16i8(<16 x i8> %v0, <16 x i8> %v1) {			define <16 x i8> @sel.v16i8(<16 x i8> %v0, <16 x i8> %v1) {
	%tmp0 = shufflevector <16 x i8> %v0, <16 x i8> %v1, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>			%tmp0 = shufflevector <16 x i8> %v0, <16 x i8> %v1, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>
	ret <16 x i8> %tmp0			ret <16 x i8> %tmp0
	}			}

	; COST-LABEL: sel.v4i16			; COST-LABEL: sel.v4i16
	; COST: Found an estimated cost of 18 for instruction: %tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 0, i32 5, i32 2, i32 7>			; COST: Found an estimated cost of 2 for instruction: %tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 0, i32 5, i32 2, i32 7>
	; CODE-LABEL: sel.v4i16			; CODE-LABEL: sel.v4i16
	; CODE: rev32 v0.4h, v0.4h			; CODE: rev32 v0.4h, v0.4h
	; CODE: trn2 v0.4h, v0.4h, v1.4h			; CODE: trn2 v0.4h, v0.4h, v1.4h
	define <4 x i16> @sel.v4i16(<4 x i16> %v0, <4 x i16> %v1) {			define <4 x i16> @sel.v4i16(<4 x i16> %v0, <4 x i16> %v1) {
	%tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 0, i32 5, i32 2, i32 7>			%tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 0, i32 5, i32 2, i32 7>
	ret <4 x i16> %tmp0			ret <4 x i16> %tmp0
	}			}

	▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll

	Show First 20 Lines • Show All 302 Lines • ▼ Show 20 Lines

	; Function Attrs: argmemonly nofree nosync nounwind willreturn			; Function Attrs: argmemonly nofree nosync nounwind willreturn
	declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1			declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1

	; Function Attrs: nounwind ssp uwtable mustprogress			; Function Attrs: nounwind ssp uwtable mustprogress

	define <4 x float> @reverse_hadd_v4f32(<4 x float> %a, <4 x float> %b) {			define <4 x float> @reverse_hadd_v4f32(<4 x float> %a, <4 x float> %b) {
	; CHECK-LABEL: @reverse_hadd_v4f32(			; CHECK-LABEL: @reverse_hadd_v4f32(
	; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> undef, <2 x i32> <i32 2, i32 0>			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> [[A:%.*]], <4 x i32> <i32 2, i32 0, i32 6, i32 4>
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 3, i32 1>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[B]], <4 x float> [[A]], <4 x i32> <i32 3, i32 1, i32 7, i32 5>
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>			; CHECK-NEXT: ret <4 x float> [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> undef, <2 x i32> <i32 2, i32 0>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x float> [[B]], <4 x float> undef, <2 x i32> <i32 3, i32 1>
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x float> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP8]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
	; CHECK-NEXT: ret <4 x float> [[TMP9]]
	;			;
	%vecext = extractelement <4 x float> %a, i32 0			%vecext = extractelement <4 x float> %a, i32 0
	%vecext1 = extractelement <4 x float> %a, i32 1			%vecext1 = extractelement <4 x float> %a, i32 1
	%add = fadd float %vecext, %vecext1			%add = fadd float %vecext, %vecext1
	%vecinit = insertelement <4 x float> undef, float %add, i32 0			%vecinit = insertelement <4 x float> undef, float %add, i32 0
	%vecext2 = extractelement <4 x float> %a, i32 2			%vecext2 = extractelement <4 x float> %a, i32 2
	%vecext3 = extractelement <4 x float> %a, i32 3			%vecext3 = extractelement <4 x float> %a, i32 3
	%add4 = fadd float %vecext2, %vecext3			%add4 = fadd float %vecext2, %vecext3
	Show All 12 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCostClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 425476

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64PerfectShuffle.h

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/test/Analysis/CostModel/AArch64/shuffle-other.ll

llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll

llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll

[AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost
ClosedPublic