This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
1/1
X86TargetTransformInfo.cpp
-
test/Analysis/CostModel/X86/
-
Analysis/
-
CostModel/
-
X86/
-
interleaved-load-i32-stride-2-indices-0u.ll
-
interleaved-load-i32-stride-3-indices-01u.ll
-
interleaved-load-i32-stride-3-indices-0uu.ll
-
interleaved-load-i32-stride-4-indices-012u.ll
-
interleaved-load-i32-stride-4-indices-01uu.ll
-
interleaved-load-i32-stride-4-indices-0uuu.ll

Differential D112307

[X86] `X86TTIImpl::getInterleavedMemoryOpCost()`: scale interleaving cost by the fraction of live members
ClosedPublic

Authored by lebedev.ri on Oct 22 2021, 4:33 AM.

Download Raw Diff

Details

Reviewers

RKSimon

Commits

rG8fac9e95ade9: [X86] `X86TTIImpl::getInterleavedMemoryOpCost()`: scale interleaving cost by…

Summary

By definition, interleaving load of stride N means:
load N*VF elements, and shuffle them into N VF-sized vectors,
with 0'th vector containing elements [0, VF)*stride + 0,
and 1'th vector containing elements [0, VF)*stride + 1.
Example: https://godbolt.org/z/df561Me5E (i64 stride 4 vf 2 => cost 6)

Now, not fully interleaved load, is when not all of these vectors is demanded.
So at worst, we could just pretend that everything is demanded,
and discard the non-demanded vectors. What this means is that the cost
for not-fully-interleaved group should be not greater than the cost
for the same fully-interleaved group, but perhaps somewhat less.
Examples:
https://godbolt.org/z/a78dK5Geq (i64 stride 4 (indices 012u) vf 2 => cost 4)
https://godbolt.org/z/G91ceo8dM (i64 stride 4 (indices 01uu) vf 2 => cost 2)
https://godbolt.org/z/5joYob9rx (i64 stride 4 (indices 0uuu) vf 2 => cost 1)

Right now, for such not-fully-interleaved loads we just use the costs
for fully-interleaved loads. But at least in general,
that is obviously overly pessimistic, because in general,
not all the shuffles needed to perform the full interleaving
will end up being live.

So what i propose, is to naively scale the interleaving cost
by the fraction of the live members. I believe this should still result
in the right ballpark cost estimate.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Oct 22 2021, 4:33 AM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptOct 22 2021, 4:33 AM

lebedev.ri requested review of this revision.Oct 22 2021, 4:33 AM

Harbormaster completed remote builds in B130116: Diff 381506.Oct 22 2021, 4:36 AM

Rebased, NFC.

Harbormaster completed remote builds in B130121: Diff 381516.Oct 22 2021, 5:12 AM

SGTM - cheers

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
5428–5434	Add a comment explaining this is just an approximation

This revision is now accepted and ready to land.Oct 22 2021, 5:57 AM

In D112307#3080480, @RKSimon wrote:

SGTM - cheers

To check - was it clear from my explanation that this is a *rough* approximation,
that can deviate from reality in either direction, i.e. it may both be
higher cost than in reality, and for some cases it might be lower cost than in reality?

In D112307#3080510, @lebedev.ri wrote:

In D112307#3080480, @RKSimon wrote:

SGTM - cheers

To check - was it clear from my explanation that this is a *rough* approximation,
that can deviate from reality in either direction, i.e. it may both be
higher cost than in reality, and for some cases it might be lower cost than in reality?

Yes - which is why I asked you to include that as a comment to GetDiscountedCost. Cheers.

In D112307#3080532, @RKSimon wrote:

In D112307#3080510, @lebedev.ri wrote:

In D112307#3080480, @RKSimon wrote:

SGTM - cheers

To check - was it clear from my explanation that this is a *rough* approximation,
that can deviate from reality in either direction, i.e. it may both be
higher cost than in reality, and for some cases it might be lower cost than in reality?

Yes - which is why I asked you to include that as a comment to GetDiscountedCost. Cheers.

Awesome, thank you for the review!

This revision was landed with ongoing or failed builds.Oct 22 2021, 6:34 AM

Closed by commit rG8fac9e95ade9: [X86] `X86TTIImpl::getInterleavedMemoryOpCost()`: scale interleaving cost by… (authored by lebedev.ri). · Explain Why

This revision was automatically updated to reflect the committed changes.

lebedev.ri added a commit: rG8fac9e95ade9: [X86] `X86TTIImpl::getInterleavedMemoryOpCost()`: scale interleaving cost by….

Harbormaster completed remote builds in B130127: Diff 381527.Oct 22 2021, 6:39 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86TargetTransformInfo.cpp

16 lines

test/

Analysis/

CostModel/

X86/

interleaved-load-i32-stride-2-indices-0u.ll

18 lines

interleaved-load-i32-stride-3-indices-01u.ll

10 lines

interleaved-load-i32-stride-3-indices-0uu.ll

10 lines

interleaved-load-i32-stride-4-indices-012u.ll

10 lines

interleaved-load-i32-stride-4-indices-01uu.ll

10 lines

interleaved-load-i32-stride-4-indices-0uuu.ll

10 lines

Diff 381530

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 5,199 Lines • ▼ Show 20 Lines	InstructionCost X86TTIImpl::getInterleavedMemoryOpCost(
unsigned VF = VecTy->getNumElements() / Factor;		unsigned VF = VecTy->getNumElements() / Factor;
Type *ScalarTy = VecTy->getElementType();		Type *ScalarTy = VecTy->getElementType();
// Deduplicate entries, model floats/pointers as appropriately-sized integers.		// Deduplicate entries, model floats/pointers as appropriately-sized integers.
if (!ScalarTy->isIntegerTy())		if (!ScalarTy->isIntegerTy())
ScalarTy =		ScalarTy =
Type::getIntNTy(ScalarTy->getContext(), DL.getTypeSizeInBits(ScalarTy));		Type::getIntNTy(ScalarTy->getContext(), DL.getTypeSizeInBits(ScalarTy));

// Get the cost of all the memory operations.		// Get the cost of all the memory operations.
		// FIXME: discount dead loads.
InstructionCost MemOpCosts = getMemoryOpCost(		InstructionCost MemOpCosts = getMemoryOpCost(
Opcode, VecTy, MaybeAlign(Alignment), AddressSpace, CostKind);		Opcode, VecTy, MaybeAlign(Alignment), AddressSpace, CostKind);

auto *VT = FixedVectorType::get(ScalarTy, VF);		auto *VT = FixedVectorType::get(ScalarTy, VF);
EVT ETy = TLI->getValueType(DL, VT);		EVT ETy = TLI->getValueType(DL, VT);
if (!ETy.isSimple())		if (!ETy.isSimple())
return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,		return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace, CostKind);		Alignment, AddressSpace, CostKind);
▲ Show 20 Lines • Show All 203 Lines • ▼ Show 20 Lines	static const CostTblEntry SSE2InterleavedStoreTbl[] = {

{2, MVT::v2i16, 1}, // interleave 2 x 2i16 into 4i16 (and store)		{2, MVT::v2i16, 1}, // interleave 2 x 2i16 into 4i16 (and store)
{2, MVT::v4i16, 1}, // interleave 2 x 4i16 into 8i16 (and store)		{2, MVT::v4i16, 1}, // interleave 2 x 4i16 into 8i16 (and store)

{2, MVT::v2i32, 1}, // interleave 2 x 2i32 into 4i32 (and store)		{2, MVT::v2i32, 1}, // interleave 2 x 2i32 into 4i32 (and store)
};		};

if (Opcode == Instruction::Load) {		if (Opcode == Instruction::Load) {
// FIXME: if we have a partially-interleaved groups, with gaps,		auto GetDiscountedCost = [Factor, NumMembers = Indices.size(),
// should we discount the not-demanded indicies?		MemOpCosts](const CostTblEntry *Entry) {
		// NOTE: this is just an approximation!
		// It can over/under -estimate the cost!
		return MemOpCosts + divideCeil(NumMembers * Entry->Cost, Factor);
		};

		RKSimonUnsubmitted Done Reply Inline Actions Add a comment explaining this is just an approximation RKSimon: Add a comment explaining this is just an approximation
if (ST->hasAVX2())		if (ST->hasAVX2())
if (const auto *Entry = CostTableLookup(AVX2InterleavedLoadTbl, Factor,		if (const auto *Entry = CostTableLookup(AVX2InterleavedLoadTbl, Factor,
ETy.getSimpleVT()))		ETy.getSimpleVT()))
return MemOpCosts + Entry->Cost;		return GetDiscountedCost(Entry);

if (ST->hasSSSE3())		if (ST->hasSSSE3())
if (const auto *Entry = CostTableLookup(SSSE3InterleavedLoadTbl, Factor,		if (const auto *Entry = CostTableLookup(SSSE3InterleavedLoadTbl, Factor,
ETy.getSimpleVT()))		ETy.getSimpleVT()))
return MemOpCosts + Entry->Cost;		return GetDiscountedCost(Entry);

if (ST->hasSSE2())		if (ST->hasSSE2())
if (const auto *Entry = CostTableLookup(SSE2InterleavedLoadTbl, Factor,		if (const auto *Entry = CostTableLookup(SSE2InterleavedLoadTbl, Factor,
ETy.getSimpleVT()))		ETy.getSimpleVT()))
return MemOpCosts + Entry->Cost;		return GetDiscountedCost(Entry);
} else {		} else {
assert(Opcode == Instruction::Store &&		assert(Opcode == Instruction::Store &&
"Expected Store Instruction at this point");		"Expected Store Instruction at this point");
assert((!Indices.size() \|\| Indices.size() == Factor) &&		assert((!Indices.size() \|\| Indices.size() == Factor) &&
"Interleaved store only supports fully-interleaved groups.");		"Interleaved store only supports fully-interleaved groups.");
if (ST->hasAVX2())		if (ST->hasAVX2())
if (const auto *Entry = CostTableLookup(AVX2InterleavedStoreTbl, Factor,		if (const auto *Entry = CostTableLookup(AVX2InterleavedStoreTbl, Factor,
ETy.getSimpleVT()))		ETy.getSimpleVT()))
Show All 12 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2-indices-0u.ll

	; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+sse2 --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,SSE2			; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+sse2 --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,SSE2
	; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX1			; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX1
	; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx2 --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX2			; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx2 --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX2
	; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx512bw,+avx512vl --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX512			; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx512bw,+avx512vl --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX512
	; REQUIRES: asserts			; REQUIRES: asserts

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@A = global [1024 x i32] zeroinitializer, align 128			@A = global [1024 x i32] zeroinitializer, align 128
	@B = global [1024 x i8] zeroinitializer, align 128			@B = global [1024 x i8] zeroinitializer, align 128

	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; SSE2: LV: Found an estimated cost of 3 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; SSE2: LV: Found an estimated cost of 4 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; SSE2: LV: Found an estimated cost of 3 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; SSE2: LV: Found an estimated cost of 30 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; SSE2: LV: Found an estimated cost of 30 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; SSE2: LV: Found an estimated cost of 60 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; SSE2: LV: Found an estimated cost of 60 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 3 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 3 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 2 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 24 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 24 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 48 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 48 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 96 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 96 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 3 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 2 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 3 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 2 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 6 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 4 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 12 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 8 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 24 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 16 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 1 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 1 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 1 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 2 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 2 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 13 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 13 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 50 for VF 64 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 50 for VF 64 For instruction: %v0 = load i32, i32* %in0, align 4
	Show All 30 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-01u.ll

	Show All 20 Lines
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 12 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 12 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 21 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 21 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 47 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 47 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 94 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 94 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 188 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 188 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 6 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 5 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 5 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 4 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 10 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 8 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 20 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 16 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 44 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 34 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 3 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 3 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 3 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 3 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 5 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 5 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 9 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 9 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 36 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 36 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 144 for VF 64 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 144 for VF 64 For instruction: %v0 = load i32, i32* %in0, align 4
	Show All 34 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-0uu.ll

	Show All 20 Lines
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 7 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 7 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 11 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 11 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 25 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 25 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 50 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 50 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 100 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 100 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 6 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 4 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 5 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 3 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 10 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 6 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 20 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 11 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 44 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 23 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 1 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 1 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 2 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 2 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 3 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 3 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 21 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 21 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 78 for VF 64 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 78 for VF 64 For instruction: %v0 = load i32, i32* %in0, align 4
	Show All 31 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-012u.ll

	Show All 20 Lines
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 16 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 16 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 32 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 32 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 70 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 70 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 140 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 140 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 280 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 280 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 5 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 4 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 10 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 8 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 20 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 16 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 40 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 32 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 84 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 67 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 4 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 4 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 4 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 4 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 6 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 6 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 17 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 17 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 71 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 71 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	Show All 37 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-01uu.ll

	Show All 20 Lines
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 11 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 11 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 22 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 22 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 48 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 48 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 96 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 96 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 192 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 192 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 5 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 3 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 10 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 6 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 20 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 12 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 40 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 24 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 84 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 50 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 3 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 3 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 3 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 3 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 5 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 5 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 13 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 13 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 50 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 50 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 160 for VF 64 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 160 for VF 64 For instruction: %v0 = load i32, i32* %in0, align 4
	Show All 35 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-0uuu.ll

	Show All 20 Lines
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 6 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 6 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 12 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 12 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 26 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 26 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 52 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 52 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 104 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 104 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 5 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 2 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 10 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 4 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 20 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 8 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 40 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 16 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 84 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 33 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 1 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 1 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 2 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 2 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 5 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 5 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 29 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 29 for VF 32 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 80 for VF 64 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 80 for VF 64 For instruction: %v0 = load i32, i32* %in0, align 4
	Show All 32 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] `X86TTIImpl::getInterleavedMemoryOpCost()`: scale interleaving cost by the fraction of live membersClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 381530

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2-indices-0u.ll

llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-01u.ll

llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-0uu.ll

llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-012u.ll

llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-01uu.ll

llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-0uuu.ll

[X86] `X86TTIImpl::getInterleavedMemoryOpCost()`: scale interleaving cost by the fraction of live members
ClosedPublic