This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86TargetTransformInfo.cpp
-
test/Analysis/CostModel/X86/
-
Analysis/
-
CostModel/
-
X86/
-
interleaved-load-f64-stride-6.ll
-
interleaved-load-i64-stride-6.ll
-
interleaved-store-f64-stride-6.ll
-
interleaved-store-i64-stride-6.ll

Differential D111094

[X86][Costmodel] Load/store i64/f64 Stride=6 VF=8 interleaving costs
ClosedPublic

Authored by lebedev.ri on Oct 4 2021, 12:42 PM.

Download Raw Diff

Details

Reviewers

RKSimon

Commits

rG3f9b235482a0: [X86][Costmodel] Load/store i64/f64 Stride=6 VF=8 interleaving costs

Summary

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1jfGddcre - for intels Block RThroughput: =36.0; for ryzens, Block RThroughput: =12.0
So could pick cost of 36

For store we have:
https://godbolt.org/z/ao9srMT8r - for intels Block RThroughput: =30.0; for ryzens, Block RThroughput: =12.0
So we could pick cost of 30.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Oct 4 2021, 12:42 PM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptOct 4 2021, 12:42 PM

lebedev.ri requested review of this revision.Oct 4 2021, 12:42 PM

lebedev.ri added a parent revision: D111093: [X86][Costmodel] Load/store i64/f64 Stride=6 VF=4 interleaving costs.

Harbormaster completed remote builds in B126911: Diff 377007.Oct 4 2021, 2:17 PM

LGTM

This revision is now accepted and ready to land.Oct 5 2021, 6:21 AM

In D111094#3042506, @RKSimon wrote:

LGTM

HURRAY! Thank you for the reviews!
Let's see how far has this gotten us.

This revision was landed with ongoing or failed builds.Oct 5 2021, 7:00 AM

Closed by commit rG3f9b235482a0: [X86][Costmodel] Load/store i64/f64 Stride=6 VF=8 interleaving costs (authored by lebedev.ri). · Explain Why

This revision was automatically updated to reflect the committed changes.

lebedev.ri added a commit: rG3f9b235482a0: [X86][Costmodel] Load/store i64/f64 Stride=6 VF=8 interleaving costs.

So i've rechecked and as far as the full interleave groups go, this has fully covered my interest in rawspeed+darktable (and ended up vectorizing +12% (+113) more loops).
I've looked at tertiary relevant projects (i didn't before), and despite my best hopes,
stride=5/7/8 comes up in rawtherapee/babl/gegl/gimp :/
I don't really want to deal with that again right away, so i'll instead look into what's missing for non-fully-interleaved groups.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86TargetTransformInfo.cpp

2 lines

test/

Analysis/

CostModel/

X86/

interleaved-load-f64-stride-6.ll

2 lines

interleaved-load-i64-stride-6.ll

2 lines

interleaved-store-f64-stride-6.ll

2 lines

interleaved-store-i64-stride-6.ll

2 lines

Diff 377220

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 5,149 Lines • ▼ Show 20 Lines	static const CostTblEntry AVX2InterleavedLoadTbl[] = {

{6, MVT::v2i32, 6}, // (load 12i32 and) deinterleave into 6 x 2i32		{6, MVT::v2i32, 6}, // (load 12i32 and) deinterleave into 6 x 2i32
{6, MVT::v4i32, 15}, // (load 24i32 and) deinterleave into 6 x 4i32		{6, MVT::v4i32, 15}, // (load 24i32 and) deinterleave into 6 x 4i32
{6, MVT::v8i32, 31}, // (load 48i32 and) deinterleave into 6 x 8i32		{6, MVT::v8i32, 31}, // (load 48i32 and) deinterleave into 6 x 8i32
{6, MVT::v16i32, 64}, // (load 96i32 and) deinterleave into 6 x 16i32		{6, MVT::v16i32, 64}, // (load 96i32 and) deinterleave into 6 x 16i32

{6, MVT::v2i64, 6}, // (load 12i64 and) deinterleave into 6 x 2i64		{6, MVT::v2i64, 6}, // (load 12i64 and) deinterleave into 6 x 2i64
{6, MVT::v4i64, 18}, // (load 24i64 and) deinterleave into 6 x 4i64		{6, MVT::v4i64, 18}, // (load 24i64 and) deinterleave into 6 x 4i64
		{6, MVT::v8i64, 36}, // (load 48i64 and) deinterleave into 6 x 8i64

{8, MVT::v8i32, 40} // (load 64i32 and) deinterleave into 8 x 8i32		{8, MVT::v8i32, 40} // (load 64i32 and) deinterleave into 8 x 8i32
};		};

static const CostTblEntry AVX2InterleavedStoreTbl[] = {		static const CostTblEntry AVX2InterleavedStoreTbl[] = {
{2, MVT::v2i8, 1}, // interleave 2 x 2i8 into 4i8 (and store)		{2, MVT::v2i8, 1}, // interleave 2 x 2i8 into 4i8 (and store)
{2, MVT::v4i8, 1}, // interleave 2 x 4i8 into 8i8 (and store)		{2, MVT::v4i8, 1}, // interleave 2 x 4i8 into 8i8 (and store)
{2, MVT::v8i8, 1}, // interleave 2 x 8i8 into 16i8 (and store)		{2, MVT::v8i8, 1}, // interleave 2 x 8i8 into 16i8 (and store)
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	static const CostTblEntry AVX2InterleavedStoreTbl[] = {

{6, MVT::v2i32, 9}, // interleave 6 x 2i32 into 12i32 (and store)		{6, MVT::v2i32, 9}, // interleave 6 x 2i32 into 12i32 (and store)
{6, MVT::v4i32, 12}, // interleave 6 x 4i32 into 24i32 (and store)		{6, MVT::v4i32, 12}, // interleave 6 x 4i32 into 24i32 (and store)
{6, MVT::v8i32, 33}, // interleave 6 x 8i32 into 48i32 (and store)		{6, MVT::v8i32, 33}, // interleave 6 x 8i32 into 48i32 (and store)
{6, MVT::v16i32, 66}, // interleave 6 x 16i32 into 96i32 (and store)		{6, MVT::v16i32, 66}, // interleave 6 x 16i32 into 96i32 (and store)

{6, MVT::v2i64, 8}, // interleave 6 x 2i64 into 12i64 (and store)		{6, MVT::v2i64, 8}, // interleave 6 x 2i64 into 12i64 (and store)
{6, MVT::v4i64, 15}, // interleave 6 x 4i64 into 24i64 (and store)		{6, MVT::v4i64, 15}, // interleave 6 x 4i64 into 24i64 (and store)
		{6, MVT::v8i64, 30}, // interleave 6 x 8i64 into 48i64 (and store)
};		};

if (Opcode == Instruction::Load) {		if (Opcode == Instruction::Load) {
if (const auto *Entry =		if (const auto *Entry =
CostTableLookup(AVX2InterleavedLoadTbl, Factor, ETy.getSimpleVT()))		CostTableLookup(AVX2InterleavedLoadTbl, Factor, ETy.getSimpleVT()))
return MemOpCosts + Entry->Cost;		return MemOpCosts + Entry->Cost;
} else {		} else {
assert(Opcode == Instruction::Store &&		assert(Opcode == Instruction::Store &&
▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-6.ll

	Show All 18 Lines
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load double, double* %in0, align 8			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load double, double* %in0, align 8
	; AVX1: LV: Found an estimated cost of 21 for VF 2 For instruction: %v0 = load double, double* %in0, align 8			; AVX1: LV: Found an estimated cost of 21 for VF 2 For instruction: %v0 = load double, double* %in0, align 8
	; AVX1: LV: Found an estimated cost of 48 for VF 4 For instruction: %v0 = load double, double* %in0, align 8			; AVX1: LV: Found an estimated cost of 48 for VF 4 For instruction: %v0 = load double, double* %in0, align 8
	; AVX1: LV: Found an estimated cost of 96 for VF 8 For instruction: %v0 = load double, double* %in0, align 8			; AVX1: LV: Found an estimated cost of 96 for VF 8 For instruction: %v0 = load double, double* %in0, align 8
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load double, double* %in0, align 8			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load double, double* %in0, align 8
	; AVX2: LV: Found an estimated cost of 9 for VF 2 For instruction: %v0 = load double, double* %in0, align 8			; AVX2: LV: Found an estimated cost of 9 for VF 2 For instruction: %v0 = load double, double* %in0, align 8
	; AVX2: LV: Found an estimated cost of 24 for VF 4 For instruction: %v0 = load double, double* %in0, align 8			; AVX2: LV: Found an estimated cost of 24 for VF 4 For instruction: %v0 = load double, double* %in0, align 8
	; AVX2: LV: Found an estimated cost of 96 for VF 8 For instruction: %v0 = load double, double* %in0, align 8			; AVX2: LV: Found an estimated cost of 48 for VF 8 For instruction: %v0 = load double, double* %in0, align 8
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load double, double* %in0, align 8			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load double, double* %in0, align 8
	; AVX512: LV: Found an estimated cost of 11 for VF 2 For instruction: %v0 = load double, double* %in0, align 8			; AVX512: LV: Found an estimated cost of 11 for VF 2 For instruction: %v0 = load double, double* %in0, align 8
	; AVX512: LV: Found an estimated cost of 21 for VF 4 For instruction: %v0 = load double, double* %in0, align 8			; AVX512: LV: Found an estimated cost of 21 for VF 4 For instruction: %v0 = load double, double* %in0, align 8
	; AVX512: LV: Found an estimated cost of 51 for VF 8 For instruction: %v0 = load double, double* %in0, align 8			; AVX512: LV: Found an estimated cost of 51 for VF 8 For instruction: %v0 = load double, double* %in0, align 8
	; AVX512: LV: Found an estimated cost of 120 for VF 16 For instruction: %v0 = load double, double* %in0, align 8			; AVX512: LV: Found an estimated cost of 120 for VF 16 For instruction: %v0 = load double, double* %in0, align 8
	; AVX512: LV: Found an estimated cost of 240 for VF 32 For instruction: %v0 = load double, double* %in0, align 8			; AVX512: LV: Found an estimated cost of 240 for VF 32 For instruction: %v0 = load double, double* %in0, align 8
	;			;
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-6.ll

	Show All 18 Lines
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i64, i64* %in0, align 8			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i64, i64* %in0, align 8
	; AVX1: LV: Found an estimated cost of 33 for VF 2 For instruction: %v0 = load i64, i64* %in0, align 8			; AVX1: LV: Found an estimated cost of 33 for VF 2 For instruction: %v0 = load i64, i64* %in0, align 8
	; AVX1: LV: Found an estimated cost of 78 for VF 4 For instruction: %v0 = load i64, i64* %in0, align 8			; AVX1: LV: Found an estimated cost of 78 for VF 4 For instruction: %v0 = load i64, i64* %in0, align 8
	; AVX1: LV: Found an estimated cost of 156 for VF 8 For instruction: %v0 = load i64, i64* %in0, align 8			; AVX1: LV: Found an estimated cost of 156 for VF 8 For instruction: %v0 = load i64, i64* %in0, align 8
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i64, i64* %in0, align 8			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i64, i64* %in0, align 8
	; AVX2: LV: Found an estimated cost of 9 for VF 2 For instruction: %v0 = load i64, i64* %in0, align 8			; AVX2: LV: Found an estimated cost of 9 for VF 2 For instruction: %v0 = load i64, i64* %in0, align 8
	; AVX2: LV: Found an estimated cost of 24 for VF 4 For instruction: %v0 = load i64, i64* %in0, align 8			; AVX2: LV: Found an estimated cost of 24 for VF 4 For instruction: %v0 = load i64, i64* %in0, align 8
	; AVX2: LV: Found an estimated cost of 156 for VF 8 For instruction: %v0 = load i64, i64* %in0, align 8			; AVX2: LV: Found an estimated cost of 48 for VF 8 For instruction: %v0 = load i64, i64* %in0, align 8
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i64, i64* %in0, align 8			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i64, i64* %in0, align 8
	; AVX512: LV: Found an estimated cost of 11 for VF 2 For instruction: %v0 = load i64, i64* %in0, align 8			; AVX512: LV: Found an estimated cost of 11 for VF 2 For instruction: %v0 = load i64, i64* %in0, align 8
	; AVX512: LV: Found an estimated cost of 21 for VF 4 For instruction: %v0 = load i64, i64* %in0, align 8			; AVX512: LV: Found an estimated cost of 21 for VF 4 For instruction: %v0 = load i64, i64* %in0, align 8
	; AVX512: LV: Found an estimated cost of 51 for VF 8 For instruction: %v0 = load i64, i64* %in0, align 8			; AVX512: LV: Found an estimated cost of 51 for VF 8 For instruction: %v0 = load i64, i64* %in0, align 8
	; AVX512: LV: Found an estimated cost of 120 for VF 16 For instruction: %v0 = load i64, i64* %in0, align 8			; AVX512: LV: Found an estimated cost of 120 for VF 16 For instruction: %v0 = load i64, i64* %in0, align 8
	; AVX512: LV: Found an estimated cost of 240 for VF 32 For instruction: %v0 = load i64, i64* %in0, align 8			; AVX512: LV: Found an estimated cost of 240 for VF 32 For instruction: %v0 = load i64, i64* %in0, align 8
	;			;
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-6.ll

	Show All 18 Lines
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: store double %v5, double* %out5, align 8			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: store double %v5, double* %out5, align 8
	; AVX1: LV: Found an estimated cost of 21 for VF 2 For instruction: store double %v5, double* %out5, align 8			; AVX1: LV: Found an estimated cost of 21 for VF 2 For instruction: store double %v5, double* %out5, align 8
	; AVX1: LV: Found an estimated cost of 54 for VF 4 For instruction: store double %v5, double* %out5, align 8			; AVX1: LV: Found an estimated cost of 54 for VF 4 For instruction: store double %v5, double* %out5, align 8
	; AVX1: LV: Found an estimated cost of 108 for VF 8 For instruction: store double %v5, double* %out5, align 8			; AVX1: LV: Found an estimated cost of 108 for VF 8 For instruction: store double %v5, double* %out5, align 8
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: store double %v5, double* %out5, align 8			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: store double %v5, double* %out5, align 8
	; AVX2: LV: Found an estimated cost of 11 for VF 2 For instruction: store double %v5, double* %out5, align 8			; AVX2: LV: Found an estimated cost of 11 for VF 2 For instruction: store double %v5, double* %out5, align 8
	; AVX2: LV: Found an estimated cost of 21 for VF 4 For instruction: store double %v5, double* %out5, align 8			; AVX2: LV: Found an estimated cost of 21 for VF 4 For instruction: store double %v5, double* %out5, align 8
	; AVX2: LV: Found an estimated cost of 108 for VF 8 For instruction: store double %v5, double* %out5, align 8			; AVX2: LV: Found an estimated cost of 42 for VF 8 For instruction: store double %v5, double* %out5, align 8
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store double %v5, double* %out5, align 8			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store double %v5, double* %out5, align 8
	; AVX512: LV: Found an estimated cost of 17 for VF 2 For instruction: store double %v5, double* %out5, align 8			; AVX512: LV: Found an estimated cost of 17 for VF 2 For instruction: store double %v5, double* %out5, align 8
	; AVX512: LV: Found an estimated cost of 25 for VF 4 For instruction: store double %v5, double* %out5, align 8			; AVX512: LV: Found an estimated cost of 25 for VF 4 For instruction: store double %v5, double* %out5, align 8
	; AVX512: LV: Found an estimated cost of 51 for VF 8 For instruction: store double %v5, double* %out5, align 8			; AVX512: LV: Found an estimated cost of 51 for VF 8 For instruction: store double %v5, double* %out5, align 8
	; AVX512: LV: Found an estimated cost of 102 for VF 16 For instruction: store double %v5, double* %out5, align 8			; AVX512: LV: Found an estimated cost of 102 for VF 16 For instruction: store double %v5, double* %out5, align 8
	; AVX512: LV: Found an estimated cost of 204 for VF 32 For instruction: store double %v5, double* %out5, align 8			; AVX512: LV: Found an estimated cost of 204 for VF 32 For instruction: store double %v5, double* %out5, align 8
	;			;
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-6.ll

	Show All 18 Lines
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %v5, i64* %out5, align 8			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %v5, i64* %out5, align 8
	; AVX1: LV: Found an estimated cost of 33 for VF 2 For instruction: store i64 %v5, i64* %out5, align 8			; AVX1: LV: Found an estimated cost of 33 for VF 2 For instruction: store i64 %v5, i64* %out5, align 8
	; AVX1: LV: Found an estimated cost of 78 for VF 4 For instruction: store i64 %v5, i64* %out5, align 8			; AVX1: LV: Found an estimated cost of 78 for VF 4 For instruction: store i64 %v5, i64* %out5, align 8
	; AVX1: LV: Found an estimated cost of 156 for VF 8 For instruction: store i64 %v5, i64* %out5, align 8			; AVX1: LV: Found an estimated cost of 156 for VF 8 For instruction: store i64 %v5, i64* %out5, align 8
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %v5, i64* %out5, align 8			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %v5, i64* %out5, align 8
	; AVX2: LV: Found an estimated cost of 11 for VF 2 For instruction: store i64 %v5, i64* %out5, align 8			; AVX2: LV: Found an estimated cost of 11 for VF 2 For instruction: store i64 %v5, i64* %out5, align 8
	; AVX2: LV: Found an estimated cost of 21 for VF 4 For instruction: store i64 %v5, i64* %out5, align 8			; AVX2: LV: Found an estimated cost of 21 for VF 4 For instruction: store i64 %v5, i64* %out5, align 8
	; AVX2: LV: Found an estimated cost of 156 for VF 8 For instruction: store i64 %v5, i64* %out5, align 8			; AVX2: LV: Found an estimated cost of 42 for VF 8 For instruction: store i64 %v5, i64* %out5, align 8
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %v5, i64* %out5, align 8			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %v5, i64* %out5, align 8
	; AVX512: LV: Found an estimated cost of 17 for VF 2 For instruction: store i64 %v5, i64* %out5, align 8			; AVX512: LV: Found an estimated cost of 17 for VF 2 For instruction: store i64 %v5, i64* %out5, align 8
	; AVX512: LV: Found an estimated cost of 25 for VF 4 For instruction: store i64 %v5, i64* %out5, align 8			; AVX512: LV: Found an estimated cost of 25 for VF 4 For instruction: store i64 %v5, i64* %out5, align 8
	; AVX512: LV: Found an estimated cost of 51 for VF 8 For instruction: store i64 %v5, i64* %out5, align 8			; AVX512: LV: Found an estimated cost of 51 for VF 8 For instruction: store i64 %v5, i64* %out5, align 8
	; AVX512: LV: Found an estimated cost of 102 for VF 16 For instruction: store i64 %v5, i64* %out5, align 8			; AVX512: LV: Found an estimated cost of 102 for VF 16 For instruction: store i64 %v5, i64* %out5, align 8
	; AVX512: LV: Found an estimated cost of 204 for VF 32 For instruction: store i64 %v5, i64* %out5, align 8			; AVX512: LV: Found an estimated cost of 204 for VF 32 For instruction: store i64 %v5, i64* %out5, align 8
	;			;
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86][Costmodel] Load/store i64/f64 Stride=6 VF=8 interleaving costsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 377220

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-6.ll

llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-6.ll

llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-6.ll

llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-6.ll

[X86][Costmodel] Load/store i64/f64 Stride=6 VF=8 interleaving costs
ClosedPublic