This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86TargetTransformInfo.cpp
-
test/Analysis/CostModel/X86/
-
Analysis/
-
CostModel/
-
X86/
-
interleaved-load-i8-stride-3.ll
-
interleaved-store-i8-stride-3.ll

Differential D110958

[X86][Costmodel] Load/store i8 Stride=3 VF=4 interleaving costs
ClosedPublic

Authored by lebedev.ri on Oct 1 2021, 12:14 PM.

Download Raw Diff

Details

Reviewers

RKSimon

Commits

rGf1df2d8eaf18: [X86][Costmodel] Load/store i8 Stride=3 VF=4 interleaving costs

Summary

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/obWz3PrfK - for intels Block RThroughput: =3.0; for ryzens, Block RThroughput: <=1.5
So pick cost of 3.

For store we have:
https://godbolt.org/z/orjPshn3h - for intels Block RThroughput: =4.0; for ryzens, Block RThroughput: <=2.0
So pick cost of 4.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Oct 1 2021, 12:14 PM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptOct 1 2021, 12:14 PM

lebedev.ri requested review of this revision.Oct 1 2021, 12:14 PM

lebedev.ri added a parent revision: D110956: [X86][Costmodel] Load/store i8 Stride=3 VF=2 interleaving costs.

lebedev.ri added a child revision: D110960: [X86][Costmodel] Load/store i8 Stride=3 VF=8 interleaving costs.Oct 1 2021, 12:19 PM

LGTM

This revision is now accepted and ready to land.Oct 2 2021, 3:15 AM

Closed by commit rGf1df2d8eaf18: [X86][Costmodel] Load/store i8 Stride=3 VF=4 interleaving costs (authored by lebedev.ri). · Explain WhyOct 2 2021, 3:52 AM

This revision was automatically updated to reflect the committed changes.

lebedev.ri added a commit: rGf1df2d8eaf18: [X86][Costmodel] Load/store i8 Stride=3 VF=4 interleaving costs.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86TargetTransformInfo.cpp

4 lines

test/

Analysis/

CostModel/

X86/

interleaved-load-i8-stride-3.ll

2 lines

interleaved-store-i8-stride-3.ll

2 lines

Diff 376691

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 5,081 Lines • ▼ Show 20 Lines	static const CostTblEntry AVX2InterleavedLoadTbl[] = {
{2, MVT::v32i32, 16}, // (load 64i32 and) deinterleave into 2 x 32i32		{2, MVT::v32i32, 16}, // (load 64i32 and) deinterleave into 2 x 32i32

{2, MVT::v2i64, 2}, // (load 4i64 and) deinterleave into 2 x 2i64		{2, MVT::v2i64, 2}, // (load 4i64 and) deinterleave into 2 x 2i64
{2, MVT::v4i64, 4}, // (load 8i64 and) deinterleave into 2 x 4i64		{2, MVT::v4i64, 4}, // (load 8i64 and) deinterleave into 2 x 4i64
{2, MVT::v8i64, 8}, // (load 16i64 and) deinterleave into 2 x 8i64		{2, MVT::v8i64, 8}, // (load 16i64 and) deinterleave into 2 x 8i64
{2, MVT::v16i64, 16}, // (load 32i64 and) deinterleave into 2 x 16i64		{2, MVT::v16i64, 16}, // (load 32i64 and) deinterleave into 2 x 16i64

{3, MVT::v2i8, 3}, // (load 6i8 and) deinterleave into 3 x 2i8		{3, MVT::v2i8, 3}, // (load 6i8 and) deinterleave into 3 x 2i8
{3, MVT::v4i8, 4}, // (load 12i8 and) deinterleave into 3 x 4i8		{3, MVT::v4i8, 3}, // (load 12i8 and) deinterleave into 3 x 4i8
{3, MVT::v8i8, 9}, // (load 24i8 and) deinterleave into 3 x 8i8		{3, MVT::v8i8, 9}, // (load 24i8 and) deinterleave into 3 x 8i8
{3, MVT::v16i8, 11}, // (load 48i8 and) deinterleave into 3 x 16i8		{3, MVT::v16i8, 11}, // (load 48i8 and) deinterleave into 3 x 16i8
{3, MVT::v32i8, 13}, // (load 96i8 and) deinterleave into 3 x 32i8		{3, MVT::v32i8, 13}, // (load 96i8 and) deinterleave into 3 x 32i8

{3, MVT::v8i32, 17}, // (load 24i32 and) deinterleave into 3 x 8i32		{3, MVT::v8i32, 17}, // (load 24i32 and) deinterleave into 3 x 8i32

{4, MVT::v2i8, 12}, // (load 8i8 and) deinterleave into 4 x 2i8		{4, MVT::v2i8, 12}, // (load 8i8 and) deinterleave into 4 x 2i8
{4, MVT::v4i8, 4}, // (load 16i8 and) deinterleave into 4 x 4i8		{4, MVT::v4i8, 4}, // (load 16i8 and) deinterleave into 4 x 4i8
Show All 35 Lines	static const CostTblEntry AVX2InterleavedStoreTbl[] = {
{2, MVT::v32i32, 16}, // interleave 2 x 32i32 into 64i32 (and store)		{2, MVT::v32i32, 16}, // interleave 2 x 32i32 into 64i32 (and store)

{2, MVT::v2i64, 2}, // interleave 2 x 2i64 into 4i64 (and store)		{2, MVT::v2i64, 2}, // interleave 2 x 2i64 into 4i64 (and store)
{2, MVT::v4i64, 4}, // interleave 2 x 4i64 into 8i64 (and store)		{2, MVT::v4i64, 4}, // interleave 2 x 4i64 into 8i64 (and store)
{2, MVT::v8i64, 8}, // interleave 2 x 8i64 into 16i64 (and store)		{2, MVT::v8i64, 8}, // interleave 2 x 8i64 into 16i64 (and store)
{2, MVT::v16i64, 16}, // interleave 2 x 16i64 into 32i64 (and store)		{2, MVT::v16i64, 16}, // interleave 2 x 16i64 into 32i64 (and store)

{3, MVT::v2i8, 4}, // interleave 3 x 2i8 into 6i8 (and store)		{3, MVT::v2i8, 4}, // interleave 3 x 2i8 into 6i8 (and store)
{3, MVT::v4i8, 8}, // interleave 3 x 4i8 into 12i8 (and store)		{3, MVT::v4i8, 4}, // interleave 3 x 4i8 into 12i8 (and store)
{3, MVT::v8i8, 11}, // interleave 3 x 8i8 into 24i8 (and store)		{3, MVT::v8i8, 11}, // interleave 3 x 8i8 into 24i8 (and store)
{3, MVT::v16i8, 11}, // interleave 3 x 16i8 into 48i8 (and store)		{3, MVT::v16i8, 11}, // interleave 3 x 16i8 into 48i8 (and store)
{3, MVT::v32i8, 13}, // interleave 3 x 32i8 into 96i8 (and store)		{3, MVT::v32i8, 13}, // interleave 3 x 32i8 into 96i8 (and store)

{4, MVT::v2i8, 12}, // interleave 4 x 2i8 into 8i8 (and store)		{4, MVT::v2i8, 12}, // interleave 4 x 2i8 into 8i8 (and store)
{4, MVT::v4i8, 9}, // interleave 4 x 4i8 into 16i8 (and store)		{4, MVT::v4i8, 9}, // interleave 4 x 4i8 into 16i8 (and store)
{4, MVT::v8i8, 10}, // interleave 4 x 8i8 into 32i8 (and store)		{4, MVT::v8i8, 10}, // interleave 4 x 8i8 into 32i8 (and store)
{4, MVT::v16i8, 10}, // interleave 4 x 16i8 into 64i8 (and store)		{4, MVT::v16i8, 10}, // interleave 4 x 16i8 into 64i8 (and store)
▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll

	Show All 21 Lines
	; AVX1: LV: Found an estimated cost of 15 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX1: LV: Found an estimated cost of 15 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX1: LV: Found an estimated cost of 27 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX1: LV: Found an estimated cost of 27 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX1: LV: Found an estimated cost of 59 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX1: LV: Found an estimated cost of 59 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX1: LV: Found an estimated cost of 114 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX1: LV: Found an estimated cost of 114 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX1: LV: Found an estimated cost of 249 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX1: LV: Found an estimated cost of 249 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX2: LV: Found an estimated cost of 6 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 6 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX2: LV: Found an estimated cost of 7 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 6 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX2: LV: Found an estimated cost of 12 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 12 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX2: LV: Found an estimated cost of 13 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 13 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX2: LV: Found an estimated cost of 16 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 16 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX512: LV: Found an estimated cost of 4 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 4 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX512: LV: Found an estimated cost of 4 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 4 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX512: LV: Found an estimated cost of 13 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 13 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1
	Show All 38 Lines

llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-3.ll

	Show All 21 Lines
	; AVX1: LV: Found an estimated cost of 15 for VF 2 For instruction: store i8 %v2, i8* %out2, align 1			; AVX1: LV: Found an estimated cost of 15 for VF 2 For instruction: store i8 %v2, i8* %out2, align 1
	; AVX1: LV: Found an estimated cost of 27 for VF 4 For instruction: store i8 %v2, i8* %out2, align 1			; AVX1: LV: Found an estimated cost of 27 for VF 4 For instruction: store i8 %v2, i8* %out2, align 1
	; AVX1: LV: Found an estimated cost of 54 for VF 8 For instruction: store i8 %v2, i8* %out2, align 1			; AVX1: LV: Found an estimated cost of 54 for VF 8 For instruction: store i8 %v2, i8* %out2, align 1
	; AVX1: LV: Found an estimated cost of 101 for VF 16 For instruction: store i8 %v2, i8* %out2, align 1			; AVX1: LV: Found an estimated cost of 101 for VF 16 For instruction: store i8 %v2, i8* %out2, align 1
	; AVX1: LV: Found an estimated cost of 249 for VF 32 For instruction: store i8 %v2, i8* %out2, align 1			; AVX1: LV: Found an estimated cost of 249 for VF 32 For instruction: store i8 %v2, i8* %out2, align 1
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %v2, i8* %out2, align 1			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %v2, i8* %out2, align 1
	; AVX2: LV: Found an estimated cost of 7 for VF 2 For instruction: store i8 %v2, i8* %out2, align 1			; AVX2: LV: Found an estimated cost of 7 for VF 2 For instruction: store i8 %v2, i8* %out2, align 1
	; AVX2: LV: Found an estimated cost of 11 for VF 4 For instruction: store i8 %v2, i8* %out2, align 1			; AVX2: LV: Found an estimated cost of 7 for VF 4 For instruction: store i8 %v2, i8* %out2, align 1
	; AVX2: LV: Found an estimated cost of 14 for VF 8 For instruction: store i8 %v2, i8* %out2, align 1			; AVX2: LV: Found an estimated cost of 14 for VF 8 For instruction: store i8 %v2, i8* %out2, align 1
	; AVX2: LV: Found an estimated cost of 13 for VF 16 For instruction: store i8 %v2, i8* %out2, align 1			; AVX2: LV: Found an estimated cost of 13 for VF 16 For instruction: store i8 %v2, i8* %out2, align 1
	; AVX2: LV: Found an estimated cost of 16 for VF 32 For instruction: store i8 %v2, i8* %out2, align 1			; AVX2: LV: Found an estimated cost of 16 for VF 32 For instruction: store i8 %v2, i8* %out2, align 1
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %v2, i8* %out2, align 1			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %v2, i8* %out2, align 1
	; AVX512: LV: Found an estimated cost of 8 for VF 2 For instruction: store i8 %v2, i8* %out2, align 1			; AVX512: LV: Found an estimated cost of 8 for VF 2 For instruction: store i8 %v2, i8* %out2, align 1
	; AVX512: LV: Found an estimated cost of 8 for VF 4 For instruction: store i8 %v2, i8* %out2, align 1			; AVX512: LV: Found an estimated cost of 8 for VF 4 For instruction: store i8 %v2, i8* %out2, align 1
	; AVX512: LV: Found an estimated cost of 16 for VF 8 For instruction: store i8 %v2, i8* %out2, align 1			; AVX512: LV: Found an estimated cost of 16 for VF 8 For instruction: store i8 %v2, i8* %out2, align 1
	Show All 39 Lines