This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86TargetTransformInfo.cpp
-
test/Analysis/CostModel/X86/
-
Analysis/
-
CostModel/
-
X86/
-
interleaved-load-i8-stride-4.ll
-
interleaved-store-i8-stride-4.ll

Differential D110970

[X86][Costmodel] Load/store i8 Stride=4 VF=16 interleaving costs
ClosedPublic

Authored by lebedev.ri on Oct 1 2021, 1:27 PM.

Download Raw Diff

Details

Reviewers

RKSimon

Commits

rG0e71ae6da8f3: [X86][Costmodel] Load/store i8 Stride=4 VF=16 interleaving costs

Summary

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/TrGW7cKsE - for intels Block RThroughput: =24.0; for ryzens, Block RThroughput: <=12.0
So pick cost of 24.

For store we have:
https://godbolt.org/z/Mh7qaqEfe - for intels Block RThroughput: =8.0; for ryzens, Block RThroughput: <=4.0
So pick cost of 8.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Oct 1 2021, 1:27 PM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptOct 1 2021, 1:27 PM

lebedev.ri requested review of this revision.Oct 1 2021, 1:27 PM

lebedev.ri added a parent revision: D110969: [X86][Costmodel] Load/store i8 Stride=4 VF=8 interleaving costs.

lebedev.ri added a child revision: D110971: [X86][Costmodel] Load/store i8 Stride=4 VF=32 interleaving costs.Oct 1 2021, 1:33 PM

LGTM

This revision is now accepted and ready to land.Oct 2 2021, 3:21 AM

Closed by commit rG0e71ae6da8f3: [X86][Costmodel] Load/store i8 Stride=4 VF=16 interleaving costs (authored by lebedev.ri). · Explain WhyOct 2 2021, 3:53 AM

This revision was automatically updated to reflect the committed changes.

lebedev.ri added a commit: rG0e71ae6da8f3: [X86][Costmodel] Load/store i8 Stride=4 VF=16 interleaving costs.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86TargetTransformInfo.cpp

4 lines

test/

Analysis/

CostModel/

X86/

interleaved-load-i8-stride-4.ll

2 lines

interleaved-store-i8-stride-4.ll

2 lines

Diff 376697

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 5,091 Lines • ▼ Show 20 Lines	static const CostTblEntry AVX2InterleavedLoadTbl[] = {
{3, MVT::v16i8, 11}, // (load 48i8 and) deinterleave into 3 x 16i8		{3, MVT::v16i8, 11}, // (load 48i8 and) deinterleave into 3 x 16i8
{3, MVT::v32i8, 14}, // (load 96i8 and) deinterleave into 3 x 32i8		{3, MVT::v32i8, 14}, // (load 96i8 and) deinterleave into 3 x 32i8

{3, MVT::v8i32, 17}, // (load 24i32 and) deinterleave into 3 x 8i32		{3, MVT::v8i32, 17}, // (load 24i32 and) deinterleave into 3 x 8i32

{4, MVT::v2i8, 4}, // (load 8i8 and) deinterleave into 4 x 2i8		{4, MVT::v2i8, 4}, // (load 8i8 and) deinterleave into 4 x 2i8
{4, MVT::v4i8, 4}, // (load 16i8 and) deinterleave into 4 x 4i8		{4, MVT::v4i8, 4}, // (load 16i8 and) deinterleave into 4 x 4i8
{4, MVT::v8i8, 12}, // (load 32i8 and) deinterleave into 4 x 8i8		{4, MVT::v8i8, 12}, // (load 32i8 and) deinterleave into 4 x 8i8
{4, MVT::v16i8, 39}, // (load 64i8 and) deinterleave into 4 x 16i8		{4, MVT::v16i8, 24}, // (load 64i8 and) deinterleave into 4 x 16i8
{4, MVT::v32i8, 80}, // (load 128i8 and) deinterleave into 4 x 32i8		{4, MVT::v32i8, 80}, // (load 128i8 and) deinterleave into 4 x 32i8

{4, MVT::v2i16, 6}, // (load 8i16 and) deinterleave into 4 x 2i16		{4, MVT::v2i16, 6}, // (load 8i16 and) deinterleave into 4 x 2i16
{4, MVT::v4i16, 17}, // (load 16i16 and) deinterleave into 4 x 4i16		{4, MVT::v4i16, 17}, // (load 16i16 and) deinterleave into 4 x 4i16
{4, MVT::v8i16, 33}, // (load 32i16 and) deinterleave into 4 x 8i16		{4, MVT::v8i16, 33}, // (load 32i16 and) deinterleave into 4 x 8i16
{4, MVT::v16i16, 75}, // (load 64i16 and) deinterleave into 4 x 16i16		{4, MVT::v16i16, 75}, // (load 64i16 and) deinterleave into 4 x 16i16
{4, MVT::v32i16, 150}, // (load 128i16 and) deinterleave into 4 x 32i16		{4, MVT::v32i16, 150}, // (load 128i16 and) deinterleave into 4 x 32i16

Show All 33 Lines	static const CostTblEntry AVX2InterleavedStoreTbl[] = {
{3, MVT::v4i8, 4}, // interleave 3 x 4i8 into 12i8 (and store)		{3, MVT::v4i8, 4}, // interleave 3 x 4i8 into 12i8 (and store)
{3, MVT::v8i8, 6}, // interleave 3 x 8i8 into 24i8 (and store)		{3, MVT::v8i8, 6}, // interleave 3 x 8i8 into 24i8 (and store)
{3, MVT::v16i8, 11}, // interleave 3 x 16i8 into 48i8 (and store)		{3, MVT::v16i8, 11}, // interleave 3 x 16i8 into 48i8 (and store)
{3, MVT::v32i8, 13}, // interleave 3 x 32i8 into 96i8 (and store)		{3, MVT::v32i8, 13}, // interleave 3 x 32i8 into 96i8 (and store)

{4, MVT::v2i8, 4}, // interleave 4 x 2i8 into 8i8 (and store)		{4, MVT::v2i8, 4}, // interleave 4 x 2i8 into 8i8 (and store)
{4, MVT::v4i8, 4}, // interleave 4 x 4i8 into 16i8 (and store)		{4, MVT::v4i8, 4}, // interleave 4 x 4i8 into 16i8 (and store)
{4, MVT::v8i8, 4}, // interleave 4 x 8i8 into 32i8 (and store)		{4, MVT::v8i8, 4}, // interleave 4 x 8i8 into 32i8 (and store)
{4, MVT::v16i8, 10}, // interleave 4 x 16i8 into 64i8 (and store)		{4, MVT::v16i8, 8}, // interleave 4 x 16i8 into 64i8 (and store)
{4, MVT::v32i8, 12}, // interleave 4 x 32i8 into 128i8 (and store)		{4, MVT::v32i8, 12}, // interleave 4 x 32i8 into 128i8 (and store)

{4, MVT::v2i16, 2}, // interleave 4 x 2i16 into 8i16 (and store)		{4, MVT::v2i16, 2}, // interleave 4 x 2i16 into 8i16 (and store)
{4, MVT::v4i16, 6}, // interleave 4 x 4i16 into 16i16 (and store)		{4, MVT::v4i16, 6}, // interleave 4 x 4i16 into 16i16 (and store)
{4, MVT::v8i16, 10}, // interleave 4 x 8i16 into 32i16 (and store)		{4, MVT::v8i16, 10}, // interleave 4 x 8i16 into 32i16 (and store)
{4, MVT::v16i16, 32}, // interleave 4 x 16i16 into 64i16 (and store)		{4, MVT::v16i16, 32}, // interleave 4 x 16i16 into 64i16 (and store)
{4, MVT::v32i16, 64}, // interleave 4 x 32i16 into 128i16 (and store)		{4, MVT::v32i16, 64}, // interleave 4 x 32i16 into 128i16 (and store)

▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll

	Show All 23 Lines
	; AVX1: LV: Found an estimated cost of 81 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX1: LV: Found an estimated cost of 81 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX1: LV: Found an estimated cost of 162 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX1: LV: Found an estimated cost of 162 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX1: LV: Found an estimated cost of 332 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX1: LV: Found an estimated cost of 332 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX2: LV: Found an estimated cost of 5 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 5 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX2: LV: Found an estimated cost of 5 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 5 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX2: LV: Found an estimated cost of 13 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 13 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX2: LV: Found an estimated cost of 41 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 26 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX2: LV: Found an estimated cost of 84 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 84 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX512: LV: Found an estimated cost of 5 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 5 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX512: LV: Found an estimated cost of 17 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 17 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX512: LV: Found an estimated cost of 33 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 33 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX512: LV: Found an estimated cost of 80 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 80 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1
	Show All 40 Lines

llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll

	Show All 23 Lines
	; AVX1: LV: Found an estimated cost of 67 for VF 8 For instruction: store i8 %v3, i8* %out3, align 1			; AVX1: LV: Found an estimated cost of 67 for VF 8 For instruction: store i8 %v3, i8* %out3, align 1
	; AVX1: LV: Found an estimated cost of 134 for VF 16 For instruction: store i8 %v3, i8* %out3, align 1			; AVX1: LV: Found an estimated cost of 134 for VF 16 For instruction: store i8 %v3, i8* %out3, align 1
	; AVX1: LV: Found an estimated cost of 332 for VF 32 For instruction: store i8 %v3, i8* %out3, align 1			; AVX1: LV: Found an estimated cost of 332 for VF 32 For instruction: store i8 %v3, i8* %out3, align 1
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %v3, i8* %out3, align 1			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %v3, i8* %out3, align 1
	; AVX2: LV: Found an estimated cost of 5 for VF 2 For instruction: store i8 %v3, i8* %out3, align 1			; AVX2: LV: Found an estimated cost of 5 for VF 2 For instruction: store i8 %v3, i8* %out3, align 1
	; AVX2: LV: Found an estimated cost of 5 for VF 4 For instruction: store i8 %v3, i8* %out3, align 1			; AVX2: LV: Found an estimated cost of 5 for VF 4 For instruction: store i8 %v3, i8* %out3, align 1
	; AVX2: LV: Found an estimated cost of 5 for VF 8 For instruction: store i8 %v3, i8* %out3, align 1			; AVX2: LV: Found an estimated cost of 5 for VF 8 For instruction: store i8 %v3, i8* %out3, align 1
	; AVX2: LV: Found an estimated cost of 12 for VF 16 For instruction: store i8 %v3, i8* %out3, align 1			; AVX2: LV: Found an estimated cost of 10 for VF 16 For instruction: store i8 %v3, i8* %out3, align 1
	; AVX2: LV: Found an estimated cost of 16 for VF 32 For instruction: store i8 %v3, i8* %out3, align 1			; AVX2: LV: Found an estimated cost of 16 for VF 32 For instruction: store i8 %v3, i8* %out3, align 1
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %v3, i8* %out3, align 1			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %v3, i8* %out3, align 1
	; AVX512: LV: Found an estimated cost of 11 for VF 2 For instruction: store i8 %v3, i8* %out3, align 1			; AVX512: LV: Found an estimated cost of 11 for VF 2 For instruction: store i8 %v3, i8* %out3, align 1
	; AVX512: LV: Found an estimated cost of 11 for VF 4 For instruction: store i8 %v3, i8* %out3, align 1			; AVX512: LV: Found an estimated cost of 11 for VF 4 For instruction: store i8 %v3, i8* %out3, align 1
	; AVX512: LV: Found an estimated cost of 11 for VF 8 For instruction: store i8 %v3, i8* %out3, align 1			; AVX512: LV: Found an estimated cost of 11 for VF 8 For instruction: store i8 %v3, i8* %out3, align 1
	; AVX512: LV: Found an estimated cost of 12 for VF 16 For instruction: store i8 %v3, i8* %out3, align 1			; AVX512: LV: Found an estimated cost of 12 for VF 16 For instruction: store i8 %v3, i8* %out3, align 1
	; AVX512: LV: Found an estimated cost of 16 for VF 32 For instruction: store i8 %v3, i8* %out3, align 1			; AVX512: LV: Found an estimated cost of 16 for VF 32 For instruction: store i8 %v3, i8* %out3, align 1
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines