This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86TargetTransformInfo.cpp
-
test/Analysis/CostModel/X86/
-
Analysis/
-
CostModel/
-
X86/
-
interleaved-load-i8-stride-3.ll

Differential D110961

[X86][Costmodel] Load/store i8 Stride=3 VF=32 interleaving costs
ClosedPublic

Authored by lebedev.ri on Oct 1 2021, 12:28 PM.

Download Raw Diff

Details

Reviewers

Commits

rG448c93983999: [X86][Costmodel] Load/store i8 Stride=3 VF=32 interleaving costs

Summary

For VF=16, costs are correct.
For VF=32, load cost is divergent.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/qKjevqf4W - for intels Block RThroughput: <=14.0; for ryzens, Block RThroughput: <=4.5
So pick cost of 14.

For store we have:
https://godbolt.org/z/xTssTq319 - for intels Block RThroughput: =13.0; for ryzens, Block RThroughput: <=5.5
So pick cost of 13.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Oct 1 2021, 12:28 PM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptOct 1 2021, 12:28 PM

lebedev.ri requested review of this revision.Oct 1 2021, 12:28 PM

lebedev.ri added a parent revision: D110960: [X86][Costmodel] Load/store i8 Stride=3 VF=8 interleaving costs.

LGTM

This revision is now accepted and ready to land.Oct 2 2021, 3:18 AM

In D110961#3037993, @RKSimon wrote:

LGTM

Thank you for the reviews!

Closed by commit rG448c93983999: [X86][Costmodel] Load/store i8 Stride=3 VF=32 interleaving costs (authored by lebedev.ri). · Explain WhyOct 2 2021, 3:52 AM

This revision was automatically updated to reflect the committed changes.

lebedev.ri added a commit: rG448c93983999: [X86][Costmodel] Load/store i8 Stride=3 VF=32 interleaving costs.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86TargetTransformInfo.cpp

2 lines

test/

Analysis/

CostModel/

X86/

interleaved-load-i8-stride-3.ll

2 lines

Diff 376693

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

	Show First 20 Lines • Show All 5,084 Lines • ▼ Show 20 Lines		static const CostTblEntry AVX2InterleavedLoadTbl[] = {
	{2, MVT::v4i64, 4}, // (load 8i64 and) deinterleave into 2 x 4i64			{2, MVT::v4i64, 4}, // (load 8i64 and) deinterleave into 2 x 4i64
	{2, MVT::v8i64, 8}, // (load 16i64 and) deinterleave into 2 x 8i64			{2, MVT::v8i64, 8}, // (load 16i64 and) deinterleave into 2 x 8i64
	{2, MVT::v16i64, 16}, // (load 32i64 and) deinterleave into 2 x 16i64			{2, MVT::v16i64, 16}, // (load 32i64 and) deinterleave into 2 x 16i64

	{3, MVT::v2i8, 3}, // (load 6i8 and) deinterleave into 3 x 2i8			{3, MVT::v2i8, 3}, // (load 6i8 and) deinterleave into 3 x 2i8
	{3, MVT::v4i8, 3}, // (load 12i8 and) deinterleave into 3 x 4i8			{3, MVT::v4i8, 3}, // (load 12i8 and) deinterleave into 3 x 4i8
	{3, MVT::v8i8, 6}, // (load 24i8 and) deinterleave into 3 x 8i8			{3, MVT::v8i8, 6}, // (load 24i8 and) deinterleave into 3 x 8i8
	{3, MVT::v16i8, 11}, // (load 48i8 and) deinterleave into 3 x 16i8			{3, MVT::v16i8, 11}, // (load 48i8 and) deinterleave into 3 x 16i8
	{3, MVT::v32i8, 13}, // (load 96i8 and) deinterleave into 3 x 32i8			{3, MVT::v32i8, 14}, // (load 96i8 and) deinterleave into 3 x 32i8

	{3, MVT::v8i32, 17}, // (load 24i32 and) deinterleave into 3 x 8i32			{3, MVT::v8i32, 17}, // (load 24i32 and) deinterleave into 3 x 8i32

	{4, MVT::v2i8, 12}, // (load 8i8 and) deinterleave into 4 x 2i8			{4, MVT::v2i8, 12}, // (load 8i8 and) deinterleave into 4 x 2i8
	{4, MVT::v4i8, 4}, // (load 16i8 and) deinterleave into 4 x 4i8			{4, MVT::v4i8, 4}, // (load 16i8 and) deinterleave into 4 x 4i8
	{4, MVT::v8i8, 20}, // (load 32i8 and) deinterleave into 4 x 8i8			{4, MVT::v8i8, 20}, // (load 32i8 and) deinterleave into 4 x 8i8
	{4, MVT::v16i8, 39}, // (load 64i8 and) deinterleave into 4 x 16i8			{4, MVT::v16i8, 39}, // (load 64i8 and) deinterleave into 4 x 16i8
	{4, MVT::v32i8, 80}, // (load 128i8 and) deinterleave into 4 x 32i8			{4, MVT::v32i8, 80}, // (load 128i8 and) deinterleave into 4 x 32i8
	▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll

	Show All 24 Lines
	; AVX1: LV: Found an estimated cost of 114 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX1: LV: Found an estimated cost of 114 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX1: LV: Found an estimated cost of 249 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX1: LV: Found an estimated cost of 249 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX2: LV: Found an estimated cost of 6 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 6 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX2: LV: Found an estimated cost of 6 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 6 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX2: LV: Found an estimated cost of 9 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 9 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX2: LV: Found an estimated cost of 13 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 13 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX2: LV: Found an estimated cost of 16 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX2: LV: Found an estimated cost of 17 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX512: LV: Found an estimated cost of 4 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 4 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX512: LV: Found an estimated cost of 4 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 4 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX512: LV: Found an estimated cost of 13 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 13 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX512: LV: Found an estimated cost of 13 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 13 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX512: LV: Found an estimated cost of 16 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 16 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1
	; AVX512: LV: Found an estimated cost of 25 for VF 64 For instruction: %v0 = load i8, i8* %in0, align 1			; AVX512: LV: Found an estimated cost of 25 for VF 64 For instruction: %v0 = load i8, i8* %in0, align 1
	Show All 35 Lines